facebookresearch/laser

Analysis updated 2026-07-03

★ 3,661Jupyter NotebookAudience · researcherComplexity · 3/5LicenseSetup · moderate

Mindmap

mindmap
  root((LASER))
    What it does
      Multilingual embeddings
      Cross-language matching
      Parallel sentence mining
    Models
      LASER-2 unified encoder
      LASER-3 language-specific
    Use Cases
      Translation training data
      Cross-language classification
      Speech mining
    Tech Stack
      Python
      PyTorch
      FAISS

mindmap root((LASER)) What it does Multilingual embeddings Cross-language matching Parallel sentence mining Models LASER-2 unified encoder LASER-3 language-specific Use Cases Translation training data Cross-language classification Speech mining Tech Stack Python PyTorch FAISS

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Mine parallel sentence pairs from Wikipedia across 200 languages to build translation training datasets.

USE CASE 2

Encode multilingual product reviews into a shared vector space so you can find similar reviews across languages.

USE CASE 3

Build a cross-language document classifier that groups news articles regardless of the language they are written in.

USE CASE 4

Create speech-to-speech translation datasets by matching spoken segments across language pairs.

What is it built with?

PythonPyTorchFAISSJupyter Notebook

How does it compare?

	facebookresearch/laser	datadog/go-profiler-notes	verazuo/jailbreak_llms
Stars	3,661	3,666	3,669
Language	Jupyter Notebook	Jupyter Notebook	Jupyter Notebook
Setup difficulty	moderate	easy	easy
Complexity	3/5	1/5	2/5
Audience	researcher	developer	researcher

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · moderate Time to first run · 30min

Basic use via laser_encoders is pip-installable, advanced mining tools need extra deps like FAISS and language-specific tokenizers.

MIT license, use freely for any purpose including commercial, keep the copyright notice.

In plain English

LASER is a research library from Meta AI that converts sentences into numerical representations called embeddings, with the distinguishing property that it works across more than 200 languages. The name stands for Language-Agnostic Sentence Representations. The practical consequence is that a sentence in English and its translation in French will produce embeddings that are numerically close to each other, even though the two sentences share no words. This property makes it useful for a set of tasks that require matching text across languages without a human translator involved. The library includes tools for mining parallel sentences from large text sources like Wikipedia and the web, meaning it can automatically find pairs of sentences across different languages that say the same thing. Those mined pairs can then be used to train translation systems. The simplest way to use it is through a pip-installable package called laser_encoders, which supports two families of models called LASER-2 and LASER-3. LASER-2 uses one encoder for all supported languages, while LASER-3 provides 147 language-specific encoders. A few lines of Python code are enough to load a model and turn a list of sentences into numerical vectors. The full kit includes more dependencies for advanced use cases, including tools for fast nearest-neighbor search and Chinese and Japanese text segmentation. The repository also contains several research tasks showing how the embeddings have been applied, such as cross-language document classification and speech-to-speech translation mining.

Copy-paste prompts

Prompt 1

Using the LASER laser_encoders Python package, write code that loads the LASER-2 model and encodes a list of English and Spanish sentences into embeddings, then finds the closest Spanish sentence for each English one using cosine similarity.

Prompt 2

How do I use LASER-3 to encode sentences in Japanese and return their vector representations? Show me the minimal Python code needed.

Prompt 3

Write a Python script that uses LASER to mine parallel sentences from two large plain-text files (one English, one German) and saves the matched pairs to a CSV file.

Prompt 4

I want to fine-tune a translation model. How do I use LASER's parallel sentence mining tools to generate training data from Common Crawl for a low-resource language pair?

Frequently asked questions

What is laser?

LASER turns sentences into numerical embeddings that work across 200+ languages, so text in English and its French translation land near each other in vector space, no translator needed.

What language is laser written in?

Mainly Jupyter Notebook. The stack also includes Python, PyTorch, FAISS.

What license does laser use?

MIT license, use freely for any purpose including commercial, keep the copyright notice.

How hard is laser to set up?

Setup difficulty is rated moderate, with roughly 30min to a first successful run.

Who is laser for?

Mainly researcher.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Scan in gitsafehub Deploy in gitdeployhub facebookresearch on gitmyhub

Verify against the repo before relying on details.