yandexdataschool/nlp_course

★ 10,556Jupyter NotebookAudience · developerComplexity · 2/5Setup · easy

Mindmap

mindmap
  root((nlp_course))
    What it covers
      Text classification
      Machine translation
      Dialogue systems
      Text summarization
    Seminar projects
      Salary predictor
      POS tagging
      QA system
    Tech
      Python
      Jupyter Notebook
    Audience
      ML learners
      NLP beginners

mindmap root((nlp_course)) What it covers Text classification Machine translation Dialogue systems Text summarization Seminar projects Salary predictor POS tagging QA system Tech Python Jupyter Notebook Audience ML learners NLP beginners

Click or tap to explore — scroll the page freely

Things people build with this

USE CASE 1

Build a salary predictor from job description text using neural networks following the provided seminar exercises

USE CASE 2

Train a machine translation system for hotel descriptions using sequence-to-sequence models

USE CASE 3

Implement part-of-speech tagging for text annotation tasks

USE CASE 4

Create a simple question-answering system using techniques from the later course weeks

Tech stack

PythonJupyter Notebook

Getting it running

Difficulty · easy Time to first run · 5min

A Binder link lets you open all notebooks in your browser with no local installation required.

In plain English

This repository contains the lecture and seminar materials for a Natural Language Processing course taught at Yandex School of Data Analysis. Natural language processing, often called NLP, is the area of computer science and machine learning concerned with teaching computers to understand and work with human language, such as text and speech. The course is organized into twelve weekly topics, each with a lecture and a hands-on seminar. The weekly topics move from foundational ideas to more advanced ones. Early weeks cover how to turn words into numbers that machines can work with, how to classify text into categories, and how language models work. Later weeks cover sequence-to-sequence models for tasks like machine translation, structured prediction, expectation-maximization, transfer learning, domain adaptation, dialogue systems, adversarial learning, and text summarization. The seminar sessions are practical and include exercises like building a salary predictor using neural networks, training a machine translation system for hotel descriptions, implementing part-of-speech tagging, and building a simple question-answering system. The materials are written as Jupyter Notebooks, which combine explanatory text and runnable code in one document. A Binder link in the README lets you open the notebooks in a browser without installing anything locally. The course was developed and taught by a team of five contributors including Elena Voita and several colleagues, all of whom contributed lectures, seminars, and homework assignments. Homework deadlines for YSDA students are tracked on a separate platform called Anytask. The repository also has a GitHub issues thread specifically for help with installing the required libraries. This course is appropriate for people who already have some background in programming and mathematics and want to go deeper into how modern NLP systems are built. It is not an introductory course and assumes familiarity with machine learning concepts before the later weeks.

Copy-paste prompts

Prompt 1

Using the yandexdataschool/nlp_course materials, walk me through how word embeddings work and help me implement one in Python for a text classification task.

Prompt 2

Help me build a salary predictor from job description text using the neural network approach covered in the yandexdataschool/nlp_course seminars.

Prompt 3

Based on the sequence-to-sequence lecture in yandexdataschool/nlp_course, help me build a simple English-to-French translation model using PyTorch.

Prompt 4

Using the transfer learning week from yandexdataschool/nlp_course, help me fine-tune a pretrained language model on my own text dataset.

Prompt 5

Help me run the yandexdataschool/nlp_course notebooks in my browser using the Binder link, and explain what I need to do for week 1.

Open on GitHub → Explain another repo

← yandexdataschool on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.