Build a salary predictor from job description text using neural networks following the provided seminar exercises
Train a machine translation system for hotel descriptions using sequence-to-sequence models
Implement part-of-speech tagging for text annotation tasks
Create a simple question-answering system using techniques from the later course weeks
A Binder link lets you open all notebooks in your browser with no local installation required.
This repository contains the lecture and seminar materials for a Natural Language Processing course taught at Yandex School of Data Analysis. Natural language processing, often called NLP, is the area of computer science and machine learning concerned with teaching computers to understand and work with human language, such as text and speech. The course is organized into twelve weekly topics, each with a lecture and a hands-on seminar. The weekly topics move from foundational ideas to more advanced ones. Early weeks cover how to turn words into numbers that machines can work with, how to classify text into categories, and how language models work. Later weeks cover sequence-to-sequence models for tasks like machine translation, structured prediction, expectation-maximization, transfer learning, domain adaptation, dialogue systems, adversarial learning, and text summarization. The seminar sessions are practical and include exercises like building a salary predictor using neural networks, training a machine translation system for hotel descriptions, implementing part-of-speech tagging, and building a simple question-answering system. The materials are written as Jupyter Notebooks, which combine explanatory text and runnable code in one document. A Binder link in the README lets you open the notebooks in a browser without installing anything locally. The course was developed and taught by a team of five contributors including Elena Voita and several colleagues, all of whom contributed lectures, seminars, and homework assignments. Homework deadlines for YSDA students are tracked on a separate platform called Anytask. The repository also has a GitHub issues thread specifically for help with installing the required libraries. This course is appropriate for people who already have some background in programming and mathematics and want to go deeper into how modern NLP systems are built. It is not an introductory course and assumes familiarity with machine learning concepts before the later weeks.
← yandexdataschool on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.