explaingit

khanhnamle1994/cracking-the-data-science-interview

4,635Jupyter NotebookAudience · dataComplexity · 2/5Setup · easy

TLDR

A curated study kit for data science job interviews: cheatsheets on SQL, stats, ML, and deep learning, a question bank with 150 Q&As, case study prompts, and the author's own portfolio projects spanning recommendation systems, computer vision, and NLP.

Mindmap

mindmap
  root((repo))
    Cheatsheets
      SQL basics
      Stats and probability
      ML fundamentals
      Deep learning
    Question Bank
      150 common questions
      Analytics Vidhya picks
      Interview Query picks
    Case Studies
      ML system design
      Real world scenarios
    Ebooks
      Python ML books
      Finance ML
      Data science stats
    Portfolio Projects
      Recommendation systems
      Computer vision
      Tweet classification
    Data Journalism
      Published stories
      Soccer analysis
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Study for a data science job interview using topic-by-topic cheatsheets and a 150-question Q&A bank

USE CASE 2

Practice designing real-world ML systems with included case study prompts

USE CASE 3

Browse the author's portfolio projects as inspiration for your own data science work

USE CASE 4

Use the ebook collection as a reading list to build foundational knowledge in ML and statistics

Tech stack

PythonJupyter NotebookPyTorchKerasSQLPDFMachine LearningNLP

Getting it running

Difficulty · easy Time to first run · 5min

No installation needed for most materials. Download or browse PDFs and notebooks directly. Running Jupyter notebooks requires Python and standard ML libraries like PyTorch or Keras.

No license is mentioned in the explanation, so reuse terms are unclear.

In plain English

Cracking the Data Science Interview is a collection of study materials, practice questions, and sample projects assembled by one developer to help people prepare for data science job interviews. It is not a course or an application, it is a curated repository of reference files grouped into several topic areas. The cheatsheets section covers the concepts most commonly tested in interviews: SQL for querying databases, statistics and probability, linear algebra and other mathematics, machine learning fundamentals, deep learning, supervised and unsupervised learning, computer vision, and natural language processing. Many of these are downloadable PDF summaries meant for quick review before an interview. The ebooks section collects several books on practical data science and machine learning, including titles on Python-based machine learning, data science statistics, and applying machine learning to finance. The question bank gathers interview questions sourced from platforms like Analytics Vidhya, Interview Query, and others, including a PDF of 150 commonly asked data science questions and answers. There is also a section of case study prompts that ask candidates to reason through how they would design machine learning systems for real-world scenarios. Beyond study materials, the repository includes the author's own portfolio of past projects. These span recommendation systems built with PyTorch and Keras, machine learning work on taxi trip optimization and grocery basket prediction, computer vision projects on clothing classification and road segmentation, tweet classification, and data analysis on topics like World Cup soccer teams and Spotify artist styles. There is also a data journalism section with published stories. The repository is intended as both a study guide for job seekers and a portfolio reference for the author's own work.

Copy-paste prompts

Prompt 1
I am preparing for a data science interview. Using the cheatsheets in this repository as context, quiz me on 10 statistics and probability questions and then explain any I get wrong.
Prompt 2
I want to build a recommendation system similar to the PyTorch and Keras projects in this repo. Walk me through the high-level steps I would need to follow to get started.
Prompt 3
Using the ML system design case studies in this repository as a template, help me practice answering this question: how would you build a product recommendation engine for an e-commerce site?
Prompt 4
I have the 150 data science interview questions from this repository. Pick 5 SQL questions from that list and give me a hint for each one without revealing the full answer.
Prompt 5
Based on the portfolio projects in this repo covering computer vision and NLP, suggest 3 beginner-friendly project ideas I could add to my own portfolio to stand out in interviews.
Open on GitHub → Explain another repo

← khanhnamle1994 on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.