explaingit

amathguywhocodes/day-45-100-movies-to-watch

0PythonAudience · developerComplexity · 1/5ActiveSetup · easy

TLDR

Small Day 45 Python exercise that scrapes an archived Empire top-100 movies page with BeautifulSoup and writes the titles to movies.txt in ascending order.

Mindmap

mindmap
  root((Day-45-100-movies-to-watch))
    Inputs
      Archived Empire URL
      HTML page
    Outputs
      movies.txt
      Ordered titles list
    Use Cases
      Practice web scraping
      Build a movie watchlist
      Day-by-day learning
    Tech Stack
      Python
      BeautifulSoup
      Requests

Things people build with this

USE CASE 1

Practice BeautifulSoup by extracting an ordered list of movie titles from an HTML page.

USE CASE 2

Generate a personal movies.txt watchlist of the top 100 movies of all time.

USE CASE 3

Use an Internet Archive snapshot as a stable scraping target for a reproducible exercise.

USE CASE 4

Drop in as a day-45 milestone in a 100 day Python coding course.

Tech stack

PythonBeautifulSoupRequests

Getting it running

Difficulty · easy Time to first run · 30min

No code is provided; the reader installs BeautifulSoup and requests and writes the scraper from the README brief.

License is not stated in the available content.

In plain English

This repository is a small Python exercise that asks the reader to scrape the top 100 movies of all time from a webpage and save the result to a plain text file. The output file is called movies.txt and lists the titles in ascending order, starting from one. The README gives a short example of what the first few lines should look like, with titles such as The Godfather, The Empire Strikes Back, The Dark Knight, and The Shawshank Redemption. The stated purpose of the project is to practice using BeautifulSoup, a Python library that reads the HTML of a webpage and lets you pull pieces of data out of it. The README points to Empire's best movies list as the source, but also mentions that similar curated lists from Timeout or Stacker would work for the same exercise. There is no further code in the README, only the brief on what the script should do. The README includes one important note about the source link. Because live websites change layout often, the project recommends pointing the scraper at a snapshot stored on the Internet Archive. A specific archived URL from May 2020 is provided so that the page structure stays the same every time the script runs. This keeps the exercise reproducible long after the original page may have been updated or moved. The project looks like a single day of a longer learning series, judging by the repository name that includes Day 45. There is no list of dependencies, no setup script, and no test suite described in the README. A reader is expected to install BeautifulSoup and a request library on their own, fetch the archived page, find the right HTML elements that hold the movie titles, and write the ordered list to disk. The README is sparse, and that matters for anyone arriving at this repo. There is no license file mentioned, no contribution guide, and no description of the final solution. The repository works best as a starting prompt for someone practicing web scraping in Python, rather than as a finished tool to install and run.

Copy-paste prompts

Prompt 1
Write a Python script that uses requests and BeautifulSoup to scrape the archived May 2020 Empire top 100 movies page and save the titles in ascending order to movies.txt.
Prompt 2
Adapt the scraper to also work on a Timeout or Stacker best-movies list with minimal selector changes.
Prompt 3
Add a small test that asserts movies.txt has 100 lines and the first line is The Godfather.
Prompt 4
Refactor the scraper into a function that takes an archived URL and an output filename.
Prompt 5
Walk me through which BeautifulSoup selectors to use to pull the titles from the archived Empire page.
Open on GitHub → Explain another repo

Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.