Find publicly available datasets for training intrusion detection or malware classification models
Build a reading list of academic papers before starting a security ML project
Discover video talks and university courses on applying ML to cybersecurity
Locate phishing, spam, and network capture datasets for research experiments
This repository is a curated reading list focused on one specific crossover: applying machine learning to cybersecurity problems. It does not contain runnable code. Instead, it is an organized collection of links to datasets, academic papers, books, video talks, tutorials, courses, and miscellaneous tools that researchers and practitioners have found useful in this area. The datasets section lists publicly available data that someone building a security-focused machine learning model might train or test against. These include network intrusion detection datasets, malware samples, spam corpora, phishing data, web attack payloads, and packet capture files from various universities, government labs, and research organizations. The papers section covers academic research on topics like detecting malicious PDF files, identifying malware through network behavior, spotting phishing domains using passive DNS data, password strength modeling with neural networks, and anomaly detection in system logs. Several papers in Russian are also included. The list spans roughly a decade of published work, so some entries are foundational older research and some are more recent. The books, talks, tutorials, and courses sections follow the same pattern: they point outward to external resources rather than providing content directly in this repository. This kind of list is most useful as a starting point for someone who wants to explore the research landscape, find training data for a project, or build a reading list before going deeper into any one area. It is a reference index, not a guide with explanations or commentary on the items listed.
← jivoi on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.