Identify high-risk patients before appointment day so staff can send SMS reminders or make phone calls to reduce no-shows.
Adjust clinic daily schedules proactively by accounting for predicted absences and reducing wasted appointment slots.
Understand which factors drive each individual patient's no-show risk using SHAP explanations, not just a raw risk number.
Use as a portfolio reference or starting point for a custom clinic no-show prediction system tailored to your own data.
Dataset is not included, download separately from Kaggle. Install Python dependencies, then run the Jupyter notebook for exploration or the standalone scripts for training and SHAP explanations.
This project builds a machine learning system to predict which patients are likely to miss their medical appointments before the appointment day arrives. It works from a dataset of over 110,000 appointment records and is designed to give clinic staff early warning so they can take action, such as sending reminder texts, calling high-risk patients, or adjusting the day's schedule to account for expected absences. The core of the project trains two types of prediction models, LightGBM and XGBoost, which are both established tools for this kind of classification problem. The models take in information about each appointment, including how far in advance it was booked, whether the patient received an SMS reminder, and health markers like hypertension or diabetes, then output a risk score for that patient skipping the visit. A notable feature is the inclusion of SHAP explanations. SHAP is a technique that shows not just whether the model flagged a patient as high-risk, but which specific factors drove that prediction for that individual appointment. This is important in a clinical context because staff and administrators generally need to understand the reasoning behind a prediction, not just act on an opaque number. The repository includes an exploratory Jupyter notebook for analysis and experimentation, along with separate Python scripts for the data processing, model training, and explanation steps. The dataset itself is not bundled in the repo, the README points to a public Kaggle download. This appears to be a portfolio and consulting showcase project from a healthcare data scientist rather than a production-ready system. The code is organized cleanly and includes standard installation steps, but some sections of the README read as promotional material directed at potential clinic clients.
← nudratds on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.