explaingit

i-am-manware/dating-app-behavioural-analysis-for-secure-girls

14Jupyter NotebookAudience · researcherComplexity · 3/5ActiveSetup · moderate

TLDR

Jupyter notebook pipeline that analyses 123 hand-rated male dating profiles to identify which latent factors and features predict a right swipe from five raters.

Mindmap

mindmap
  root((dating-analysis))
    Inputs
      Annotated profile dataset
      Rater scores
      Parquet files
    Outputs
      EDA figures
      Factor loadings
      ML model AUC
      Prescription table
    Use Cases
      Reproduce swipe study
      Try factor analysis on ratings data
      Practice SHAP and GAM modelling
    Tech Stack
      Python
      pandas
      scikit-learn
      XGBoost
      SHAP
      UMAP

Things people build with this

USE CASE 1

Reproduce the dating-app swipe study and inspect the 16 EDA figures

USE CASE 2

Reuse the EFA, PCA, t-SNE, and UMAP latent-variable pipeline on a similar rated dataset

USE CASE 3

Compare eight classifier baselines plus a GAM and SHAP explanations on a small dataset

USE CASE 4

Study a worked example of within-cohort behavioural data analysis with a research-style report

Tech stack

Pythonpandasscikit-learnXGBoostSHAPUMAP

Getting it running

Difficulty · moderate Time to first run · 30min

Notebooks must be run in order because each writes parquet files the next consumes, and the stack pulls in XGBoost, LightGBM, CatBoost, PyGAM, SHAP, factor-analyzer, and UMAP.

In plain English

This project is a Jupyter notebook pipeline that takes a hand-annotated dataset of 123 male dating-app profiles and looks at which features predict a right swipe. The profiles were rated by five women described in the README as securely-attached, and 23.6% of the profiles received a right swipe. The author treats the result as a within-cohort study, not a population average. The repository is organised as five notebooks that run in order: data cleaning and parquet export, exploratory data analysis with 16 figures, deeper feature-level analysis, latent-variable analysis using exploratory factor analysis plus PCA, t-SNE and UMAP, and a final modelling notebook with eight machine-learning models, SHAP, a GAM, and a prescription table. The headline finding reported in the README is that two latent factors, labelled Psychological Safety and Visual Appeal, account for 99.3% of the swipe decisions in this dataset. The strongest individual predictor is the rater-inferred emotional_stability score. All eight models reach an AUC of 1.0 on the held-out test set, which the author attributes to high rater agreement rather than overfitting. Several common beliefs are reported as not supported by the data. Height shows no statistically significant correlation with swipe outcome in this sample. Shirtless photos in the sample receive a 0% swipe rate. Status correlates with swipes raw but drops to non-significant after controlling for perceived attractiveness. The README also notes that photo quality and warmth matter more than the number of photos. To reproduce the work, the README lists Python 3.10 or newer plus pandas, scikit-learn, XGBoost, LightGBM, CatBoost, PyGAM, SHAP, factor-analyzer, UMAP, openpyxl, and pyarrow. Notebooks must be run in sequence because each one writes parquet files the next one reads. A 13-section research-style report covering the methods, findings, and limitations is included as report.md.

Copy-paste prompts

Prompt 1
Walk me through the five notebooks in order and explain what parquet outputs each one produces
Prompt 2
Show me how the Psychological Safety and Visual Appeal factors are derived in the EFA notebook
Prompt 3
Adapt the modelling notebook to use a different held-out split and report AUC drop
Prompt 4
Run the SHAP analysis on the XGBoost model and surface the top five features
Prompt 5
Rewrite the prescription table cell so it exports to CSV instead of inline markdown
Open on GitHub → Explain another repo

Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.