explaingit

data-centric-ai-community/fg-data-profiling

13,547PythonAudience · dataComplexity · 2/5Setup · easy

TLDR

fg-data-profiling (formerly ydata-profiling and pandas-profiling) generates a thorough HTML or JSON analysis report of any dataset, types, missing values, distributions, correlations, with a single line of Python code.

Mindmap

mindmap
  root((repo))
    What it does
      One-line data report
      Column analysis
      Problem detection
    Output formats
      HTML report
      JSON export
      Notebook widget
    Data types handled
      Numeric columns
      Text and dates
      File and image cols
    Audience
      Data scientists
      Data engineers
      Analysts
Click or tap to explore — scroll the page freely

Code map

Detail Auto

An interactive map of this repo's files and how they connect — its source is parsed live in your browser. Click Visualize to build it.

filefunction / class

Things people build with this

USE CASE 1

Generate a one-command HTML report summarizing every column in a CSV dataset, including missing values, distributions, and correlations.

USE CASE 2

Compare two versions of the same dataset side by side to spot what changed between data snapshots.

USE CASE 3

Profile a large distributed dataset using Spark without loading it all into memory on a single machine.

USE CASE 4

Embed an interactive profiling widget directly inside a Jupyter Notebook for fast exploratory data analysis.

Tech stack

PythonPandasSpark

Getting it running

Difficulty · easy Time to first run · 5min

Previously called ydata-profiling and pandas-profiling, update imports if migrating from an older version.

In plain English

fg-data-profiling is a Python library that produces a detailed analysis report of a dataset with a single line of code. You load a table of data into a standard Python data structure called a DataFrame, run one command, and get back a thorough breakdown of every column, covering data types, missing values, duplicate rows, statistical summaries, and visualizations. The report can be exported as an HTML file you can open in a browser, as JSON for use in automated systems, or as an interactive widget inside a Jupyter Notebook. The library handles several types of data automatically. For numeric columns it computes averages, medians, and distributions. For text columns it identifies character patterns and scripts. For date and time columns it detects seasonality and auto-correlation patterns. It also handles file and image columns by reporting file sizes, creation dates, and image dimensions. It automatically flags potential problems in the data, such as columns that are almost entirely empty, values that are heavily skewed to one side, or columns that are nearly identical to each other. One common use case is comparing two versions of the same dataset side by side, which the library supports with the same one-line approach. It also scales to large datasets through Spark support, allowing the same profiling workflow on distributed data rather than only on data that fits on a single machine. The package was previously called ydata-profiling and before that pandas-profiling. It was recently renamed to fg-data-profiling under new stewardship by the Data-Centric AI Community. If you have older code that imports ydata-profiling, the README includes a short migration guide showing how to swap the package name and update import statements. The old package will no longer receive updates.

Copy-paste prompts

Prompt 1
Using fg-data-profiling, write Python code to load a CSV file into a pandas DataFrame and generate an HTML profiling report saved to disk.
Prompt 2
Show me how to use fg-data-profiling to compare two DataFrames, a training set and a test set, and highlight the differences.
Prompt 3
Write Python code using fg-data-profiling to profile a large Spark DataFrame and export the results as a JSON file.
Prompt 4
How do I display an interactive fg-data-profiling report as an inline widget inside a Jupyter Notebook?
Prompt 5
I have code that imports ydata-profiling, show me exactly how to migrate it to fg-data-profiling.
Open on GitHub → Explain another repo

← data-centric-ai-community on gitmyhub — every repo by this author, as a profile.

Verify against the repo before relying on details.