Data Sources
Our primary dataset is the Federal Judicial Center (FJC) Integrated Database, containing 4.9 million bankruptcy cases from 2008-2024 across all 94 federal judicial districts. This is the most comprehensive public bankruptcy dataset available.
We supplement with PACER docket data for case-level analysis, Census Bureau demographic data for population adjustments, and Bureau of Labor Statistics data for economic context.
Processing Pipeline
Raw FJC data is loaded into a SQLite database and normalized for consistent analysis. We standardize court identifiers, clean attorney names, resolve district variations, and validate data integrity.
Our analysis scripts are open-source Python, available on GitHub. Every finding can be independently verified by running the same scripts against the same data.
Statistical Methods
We use standard statistical methods: completion rates (cases reaching discharge vs. total filed), comparative analysis across districts and time periods, outlier detection (Z-scores and percentile rankings), and regression analysis for identifying factors correlated with case outcomes.
We do not impute missing data, extrapolate beyond the dataset, or make causal claims without appropriate statistical support.
Reproducibility
Every analysis we publish includes: the specific SQL queries or Python scripts used, the dataset version and date range, the statistical methods applied, and the complete results (not just favorable findings). Our code is on GitHub for anyone to review and replicate.
Limitations
Our data has limitations we acknowledge: FJC data is filed-case data (we cannot observe people who needed bankruptcy but didn't file), PACER data requires manual or automated collection (not all dockets are in our database), and attorney quality is inferred from outcomes (correlation, not causation).
Frequently Asked Questions
Last updated: April 2026. Not legal advice.
Part of the Bankruptcy Transparency Network