Our Methodology: How We Analyze 4.9 Million Cases

Data Sources

Our primary dataset is the Federal Judicial Center (FJC) Integrated Database, containing 4.9 million bankruptcy cases from 2008-2024 across all 94 federal judicial districts. This is the most comprehensive public bankruptcy dataset available.

We supplement with PACER docket data for case-level analysis, Census Bureau demographic data for population adjustments, and Bureau of Labor Statistics data for economic context.

Processing Pipeline

Raw FJC data is loaded into a SQLite database and normalized for consistent analysis. We standardize court identifiers, clean attorney names, resolve district variations, and validate data integrity.

Our analysis scripts are open-source Python, available on GitHub. Every finding can be independently verified by running the same scripts against the same data.

Statistical Methods

We use standard statistical methods: completion rates (cases reaching discharge vs. total filed), comparative analysis across districts and time periods, outlier detection (Z-scores and percentile rankings), and regression analysis for identifying factors correlated with case outcomes.

We do not impute missing data, extrapolate beyond the dataset, or make causal claims without appropriate statistical support.

Reproducibility

Every analysis we publish includes: the specific SQL queries or Python scripts used, the dataset version and date range, the statistical methods applied, and the complete results (not just favorable findings). Our code is on GitHub for anyone to review and replicate.

Limitations

Our data has limitations we acknowledge: FJC data is filed-case data (we cannot observe people who needed bankruptcy but didn't file), PACER data requires manual or automated collection (not all dockets are in our database), and attorney quality is inferred from outcomes (correlation, not causation).

Frequently Asked Questions

Where does your data come from?

The Federal Judicial Center (FJC) Integrated Database, containing 4.9 million bankruptcy cases from 2008-2024 across all 94 federal districts. Supplemented with PACER docket data and Census Bureau demographics.

Can I verify your findings?

Yes. All our analysis scripts are open-source on GitHub. You can download the FJC data, run our scripts, and reproduce any finding.

Do you accept corrections?

Absolutely. If you find an error in our data or analysis, contact us at research@openbankruptcyproject.org. We issue corrections promptly.

How often is the data updated?

We update FJC data annually and PACER data on an ongoing basis. Specific analyses note their data vintage.

Our Methodology