About This Project

The Epstein Files platform is a public-interest research tool for searching, analyzing, and cross-referencing the publicly released documents from the Jeffrey Epstein case. All source data comes from official releases by the U.S. Department of Justice and congressional committees.

Purpose

In 2023–2025, the U.S. Department of Justice released over 1.3 million documents related to the Jeffrey Epstein investigation across 12 datasets. These documents include court filings, FBI records, bank statements, flight logs, email archives, detention records, and more.

The sheer volume of these releases makes manual review impractical. This platform was built to make these public records searchable and analyzable. Every document has been extracted, OCR-processed, indexed, and cross-referenced with entity recognition to surface connections that would otherwise remain buried in hundreds of thousands of PDFs.

This is a transparency and accountability project. The documents are public record. This tool simply makes them accessible.

Platform Statistics

3.13M

Document chunks indexed

1.38M

PDF documents processed

Datasets from DOJ releases

1.44M

Knowledge graph nodes

6.8M+

Entity relationships mapped

311

Individuals profiled

186K+

Photos classified

2,420

Flight records cleaned

Source Datasets

All datasets originate from official DOJ releases and congressional committee publications.

Dataset	Files	Content
DS1	3,158	Court filings, FBI records, and investigative documents
DS2	574	Financial documents and bank records
DS3	67	Metropolitan Correctional Center (MCC) detention records
DS4	152	Communications and correspondence
DS5	120	Additional court filings
DS6	13	Supplemental materials
DS7	17	Supplemental materials
DS8	10,595	FBI investigative records
DS9	531,307	Bank records, JPMorgan correspondence
DS10	503,154	Aviation and flight records
DS11	331,655	Email archives
DS12	152	Additional FBI files

Features

Full-Text Search

Search 3.13M indexed document chunks with faceted filtering by dataset, relevance scoring, and highlighted results.

AI Chat

Ask questions in natural language. Hybrid keyword and semantic search retrieves relevant documents to answer queries.

Crime Board

311 profiled individuals categorized by role: victims, accomplices, associates, legal figures, staff, and financials.

People Browser

Browse all entities extracted via NER: persons, organizations, locations, emails, and phone numbers with document connections.

Knowledge Graph

3D interactive graph visualization of entity relationships with depth control, type filters, and weighted edges.

Evidence Browser

Browse key evidence documents organized by category and significance.

Timeline

168 corpus-verified events from 1953 to present with activity heatmap drill-down.

Key Facts

318 verified facts across 10 categories, each linked to source documents.

Network Analysis

Entity reports, relationship discovery, clustering analysis, and CSV export.

Flight Logs

2,420 cleaned flight records with route map, top passengers, airports, and statistics.

Email Inbox

Browse emails across all datasets with parsed headers and dataset facets.

Gallery

186,000+ classified photos, 1,300 videos, and audio files. CLIP-categorized with advanced filtering.

Artworks

Documented artworks from Epstein's properties including the townhouse collection and commissioned pieces.

Black Book

1,971 contacts from Epstein's little black book with phone numbers and addresses.

Birthday Book

238-page commemorative album assembled by Ghislaine Maxwell for Epstein's 50th birthday.

Code Words

Reference guide to coded language, aliases, and euphemisms found in the documents.

Datasets

Dashboard showing all 12 datasets with file counts, extraction status, and pipeline statistics.

Deleted Files

Tracker for deleted and removed files across all datasets with recovery status.

Financial Network

Financial network analysis showing money flows, wire transfers, and account relationships.

Properties

Interactive map and details of Epstein's properties worldwide with associated documents.

Technical Details

The extraction pipeline uses a multi-stage approach: hidden text recovery from PDF layers, OCR via SOTA models (FireRed-OCR) for scanned pages, CLIP ViT-L/14 for photo classification, and GLiNER for named entity recognition across all 3.13M+ chunks.

Entity data is stored in a Neo4j knowledge graph (1.44M nodes, 6.8M+ relationships) enabling network analysis and relationship discovery. Full-text search is powered by OpenSearch with 3.13M indexed chunks. Semantic search uses 5.3M vector embeddings in Qdrant. Document metadata and ground truth statistics are maintained in PostgreSQL.

Frontend: Next.js + React + Tailwind

Backend: FastAPI + Python

Search: OpenSearch 2.19

Graph DB: Neo4j 5.27

SQL DB: PostgreSQL 16

OCR: FireRed-OCR (2B, SOTA)

Classification: CLIP ViT-L/14

NER: GLiNER

Vectors: Qdrant (5.3M embeddings)

Inference: vLLM (FireRed-OCR)

CDN: Cloudflare

Disclaimer

This platform is a research tool for publicly available documents. All source material comes from official government releases. The platform does not host, generate, or distribute any illegal content.

Some images in the original documents were not properly redacted by the releasing agencies. Where identified, these images have been censored pending review. If you encounter content that should be redacted, please report it.

Inclusion of any individual's name in these documents does not imply guilt or wrongdoing. Many people appear in the records as witnesses, victims, legal representatives, or incidental mentions. The profiles on the Crime Board reflect what the documents contain, not conclusions of culpability.

Entity extraction (names, organizations, locations) is automated using NLP models and may contain OCR errors or misattributions. Always verify against the source documents.