Intelligence Dashboard
Aggregated intelligence from 2M+ document chunks across 12 datasets
Key Documents
Investigative Terms
Most Connected Documents
Ranked by number of linked entities (persons, orgs, emails)
Strongest Connections
Entity pairs co-occurring in the most documents
What are the Epstein Files?
The Epstein Files are court documents, financial records, flight logs, communications, and other evidence released by the DOJ and various courts related to Jeffrey Epstein and associated individuals. These span 12 distinct datasets totaling over 1.3 million documents.
Search tips
Keyword mode uses full-text matching with fuzzy typo tolerance.
Semantic mode uses AI embeddings to find conceptually similar passages.
Use the Filters button to filter by text source, dataset, or your tags.
PDF OCR (cyan badge) was extracted from PDF text layers — high reliability, deterministic.
Visual OCR (amber badge) was recognized from scanned images — may contain errors.
How was this data extracted?
Each PDF was processed through text layer extraction (PyMuPDF), visual OCR (FireRed-OCR, 2B SOTA model), and page image extraction. Text was chunked, indexed in OpenSearch, and embedded in Qdrant (5.3M vectors) for semantic search. All source data comes directly from publicly released DOJ files.