PyPI - epstein-files - Versions diffs - 1.0.3__tar.gz → 1.0.5__tar.gz - Mend

epstein-files 1.0.3tar.gz → 1.0.5tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (34) hide show

{epstein_files-1.0.3 → epstein_files-1.0.5}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: epstein-files
-Version: 1.0.3
+Version: 1.0.5
 Summary: Tools for working with the Jeffrey Epstein documents released in November 2025.
 Home-page: https://michelcrypt4d4mus.github.io/epstein_text_messages/
 License: GPL-3.0-or-later
@@ -32,35 +32,51 @@ Description-Content-Type: text/markdown
 # I Made Epstein's Text Messages Great Again
+![joi_ito](docs/joi_ito_gavin_is_clever_epstein_funds_bitcoin_dev_team.png)
 * [I Made Epstein's Text Messages Great Again (And You Should Read Them)](https://cryptadamus.substack.com/p/i-made-epsteins-text-messages-great) post on [Substack](https://cryptadamus.substack.com/p/i-made-epsteins-text-messages-great)
-* The Epstein text messages (and some of the emails along with summary counts of sent emails to/from Epstein) generated by this code can be viewed [here](https://michelcrypt4d4mus.github.io/epstein_text_messages/).
-* All of His Emails can be read at another page also generated by this code [here](https://michelcrypt4d4mus.github.io/epstein_text_messages/all_emails_epstein_files_nov_2025.html).
-* Word counts for the emails and text messages are [here](https://michelcrypt4d4mus.github.io/epstein_text_messages/communication_word_count_epstein_files_nov_2025.html).
+* The Epstein text messages (and some of the emails along with summary information) generated by this code can be viewed [here](https://michelcrypt4d4mus.github.io/epstein_text_messages/).
+* All of His Emails along with descriptions of the 496 files that were neither emails nor text messages can be read at [another page also generated by this code](https://michelcrypt4d4mus.github.io/epstein_text_messages/all_emails_epstein_files_nov_2025.html).
+* Word counts for the communications are [here](https://michelcrypt4d4mus.github.io/epstein_text_messages/communication_word_count_epstein_files_nov_2025.html).
 * Metadata containing what I have figured out about who sent or received the communications in a given file (and a brief explanation for how I figured it out for each file) is deployed [here](https://michelcrypt4d4mus.github.io/epstein_text_messages/file_metadata_epstein_files_nov_2025.json)
-* Configuration variables assigning specific `HOUSE_OVERSIGHT_XXXXXX.txt` file IDs (the `111111` part) as being emails to or from particular people based on various research and contributions can be found in [constants.py](./epstein_files/util/constants.py). Everything in `constants.py` should also appear in the JSON metadata.
+* Configuration variables assigning specific `HOUSE_OVERSIGHT_XXXXXX.txt` file IDs (the `111111` part) as being emails to or from particular people based on various research and contributions can be found in [constants.py](./epstein_files/util/constants.py). Everything in `constants.py` appears in the JSON metadata linked above.
-### Usage
-1. Requires you have a local copy of OCR text from the House Oversight document dump in a directory `/path/to/epstein/ocr_txt_files`. You can download them from [the Congressional Google Drive folder](https://drive.google.com/drive/folders/1ldncvdqIf6miiskDp_EDuGSDAaI_fJx8).
-1. Dependencies are in [pyproject.toml](./pyproject.toml). Use `poetry install` for easiest time installing. `pip install .` may or may not work.
+## Usage
+1. Requires you have a local copy of the OCR text files from the House Oversight document release in a directory `/path/to/epstein/ocr_txt_files`. You can download those OCR text files from [the Congressional Google Drive folder](https://drive.google.com/drive/folders/1ldncvdqIf6miiskDp_EDuGSDAaI_fJx8).
+1. Dependencies are in [pyproject.toml](./pyproject.toml). Use `poetry install` for easiest time installing. `pip install epstein-files` should also work, though `pipx install epstein-files` is usually better.
-You need to set the `DOCS_DIR` environment variable with the path to the folder of files you just downloaded when running. You can either create a `.env` file modeled on [`.env.example`](./.env.example) (which will set it permanently) or you can run with:
+You need to set the `EPSTEIN_DOCS_DIR` environment variable with the path to the folder of files you just downloaded when running. You can either create a `.env` file modeled on [`.env.example`](./.env.example) (which will set it permanently) or you can run with:
+```bash
+EPSTEIN_DOCS_DIR=/path/to/epstein/ocr_txt_files epstein_generate --help
+```
+All the tools that come with the package require `EPSTEIN_DOCS_DIR` to be set. These are the available tools:
 ```bash
 # Generate color highlighted texts/emails/other files
-DOCS_DIR=/path/to/epstein/ocr_txt_files epstein_generate
+epstein_generate
+# Search for a string:
+epstein_search Bannon
+# Or a regex:
+epstein_search '\bSteve\s*Bannon\b'
-# Search
-DOCS_DIR=/path/to/epstein/ocr_txt_files epstein_search Bannon
+# Show a file with color highlighting of keywords
+epstein_show 030999
+# Show both the highlighted and raw versions of the file:
+epstein_show --raw 030999
+# This also works:
+epstein_show HOUSE_OVERSIGHT_030999
-# Show a color highlighted file
-DOCS_DIR=/path/to/epstein/ocr_txt_files epstein_show 030999
-# This also works
-DOCS_DIR=/path/to/epstein/ocr_txt_files epstein_show HOUSE_OVERSIGHT_030999
+# Diff two epstein files after all the cleanup (stripping BOMs, matching newline chars, etc):
+epstein_diff 030999 020442
 ```
+The first time you run anything it will take a few minutes to fix all the data, attribute the redacted emails, etc.
 Run `epstein_generate --help` for command line option assistance.
-The first time you run anything it will take a few minutes to fix all the data, attribute the redacted emails, etc. Once you've run things once you can run the `epstein_generate --pickled` to load the cached fixed up data and things will be quick.
 #### As A Library
 ```python
@@ -69,18 +85,18 @@ epstein_files = EpsteinFiles.get_files()
 # All files
 for document in epstein_files.all_documents():
-    do_stuff()
+    do_stuff(document)
 # Emails
 for email in epstein_files.emails:
-    do_stuff()
+    do_stuff(email)
 # iMessage Logs
 for imessage_log in epstein_files.imessage_logs:
-    do_stuff()
+    do_stuff(imessage_log)
 # Other Files
-for document in epstein_files.other_files:
-    do_stuff()
+for file in epstein_files.other_files:
+    do_stuff(file)
 ```

epstein_files-1.0.5/README.md ADDED Viewed

@@ -0,0 +1,69 @@
+# I Made Epstein's Text Messages Great Again
+![joi_ito](docs/joi_ito_gavin_is_clever_epstein_funds_bitcoin_dev_team.png)
+* [I Made Epstein's Text Messages Great Again (And You Should Read Them)](https://cryptadamus.substack.com/p/i-made-epsteins-text-messages-great) post on [Substack](https://cryptadamus.substack.com/p/i-made-epsteins-text-messages-great)
+* The Epstein text messages (and some of the emails along with summary information) generated by this code can be viewed [here](https://michelcrypt4d4mus.github.io/epstein_text_messages/).
+* All of His Emails along with descriptions of the 496 files that were neither emails nor text messages can be read at [another page also generated by this code](https://michelcrypt4d4mus.github.io/epstein_text_messages/all_emails_epstein_files_nov_2025.html).
+* Word counts for the communications are [here](https://michelcrypt4d4mus.github.io/epstein_text_messages/communication_word_count_epstein_files_nov_2025.html).
+* Metadata containing what I have figured out about who sent or received the communications in a given file (and a brief explanation for how I figured it out for each file) is deployed [here](https://michelcrypt4d4mus.github.io/epstein_text_messages/file_metadata_epstein_files_nov_2025.json)
+* Configuration variables assigning specific `HOUSE_OVERSIGHT_XXXXXX.txt` file IDs (the `111111` part) as being emails to or from particular people based on various research and contributions can be found in [constants.py](./epstein_files/util/constants.py). Everything in `constants.py` appears in the JSON metadata linked above.
+## Usage
+1. Requires you have a local copy of the OCR text files from the House Oversight document release in a directory `/path/to/epstein/ocr_txt_files`. You can download those OCR text files from [the Congressional Google Drive folder](https://drive.google.com/drive/folders/1ldncvdqIf6miiskDp_EDuGSDAaI_fJx8).
+1. Dependencies are in [pyproject.toml](./pyproject.toml). Use `poetry install` for easiest time installing. `pip install epstein-files` should also work, though `pipx install epstein-files` is usually better.
+You need to set the `EPSTEIN_DOCS_DIR` environment variable with the path to the folder of files you just downloaded when running. You can either create a `.env` file modeled on [`.env.example`](./.env.example) (which will set it permanently) or you can run with:
+```bash
+EPSTEIN_DOCS_DIR=/path/to/epstein/ocr_txt_files epstein_generate --help
+```
+All the tools that come with the package require `EPSTEIN_DOCS_DIR` to be set. These are the available tools:
+```bash
+# Generate color highlighted texts/emails/other files
+epstein_generate
+# Search for a string:
+epstein_search Bannon
+# Or a regex:
+epstein_search '\bSteve\s*Bannon\b'
+# Show a file with color highlighting of keywords
+epstein_show 030999
+# Show both the highlighted and raw versions of the file:
+epstein_show --raw 030999
+# This also works:
+epstein_show HOUSE_OVERSIGHT_030999
+# Diff two epstein files after all the cleanup (stripping BOMs, matching newline chars, etc):
+epstein_diff 030999 020442
+```
+The first time you run anything it will take a few minutes to fix all the data, attribute the redacted emails, etc.
+Run `epstein_generate --help` for command line option assistance.
+#### As A Library
+```python
+from epstein_files.epstein_files import EpsteinFiles
+epstein_files = EpsteinFiles.get_files()
+# All files
+for document in epstein_files.all_documents():
+    do_stuff(document)
+# Emails
+for email in epstein_files.emails:
+    do_stuff(email)
+# iMessage Logs
+for imessage_log in epstein_files.imessage_logs:
+    do_stuff(imessage_log)
+# Other Files
+for file in epstein_files.other_files:
+    do_stuff(file)
+```

{epstein_files-1.0.3 → epstein_files-1.0.5}/epstein_files/__init__.py RENAMED Viewed

@@ -75,7 +75,7 @@ def epstein_diff():
 def epstein_search():
     """Search the cleaned up text of the files."""
     _assert_positional_args()
-    epstein_files = EpsteinFiles.get_files(use_pickled=True)
+    epstein_files = EpsteinFiles.get_files()
     for search_term in args.positional_args:
         temp_highlighter = build_highlighter(search_term)
@@ -103,27 +103,22 @@ def epstein_show():
     """Show the color highlighted file. If --raw arg is passed, show the raw text of the file as well."""
     _assert_positional_args()
     ids = [extract_file_id(arg) for arg in args.positional_args]
+    raw_docs = [Document(coerce_file_path(id)) for id in ids]
+    docs = [document_cls(doc)(doc.file_path) for doc in raw_docs]
     console.line()
-    if args.pickled:
-        epstein_files = EpsteinFiles.get_files(use_pickled=True)
-        docs = epstein_files.get_documents_by_id(ids)
-    else:
-        raw_docs = [Document(coerce_file_path(id)) for id in ids]
-        docs = [document_cls(doc)(doc.file_path) for doc in raw_docs]
     for doc in docs:
         console.line()
         console.print(doc)
         if args.raw:
             console.line()
-            console.print(Panel(f"*** {doc.url_slug} RAW ***", expand=False, style=doc._border_style()))
+            console.print(Panel(f"RAW {doc.filename} RAW", expand=False, style=doc._border_style()))
             console.print(escape(doc.raw_text()))
             if isinstance(doc, Email):
                 console.line()
-                console.print(Panel(f"*** {doc.url_slug} actual_text ***", expand=False, style=doc._border_style()))
+                console.print(Panel(f"{doc.filename}: actual_text() output", expand=False, style=doc._border_style()))
                 console.print(escape(doc._actual_text()))

{epstein_files-1.0.3 → epstein_files-1.0.5}/epstein_files/documents/document.py RENAMED Viewed

@@ -85,10 +85,9 @@ class Document:
         if self.is_local_extract_file():
             self.url_slug = LOCAL_EXTRACT_REGEX.sub('', file_stem_for_id(self.file_id))
-            cfg_type = type(self.config).__name__ if self.config else None
             # Coerce FileConfig for court docs etc. to MessageCfg for email files extracted from that document
-            if self.class_name() == EMAIL_CLASS and self.config and cfg_type != EmailCfg.__name__:
+            if self.class_name() == EMAIL_CLASS and self.config and not isinstance(self.config, EmailCfg):
                 self.config = EmailCfg.from_doc_cfg(self.config)
         else:
             self.url_slug = self.file_path.stem

{epstein_files-1.0.3 → epstein_files-1.0.5}/epstein_files/documents/email.py RENAMED Viewed

@@ -26,7 +26,7 @@ from epstein_files.util.logging import logger
 from epstein_files.util.rich import *
 BAD_FIRST_LINE_REGEX = re.compile(r'^(>>|Grant_Smith066474"eMailContent.htm|LOVE & KISSES)$')
-BAD_LINE_REGEX = re.compile(r'^(>;?|\d{1,2}|Classification: External Communication|Importance:?\s*High|[iI,•]|i (_ )?i|, [-,]|L\._)$')
+BAD_LINE_REGEX = re.compile(r'^(>;?|\d{1,2}|PAGE INTENTIONALLY LEFT BLANK|Classification: External Communication|Importance:?\s*High|[iI,•]|i (_ )?i|, [-,]|L\._)$')
 DETECT_EMAIL_REGEX = re.compile(r'^(.*\n){0,2}From:')
 LINK_LINE_REGEX = re.compile(f"^(> )?htt")
 QUOTED_REPLY_LINE_REGEX = re.compile(r'wrote:\n', re.IGNORECASE)
@@ -245,12 +245,10 @@ TRUNCATE_TERMS = [
 ]
 # Some Paul Krassner emails have a ton of CCed parties we don't care about
-KRASSNER_RECIPIENTS = uniquify(flatten(ALL_FILE_CONFIGS[id].recipients for id in ['025329', '024923', '033568']))
+KRASSNER_RECIPIENTS = uniquify(flatten([ALL_FILE_CONFIGS[id].recipients for id in ['025329', '024923', '033568']]))
 # No point in ever displaying these; their emails show up elsewhere because they're mostly CC recipients
-USELESS_EMAILERS = IRAN_NUCLEAR_DEAL_SPAM_EMAIL_RECIPIENTS + \
-                   KRASSNER_RECIPIENTS + \
-                   FLIGHT_IN_2012_PEOPLE + [
+USELESS_EMAILERS = FLIGHT_IN_2012_PEOPLE + IRAN_DEAL_RECIPIENTS + KRASSNER_RECIPIENTS + [
     'Alan Rogers',                           # Random CC
     'Andrew Friendly',                       # Presumably some relation of Kelly Friendly
     'BS Stern',                              # A random fwd of email we have
@@ -322,11 +320,18 @@ class Email(Communication):
     def __post_init__(self):
         super().__post_init__()
-        if self.config and self.config.recipients:
-            self.recipients = cast(list[str | None], self.config.recipients)
-        else:
-            for recipient in self.header.recipients():
-                self.recipients.extend(self._get_names(recipient))
+        try:
+            if self.config and self.config.recipients:
+                self.recipients = cast(list[str | None], self.config.recipients)
+            else:
+                for recipient in self.header.recipients():
+                    self.recipients.extend(self._get_names(recipient))
+        except Exception as e:
+            console.print_exception()
+            console.line(2)
+            logger.fatal(f"Failed on {self.file_id}")
+            console.line(2)
+            raise e
         # Remove self CCs
         recipients = [r for r in self.recipients if r != self.author or self.file_id in SELF_EMAILS_FILE_IDS]

{epstein_files-1.0.3 → epstein_files-1.0.5}/epstein_files/epstein_files.py RENAMED Viewed

@@ -19,7 +19,6 @@ from epstein_files.documents.emails.email_header import AUTHOR
 from epstein_files.documents.json_file import JsonFile
 from epstein_files.documents.messenger_log import MSG_REGEX, MessengerLog
 from epstein_files.documents.other_file import OtherFile
-from epstein_files.util.constant.output_files import PICKLED_PATH
 from epstein_files.util.constant.strings import *
 from epstein_files.util.constant.urls import (EPSTEIN_MEDIA, EPSTEIN_WEB, JMAIL, epstein_media_person_url,
      epsteinify_name_url, epstein_web_person_url, search_jmail_url, search_twitter_url)
@@ -35,9 +34,10 @@ from epstein_files.util.rich import (DEFAULT_NAME_STYLE, NA_TXT, add_cols_to_tab
 from epstein_files.util.search_result import SearchResult
 from epstein_files.util.timer import Timer
+EXCLUDED_EMAILERS = [e.lower() for e in (USELESS_EMAILERS + [JEFFREY_EPSTEIN])]
+PICKLED_PATH = Path("the_epstein_files.pkl.gz")
 DEVICE_SIGNATURE = 'Device Signature'
 DEVICE_SIGNATURE_PADDING = (1, 0)
-NOT_INCLUDED_EMAILERS = [e.lower() for e in (USELESS_EMAILERS + [JEFFREY_EPSTEIN])]
 SLOW_FILE_SECONDS = 1.0
 INVALID_FOR_EPSTEIN_WEB = JUNK_EMAILERS + KRASSNER_RECIPIENTS + [
@@ -94,23 +94,23 @@ class EpsteinFiles:
         self._tally_email_data()
     @classmethod
-    def get_files(cls, timer: Timer | None = None, use_pickled: bool = False) -> 'EpsteinFiles':
+    def get_files(cls, timer: Timer | None = None) -> 'EpsteinFiles':
         """Alternate constructor that reads/writes a pickled version of the data ('timer' arg is for logging)."""
         timer = timer or Timer()
-        if ((args.pickled or use_pickled) and PICKLED_PATH.exists()) and not args.overwrite_pickle:
+        if PICKLED_PATH.exists() and not args.overwrite_pickle:
             with gzip.open(PICKLED_PATH, 'rb') as file:
                 epstein_files = pickle.load(file)
                 timer.print_at_checkpoint(f"Loaded {len(epstein_files.all_files):,} documents from '{PICKLED_PATH}' ({file_size_str(PICKLED_PATH)})")
                 epstein_files.timer = timer
                 return epstein_files
+        logger.warning(f"Building new cache file, this will take a few minutes...")
         epstein_files = EpsteinFiles(timer=timer)
-        if args.overwrite_pickle or not PICKLED_PATH.exists():
-            with gzip.open(PICKLED_PATH, 'wb') as file:
-                pickle.dump(epstein_files, file)
-                logger.warning(f"Pickled data to '{PICKLED_PATH}' ({file_size_str(PICKLED_PATH)})...")
+        with gzip.open(PICKLED_PATH, 'wb') as file:
+            pickle.dump(epstein_files, file)
+            logger.warning(f"Pickled data to '{PICKLED_PATH}' ({file_size_str(PICKLED_PATH)})...")
         timer.print_at_checkpoint(f'Processed {len(epstein_files.all_files):,} documents')
         return epstein_files
@@ -119,9 +119,9 @@ class EpsteinFiles:
         return self.imessage_logs + self.emails + self.other_files
     def all_emailers(self, include_useless: bool = False) -> list[str | None]:
-        """Returns all emailers except Epstein and USELESS_EMAILERS, sorted from least frequent to most."""
+        """Returns all emailers except Epstein and EXCLUDED_EMAILERS, sorted from least frequent to most."""
         names = [a for a in self.email_author_counts.keys()] + [r for r in self.email_recipient_counts.keys()]
-        names = names if include_useless else [e for e in names if e is None or e.lower() not in NOT_INCLUDED_EMAILERS]
+        names = names if include_useless else [e for e in names if e is None or e.lower() not in EXCLUDED_EMAILERS]
         return sorted(list(set(names)), key=lambda e: self.email_author_counts[e] + self.email_recipient_counts[e])
     def attributed_email_count(self) -> int:
@@ -200,10 +200,10 @@ class EpsteinFiles:
     def json_metadata(self) -> str:
         """Create a JSON string containing metadata for all the files."""
         metadata = {
-            EMAIL_CLASS: _sorted_metadata(self.emails),
-            JSON_FILE_CLASS: _sorted_metadata(self.json_files),
-            MESSENGER_LOG_CLASS: _sorted_metadata(self.imessage_logs),
-            OTHER_FILE_CLASS: _sorted_metadata(self.non_json_other_files()),
+            Email.__name__: _sorted_metadata(self.emails),
+            JsonFile.__name__: _sorted_metadata(self.json_files),
+            MessengerLog.__name__: _sorted_metadata(self.imessage_logs),
+            OtherFile.__name__: _sorted_metadata(self.non_json_other_files()),
         }
         return json.dumps(metadata, indent=4, sort_keys=True)
@@ -372,12 +372,12 @@ def count_by_month(docs: Sequence[Document]) -> dict[str | None, int]:
     return counts
-def document_cls(document: Document) -> Type[Document]:
-    search_area = document.text[0:5000]  # Limit search area to avoid pointless scans of huge files
+def document_cls(doc: Document) -> Type[Document]:
+    search_area = doc.text[0:5000]  # Limit search area to avoid pointless scans of huge files
-    if document.text[0] == '{':
+    if doc.text[0] == '{':
         return JsonFile
-    elif isinstance(document.config, EmailCfg) or DETECT_EMAIL_REGEX.match(search_area):
+    elif isinstance(doc.config, EmailCfg) or (DETECT_EMAIL_REGEX.match(search_area) and doc.config is None):
         return Email
     elif MSG_REGEX.search(search_area):
         return MessengerLog

{epstein_files-1.0.3 → epstein_files-1.0.5}/epstein_files/util/constant/output_files.py RENAMED Viewed

@@ -1,11 +1,10 @@
 from pathlib import Path
-PICKLED_PATH = Path("the_epstein_files.pkl.gz")
-EPSTEIN_FILES_NOV_2025 = 'epstein_files_nov_2025'
 URLS_ENV = '.urls.env'
+# Files output by the code
 HTML_DIR = Path('docs')
+EPSTEIN_FILES_NOV_2025 = 'epstein_files_nov_2025'
 ALL_EMAILS_PATH = HTML_DIR.joinpath(f'all_emails_{EPSTEIN_FILES_NOV_2025}.html')
 JSON_METADATA_PATH = HTML_DIR.joinpath(f'file_metadata_{EPSTEIN_FILES_NOV_2025}.json')
 TEXT_MSGS_HTML_PATH = HTML_DIR.joinpath('index.html')

{epstein_files-1.0.3 → epstein_files-1.0.5}/epstein_files/util/constant/strings.py RENAMED Viewed

@@ -2,13 +2,6 @@ import re
 from typing import Literal
-# Document subclass names (this sucks)
-DOCUMENT_CLASS = 'Document'
-EMAIL_CLASS = 'Email'
-JSON_FILE_CLASS = 'JsonFile'
-MESSENGER_LOG_CLASS = 'MessengerLog'
-OTHER_FILE_CLASS = 'OtherFile'
 # categories
 ACADEMIA = 'academia'
 ARTS = 'arts'
@@ -27,6 +20,7 @@ POLITICS = 'politics'
 PROPERTY = 'property'
 PUBLICIST = 'publicist'
 REPUTATION = 'reputation'
+SKYPE_LOG= 'skype log'
 SOCIAL = 'social'
 SPEECH = 'speech'
@@ -76,5 +70,12 @@ FILE_STEM_REGEX = re.compile(fr"{HOUSE_OVERSIGHT_PREFIX}(\d{{6}}(_\d{{1,2}})?)")
 FILE_NAME_REGEX = re.compile(fr"{FILE_STEM_REGEX.pattern}(\.txt(\.json)?)?")
 QUESTION_MARKS_REGEX = re.compile(fr' {re.escape(QUESTION_MARKS)}$')
+# Document subclass names (this sucks)
+DOCUMENT_CLASS = 'Document'
+EMAIL_CLASS = 'Email'
+JSON_FILE_CLASS = 'JsonFile'
+MESSENGER_LOG_CLASS = 'MessengerLog'
+OTHER_FILE_CLASS = 'OtherFile'
 remove_question_marks = lambda name: QUESTION_MARKS_REGEX.sub('', name)

{epstein_files-1.0.3 → epstein_files-1.0.5}/epstein_files/util/constant/urls.py RENAMED Viewed

@@ -47,7 +47,7 @@ extracted_file_url = lambda f: f"{EXTRACTS_BASE_URL}/{f}"
 COFFEEZILLA_ARCHIVE_URL = 'https://journaliststudio.google.com/pinpoint/search?collection=061ce61c9e70bdfd'
 COURIER_NEWSROOM_ARCHIVE_URL = 'https://journaliststudio.google.com/pinpoint/search?collection=092314e384a58618'
 EPSTEINIFY_URL = 'https://epsteinify.com'
-EPSTEIN_MEDIA_URL = 'https://www.epstein.media'
+EPSTEIN_MEDIA_URL = 'https://epstein.media'
 EPSTEIN_WEB_URL = 'https://epsteinweb.org'
 JMAIL_URL = 'https://jmail.world'
 OVERSIGHT_REPUBLICANS_PRESSER_URL = 'https://oversight.house.gov/release/oversight-committee-releases-additional-epstein-estate-documents/'

{epstein_files-1.0.3 → epstein_files-1.0.5}/epstein_files/util/constants.py RENAMED Viewed

@@ -1,5 +1,6 @@
 import re
 from copy import deepcopy
+from typing import cast
 from dateutil.parser import parse
@@ -84,7 +85,7 @@ EMAILER_ID_REGEXES: dict[str, re.Pattern] = {
     JAMES_HILL: re.compile(r"hill, james e.|james.e.hill@abc.com", re.IGNORECASE),
     JEAN_LUC_BRUNEL: re.compile(r'Jean[- ]Luc Brunel?', re.IGNORECASE),
     JEFF_FULLER: re.compile(r"jeff@mc2mm.com|Jeff Fuller", re.IGNORECASE),
-    JEFFREY_EPSTEIN: re.compile(r'[djl]ee[vy]acation[©@]?g?(mail.com)?|Epstine|\bJEE?\b|Jeffrey E((sp|ps)tein?)?|jeeproject@yahoo.com|J Jep|Jeffery Edwards|(?<!Mark L. )Epstein', re.IGNORECASE),
+    JEFFREY_EPSTEIN: re.compile(r'[djl]\s?ee[vy]acation[©@]?g?(mail.com)?|Epstine|\bJEE?\b|Jeffrey E((sp|ps)tein?)?|jeeproject@yahoo.com|J Jep|Jeffery Edwards|(?<!Mark L. )Epstein', re.IGNORECASE),
     JESSICA_CADWELL: re.compile(r'Jessica Cadwell?', re.IGNORECASE),
     JOHNNY_EL_HACHEM: re.compile(r'el hachem johnny|johnny el hachem', re.IGNORECASE),
     JOI_ITO: re.compile(r'ji@media.mit.?edu|(joichi|joi)( Ito)?', re.IGNORECASE),
@@ -94,7 +95,7 @@ EMAILER_ID_REGEXES: dict[str, re.Pattern] = {
     LANDON_THOMAS: re.compile(r'lando[nr] thomas( jr)?|thomas jr.?, lando[nr]', re.IGNORECASE),
     LARRY_SUMMERS: re.compile(r'(La(wrence|rry).{1,5})?Summers?|^LH$|LHS|Ihsofficel', re.IGNORECASE),
     LAWRANCE_VISOSKI: re.compile(r'La(rry|wrance) Visoski?|Lvjet', re.IGNORECASE),
-    LAWRENCE_KRAUSS: re.compile(r'Lawrence Kraus|lawkrauss', re.IGNORECASE),
+    LAWRENCE_KRAUSS: re.compile(r'Lawrence Kraus|[jl]awkrauss', re.IGNORECASE),
     LEON_BLACK: re.compile(r'Leon Black?', re.IGNORECASE),
     MANUELA_MARTINEZ: re.compile(fr'Manuela (- Mega Partners|Martinez)', re.IGNORECASE),
     MARIANA_IDZKOWSKA: re.compile(r'Mariana [Il]d[źi]kowska?', re.IGNORECASE),
@@ -268,7 +269,7 @@ SHIMON_POST = 'The Shimon Post'
 SHIMON_POST_ARTICLE = f'selection of articles about the mideast'
 SINGLE_PAGE = 'single page of'
 STRANGE_BEDFELLOWS = "'Strange Bedfellows' list of invitees f. Johnny Depp, Woody Allen, Obama, and more"
-SWEDISH_LIFE_SCIENCES_SUMMIT = f"{BARBRO_C_EHNBOM}'s Swedish American Life Science Summit"
+SWEDISH_LIFE_SCIENCES_SUMMIT = f"{BARBRO_C_EHNBOM}'s Swedish American Life Science Summit (SALSS)"
 THE_REAL_DEAL_ARTICLE = 'article by Keith Larsen'
 TRUMP_DISCLOSURES = f"Donald Trump financial disclosures from U.S. Office of Government Ethics"
 UBS_CIO_REPORT = 'CIO Monthly Extended report'
@@ -371,8 +372,8 @@ TEXTS_CONFIG = CONFIRMED_TEXTS_CONFIG + UNCONFIRMED_TEXTS_CONFIG
 ########################################################################################################
 # Some emails have a lot of uninteresting CCs
-IRAN_NUCLEAR_DEAL_SPAM_EMAIL_RECIPIENTS: list[str | None] = ['Allen West', 'Rafael Bardaji', 'Philip Kafka', 'Herb Goodman', 'Grant Seeger', 'Lisa Albert', 'Janet Kafka', 'James Ramsey', 'ACT for America', 'John Zouzelka', 'Joel Dunn', 'Nate McClain', 'Bennet Greenwald', 'Taal Safdie', 'Uri Fouzailov', 'Neil Anderson', 'Nate White', 'Rita Hortenstine', 'Henry Hortenstine', 'Gary Gross', 'Forrest Miller', 'Bennett Schmidt', 'Val Sherman', 'Marcie Brown', 'Michael Horowitz', 'Marshall Funk']
-FLIGHT_IN_2012_PEOPLE: list[str | None] = ['Francis Derby', 'Januiz Banasiak', 'Louella Rabuyo', 'Richard Barnnet']
+IRAN_DEAL_RECIPIENTS = ['Allen West', 'Rafael Bardaji', 'Philip Kafka', 'Herb Goodman', 'Grant Seeger', 'Lisa Albert', 'Janet Kafka', 'James Ramsey', 'ACT for America', 'John Zouzelka', 'Joel Dunn', 'Nate McClain', 'Bennet Greenwald', 'Taal Safdie', 'Uri Fouzailov', 'Neil Anderson', 'Nate White', 'Rita Hortenstine', 'Henry Hortenstine', 'Gary Gross', 'Forrest Miller', 'Bennett Schmidt', 'Val Sherman', 'Marcie Brown', 'Michael Horowitz', 'Marshall Funk']
+FLIGHT_IN_2012_PEOPLE = ['Francis Derby', 'Januiz Banasiak', 'Louella Rabuyo', 'Richard Barnnet']
 EMAILS_CONFIG = [
     EmailCfg(id='032436', author=ALIREZA_ITTIHADIEH, attribution_reason='Signature'),
@@ -491,9 +492,6 @@ EMAILS_CONFIG = [
     EmailCfg(id='032727', author=KATHRYN_RUEMMLER, attribution_reason=KATHY_REASON, is_attribution_uncertain=True),
     EmailCfg(id='030478', author=LANDON_THOMAS),
     EmailCfg(id='029013', author=LARRY_SUMMERS, recipients=[JEFFREY_EPSTEIN]),    # Bad OCR (nofix)
-    EmailCfg(id='032206', author=LAWRENCE_KRAUSS),                                # More of a text convo?
-    EmailCfg(id='032208', author=LAWRENCE_KRAUSS, recipients=[JEFFREY_EPSTEIN]),  # More of a text convo?
-    EmailCfg(id='032209', author=LAWRENCE_KRAUSS, recipients=[JEFFREY_EPSTEIN]),  # More of a text convo?
     EmailCfg(id='029196', author=LAWRENCE_KRAUSS, recipients=[JEFFREY_EPSTEIN], actual_text='Talk in 40?'),
     EmailCfg(id='033593', author=LAWRANCE_VISOSKI, attribution_reason='Signature'),
     EmailCfg(id='033370', author=LAWRANCE_VISOSKI, attribution_reason=LARRY_REASON),
@@ -575,7 +573,7 @@ EMAILS_CONFIG = [
         attribution_reason='ends with "Respectfully, terry"',
         author=TERRY_KAFKA,
         fwded_text_after='From: Mike Cohen',
-        recipients=[JEFFREY_EPSTEIN, MARK_EPSTEIN, MICHAEL_BUCHHOLTZ] + IRAN_NUCLEAR_DEAL_SPAM_EMAIL_RECIPIENTS,
+        recipients=[JEFFREY_EPSTEIN, MARK_EPSTEIN, MICHAEL_BUCHHOLTZ] + IRAN_DEAL_RECIPIENTS,
         duplicate_ids=['028482'],
     ),
     EmailCfg(id='029992', author=TERRY_KAFKA, attribution_reason='Quoted reply'),
@@ -600,7 +598,6 @@ EMAILS_CONFIG = [
     EmailCfg(id='022202', recipients=[JEAN_LUC_BRUNEL], attribution_reason='Follow up / reply', duplicate_ids=['029975']),
     EmailCfg(id='022187', recipients=[JEFFREY_EPSTEIN]),  # Bad OCR (nofix)
     EmailCfg(id='031489', recipients=[JEFFREY_EPSTEIN]),  # Bad OCR (unfixable)
-    EmailCfg(id='032210', recipients=[JEFFREY_EPSTEIN]),  # More of a text convo?
     EmailCfg(id='030347', recipients=[JEFFREY_EPSTEIN]),  # Bad OCR (nofix)
     EmailCfg(id='030367', recipients=[JEFFREY_EPSTEIN]),  # Bad OCR (nofix)
     EmailCfg(id='033274', recipients=[JEFFREY_EPSTEIN]),  # this is a note sent to self
@@ -751,7 +748,7 @@ EMAILS_CONFIG = [
     EmailCfg(id='031118', duplicate_ids=['019465']),
     EmailCfg(id='031912', duplicate_ids=['032158']),
     EmailCfg(id='030587', duplicate_ids=['030514']),
-    EmailCfg(id='029773', duplicate_ids=['012685']),
+    EmailCfg(id='029773', duplicate_ids=['012685'], fwded_text_after='Omar Quadhafi'),
     EmailCfg(id='033297', duplicate_ids=['033586']),
     EmailCfg(id='031089', duplicate_ids=['018084']),
     EmailCfg(id='031088', duplicate_ids=['030885']),
@@ -1195,7 +1192,7 @@ OTHER_FILES_CONFERENCES = [
     DocCfg(id='019300', author=SVETLANA_POZHIDAEVA, description=f'{WOMEN_EMPOWERMENT} f. {KATHRYN_RUEMMLER}', date='2019-04-05'),
     DocCfg(id='022267', author=SVETLANA_POZHIDAEVA, description=f'{WOMEN_EMPOWERMENT} founder essay about growing the seminar business'),
     DocCfg(id='022407', author=SVETLANA_POZHIDAEVA, description=f'{WOMEN_EMPOWERMENT} seminar pitch deck'),
-    DocCfg(id='017524', author=SWEDISH_LIFE_SCIENCES_SUMMIT, description=f"2012 program"),
+    DocCfg(id='017524', author=SWEDISH_LIFE_SCIENCES_SUMMIT, description=f"2012 program emailed to epstein BY {BARBRO_C_EHNBOM} in 031226", date='2012-08-18'),
     DocCfg(id='026747', author=SWEDISH_LIFE_SCIENCES_SUMMIT, description=f"2017 program", date='2017-08-23'),
     DocCfg(id='014951', author='TED Talks', description=f"2017 program", date='2017-04-20'),
     DocCfg(id='024179', author=UN_GENERAL_ASSEMBLY, description=f'president and first lady schedule', date='2012-09-21'),
@@ -1326,7 +1323,7 @@ OTHER_FILES_LETTERS = [
 ]
 OTHER_FILES_PROPERTY = [
-    DocCfg(id='026759', author='Great Bay Condominium Owners Association', description=f'{PRESS_RELEASE} by about Hurricane Irma damage', date='2017-09-13'),
+    DocCfg(id='026759', author='Great Bay Condominium Owners Association', description=f'{PRESS_RELEASE} about Hurricane Irma damage', date='2017-09-13'),
     DocCfg(id='016602', author=PALM_BEACH_CODE_ENFORCEMENT, description='board minutes', date='2008-04-17'),
     DocCfg(id='016554', author=PALM_BEACH_CODE_ENFORCEMENT, description='board minutes', date='2008-07-17', duplicate_ids=['016616', '016574']),
     DocCfg(id='027068', author=THE_REAL_DEAL, description=f"{THE_REAL_DEAL_ARTICLE} Palm House Hotel Bankruptcy and EB-5 Visa Fraud Allegations"),
@@ -1379,8 +1376,8 @@ OTHER_FILES_SOCIAL = [
 ]
 OTHER_FILES_POLITICS = [
-    DocCfg(id='029918', author=DIANA_DEGETTE_CAMPAIGN, description=f"bio", date='2012-01-01'),
-    DocCfg(id='031184', author=DIANA_DEGETTE_CAMPAIGN, description=f"fundraiser invitation"),
+    DocCfg(id='029918', author=DIANA_DEGETTE_CAMPAIGN, description=f"bio", date='2012-09-27'),
+    DocCfg(id='031184', author=DIANA_DEGETTE_CAMPAIGN, description=f"invitation to fundraiser hosted by {BARBRO_C_EHNBOM}", date='2012-09-27'),
     DocCfg(id='026827', author='Scowcroft Group', description=f'report on ISIS', date='2015-11-14'),
     DocCfg(id='024294', author=STACEY_PLASKETT, description=f"campaign flier", date='2016-10-01'),
     DocCfg(
@@ -1482,6 +1479,11 @@ OTHER_FILES_ARTS = [
 OTHER_FILES_MISC = [
     DocCfg(id='022780', category=FLIGHT_LOGS),
     DocCfg(id='022816', category=FLIGHT_LOGS),
+    DocCfg(id='032206', category=SKYPE_LOG, author=LAWRENCE_KRAUSS),
+    DocCfg(id='032208', category=SKYPE_LOG, author=LAWRENCE_KRAUSS),
+    DocCfg(id='032209', category=SKYPE_LOG, author=LAWRENCE_KRAUSS),
+    DocCfg(id='018224', category=SKYPE_LOG, author=LAWRENCE_KRAUSS, description=f'conversations with linkspirit (French?) and {LAWRENCE_KRAUSS}'),
+    DocCfg(id='032210', category=SKYPE_LOG, description=f'conversation with linkspirit'),
     DocCfg(
         id='025147',
         author=BROCKMAN_INC,
@@ -1496,7 +1498,6 @@ OTHER_FILES_MISC = [
     DocCfg(id='027074', author=FEMALE_HEALTH_COMPANY, description=f"pitch deck (USAID was a customer)"),
     DocCfg(id='032735', author=GORDON_GETTY, description=f"on Trump", date='2018-03-20'),  # Dated based on concurrent emails from Getty
     DocCfg(id='025540', author=JEFFREY_EPSTEIN, description=f"rough draft of Epstein's side of the story?"),
-    DocCfg(id='018224', author=LAWRENCE_KRAUSS, description=f"Skype conversation log"),
     DocCfg(id='026634', author='Michael Carrier', description=f"comments about an Apollo linked hedge fund 'DE Fund VIII'"),
     DocCfg(id='031425', author=SCOTT_J_LINK, description=f'completely redacted email from'),
     DocCfg(id='020447', author='Working Group on Chinese Influence Activities in the U.S.', description=f'Promoting Constructive Vigilance'),
@@ -1589,8 +1590,8 @@ SENT_FROM_REGEX = re.compile(r'^(?:(Please forgive|Sorry for all the) typos.{1,4
 # Error checking.
-if len(OTHER_FILES_CONFIG) != 438:
-    logger.warning(f"Only {len(OTHER_FILES_CONFIG)} configured other files!")
+if len(OTHER_FILES_CONFIG) != 442:
+    logger.warning(f"Found {len(OTHER_FILES_CONFIG)} configured other files!")
 encountered_file_ids = set()

{epstein_files-1.0.3 → epstein_files-1.0.5}/epstein_files/util/doc_cfg.py RENAMED Viewed

@@ -109,7 +109,9 @@ class DocCfg:
     def info_str(self) -> str | None:
         """String that summarizes what is known about this document."""
-        if self.category == REPUTATION:
+        if self.category and not self.description:
+            return self.category
+        elif self.category == REPUTATION:
             return f"{REPUTATION_MGMT}: {self.description}"
         elif self.author and self.description:
             if self.category in [ACADEMIA, BOOK]:

epstein_files-1.0.5/epstein_files/util/env.py ADDED Viewed

@@ -0,0 +1,84 @@
+import logging
+from argparse import ArgumentParser
+from os import environ
+from pathlib import Path
+from sys import argv
+from epstein_files.util.logging import datefinder_logger, env_log_level, logger
+COUNT_WORDS_SCRIPT = 'count_words.py'
+DEFAULT_WIDTH = 145
+HTML_SCRIPTS = ['epstein_generate', 'generate_html.py', COUNT_WORDS_SCRIPT]
+parser = ArgumentParser(description="Parse epstein OCR docs and generate HTML page.")
+parser.add_argument('--name', '-n', action='append', dest='names', help='specify the name(s) whose communications should be output')
+parser.add_argument('--overwrite-pickle', '-op', action='store_true', help='ovewrite cached EpsteinFiles')
+output = parser.add_argument_group('OUTPUT')
+output.add_argument('--all-emails', '-ae', action='store_true', help='all the emails instead of just the interesting ones')
+output.add_argument('--all-other-files', '-ao', action='store_true', help='all the non-email, non-text msg files instead of just the interesting ones')
+output.add_argument('--build', '-b', action='store_true', help='write output to HTML file')
+output.add_argument('--make-clean', '-mc', action='store_true', help='delete all build artifact HTML and JSON files')
+output.add_argument('--output-emails', '-oe', action='store_true', help='generate other files section')
+output.add_argument('--output-other-files', '-oo', action='store_true', help='generate other files section')
+output.add_argument('--output-texts', '-ot', action='store_true', help='generate other files section')
+output.add_argument('--suppress-output', action='store_true', help='no output to terminal (use with --build)')
+output.add_argument('--width', '-w', type=int, default=DEFAULT_WIDTH, help='screen width to use (in characters)')
+output.add_argument('--use-epstein-web-links', action='store_true', help='use epsteinweb.org links instead of epstein.media')
+scripts = parser.add_argument_group('SCRIPTS', 'Arguments used only by epstein_search, epstein_show, epstein_diff')
+scripts.add_argument('positional_args', nargs='*', help='strings to searchs for, file IDs to show or diff, etc.')
+scripts.add_argument('--raw', '-r', action='store_true', help='show raw contents of file (only used by scripts)')
+scripts.add_argument('--whole-file', '-wf', action='store_true', help='print whole file (only used by epstein_search)')
+debug = parser.add_argument_group('DEBUG')
+debug.add_argument('--colors-only', '-c', action='store_true', help='print header with color key table and links and exit')
+debug.add_argument('--debug', '-d', action='store_true', help='set debug level to INFO')
+debug.add_argument('--deep-debug', '-dd', action='store_true', help='set debug level to DEBUG')
+debug.add_argument('--json-metadata', '-jm', action='store_true', help='dump JSON metadata for all files')
+debug.add_argument('--json-stats', '-j', action='store_true', help='print JSON formatted stats at the end')
+debug.add_argument('--sort-alphabetical', action='store_true', help='sort emailers alphabetically in counts table')
+debug.add_argument('--suppress-logs', '-sl', action='store_true', help='set debug level to FATAL')
+args = parser.parse_args()
+current_script = Path(argv[0]).name
+is_env_var_set = lambda s: len(environ.get(s) or '') > 0
+is_html_script = current_script in HTML_SCRIPTS
+args.debug = args.deep_debug or args.debug or is_env_var_set('DEBUG')
+args.output_emails = args.output_emails or args.all_emails
+args.output_other_files = args.output_other_files or args.all_other_files
+args.overwrite_pickle = args.overwrite_pickle or (is_env_var_set('OVERWRITE_PICKLE') and not is_env_var_set('PICKLED'))
+args.width = args.width if is_html_script else None
+specified_names: list[str | None] = [None if n == 'None' else n for n in (args.names or [])]
+# Log level args
+if args.deep_debug:
+    logger.setLevel(logging.DEBUG)
+elif args.debug:
+    logger.setLevel(logging.INFO)
+elif args.suppress_logs:
+    logger.setLevel(logging.FATAL)
+elif not env_log_level:
+    logger.setLevel(logging.WARNING)
+logger.info(f'Log level set to {logger.level}...')
+datefinder_logger.setLevel(logger.level)
+# Massage args that depend on other args to the appropriate state
+if not (args.json_metadata or args.output_texts or args.output_emails or args.output_other_files):
+    if is_html_script and current_script != COUNT_WORDS_SCRIPT and not args.make_clean and not args.colors_only:
+        logger.warning(f"No output section chosen; outputting default selection of texts, selected emails, and other files...")
+    args.output_texts = True
+    args.output_emails = True
+    args.output_other_files = True
+if args.use_epstein_web_links:
+    logger.warning(f"Using links to epsteinweb.org links instead of epsteinify.com...")
+if args.debug:
+    logger.warning(f"Invocation args:\ncurrent_script={current_script}\nis_html_script={is_html_script},\nspecified_names={specified_names},\nargs={args}")

{epstein_files-1.0.3 → epstein_files-1.0.5}/epstein_files/util/highlighted_group.py RENAMED Viewed

@@ -159,7 +159,7 @@ HIGHLIGHTED_NAMES = [
         pattern=r'Gruterite|(John\s*)?Kluge|Marc Rich|(Mi(chael|ke)\s*)?Ovitz|(Steve\s+)?Wynn|(Les(lie)?\s+)?Wexner|SALSS|Swedish[-\s]*American\s*Life\s*Science\s*Summit|Valhi|(Yves\s*)?Bouvier',
         emailers = {
             ALIREZA_ITTIHADIEH: 'CEO Freestream Aircraft Limited',
-            BARBRO_C_EHNBOM: 'Swedish pharmaceuticals',
+            BARBRO_C_EHNBOM: 'Swedish pharmaceuticals, SALSS',
             FRED_HADDAD: "co-founder of Heck's in West Virginia",
             GERALD_BARTON: "Maryland property developer Landmark Land Company, fan of Trump's Irish golf course",
             GORDON_GETTY: 'heir of oil tycoon J. Paul Getty',
@@ -296,6 +296,7 @@ HIGHLIGHTED_NAMES = [
         emailers = {
             DAVID_STERN: f'emailed Epstein from Moscow, appears to know chairman of {DEUTSCHE_BANK}',
             JONATHAN_FARKAS: "heir to the Alexander's department store fortune",
+            'linkspirit': "Skype username of someone Epstein communicated with",
             'Peter Thomas Roth': 'student of Epstein at Dalton, skincare company founder',
             STEPHEN_HANSON: None,
             TOM_BARRACK: 'long time friend of Trump',
@@ -304,7 +305,7 @@ HIGHLIGHTED_NAMES = [
     HighlightedNames(
         label='finance',
         style='green',
-        pattern=r'Apollo|Ari\s*Glass|(Bernie\s*)?Madoff|Black(rock|stone)|BofA|Boothbay(\sFund\sManagement)?|Chase\s*Bank|Credit\s*Suisse|DB|Deutsche\s*(Asset|Bank)|Electron\s*Capital\s*(Partners)?|Fenner|FRBNY|Goldman(\s*Sachs)|HSBC|Invesco|(Janet\s*)?Yellen|(Jerome\s*)?Powell(?!M\. Cabot)|(Jimmy\s*)?Cayne|JPMC?|j\.?p\.?\s*morgan(\.?com|\s*Chase)?|Madoff|Merrill(\s*Lynch)?|(Michael\s*)?(Cembalest|Milken)|Mizrahi\s*Bank|MLPF&S|(money\s+)?launder(s?|ers?|ing)?(\s+money)?|Morgan Stanley|(Peter L. )?Scher|(Ray\s*)?Dalio|Schwartz?man|Serageldin|UBS|us.gio@jpmorgan.com',
+        pattern=r'Apollo|Ari\s*Glass|Bank|(Bernie\s*)?Madoff|Black(rock|stone)|B\s*of\s*A|Boothbay(\sFund\sManagement)?|Chase\s*Bank|Credit\s*Suisse|DB|Deutsche\s*(Asset|Bank)|Electron\s*Capital\s*(Partners)?|Fenner|FRBNY|Goldman(\s*Sachs)|HSBC|Invesco|(Janet\s*)?Yellen|(Jerome\s*)?Powell(?!M\. Cabot)|(Jimmy\s*)?Cayne|JPMC?|j\.?p\.?\s*morgan(\.?com|\s*Chase)?|Madoff|Merrill(\s*Lynch)?|(Michael\s*)?(Cembalest|Milken)|Mizrahi\s*Bank|MLPF&S|(money\s+)?launder(s?|ers?|ing)?(\s+money)?|Morgan Stanley|(Peter L. )?Scher|(Ray\s*)?Dalio|Schwartz?man|Serageldin|UBS|us.gio@jpmorgan.com',
         emailers={
             AMANDA_ENS: 'Citigroup',
             DANIEL_SABBA: 'UBS Investment Bank',
@@ -587,7 +588,7 @@ HIGHLIGHTED_NAMES = [
     HighlightedText(
         label='phone_number',
         style='bright_green',
-        pattern=r"\+?(1?\(?\d{3}\)?[- ]\d{3}[- ]\d{4}|\d{2}[- ]\(?0?\)?\d{2}[- ]\d{4}[- ]\d{4})|[\d+]{10,12}",
+        pattern=r"\+?(1?\(?\d{3}\)?[- ]\d{3}[- ]\d{4}|\d{2}[- ]\(?0?\)?\d{2}[- ]\d{4}[- ]\d{4})|\b[\d+]{10,12}\b",
     ),
 ]

{epstein_files-1.0.3 → epstein_files-1.0.5}/epstein_files/util/output.py RENAMED Viewed

@@ -7,7 +7,6 @@ from epstein_files.util.constant.output_files import JSON_METADATA_PATH
 from epstein_files.util.constant import urls
 from epstein_files.util.constant.html import *
 from epstein_files.util.constant.names import *
-from epstein_files.util.constant.strings import EMAIL_CLASS, MESSENGER_LOG_CLASS
 from epstein_files.util.data import dict_sets_to_lists
 from epstein_files.util.env import args, specified_names
 from epstein_files.util.logging import log_file_write, logger
@@ -122,9 +121,9 @@ def print_json_metadata(epstein_files: EpsteinFiles) -> None:
 def print_json_stats(epstein_files: EpsteinFiles) -> None:
     console.line(5)
     console.print(Panel('JSON Stats Dump', expand=True, style='reverse bold'), '\n')
-    print_json(f"{MESSENGER_LOG_CLASS} Sender Counts", MessengerLog.count_authors(epstein_files.imessage_logs), skip_falsey=True)
-    print_json(f"{EMAIL_CLASS} Author Counts", epstein_files.email_author_counts, skip_falsey=True)
-    print_json(f"{EMAIL_CLASS} Recipient Counts", epstein_files.email_recipient_counts, skip_falsey=True)
+    print_json(f"MessengerLog Sender Counts", MessengerLog.count_authors(epstein_files.imessage_logs), skip_falsey=True)
+    print_json(f"Email Author Counts", epstein_files.email_author_counts, skip_falsey=True)
+    print_json(f"Email Recipient Counts", epstein_files.email_recipient_counts, skip_falsey=True)
     print_json("Email signature_substitution_countss", epstein_files.email_signature_substitution_counts(), skip_falsey=True)
     print_json("email_author_device_signatures", dict_sets_to_lists(epstein_files.email_authors_to_device_signatures))
     print_json("email_sent_from_devices", dict_sets_to_lists(epstein_files.email_device_signatures_to_authors))
@@ -147,16 +146,12 @@ def print_text_messages(epstein_files: EpsteinFiles) -> None:
 def write_urls() -> None:
     """Write _URL style constant variables to a file bash scripts can load as env vars."""
-    if args.output_file == 'index.html':
-        logger.warning(f"Can't write env vars to '{args.output_file}', writing to '{URLS_ENV}' instead.\n")
-        args.output_file = URLS_ENV
     url_vars = {
         k: v for k, v in vars(urls).items()
         if isinstance(v, str) and k.split('_')[-1] in ['URL'] and 'github.io' in v and 'BASE' not in k
     }
-    with open(args.output_file, 'w') as f:
+    with open(URLS_ENV, 'w') as f:
         for var_name, url in url_vars.items():
             key_value = f"{var_name}='{url}'"
@@ -166,7 +161,7 @@ def write_urls() -> None:
             f.write(f"{key_value}\n")
     console.line()
-    logger.warning(f"Wrote {len(url_vars)} URL variables to '{args.output_file}'\n")
+    logger.warning(f"Wrote {len(url_vars)} URL variables to '{URLS_ENV}'\n")
 def _verify_all_emails_were_printed(epstein_files: EpsteinFiles, already_printed_emails: list[Email]) -> None:

{epstein_files-1.0.3 → epstein_files-1.0.5}/epstein_files/util/rich.py RENAMED Viewed

@@ -231,10 +231,13 @@ def print_other_site_link(is_header: bool = True) -> None:
     other_site_msg += f" Epstein's {other_site_type}s also generated by this code"
     markup_msg = link_markup(SITE_URLS[other_site_type], other_site_msg, OTHER_SITE_LINK_STYLE)
     print_centered(parenthesize(Text.from_markup(markup_msg)), style='bold')
-    word_count_link = link_text_obj(WORD_COUNT_URL, 'site showing the most frequently used words in these communiques', OTHER_SITE_LINK_STYLE)
-    print_centered(parenthesize(word_count_link))
-    metadata_link = link_text_obj(JSON_METADATA_URL, 'metadata with author attribution explanations', OTHER_SITE_LINK_STYLE)
-    print_centered(parenthesize(metadata_link))
+    if is_header:
+        metadata_link = link_text_obj(JSON_METADATA_URL, 'metadata with author attribution explanations', OTHER_SITE_LINK_STYLE)
+        print_centered(parenthesize(metadata_link))
+        word_count_link = link_text_obj(WORD_COUNT_URL, 'most frequently used words', OTHER_SITE_LINK_STYLE)
+        print_centered(parenthesize(word_count_link))
+        print_centered(parenthesize(link_text_obj(GH_PROJECT_URL, '@github', 'dark_orange3 bold')))
 def print_page_title(expand: bool = True, width: int | None = None) -> None:
@@ -247,8 +250,8 @@ def print_page_title(expand: bool = True, width: int | None = None) -> None:
 def print_panel(msg: str, style: str = 'black on white', padding: tuple | None = None, centered: bool = False) -> None:
     _padding: list[int] = list(padding or [0, 0, 0, 0])
     _padding[2] += 1  # Bottom pad
-    panel = Panel(Text.from_markup(msg, justify='center'), width=70, style=style)
     actual_padding: tuple[int, int, int, int] = tuple(_padding)
+    panel = Panel(Text.from_markup(msg, justify='center'), width=70, style=style)
     if centered:
         console.print(Align.center(Padding(panel, actual_padding)))
@@ -335,6 +338,7 @@ def _print_external_links() -> None:
     print_centered(link_markup(COURIER_NEWSROOM_ARCHIVE_URL, 'Searchable Archive') + " (Courier Newsroom)")
     print_centered(link_markup(EPSTEINIFY_URL) + " (raw document images)")
     print_centered(link_markup(EPSTEIN_WEB_URL) + " (character summaries)")
+    print_centered(link_markup(EPSTEIN_MEDIA_URL) + " (raw document images)")
 # if args.deep_debug:

{epstein_files-1.0.3 → epstein_files-1.0.5}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "epstein-files"
-version = "1.0.3"
+version = "1.0.5"
 description = "Tools for working with the Jeffrey Epstein documents released in November 2025."
 authors = ["Michel de Cryptadamus"]
 readme = "README.md"

epstein_files-1.0.3/README.md DELETED Viewed

@@ -1,53 +0,0 @@
-# I Made Epstein's Text Messages Great Again
-* [I Made Epstein's Text Messages Great Again (And You Should Read Them)](https://cryptadamus.substack.com/p/i-made-epsteins-text-messages-great) post on [Substack](https://cryptadamus.substack.com/p/i-made-epsteins-text-messages-great)
-* The Epstein text messages (and some of the emails along with summary counts of sent emails to/from Epstein) generated by this code can be viewed [here](https://michelcrypt4d4mus.github.io/epstein_text_messages/).
-* All of His Emails can be read at another page also generated by this code [here](https://michelcrypt4d4mus.github.io/epstein_text_messages/all_emails_epstein_files_nov_2025.html).
-* Word counts for the emails and text messages are [here](https://michelcrypt4d4mus.github.io/epstein_text_messages/communication_word_count_epstein_files_nov_2025.html).
-* Metadata containing what I have figured out about who sent or received the communications in a given file (and a brief explanation for how I figured it out for each file) is deployed [here](https://michelcrypt4d4mus.github.io/epstein_text_messages/file_metadata_epstein_files_nov_2025.json)
-* Configuration variables assigning specific `HOUSE_OVERSIGHT_XXXXXX.txt` file IDs (the `111111` part) as being emails to or from particular people based on various research and contributions can be found in [constants.py](./epstein_files/util/constants.py). Everything in `constants.py` should also appear in the JSON metadata.
-### Usage
-1. Requires you have a local copy of OCR text from the House Oversight document dump in a directory `/path/to/epstein/ocr_txt_files`. You can download them from [the Congressional Google Drive folder](https://drive.google.com/drive/folders/1ldncvdqIf6miiskDp_EDuGSDAaI_fJx8).
-1. Dependencies are in [pyproject.toml](./pyproject.toml). Use `poetry install` for easiest time installing. `pip install .` may or may not work.
-You need to set the `DOCS_DIR` environment variable with the path to the folder of files you just downloaded when running. You can either create a `.env` file modeled on [`.env.example`](./.env.example) (which will set it permanently) or you can run with:
-```bash
-# Generate color highlighted texts/emails/other files
-DOCS_DIR=/path/to/epstein/ocr_txt_files epstein_generate
-# Search
-DOCS_DIR=/path/to/epstein/ocr_txt_files epstein_search Bannon
-# Show a color highlighted file
-DOCS_DIR=/path/to/epstein/ocr_txt_files epstein_show 030999
-# This also works
-DOCS_DIR=/path/to/epstein/ocr_txt_files epstein_show HOUSE_OVERSIGHT_030999
-```
-Run `epstein_generate --help` for command line option assistance.
-The first time you run anything it will take a few minutes to fix all the data, attribute the redacted emails, etc. Once you've run things once you can run the `epstein_generate --pickled` to load the cached fixed up data and things will be quick.
-#### As A Library
-```python
-from epstein_files.epstein_files import EpsteinFiles
-epstein_files = EpsteinFiles.get_files()
-# All files
-for document in epstein_files.all_documents():
-    do_stuff()
-# Emails
-for email in epstein_files.emails:
-    do_stuff()
-# iMessage Logs
-for imessage_log in epstein_files.imessage_logs:
-    do_stuff()
-# Other Files
-for document in epstein_files.other_files:
-    do_stuff()
-```

epstein_files-1.0.3/epstein_files/util/env.py DELETED Viewed

@@ -1,80 +0,0 @@
-import logging
-from argparse import ArgumentParser
-from os import environ
-from pathlib import Path
-from sys import argv
-from epstein_files.util.logging import datefinder_logger, env_log_level, logger
-COUNT_WORDS_SCRIPT = 'count_words.py'
-DEFAULT_WIDTH = 154
-HTML_SCRIPTS = ['epstein_generate', 'generate_html.py', COUNT_WORDS_SCRIPT]
-parser = ArgumentParser(description="Parse epstein OCR docs and generate HTML page.")
-parser.add_argument('--build', '-b', action='store_true', help='write output to file')
-parser.add_argument('--all-emails', '-ae', action='store_true', help='all the emails instead of just the interesting ones')
-parser.add_argument('--all-other-files', '-ao', action='store_true', help='all the non-email, non-text msg files instead of just interesting ones')
-parser.add_argument('--colors-only', '-c', action='store_true', help='print header with color key table and links and exit')
-parser.add_argument('--name', '-n', action='append', dest='names', help='specify the name(s) whose communications should be output')
-parser.add_argument('--output-file', '-out', metavar='FILE', default='index.html', help='write output to FILE in docs/ (default=index.html)')
-parser.add_argument('--output-emails', '-oe', action='store_true', help='generate other files section')
-parser.add_argument('--output-other-files', '-oo', action='store_true', help='generate other files section')
-parser.add_argument('--output-texts', '-ot', action='store_true', help='generate other files section')
-parser.add_argument('--pickled', '-p', action='store_true', help='use pickled EpsteinFiles object')
-parser.add_argument('--overwrite-pickle', '-op', action='store_true', help='generate new pickled EpsteinFiles object')
-parser.add_argument('--raw', '-r', action='store_true', help='show raw contents of file (only used by scripts)')
-parser.add_argument('--sort-alphabetical', '-alpha', action='store_true', help='sort emailers alphabetically in counts table')
-parser.add_argument('--suppress-output', '-s', action='store_true', help='no output to terminal (use with --build)')
-parser.add_argument('--use-epstein-web-links', '-use', action='store_true', help='use epsteinweb.org links instead of epstein.media')
-parser.add_argument('--width', '-w', type=int, default=DEFAULT_WIDTH, help='screen width to use')
-parser.add_argument('--whole-file', '-wf', action='store_true', help='print whole file (only used by search script)')
-parser.add_argument('--debug', '-d', action='store_true', help='set debug level to INFO')
-parser.add_argument('--deep-debug', '-dd', action='store_true', help='set debug level to DEBUG')
-parser.add_argument('--make-clean', '-mc', action='store_true', help='delete all build artifact HTML and JSON files')
-parser.add_argument('--suppress-logs', '-sl', action='store_true', help='set debug level to FATAL')
-parser.add_argument('--json-metadata', '-jm', action='store_true', help='dump JSON metadata for all files')
-parser.add_argument('--json-stats', '-j', action='store_true', help='print JSON formatted stats at the end')
-parser.add_argument('positional_args', nargs='*', help='Optional args (only used by helper scripts)')
-args = parser.parse_args()
-current_script = Path(argv[0]).name
-is_env_var_set = lambda s: len(environ.get(s) or '') > 0
-is_html_script = current_script in HTML_SCRIPTS
-args.debug = args.deep_debug or args.debug or is_env_var_set('DEBUG')
-args.output_emails = args.output_emails or args.all_emails
-args.output_other_files = args.output_other_files or args.all_other_files
-args.pickled = args.pickled or is_env_var_set('PICKLED') or args.colors_only or len(args.names or []) > 0
-args.width = args.width if is_html_script else None
-specified_names: list[str | None] = [None if n == 'None' else n for n in (args.names or [])]
-# Log level args
-if args.deep_debug:
-    logger.setLevel(logging.DEBUG)
-elif args.debug:
-    logger.setLevel(logging.INFO)
-elif args.suppress_logs:
-    logger.setLevel(logging.FATAL)
-elif not env_log_level:
-    logger.setLevel(logging.WARNING)
-logger.info(f'Log level set to {logger.level}...')
-datefinder_logger.setLevel(logger.level)
-# Massage args that depend on other args to the appropriate state
-if not (args.json_metadata or args.output_texts or args.output_emails or args.output_other_files):
-    if is_html_script and current_script != COUNT_WORDS_SCRIPT and not args.make_clean:
-        logger.warning(f"No output section chosen; outputting default of texts, selected emails, and other files...")
-    args.output_texts = True
-    args.output_emails = True
-    args.output_other_files = True
-if args.use_epstein_web_links:
-    logger.warning(f"Using links to epsteinweb.org links instead of epsteinify.com...")
-if args.debug:
-    logger.warning(f"Invocation args:\nis_html_script={is_html_script},\nspecified_names={specified_names},\nargs={args}")