doctrail 0.3.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- doctrail-0.3.1/.github/workflows/docs.yml +51 -0
- doctrail-0.3.1/.github/workflows/release.yml +110 -0
- doctrail-0.3.1/.github/workflows/test.yml +36 -0
- doctrail-0.3.1/.gitignore +279 -0
- doctrail-0.3.1/CHANGELOG.md +134 -0
- doctrail-0.3.1/CONTRIBUTING.md +7 -0
- doctrail-0.3.1/LICENSE +21 -0
- doctrail-0.3.1/PKG-INFO +103 -0
- doctrail-0.3.1/README.md +51 -0
- doctrail-0.3.1/docs/CNAME +1 -0
- doctrail-0.3.1/docs/assets/demo-econ-threat.gif +0 -0
- doctrail-0.3.1/docs/assets/demo.gif +0 -0
- doctrail-0.3.1/docs/assets/mascot.png +0 -0
- doctrail-0.3.1/docs/cli.md +11 -0
- doctrail-0.3.1/docs/data-model.md +115 -0
- doctrail-0.3.1/docs/index.md +114 -0
- doctrail-0.3.1/docs/llms.txt +1272 -0
- doctrail-0.3.1/docs/quickstart.md +62 -0
- doctrail-0.3.1/docs/stylesheets/extra.css +112 -0
- doctrail-0.3.1/docs/tutorial.md +99 -0
- doctrail-0.3.1/docs/yaml.md +262 -0
- doctrail-0.3.1/doctrail.py +38 -0
- doctrail-0.3.1/examples/doctrail-server.yaml +30 -0
- doctrail-0.3.1/examples/tutorial/.doctrail/config.yml +30 -0
- doctrail-0.3.1/examples/tutorial/.doctrail/enrichments/country_mentions.yml +18 -0
- doctrail-0.3.1/examples/tutorial/.doctrail/enrichments/country_stance.yml +11 -0
- doctrail-0.3.1/examples/tutorial/.doctrail/enrichments/econ_threat.yml +30 -0
- doctrail-0.3.1/examples/tutorial/.doctrail/enrichments/mentions_climate.yml +11 -0
- doctrail-0.3.1/examples/tutorial/.doctrail/enrichments/optimism.yml +11 -0
- doctrail-0.3.1/examples/tutorial/.doctrail/enrichments/securitization.yml +28 -0
- doctrail-0.3.1/examples/tutorial/.doctrail/enrichments/test.yml +27 -0
- doctrail-0.3.1/examples/tutorial/.doctrail/replay/country_mentions.jsonl +10 -0
- doctrail-0.3.1/examples/tutorial/.doctrail/replay/country_stance.jsonl +20 -0
- doctrail-0.3.1/examples/tutorial/.doctrail/replay/econ_threat.jsonl +10 -0
- doctrail-0.3.1/examples/tutorial/.doctrail/replay/mentions_climate.jsonl +20 -0
- doctrail-0.3.1/examples/tutorial/.doctrail/replay/optimism.jsonl +20 -0
- doctrail-0.3.1/examples/tutorial/.doctrail/replay/securitization.jsonl +10 -0
- doctrail-0.3.1/examples/tutorial/.doctrail/replay/test.jsonl +18 -0
- doctrail-0.3.1/examples/tutorial/.doctrail/views/country_mentions.yml +14 -0
- doctrail-0.3.1/examples/tutorial/.doctrail/views/econ_threat.yml +13 -0
- doctrail-0.3.1/examples/tutorial/README.md +41 -0
- doctrail-0.3.1/examples/tutorial/corpus/federalist/federalist_01.html +9 -0
- doctrail-0.3.1/examples/tutorial/corpus/federalist/federalist_02.docx +0 -0
- doctrail-0.3.1/examples/tutorial/corpus/federalist/federalist_06.docx +0 -0
- doctrail-0.3.1/examples/tutorial/corpus/federalist/federalist_09.docx +0 -0
- doctrail-0.3.1/examples/tutorial/corpus/federalist/federalist_10.docx +0 -0
- doctrail-0.3.1/examples/tutorial/corpus/federalist/federalist_14.docx +0 -0
- doctrail-0.3.1/examples/tutorial/corpus/federalist/federalist_49.pdf +99 -0
- doctrail-0.3.1/examples/tutorial/corpus/federalist/federalist_50.pdf +99 -0
- doctrail-0.3.1/examples/tutorial/corpus/federalist/federalist_51.pdf +99 -0
- doctrail-0.3.1/examples/tutorial/corpus/federalist/federalist_52.pdf +99 -0
- doctrail-0.3.1/examples/tutorial/corpus/federalist/federalist_53.pdf +99 -0
- doctrail-0.3.1/examples/tutorial/corpus/federalist/federalist_54.pdf +99 -0
- doctrail-0.3.1/examples/tutorial/corpus/federalist/federalist_55.pdf +99 -0
- doctrail-0.3.1/examples/tutorial/corpus/federalist/federalist_56.pdf +99 -0
- doctrail-0.3.1/examples/tutorial/corpus/federalist/federalist_57.pdf +99 -0
- doctrail-0.3.1/examples/tutorial/corpus/federalist/federalist_58.pdf +99 -0
- doctrail-0.3.1/examples/tutorial/corpus/federalist/federalist_62.html +9 -0
- doctrail-0.3.1/examples/tutorial/corpus/federalist/federalist_63.html +9 -0
- doctrail-0.3.1/examples/tutorial/corpus/gt_editorials/README.md +29 -0
- doctrail-0.3.1/examples/tutorial/corpus/gt_editorials/gt_2012_philippines.txt +25 -0
- doctrail-0.3.1/examples/tutorial/corpus/gt_editorials/gt_2013_eu_wine.txt +23 -0
- doctrail-0.3.1/examples/tutorial/corpus/gt_editorials/gt_2013_north_korea.txt +29 -0
- doctrail-0.3.1/examples/tutorial/corpus/gt_editorials/gt_2016_south_korea_thaad.txt +31 -0
- doctrail-0.3.1/examples/tutorial/corpus/gt_editorials/gt_2017_australia.txt +23 -0
- doctrail-0.3.1/examples/tutorial/corpus/gt_editorials/gt_2018_canada_meng.txt +29 -0
- doctrail-0.3.1/examples/tutorial/corpus/gt_editorials/gt_2018_us_trade_war.txt +25 -0
- doctrail-0.3.1/examples/tutorial/corpus/gt_editorials/gt_2021_afghanistan.txt +23 -0
- doctrail-0.3.1/examples/tutorial/corpus/gt_editorials/gt_2021_europe.txt +21 -0
- doctrail-0.3.1/examples/tutorial/corpus/gt_editorials/gt_2021_us_taiwan.txt +29 -0
- doctrail-0.3.1/examples/tutorial/corpus/manifest.json +290 -0
- doctrail-0.3.1/examples/tutorial/corpus/un_speeches/au_general_debate_2023.html +4 -0
- doctrail-0.3.1/examples/tutorial/corpus/un_speeches/br_general_debate_2023.pdf +80 -0
- doctrail-0.3.1/examples/tutorial/corpus/un_speeches/ca_general_debate_2023.docx +0 -0
- doctrail-0.3.1/examples/tutorial/corpus/un_speeches/fj_general_debate_2023.pdf +80 -0
- doctrail-0.3.1/examples/tutorial/corpus/un_speeches/ie_general_debate_2023.html +3 -0
- doctrail-0.3.1/examples/tutorial/corpus/un_speeches/in_general_debate_2023.pdf +80 -0
- doctrail-0.3.1/examples/tutorial/corpus/un_speeches/jp_general_debate_2023.docx +0 -0
- doctrail-0.3.1/examples/tutorial/corpus/un_speeches/ke_general_debate_2023.docx +0 -0
- doctrail-0.3.1/examples/tutorial/corpus/un_speeches/ng_general_debate_2023.html +4 -0
- doctrail-0.3.1/examples/tutorial/corpus/un_speeches/za_general_debate_2023.pdf +80 -0
- doctrail-0.3.1/mkdocs.yml +68 -0
- doctrail-0.3.1/pyproject.toml +114 -0
- doctrail-0.3.1/scripts/build_demo_corpus.py +786 -0
- doctrail-0.3.1/scripts/build_demo_gif.sh +47 -0
- doctrail-0.3.1/scripts/build_llms_full.py +111 -0
- doctrail-0.3.1/scripts/demo.tape +44 -0
- doctrail-0.3.1/scripts/demo_econ_threat.tape +81 -0
- doctrail-0.3.1/skills/doctrail/SKILL.md +319 -0
- doctrail-0.3.1/src/doctrail/__init__.py +1 -0
- doctrail-0.3.1/src/doctrail/cli/__init__.py +32 -0
- doctrail-0.3.1/src/doctrail/cli/__main__.py +7 -0
- doctrail-0.3.1/src/doctrail/cli/enrich.py +610 -0
- doctrail-0.3.1/src/doctrail/cli/export.py +51 -0
- doctrail-0.3.1/src/doctrail/cli/icr.py +295 -0
- doctrail-0.3.1/src/doctrail/cli/ingest.py +306 -0
- doctrail-0.3.1/src/doctrail/cli/main.py +2024 -0
- doctrail-0.3.1/src/doctrail/cli/models.py +505 -0
- doctrail-0.3.1/src/doctrail/cli/query.py +575 -0
- doctrail-0.3.1/src/doctrail/cli/review.py +76 -0
- doctrail-0.3.1/src/doctrail/cli/serve.py +82 -0
- doctrail-0.3.1/src/doctrail/cli/utils.py +162 -0
- doctrail-0.3.1/src/doctrail/cli/view.py +5 -0
- doctrail-0.3.1/src/doctrail/cli.py +19 -0
- doctrail-0.3.1/src/doctrail/config/__init__.py +6 -0
- doctrail-0.3.1/src/doctrail/config/config_manager.py +213 -0
- doctrail-0.3.1/src/doctrail/config/validators.py +189 -0
- doctrail-0.3.1/src/doctrail/constants.py +105 -0
- doctrail-0.3.1/src/doctrail/core.py +30 -0
- doctrail-0.3.1/src/doctrail/core_runtime/__init__.py +22 -0
- doctrail-0.3.1/src/doctrail/core_runtime/batch.py +778 -0
- doctrail-0.3.1/src/doctrail/core_runtime/commands.py +1062 -0
- doctrail-0.3.1/src/doctrail/core_runtime/enrichment.py +946 -0
- doctrail-0.3.1/src/doctrail/core_runtime/shared.py +796 -0
- doctrail-0.3.1/src/doctrail/core_utils.py +584 -0
- doctrail-0.3.1/src/doctrail/cost_estimation.py +33 -0
- doctrail-0.3.1/src/doctrail/db_operations.py +8 -0
- doctrail-0.3.1/src/doctrail/db_ops/__init__.py +24 -0
- doctrail-0.3.1/src/doctrail/db_ops/audit_runs.py +868 -0
- doctrail-0.3.1/src/doctrail/db_ops/common.py +595 -0
- doctrail-0.3.1/src/doctrail/db_ops/enrichments.py +1722 -0
- doctrail-0.3.1/src/doctrail/db_ops/migrations.py +421 -0
- doctrail-0.3.1/src/doctrail/db_ops/views.py +1470 -0
- doctrail-0.3.1/src/doctrail/enrichment_config.py +356 -0
- doctrail-0.3.1/src/doctrail/export_operations.py +170 -0
- doctrail-0.3.1/src/doctrail/extractors/__init__.py +1 -0
- doctrail-0.3.1/src/doctrail/extractors/djvu_extractor.py +73 -0
- doctrail-0.3.1/src/doctrail/extractors/doc_extractor.py +62 -0
- doctrail-0.3.1/src/doctrail/extractors/docx_extractor.py +116 -0
- doctrail-0.3.1/src/doctrail/extractors/epub_extractor.py +198 -0
- doctrail-0.3.1/src/doctrail/extractors/html_extractor.py +71 -0
- doctrail-0.3.1/src/doctrail/extractors/mhtml-to-html.py +428 -0
- doctrail-0.3.1/src/doctrail/extractors/mhtml_extractor.py +644 -0
- doctrail-0.3.1/src/doctrail/extractors/mobi_extractor.py +59 -0
- doctrail-0.3.1/src/doctrail/extractors/pdf_extractor.py +256 -0
- doctrail-0.3.1/src/doctrail/extractors/presentation_extractor.py +173 -0
- doctrail-0.3.1/src/doctrail/extractors/smart_html_extractor.py +217 -0
- doctrail-0.3.1/src/doctrail/extractors/spreadsheet_extractor.py +482 -0
- doctrail-0.3.1/src/doctrail/file_filters.py +207 -0
- doctrail-0.3.1/src/doctrail/ingest/__init__.py +20 -0
- doctrail-0.3.1/src/doctrail/ingest/base.py +11 -0
- doctrail-0.3.1/src/doctrail/ingest/core.py +722 -0
- doctrail-0.3.1/src/doctrail/ingest/database.py +214 -0
- doctrail-0.3.1/src/doctrail/ingest/document_processor.py +1274 -0
- doctrail-0.3.1/src/doctrail/ingest/extractors.py +47 -0
- doctrail-0.3.1/src/doctrail/ingest/file_utils.py +119 -0
- doctrail-0.3.1/src/doctrail/ingest/manifest.py +109 -0
- doctrail-0.3.1/src/doctrail/ingest/text_processing.py +196 -0
- doctrail-0.3.1/src/doctrail/ingester.py +40 -0
- doctrail-0.3.1/src/doctrail/llm/__init__.py +5 -0
- doctrail-0.3.1/src/doctrail/llm/client.py +160 -0
- doctrail-0.3.1/src/doctrail/llm/token_utils.py +120 -0
- doctrail-0.3.1/src/doctrail/llm_operations.py +1719 -0
- doctrail-0.3.1/src/doctrail/llm_providers/__init__.py +46 -0
- doctrail-0.3.1/src/doctrail/llm_providers/anthropic_provider.py +568 -0
- doctrail-0.3.1/src/doctrail/llm_providers/claude_sdk_provider.py +272 -0
- doctrail-0.3.1/src/doctrail/llm_providers/cli_provider.py +478 -0
- doctrail-0.3.1/src/doctrail/llm_providers/factory.py +154 -0
- doctrail-0.3.1/src/doctrail/llm_providers/gemini_provider.py +705 -0
- doctrail-0.3.1/src/doctrail/llm_providers/openai_provider.py +551 -0
- doctrail-0.3.1/src/doctrail/llm_providers/replay_provider.py +155 -0
- doctrail-0.3.1/src/doctrail/main.py +20 -0
- doctrail-0.3.1/src/doctrail/plugins/README.md +166 -0
- doctrail-0.3.1/src/doctrail/plugins/__init__.py +135 -0
- doctrail-0.3.1/src/doctrail/plugins/_chinese_converter.py +190 -0
- doctrail-0.3.1/src/doctrail/plugins/doi_connector.py +636 -0
- doctrail-0.3.1/src/doctrail/plugins/example_custom.py +145 -0
- doctrail-0.3.1/src/doctrail/plugins/zotero.py +812 -0
- doctrail-0.3.1/src/doctrail/plugins/zotero_connector.py +61 -0
- doctrail-0.3.1/src/doctrail/plugins/zotero_ingester.py +660 -0
- doctrail-0.3.1/src/doctrail/presets/__init__.py +2 -0
- doctrail-0.3.1/src/doctrail/presets/document_type.yml +25 -0
- doctrail-0.3.1/src/doctrail/presets/extract_entities.yml +34 -0
- doctrail-0.3.1/src/doctrail/presets/keywords.yml +23 -0
- doctrail-0.3.1/src/doctrail/presets/language.yml +24 -0
- doctrail-0.3.1/src/doctrail/presets/relevance.yml +24 -0
- doctrail-0.3.1/src/doctrail/presets/research_methods.yml +38 -0
- doctrail-0.3.1/src/doctrail/presets/sentiment.yml +21 -0
- doctrail-0.3.1/src/doctrail/presets/summarize.yml +20 -0
- doctrail-0.3.1/src/doctrail/pydantic_schema.py +499 -0
- doctrail-0.3.1/src/doctrail/review_server.py +549 -0
- doctrail-0.3.1/src/doctrail/schema_managers.py +455 -0
- doctrail-0.3.1/src/doctrail/search.py +629 -0
- doctrail-0.3.1/src/doctrail/server.py +954 -0
- doctrail-0.3.1/src/doctrail/server_config.py +296 -0
- doctrail-0.3.1/src/doctrail/server_ingestor/__init__.py +11 -0
- doctrail-0.3.1/src/doctrail/server_ingestor/app.py +1205 -0
- doctrail-0.3.1/src/doctrail/templates/parallel-translation.md +13 -0
- doctrail-0.3.1/src/doctrail/types.py +104 -0
- doctrail-0.3.1/src/doctrail/utils/__init__.py +1 -0
- doctrail-0.3.1/src/doctrail/utils/build_documentation.py +289 -0
- doctrail-0.3.1/src/doctrail/utils/cost_estimation.py +561 -0
- doctrail-0.3.1/src/doctrail/utils/dependency_check.py +72 -0
- doctrail-0.3.1/src/doctrail/utils/logging_config.py +135 -0
- doctrail-0.3.1/src/doctrail/utils/model_pricing.py +778 -0
- doctrail-0.3.1/src/doctrail/utils/progress.py +89 -0
- doctrail-0.3.1/src/doctrail/utils/query_utils.py +103 -0
- doctrail-0.3.1/src/doctrail/utils/simple_error_handler.py +90 -0
- doctrail-0.3.1/src/doctrail/utils/validate_config.py +217 -0
- doctrail-0.3.1/tests/README.md +88 -0
- doctrail-0.3.1/tests/__init__.py +0 -0
- doctrail-0.3.1/tests/assets/files/federalist_fixture.djvu +0 -0
- doctrail-0.3.1/tests/assets/files/federalist_fixture.doc +0 -0
- doctrail-0.3.1/tests/assets/files/federalist_fixture.docx +0 -0
- doctrail-0.3.1/tests/assets/files/federalist_fixture.epub +0 -0
- doctrail-0.3.1/tests/assets/files/federalist_fixture.html +3 -0
- doctrail-0.3.1/tests/assets/files/federalist_fixture.md +7 -0
- doctrail-0.3.1/tests/assets/files/federalist_fixture.mhtml +26 -0
- doctrail-0.3.1/tests/assets/files/federalist_fixture.mobi +0 -0
- doctrail-0.3.1/tests/assets/files/federalist_fixture.pdf +85 -0
- doctrail-0.3.1/tests/assets/files/federalist_fixture.png +0 -0
- doctrail-0.3.1/tests/assets/files/federalist_fixture.ppt +0 -0
- doctrail-0.3.1/tests/assets/files/federalist_fixture.pptx +0 -0
- doctrail-0.3.1/tests/assets/files/federalist_fixture.rtf +14 -0
- doctrail-0.3.1/tests/assets/files/federalist_fixture.tsv +3 -0
- doctrail-0.3.1/tests/assets/files/federalist_fixture.txt +7 -0
- doctrail-0.3.1/tests/assets/files/federalist_fixture.xls +0 -0
- doctrail-0.3.1/tests/assets/files/federalist_fixture.xlsx +0 -0
- doctrail-0.3.1/tests/assets/files/test_document.md +24 -0
- doctrail-0.3.1/tests/assets/files/test_document.txt +7 -0
- doctrail-0.3.1/tests/conftest.py +25 -0
- doctrail-0.3.1/tests/demo/test_api.py +120 -0
- doctrail-0.3.1/tests/demo/test_config.yml +53 -0
- doctrail-0.3.1/tests/doctrail_runner.py +19 -0
- doctrail-0.3.1/tests/doctrail_support.py +677 -0
- doctrail-0.3.1/tests/schema_examples/README.md +344 -0
- doctrail-0.3.1/tests/schema_examples/advanced/test_array_language_validation.yml +28 -0
- doctrail-0.3.1/tests/schema_examples/advanced/test_cost_estimation.yml +35 -0
- doctrail-0.3.1/tests/schema_examples/advanced/test_enrich_multi_model.yml +23 -0
- doctrail-0.3.1/tests/schema_examples/advanced/test_enrich_validation_error.yml +23 -0
- doctrail-0.3.1/tests/schema_examples/advanced/test_enrich_with_converter.yml +25 -0
- doctrail-0.3.1/tests/schema_examples/advanced/test_enrich_with_schema.yml +27 -0
- doctrail-0.3.1/tests/schema_examples/main_config.yml +27 -0
- doctrail-0.3.1/tests/schema_examples/main_with_sql_import.yml +17 -0
- doctrail-0.3.1/tests/schema_examples/multi_field/extract_entities.yml +18 -0
- doctrail-0.3.1/tests/schema_examples/multi_field/multi_table_review.yml +36 -0
- doctrail-0.3.1/tests/schema_examples/multi_field/test_enrich_separate_table.yml +32 -0
- doctrail-0.3.1/tests/schema_examples/single_field/classify_language.yml +13 -0
- doctrail-0.3.1/tests/schema_examples/single_field/test_enrich_direct_column.yml +21 -0
- doctrail-0.3.1/tests/schema_examples/single_field/validate_content.yml +12 -0
- doctrail-0.3.1/tests/schema_examples/sql_queries.yml +11 -0
- doctrail-0.3.1/tests/schema_examples/test_framework/test_chinese_docs.yml +12 -0
- doctrail-0.3.1/tests/schema_examples/test_framework/test_enrich_overwrite_mode.yml +22 -0
- doctrail-0.3.1/tests/schema_examples/test_framework/test_export_parallel.yml.disabled +23 -0
- doctrail-0.3.1/tests/schema_examples/test_framework/test_ingest_basic.yml +3 -0
- doctrail-0.3.1/tests/schema_examples/test_framework/test_manifest_ingest.yml +4 -0
- doctrail-0.3.1/tests/schema_examples/test_framework/test_zotero_literature_plugin.yml +74 -0
- doctrail-0.3.1/tests/schema_examples/test_yaml_import_single.yml +11 -0
- doctrail-0.3.1/tests/schema_examples/test_yaml_imports.yml +14 -0
- doctrail-0.3.1/tests/test_anthropic_provider.py +533 -0
- doctrail-0.3.1/tests/test_append_overwrite.py +675 -0
- doctrail-0.3.1/tests/test_cli_basic.py +912 -0
- doctrail-0.3.1/tests/test_cli_provider.py +531 -0
- doctrail-0.3.1/tests/test_cli_provider_integration.py +282 -0
- doctrail-0.3.1/tests/test_cost_estimation.py +540 -0
- doctrail-0.3.1/tests/test_db_connection.py +315 -0
- doctrail-0.3.1/tests/test_docs_drift.py +182 -0
- doctrail-0.3.1/tests/test_doctrail_batch_openai.py +741 -0
- doctrail-0.3.1/tests/test_doctrail_batch_other.py +578 -0
- doctrail-0.3.1/tests/test_doctrail_cli_scenarios.py +186 -0
- doctrail-0.3.1/tests/test_doctrail_storage.py +1452 -0
- doctrail-0.3.1/tests/test_doctrail_views.py +566 -0
- doctrail-0.3.1/tests/test_document_processor_doc.py +46 -0
- doctrail-0.3.1/tests/test_document_processor_office.py +459 -0
- doctrail-0.3.1/tests/test_enrichment_identity.py +455 -0
- doctrail-0.3.1/tests/test_icr_override_query.py +45 -0
- doctrail-0.3.1/tests/test_init.py +330 -0
- doctrail-0.3.1/tests/test_limit_append_overwrite.py +419 -0
- doctrail-0.3.1/tests/test_openai_provider.py +1162 -0
- doctrail-0.3.1/tests/test_prompt_templates.py +193 -0
- doctrail-0.3.1/tests/test_replay_error_reporting.py +276 -0
- doctrail-0.3.1/tests/test_search.py +486 -0
- doctrail-0.3.1/tests/test_server.py +504 -0
- doctrail-0.3.1/tests/test_server_basic.py +77 -0
- doctrail-0.3.1/tests/test_text_processing.py +26 -0
- doctrail-0.3.1/tests/test_tutorial_scaffold.py +315 -0
- doctrail-0.3.1/tests/unit/__init__.py +1 -0
- doctrail-0.3.1/tests/unit/test_progress_utils.py +89 -0
- doctrail-0.3.1/tests/unit/test_query_utils.py +97 -0
- doctrail-0.3.1/tests/unit/test_token_utils.py +90 -0
- doctrail-0.3.1/uv.lock +2519 -0
|
@@ -0,0 +1,51 @@
|
|
|
1
|
+
name: docs
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
push:
|
|
5
|
+
branches:
|
|
6
|
+
- master
|
|
7
|
+
workflow_dispatch:
|
|
8
|
+
|
|
9
|
+
permissions:
|
|
10
|
+
contents: read
|
|
11
|
+
pages: write
|
|
12
|
+
id-token: write
|
|
13
|
+
|
|
14
|
+
env:
|
|
15
|
+
UV_PYTHON: "3.12"
|
|
16
|
+
|
|
17
|
+
concurrency:
|
|
18
|
+
group: pages
|
|
19
|
+
cancel-in-progress: false
|
|
20
|
+
|
|
21
|
+
jobs:
|
|
22
|
+
build:
|
|
23
|
+
runs-on: ubuntu-latest
|
|
24
|
+
steps:
|
|
25
|
+
- uses: actions/checkout@v4
|
|
26
|
+
- name: Install uv
|
|
27
|
+
uses: astral-sh/setup-uv@v5
|
|
28
|
+
with:
|
|
29
|
+
enable-cache: true
|
|
30
|
+
- name: Install Python
|
|
31
|
+
run: uv python install 3.12
|
|
32
|
+
- name: Generate full manual
|
|
33
|
+
run: uv run python scripts/build_llms_full.py
|
|
34
|
+
- name: Check generated manual
|
|
35
|
+
run: git diff --exit-code docs/llms.txt
|
|
36
|
+
- name: Build docs
|
|
37
|
+
run: uv run --with mkdocs-material --with mkdocs-click mkdocs build --strict
|
|
38
|
+
- uses: actions/configure-pages@v5
|
|
39
|
+
- uses: actions/upload-pages-artifact@v3
|
|
40
|
+
with:
|
|
41
|
+
path: site
|
|
42
|
+
|
|
43
|
+
deploy:
|
|
44
|
+
needs: build
|
|
45
|
+
runs-on: ubuntu-latest
|
|
46
|
+
environment:
|
|
47
|
+
name: github-pages
|
|
48
|
+
url: ${{ steps.deployment.outputs.page_url }}
|
|
49
|
+
steps:
|
|
50
|
+
- id: deployment
|
|
51
|
+
uses: actions/deploy-pages@v4
|
|
@@ -0,0 +1,110 @@
|
|
|
1
|
+
name: release
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
push:
|
|
5
|
+
tags:
|
|
6
|
+
- "v*"
|
|
7
|
+
|
|
8
|
+
env:
|
|
9
|
+
UV_PYTHON: "3.12"
|
|
10
|
+
|
|
11
|
+
jobs:
|
|
12
|
+
publish:
|
|
13
|
+
runs-on: ubuntu-latest
|
|
14
|
+
environment:
|
|
15
|
+
name: pypi
|
|
16
|
+
permissions:
|
|
17
|
+
contents: read
|
|
18
|
+
id-token: write
|
|
19
|
+
|
|
20
|
+
steps:
|
|
21
|
+
- uses: actions/checkout@v4
|
|
22
|
+
|
|
23
|
+
- name: Install uv
|
|
24
|
+
uses: astral-sh/setup-uv@v5
|
|
25
|
+
with:
|
|
26
|
+
enable-cache: true
|
|
27
|
+
|
|
28
|
+
- name: Install Python
|
|
29
|
+
run: uv python install 3.12
|
|
30
|
+
|
|
31
|
+
- name: Check release tag matches package version
|
|
32
|
+
run: |
|
|
33
|
+
package_version="$(uv run --no-sync python - <<'PY'
|
|
34
|
+
import tomllib
|
|
35
|
+
with open("pyproject.toml", "rb") as f:
|
|
36
|
+
print(tomllib.load(f)["project"]["version"])
|
|
37
|
+
PY
|
|
38
|
+
)"
|
|
39
|
+
tag_version="${GITHUB_REF_NAME#v}"
|
|
40
|
+
if [ "$package_version" != "$tag_version" ]; then
|
|
41
|
+
echo "Tag $GITHUB_REF_NAME does not match pyproject version $package_version" >&2
|
|
42
|
+
exit 1
|
|
43
|
+
fi
|
|
44
|
+
|
|
45
|
+
- name: Check lockfile
|
|
46
|
+
run: uv lock --check
|
|
47
|
+
|
|
48
|
+
- name: Install dependencies
|
|
49
|
+
run: uv sync --locked --extra test
|
|
50
|
+
|
|
51
|
+
- name: Generate full manual
|
|
52
|
+
run: uv run python scripts/build_llms_full.py
|
|
53
|
+
|
|
54
|
+
- name: Check generated manual
|
|
55
|
+
run: git diff --exit-code docs/llms.txt
|
|
56
|
+
|
|
57
|
+
- name: Run tests
|
|
58
|
+
run: uv run python -m pytest tests/ -x -q -k "not integration"
|
|
59
|
+
|
|
60
|
+
- name: Prepare distribution directory
|
|
61
|
+
run: |
|
|
62
|
+
uv run --no-sync python - <<'PY'
|
|
63
|
+
import shutil
|
|
64
|
+
from pathlib import Path
|
|
65
|
+
dist = Path("dist")
|
|
66
|
+
if dist.exists():
|
|
67
|
+
shutil.rmtree(dist)
|
|
68
|
+
PY
|
|
69
|
+
|
|
70
|
+
- name: Build package artifacts
|
|
71
|
+
run: uv build --no-sources
|
|
72
|
+
|
|
73
|
+
- name: Refuse paper artifacts
|
|
74
|
+
run: |
|
|
75
|
+
for archive in dist/*.tar.gz; do
|
|
76
|
+
if tar -tzf "$archive" | grep -E '^doctrail-[^/]+/(_paper|paper)(/|$)'; then
|
|
77
|
+
echo "Paper files are present in $archive" >&2
|
|
78
|
+
exit 1
|
|
79
|
+
fi
|
|
80
|
+
done
|
|
81
|
+
for wheel in dist/*.whl; do
|
|
82
|
+
if unzip -Z1 "$wheel" | grep -E '^(_paper|paper)(/|$)'; then
|
|
83
|
+
echo "Paper files are present in $wheel" >&2
|
|
84
|
+
exit 1
|
|
85
|
+
fi
|
|
86
|
+
done
|
|
87
|
+
|
|
88
|
+
- name: Check package metadata
|
|
89
|
+
run: uvx --from twine twine check dist/*
|
|
90
|
+
|
|
91
|
+
- name: Smoke test wheel
|
|
92
|
+
run: |
|
|
93
|
+
for wheel in dist/*.whl; do
|
|
94
|
+
uvx --from "$wheel" doctrail --version
|
|
95
|
+
done
|
|
96
|
+
|
|
97
|
+
- name: Smoke test source distribution
|
|
98
|
+
run: |
|
|
99
|
+
for archive in dist/*.tar.gz; do
|
|
100
|
+
uvx --from "$archive" doctrail --version
|
|
101
|
+
done
|
|
102
|
+
|
|
103
|
+
- name: Upload package artifacts
|
|
104
|
+
uses: actions/upload-artifact@v4
|
|
105
|
+
with:
|
|
106
|
+
name: doctrail-dist
|
|
107
|
+
path: dist/*
|
|
108
|
+
|
|
109
|
+
- name: Publish to PyPI
|
|
110
|
+
run: uv publish --trusted-publishing always
|
|
@@ -0,0 +1,36 @@
|
|
|
1
|
+
name: tests
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
push:
|
|
5
|
+
branches:
|
|
6
|
+
- master
|
|
7
|
+
pull_request:
|
|
8
|
+
|
|
9
|
+
env:
|
|
10
|
+
UV_PYTHON: "3.12"
|
|
11
|
+
|
|
12
|
+
jobs:
|
|
13
|
+
test:
|
|
14
|
+
runs-on: ubuntu-latest
|
|
15
|
+
permissions:
|
|
16
|
+
contents: read
|
|
17
|
+
|
|
18
|
+
steps:
|
|
19
|
+
- uses: actions/checkout@v4
|
|
20
|
+
|
|
21
|
+
- name: Install uv
|
|
22
|
+
uses: astral-sh/setup-uv@v5
|
|
23
|
+
with:
|
|
24
|
+
enable-cache: true
|
|
25
|
+
|
|
26
|
+
- name: Install Python
|
|
27
|
+
run: uv python install 3.12
|
|
28
|
+
|
|
29
|
+
- name: Check lockfile
|
|
30
|
+
run: uv lock --check
|
|
31
|
+
|
|
32
|
+
- name: Install dependencies
|
|
33
|
+
run: uv sync --locked --extra test
|
|
34
|
+
|
|
35
|
+
- name: Run tests
|
|
36
|
+
run: uv run python -m pytest tests/ -x -q -k "not integration"
|
|
@@ -0,0 +1,279 @@
|
|
|
1
|
+
# Byte-compiled / optimized / DLL files
|
|
2
|
+
__pycache__/
|
|
3
|
+
*.py[cod]
|
|
4
|
+
*$py.class
|
|
5
|
+
|
|
6
|
+
# C extensions
|
|
7
|
+
*.so
|
|
8
|
+
|
|
9
|
+
# Distribution / packaging
|
|
10
|
+
.Python
|
|
11
|
+
build/
|
|
12
|
+
develop-eggs/
|
|
13
|
+
dist/
|
|
14
|
+
downloads/
|
|
15
|
+
eggs/
|
|
16
|
+
.eggs/
|
|
17
|
+
lib/
|
|
18
|
+
lib64/
|
|
19
|
+
parts/
|
|
20
|
+
sdist/
|
|
21
|
+
var/
|
|
22
|
+
wheels/
|
|
23
|
+
share/python-wheels/
|
|
24
|
+
*.egg-info/
|
|
25
|
+
.installed.cfg
|
|
26
|
+
*.egg
|
|
27
|
+
MANIFEST
|
|
28
|
+
|
|
29
|
+
# PyInstaller
|
|
30
|
+
# Usually these files are written by a python script from a template
|
|
31
|
+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
|
|
32
|
+
*.manifest
|
|
33
|
+
*.spec
|
|
34
|
+
|
|
35
|
+
# Installer logs
|
|
36
|
+
pip-log.txt
|
|
37
|
+
pip-delete-this-directory.txt
|
|
38
|
+
|
|
39
|
+
# Unit test / coverage reports
|
|
40
|
+
htmlcov/
|
|
41
|
+
.tox/
|
|
42
|
+
.nox/
|
|
43
|
+
.coverage
|
|
44
|
+
.coverage.*
|
|
45
|
+
.cache
|
|
46
|
+
nosetests.xml
|
|
47
|
+
coverage.xml
|
|
48
|
+
*.cover
|
|
49
|
+
*.py,cover
|
|
50
|
+
.hypothesis/
|
|
51
|
+
.pytest_cache/
|
|
52
|
+
cover/
|
|
53
|
+
|
|
54
|
+
# Translations
|
|
55
|
+
*.mo
|
|
56
|
+
*.pot
|
|
57
|
+
|
|
58
|
+
# Django stuff:
|
|
59
|
+
*.log
|
|
60
|
+
local_settings.py
|
|
61
|
+
db.sqlite3
|
|
62
|
+
db.sqlite3-journal
|
|
63
|
+
|
|
64
|
+
# Flask stuff:
|
|
65
|
+
instance/
|
|
66
|
+
.webassets-cache
|
|
67
|
+
|
|
68
|
+
# Scrapy stuff:
|
|
69
|
+
.scrapy
|
|
70
|
+
|
|
71
|
+
# Sphinx documentation
|
|
72
|
+
docs/_build/
|
|
73
|
+
|
|
74
|
+
# PyBuilder
|
|
75
|
+
.pybuilder/
|
|
76
|
+
target/
|
|
77
|
+
|
|
78
|
+
# Jupyter Notebook
|
|
79
|
+
.ipynb_checkpoints
|
|
80
|
+
|
|
81
|
+
# IPython
|
|
82
|
+
profile_default/
|
|
83
|
+
ipython_config.py
|
|
84
|
+
|
|
85
|
+
# pyenv
|
|
86
|
+
# For a library or package, you might want to ignore these files since the code is
|
|
87
|
+
# intended to run in multiple environments; otherwise, check them in:
|
|
88
|
+
# .python-version
|
|
89
|
+
|
|
90
|
+
# pipenv
|
|
91
|
+
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
|
|
92
|
+
# However, in case of collaboration, if having platform-specific dependencies or dependencies
|
|
93
|
+
# having no cross-platform support, pipenv may install dependencies that don't work, or not
|
|
94
|
+
# install all needed dependencies.
|
|
95
|
+
#Pipfile.lock
|
|
96
|
+
|
|
97
|
+
# UV
|
|
98
|
+
# Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
|
|
99
|
+
# This is especially recommended for binary packages to ensure reproducibility, and is more
|
|
100
|
+
# commonly ignored for libraries.
|
|
101
|
+
#uv.lock
|
|
102
|
+
|
|
103
|
+
# poetry
|
|
104
|
+
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
|
|
105
|
+
# This is especially recommended for binary packages to ensure reproducibility, and is more
|
|
106
|
+
# commonly ignored for libraries.
|
|
107
|
+
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
|
|
108
|
+
#poetry.lock
|
|
109
|
+
|
|
110
|
+
# pdm
|
|
111
|
+
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
|
|
112
|
+
#pdm.lock
|
|
113
|
+
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
|
|
114
|
+
# in version control.
|
|
115
|
+
# https://pdm.fming.dev/latest/usage/project/#working-with-version-control
|
|
116
|
+
.pdm.toml
|
|
117
|
+
.pdm-python
|
|
118
|
+
.pdm-build/
|
|
119
|
+
|
|
120
|
+
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
|
|
121
|
+
__pypackages__/
|
|
122
|
+
|
|
123
|
+
# Celery stuff
|
|
124
|
+
celerybeat-schedule
|
|
125
|
+
celerybeat.pid
|
|
126
|
+
|
|
127
|
+
# SageMath parsed files
|
|
128
|
+
*.sage.py
|
|
129
|
+
|
|
130
|
+
# Environments
|
|
131
|
+
.env
|
|
132
|
+
.env.local
|
|
133
|
+
.env.*.local
|
|
134
|
+
.venv
|
|
135
|
+
.venv*/
|
|
136
|
+
env/
|
|
137
|
+
venv/
|
|
138
|
+
ENV/
|
|
139
|
+
env.bak/
|
|
140
|
+
venv.bak/
|
|
141
|
+
|
|
142
|
+
# Spyder project settings
|
|
143
|
+
.spyderproject
|
|
144
|
+
.spyproject
|
|
145
|
+
|
|
146
|
+
# Rope project settings
|
|
147
|
+
.ropeproject
|
|
148
|
+
|
|
149
|
+
# mkdocs documentation
|
|
150
|
+
/site
|
|
151
|
+
|
|
152
|
+
# mypy
|
|
153
|
+
.mypy_cache/
|
|
154
|
+
.dmypy.json
|
|
155
|
+
dmypy.json
|
|
156
|
+
|
|
157
|
+
# Pyre type checker
|
|
158
|
+
.pyre/
|
|
159
|
+
|
|
160
|
+
# pytype static type analyzer
|
|
161
|
+
.pytype/
|
|
162
|
+
|
|
163
|
+
# Cython debug symbols
|
|
164
|
+
cython_debug/
|
|
165
|
+
|
|
166
|
+
# PyCharm
|
|
167
|
+
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
|
|
168
|
+
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
|
|
169
|
+
# and can be added to the global gitignore or merged into this file. For a more nuclear
|
|
170
|
+
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
|
|
171
|
+
#.idea/
|
|
172
|
+
|
|
173
|
+
# Visual Studio Code
|
|
174
|
+
# Visual Studio Code specific template is maintained in a separate VisualStudioCode.gitignore
|
|
175
|
+
# that can be found at https://github.com/github/gitignore/blob/main/Global/VisualStudioCode.gitignore
|
|
176
|
+
# and can be added to the global gitignore or merged into this file. However, if you prefer,
|
|
177
|
+
# you could uncomment the following to ignore the entire vscode folder
|
|
178
|
+
# .vscode/
|
|
179
|
+
|
|
180
|
+
# Ruff stuff:
|
|
181
|
+
.ruff_cache/
|
|
182
|
+
|
|
183
|
+
# PyPI configuration file
|
|
184
|
+
.pypirc
|
|
185
|
+
|
|
186
|
+
# Private server configuration (contains paths, funnel slugs, etc.)
|
|
187
|
+
.config.yaml
|
|
188
|
+
.serve-funnel.sh
|
|
189
|
+
|
|
190
|
+
# Cursor
|
|
191
|
+
# Cursor is an AI-powered code editor. `.cursorignore` specifies files/directories to
|
|
192
|
+
# exclude from AI features like autocomplete and code analysis. Recommended for sensitive data
|
|
193
|
+
# refer to https://docs.cursor.com/context/ignore-files
|
|
194
|
+
.cursorignore
|
|
195
|
+
.cursorindexingignore
|
|
196
|
+
|
|
197
|
+
# ==============================================================================
|
|
198
|
+
# Doctrail-specific patterns
|
|
199
|
+
# ==============================================================================
|
|
200
|
+
|
|
201
|
+
# Archived/cut code held for possible restore (also recoverable from git history)
|
|
202
|
+
/_cut/
|
|
203
|
+
|
|
204
|
+
# Development artifacts
|
|
205
|
+
.unison/
|
|
206
|
+
.history/
|
|
207
|
+
.deleted/
|
|
208
|
+
.unison*
|
|
209
|
+
*conflict_on_*
|
|
210
|
+
/agents/
|
|
211
|
+
/_paper/
|
|
212
|
+
/data/
|
|
213
|
+
/notes/
|
|
214
|
+
/out/
|
|
215
|
+
/scratch/
|
|
216
|
+
/tmp/
|
|
217
|
+
|
|
218
|
+
# IDE configurations
|
|
219
|
+
.idea/
|
|
220
|
+
.vscode/
|
|
221
|
+
|
|
222
|
+
# Database files (NEVER commit production databases!)
|
|
223
|
+
*.db
|
|
224
|
+
*.db-shm
|
|
225
|
+
*.db-wal
|
|
226
|
+
*.sqlite
|
|
227
|
+
|
|
228
|
+
# Test database copies - NEVER COMMIT THESE
|
|
229
|
+
tests/test_copy_*.db
|
|
230
|
+
tests/temp_*.db
|
|
231
|
+
tests/*_test.db
|
|
232
|
+
|
|
233
|
+
# Logs and temporary files
|
|
234
|
+
*.log
|
|
235
|
+
*.csv
|
|
236
|
+
|
|
237
|
+
# Export directories
|
|
238
|
+
exports/
|
|
239
|
+
**/exports/*
|
|
240
|
+
|
|
241
|
+
# System files
|
|
242
|
+
.DS_Store
|
|
243
|
+
Thumbs.db
|
|
244
|
+
desktop.ini
|
|
245
|
+
|
|
246
|
+
# Local notes and documentation
|
|
247
|
+
.llm_notes/
|
|
248
|
+
CLAUDE.md
|
|
249
|
+
|
|
250
|
+
# Development and personal directories
|
|
251
|
+
claude_notes/
|
|
252
|
+
presentations/
|
|
253
|
+
legacy/
|
|
254
|
+
legacy.zip
|
|
255
|
+
tests/legacy.zip
|
|
256
|
+
tmp/
|
|
257
|
+
.claude/
|
|
258
|
+
.agents/
|
|
259
|
+
.deleted/
|
|
260
|
+
.history/
|
|
261
|
+
.ra-aid/
|
|
262
|
+
.unison/
|
|
263
|
+
.scratch/
|
|
264
|
+
.uv-cache/
|
|
265
|
+
|
|
266
|
+
# Allow specific important database files
|
|
267
|
+
!tests/test.db/tmp/
|
|
268
|
+
scratch/
|
|
269
|
+
agents/
|
|
270
|
+
agents.md
|
|
271
|
+
AGENTS.md
|
|
272
|
+
agent.md
|
|
273
|
+
AGENT.md
|
|
274
|
+
notes/
|
|
275
|
+
data/
|
|
276
|
+
out/
|
|
277
|
+
.beads/
|
|
278
|
+
first_prompt.md
|
|
279
|
+
/.doctrail/
|
|
@@ -0,0 +1,134 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
## 0.3.1 - revamp release readiness
|
|
4
|
+
|
|
5
|
+
### Documentation and tutorial fixtures
|
|
6
|
+
|
|
7
|
+
- Added a terminal demo GIF (ingest, codebook, enrich, query) to the README and docs landing page, rendered reproducibly from `scripts/demo.tape` via `scripts/build_demo_gif.sh`.
|
|
8
|
+
- Fixed Office and ebook ingestion so tuple-returning extractors populate document text instead of silently recording empty content, and hardened the tutorial corpus against short extracted files.
|
|
9
|
+
- Added committed public-domain extraction fixtures across supported ingest file families and documented the supported local file types in ingest help and quickstart docs.
|
|
10
|
+
- Added generated Click-backed CLI documentation, packaged `doctrail docs` manual output, and CI drift checks for CLI, YAML snippets, and `llms-full.txt`.
|
|
11
|
+
- Added packaged `doctrail skill` output and `doctrail skill --install` for installing the Doctrail operating doctrine into agent skill directories.
|
|
12
|
+
- Added codebook-quality prompt guidance to presets, generated enrichment scaffolds, and YAML docs, with cache-prefix caveats for provider prompt caching.
|
|
13
|
+
- Fixed `doctrail new -p` flag mode so scaffold generation stays non-interactive and reports a clear terminal-required error for wizard-only paths.
|
|
14
|
+
- Added a fully offline `doctrail init test` tutorial scaffold with replay fixtures, Federalist Papers examples, UN speech excerpts, and the `doctrail run` alias.
|
|
15
|
+
- Added tutorial ICR replay examples that bracket agreement quality: a crisp `mentions_climate` boolean codebook and a deliberately under-specified `optimism` score.
|
|
16
|
+
- Replaced the tutorial's second enrichment with `securitization`, including replay fixtures that demonstrate gate-dependent null fields.
|
|
17
|
+
- Regenerated the UN speech tutorial corpus as deterministic PDF, DOCX, and HTML containers and re-keyed the UN replay fixtures to the new file hashes.
|
|
18
|
+
- Renamed the repository tutorial fixture directory from `examples/tutorial/data/` to `examples/tutorial/corpus/` so ignored `data/` directories stay local-only.
|
|
19
|
+
- Added source context to model-by-model pivot views so ICR disagreements can be diagnosed directly from the generated review surface.
|
|
20
|
+
|
|
21
|
+
### Storage and provenance
|
|
22
|
+
|
|
23
|
+
- Renamed the package layout to `src/doctrail` and kept the legacy import surface working through compatibility exports.
|
|
24
|
+
- Added normalized enrichment identity: one current row per key, enrichment name, field, model, and prompt hash, enforced by a unique index and upsert semantics.
|
|
25
|
+
- Preserved superseded enrichment rows in side tables during identity migration instead of silently losing recoverable values.
|
|
26
|
+
- Renamed Doctrail-managed physical tables with a leading underscore and managed views with a `v_` prefix so source tables and review surfaces are easier to distinguish.
|
|
27
|
+
- Added prompt, query, run, and project provenance across `_prompts`, `_enrichment_audit`, `_enrichment_runs`, and `_enrichment_run_items`.
|
|
28
|
+
- Added ordered SQLite schema migrations stamped with `PRAGMA user_version`, with the existing idempotent schema guards folded into baseline migration 1.
|
|
29
|
+
- Recorded parsed null answers as completed normalized rows so append mode does not resubmit already answered rows.
|
|
30
|
+
|
|
31
|
+
### Execution and review surfaces
|
|
32
|
+
|
|
33
|
+
- Added query-scoped and enrichment-scoped dedupe paths so append mode can skip successful prior work without treating audit rows alone as completion.
|
|
34
|
+
- Expanded run-aware view creation: run views, final views, pivot/spec/render surfaces, and editable final tables now use the persisted run ledger.
|
|
35
|
+
- Added ICR and override workflows backed by SQLite tables so modeled output, human overrides, and finalized review surfaces remain separate.
|
|
36
|
+
- Silenced cost/pricing warnings for replay-backed tutorial models while preserving warnings for real unknown models.
|
|
37
|
+
|
|
38
|
+
### Release preparation
|
|
39
|
+
|
|
40
|
+
- Added a manual-only release workflow that builds and checks artifacts by default; publishing remains inert unless explicitly enabled with a configured PyPI token.
|
|
41
|
+
- Refreshed the documented configuration surface, including stable, deprecated-but-working, and internal keys.
|
|
42
|
+
|
|
43
|
+
## 2026-03-30 - batch backends, rerun selectors, and env precedence
|
|
44
|
+
|
|
45
|
+
### Batch execution
|
|
46
|
+
|
|
47
|
+
- `--execution-mode openai-batch` now maps direct providers to their native batch APIs while keeping one doctrail-facing workflow:
|
|
48
|
+
- OpenAI: `/v1/batches` with request lines targeting `/v1/chat/completions`
|
|
49
|
+
- Anthropic: `/v1/messages/batches` with request params targeting `/v1/messages`
|
|
50
|
+
- Gemini: File API upload plus `/v1beta/models/{model}:batchGenerateContent`
|
|
51
|
+
- Batch submit, poll, watch, reconcile, and cancel now work through the same CLI path for direct OpenAI, Anthropic, and Gemini models.
|
|
52
|
+
- CLI help, README, docs, and the doctrail skill now make the provider-specific endpoint mapping explicit.
|
|
53
|
+
|
|
54
|
+
### Anthropic batch hardening
|
|
55
|
+
|
|
56
|
+
- Added direct Anthropic batch support for `claude-*` and `anthropic/*` models behind the existing batch mode.
|
|
57
|
+
- Added provider-side schema compatibility handling for Anthropic structured batch output:
|
|
58
|
+
- bounded integer fields no longer emit unsupported `minimum` / `maximum` constraints into the submitted Anthropic batch schema
|
|
59
|
+
- doctrail now warns about those compatibility issues before submission
|
|
60
|
+
- Fixed Anthropic batch polling so provider error objects are serialized cleanly instead of causing downstream JSON serialization failures during reconciliation.
|
|
61
|
+
- Live smoke verification completed successfully with `claude-haiku-4-5`.
|
|
62
|
+
|
|
63
|
+
### Gemini batch changes
|
|
64
|
+
|
|
65
|
+
- Added direct Gemini batch support for `gemini-*` and `models/gemini-*`.
|
|
66
|
+
- Initial Gemini support used inline requests; doctrail now defaults to Google's recommended file-backed batch input mode for Gemini jobs.
|
|
67
|
+
- Gemini batch JSONL request lines are now emitted in the file-input shape Google expects, with stable per-row `key` values for reconciliation.
|
|
68
|
+
- Gemini batch results are now downloaded and reconciled from the provider result file when available.
|
|
69
|
+
- Live verification confirmed that the file-backed path is accepted by Google and produces real `files/...` input handles plus real `batches/...` jobs.
|
|
70
|
+
- Operational caveat: Gemini Batch remains unreliable in practice. Live testing saw long-lived `BATCH_STATE_PENDING` jobs and later `503 UNAVAILABLE` responses from Google's GET batch endpoint after roughly 24 hours. This appears to be a provider-side reliability issue rather than a doctrail endpoint or model-id mismatch.
|
|
71
|
+
|
|
72
|
+
### Targeted reruns
|
|
73
|
+
|
|
74
|
+
- Added `doctrail enrich --where "..."` to filter an enrichment's existing base query with an outer SQL `WHERE` predicate.
|
|
75
|
+
- `--query` remains available as the full-query replacement escape hatch.
|
|
76
|
+
- This makes targeted reruns like date filters, `LIKE`, and explicit key lists possible without cloning the YAML prompt/schema definition.
|
|
77
|
+
|
|
78
|
+
### Environment precedence
|
|
79
|
+
|
|
80
|
+
- Doctrail now prefers the nearest project-local `.env` over inherited shell or global environment variables.
|
|
81
|
+
- This applies to provider resolution and cost/model utilities, so a project can reliably use its own keys without depending on the caller's ambient shell state.
|
|
82
|
+
|
|
83
|
+
## 2026-01-15 - UX overhaul for social scientists
|
|
84
|
+
|
|
85
|
+
### New commands
|
|
86
|
+
|
|
87
|
+
- **`doctrail new`** - Create custom enrichments interactively
|
|
88
|
+
- Interactive mode: `doctrail new`
|
|
89
|
+
- Quick mode: `doctrail new topic -p "Classify topic" -o topic --enum "a,b,c"`
|
|
90
|
+
- Supports: string, integer, boolean, array, enum types
|
|
91
|
+
|
|
92
|
+
- **`doctrail view`** - Manage database views
|
|
93
|
+
- `doctrail view` - list views in database
|
|
94
|
+
- `doctrail view refresh` - execute all `.doctrail/views/*.sql` files
|
|
95
|
+
- `doctrail view new <name>` - create custom view SQL template
|
|
96
|
+
|
|
97
|
+
- **`doctrail query`** - Query database without needing sqlite-utils
|
|
98
|
+
- `doctrail query` - list documents
|
|
99
|
+
- `doctrail query 1` - show document #1 details
|
|
100
|
+
- `doctrail query "SELECT ..."` - run arbitrary SQL
|
|
101
|
+
|
|
102
|
+
### Auto-generated views
|
|
103
|
+
|
|
104
|
+
After enrichment completes, a queryable view is automatically created:
|
|
105
|
+
```
|
|
106
|
+
📊 View updated: enrichments_doctrail_demo
|
|
107
|
+
Query with: doctrail query "SELECT * FROM enrichments_doctrail_demo LIMIT 10"
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
This pivots the long-format `_enrichments` table into wide format for easy querying.
|
|
111
|
+
|
|
112
|
+
### Preset enrichments with aliases
|
|
113
|
+
|
|
114
|
+
- Built-in presets: `summarize`, `language`, `sentiment`, `document_type`, `relevance`, `keywords`, `extract_entities`, `research_methods`
|
|
115
|
+
- British/Australian aliases: `summarise` → `summarize`, `lang` → `language`
|
|
116
|
+
- Presets auto-copy to project folder when first used (so users can edit them)
|
|
117
|
+
|
|
118
|
+
### Schema fixes
|
|
119
|
+
|
|
120
|
+
- Fixed bare type schemas like `{type: string}` being misinterpreted
|
|
121
|
+
- Now correctly wraps with `output_column`: `{summary: {type: string}}`
|
|
122
|
+
- Also handles `{enum: [...]}` and `{enum_list: [...]}` bare schemas
|
|
123
|
+
|
|
124
|
+
### Project tagging
|
|
125
|
+
|
|
126
|
+
- Enrichments now tagged with `project_name` from config by default
|
|
127
|
+
- Enables project-based filtering and automatic view creation
|
|
128
|
+
|
|
129
|
+
### Python-first extraction (from earlier session)
|
|
130
|
+
|
|
131
|
+
- PDF: pymupdf as primary (260x faster than OCR-first approach)
|
|
132
|
+
- EPUB: ebooklib as primary
|
|
133
|
+
- DOCX: python-docx
|
|
134
|
+
- System tools (pdftotext, mutool, etc.) now fallbacks
|
|
@@ -0,0 +1,7 @@
|
|
|
1
|
+
# Contributing
|
|
2
|
+
|
|
3
|
+
Doctrail has both generated and hand-written documentation. `docs/cli.md` is generated at render time from the live Click command tree through `mkdocs-click`. `docs/llms-full.txt` is generated by `scripts/build_llms_full.py`; for the CLI page it renders the same Click tree as plain text so agents see real command help, not a Markdown directive.
|
|
4
|
+
|
|
5
|
+
The CI drift guards check three things: YAML snippets in `docs/yaml.md` and `docs/quickstart.md` parse and validate unless marked as fragments, documented `doctrail ...` commands resolve against the Click tree, and the committed `docs/llms-full.txt` matches a fresh generator run.
|
|
6
|
+
|
|
7
|
+
If you change the CLI or YAML surface, CI fails until docs match. Regenerate with `scripts/build_llms_full.py` and fix the prose in the same commit.
|
doctrail-0.3.1/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Matthew P. Robertson
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|