reprompt-cli 1.0.0__tar.gz → 1.3.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (227) hide show
  1. reprompt_cli-1.3.0/.testmondata +0 -0
  2. reprompt_cli-1.3.0/.testmondata-shm +0 -0
  3. reprompt_cli-1.3.0/.testmondata-wal +0 -0
  4. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/CLAUDE.md +36 -5
  5. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/PKG-INFO +2 -2
  6. reprompt_cli-1.3.0/docs/superpowers/specs/2026-03-22-prompt-compress-design.md +512 -0
  7. reprompt_cli-1.3.0/docs/superpowers/specs/2026-03-23-distill-design.md +375 -0
  8. reprompt_cli-1.3.0/module.yaml +12 -0
  9. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/pyproject.toml +2 -2
  10. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/__init__.py +1 -1
  11. reprompt_cli-1.3.0/src/reprompt/adapters/base.py +43 -0
  12. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/adapters/chatgpt.py +63 -0
  13. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/adapters/claude_code.py +103 -0
  14. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/cli.py +280 -0
  15. reprompt_cli-1.3.0/src/reprompt/core/compress.py +634 -0
  16. reprompt_cli-1.3.0/src/reprompt/core/conversation.py +60 -0
  17. reprompt_cli-1.3.0/src/reprompt/core/distill.py +317 -0
  18. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/core/extractors.py +14 -2
  19. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/core/insights.py +29 -0
  20. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/core/pipeline.py +17 -0
  21. reprompt_cli-1.3.0/src/reprompt/core/privacy.py +146 -0
  22. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/core/prompt_dna.py +1 -0
  23. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/core/scorer.py +4 -1
  24. reprompt_cli-1.3.0/src/reprompt/output/compress_terminal.py +42 -0
  25. reprompt_cli-1.3.0/src/reprompt/output/distill_terminal.py +94 -0
  26. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/output/html_report.py +25 -1
  27. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/output/terminal.py +90 -0
  28. reprompt_cli-1.3.0/tests/test_compress.py +491 -0
  29. reprompt_cli-1.3.0/tests/test_compress_cli.py +133 -0
  30. reprompt_cli-1.3.0/tests/test_compress_dna.py +53 -0
  31. reprompt_cli-1.3.0/tests/test_compress_html.py +46 -0
  32. reprompt_cli-1.3.0/tests/test_compress_insights.py +89 -0
  33. reprompt_cli-1.3.0/tests/test_conversation.py +120 -0
  34. reprompt_cli-1.3.0/tests/test_distill.py +368 -0
  35. reprompt_cli-1.3.0/tests/test_distill_cli.py +144 -0
  36. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_insights.py +89 -0
  37. reprompt_cli-1.3.0/tests/test_parse_conversation_base.py +59 -0
  38. reprompt_cli-1.3.0/tests/test_parse_conversation_chatgpt.py +150 -0
  39. reprompt_cli-1.3.0/tests/test_parse_conversation_claude.py +160 -0
  40. reprompt_cli-1.3.0/tests/test_privacy.py +241 -0
  41. reprompt_cli-1.3.0/tests/test_privacy_cli.py +47 -0
  42. reprompt_cli-1.3.0/tests/test_privacy_e2e.py +65 -0
  43. reprompt_cli-1.3.0/tests/test_privacy_output.py +185 -0
  44. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_scorer.py +30 -0
  45. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/uv.lock +1 -1
  46. reprompt_cli-1.0.0/src/reprompt/adapters/base.py +0 -25
  47. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/.editorconfig +0 -0
  48. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/.github/ISSUE_TEMPLATE/bug_report.md +0 -0
  49. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/.github/ISSUE_TEMPLATE/bug_report.yml +0 -0
  50. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/.github/ISSUE_TEMPLATE/feature_request.md +0 -0
  51. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/.github/ISSUE_TEMPLATE/feature_request.yml +0 -0
  52. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/.github/PULL_REQUEST_TEMPLATE.md +0 -0
  53. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/.github/dependabot.yml +0 -0
  54. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/.github/workflows/ci.yml +0 -0
  55. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/.github/workflows/publish.yml +0 -0
  56. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/.gitignore +0 -0
  57. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/.pre-commit-config.yaml +0 -0
  58. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/CHANGELOG.md +0 -0
  59. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/CODE_OF_CONDUCT.md +0 -0
  60. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/CONTRIBUTING.md +0 -0
  61. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/LICENSE +0 -0
  62. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/README.md +0 -0
  63. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/SECURITY.md +0 -0
  64. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/action.yml +0 -0
  65. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/docs/launch-post.md +0 -0
  66. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/docs/roadmap.md +0 -0
  67. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/docs/superpowers/specs/2026-03-11-html-dashboard-design.md +0 -0
  68. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/docs/superpowers/specs/2026-03-11-merge-view-design.md +0 -0
  69. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/docs/superpowers/specs/2026-03-11-prompt-templates-design.md +0 -0
  70. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/scripts/generate_demo_data.py +0 -0
  71. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/scripts/launch/hn_monitor.py +0 -0
  72. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/scripts/launch/reddit_helper.py +0 -0
  73. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/adapters/__init__.py +0 -0
  74. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/adapters/aider.py +0 -0
  75. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/adapters/claude_chat.py +0 -0
  76. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/adapters/cline.py +0 -0
  77. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/adapters/cursor.py +0 -0
  78. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/adapters/filters.py +0 -0
  79. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/adapters/gemini.py +0 -0
  80. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/adapters/openclaw.py +0 -0
  81. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/bridge/__init__.py +0 -0
  82. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/bridge/handler.py +0 -0
  83. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/bridge/host.py +0 -0
  84. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/bridge/manifest.py +0 -0
  85. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/bridge/protocol.py +0 -0
  86. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/commands/__init__.py +0 -0
  87. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/commands/telemetry.py +0 -0
  88. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/commands/wrapped.py +0 -0
  89. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/config.py +0 -0
  90. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/core/__init__.py +0 -0
  91. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/core/analyzer.py +0 -0
  92. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/core/dedup.py +0 -0
  93. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/core/digest.py +0 -0
  94. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/core/effectiveness.py +0 -0
  95. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/core/extractors_zh.py +0 -0
  96. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/core/lang_detect.py +0 -0
  97. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/core/library.py +0 -0
  98. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/core/lint.py +0 -0
  99. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/core/merge_view.py +0 -0
  100. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/core/models.py +0 -0
  101. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/core/persona.py +0 -0
  102. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/core/recommend.py +0 -0
  103. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/core/segmenter.py +0 -0
  104. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/core/session_meta.py +0 -0
  105. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/core/style.py +0 -0
  106. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/core/templates.py +0 -0
  107. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/core/timeutil.py +0 -0
  108. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/core/trends.py +0 -0
  109. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/core/wrapped.py +0 -0
  110. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/demo.py +0 -0
  111. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/embeddings/__init__.py +0 -0
  112. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/embeddings/base.py +0 -0
  113. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/embeddings/local_embed.py +0 -0
  114. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/embeddings/ollama.py +0 -0
  115. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/embeddings/openai_embed.py +0 -0
  116. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/embeddings/tfidf.py +0 -0
  117. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/mcp.py +0 -0
  118. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/mcp_main.py +0 -0
  119. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/output/__init__.py +0 -0
  120. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/output/chartjs.min.js +0 -0
  121. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/output/json_out.py +0 -0
  122. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/output/markdown.py +0 -0
  123. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/output/wrapped_html.py +0 -0
  124. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/output/wrapped_terminal.py +0 -0
  125. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/py.typed +0 -0
  126. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/sharing/__init__.py +0 -0
  127. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/sharing/client.py +0 -0
  128. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/sharing/clipboard.py +0 -0
  129. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/storage/__init__.py +0 -0
  130. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/storage/db.py +0 -0
  131. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/telemetry/__init__.py +0 -0
  132. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/telemetry/collector.py +0 -0
  133. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/telemetry/consent.py +0 -0
  134. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/telemetry/events.py +0 -0
  135. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/telemetry/prompt.py +0 -0
  136. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/telemetry/queue.py +0 -0
  137. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/src/reprompt/telemetry/sender.py +0 -0
  138. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/__init__.py +0 -0
  139. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/conftest.py +0 -0
  140. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/fixtures/aider_chat_history.md +0 -0
  141. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/fixtures/chatgpt_conversations.json +0 -0
  142. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/fixtures/claude_chat_export.json +0 -0
  143. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/fixtures/claude_session.jsonl +0 -0
  144. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/fixtures/cline_task/api_conversation_history.json +0 -0
  145. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/fixtures/gemini_session.json +0 -0
  146. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/fixtures/openclaw_session.jsonl +0 -0
  147. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_adapter_aider.py +0 -0
  148. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_adapter_chatgpt.py +0 -0
  149. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_adapter_claude.py +0 -0
  150. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_adapter_claude_chat.py +0 -0
  151. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_adapter_cline.py +0 -0
  152. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_adapter_gemini.py +0 -0
  153. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_adapter_openclaw.py +0 -0
  154. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_analyzer.py +0 -0
  155. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_bridge_cli.py +0 -0
  156. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_bridge_handler.py +0 -0
  157. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_bridge_integration.py +0 -0
  158. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_bridge_manifest.py +0 -0
  159. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_bridge_protocol.py +0 -0
  160. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_cli.py +0 -0
  161. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_cli_library_effectiveness.py +0 -0
  162. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_clipboard.py +0 -0
  163. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_config.py +0 -0
  164. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_coverage_boost.py +0 -0
  165. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_cursor_adapter.py +0 -0
  166. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_db.py +0 -0
  167. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_db_digest.py +0 -0
  168. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_db_effectiveness.py +0 -0
  169. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_db_trends.py +0 -0
  170. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_dedup.py +0 -0
  171. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_demo.py +0 -0
  172. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_digest.py +0 -0
  173. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_digest_cli.py +0 -0
  174. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_e2e.py +0 -0
  175. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_effectiveness.py +0 -0
  176. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_embeddings_local.py +0 -0
  177. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_embeddings_ollama.py +0 -0
  178. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_embeddings_openai.py +0 -0
  179. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_empty_state.py +0 -0
  180. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_extractors.py +0 -0
  181. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_extractors_routing.py +0 -0
  182. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_extractors_zh.py +0 -0
  183. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_extractors_zh_e2e.py +0 -0
  184. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_html_report.py +0 -0
  185. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_import_cli.py +0 -0
  186. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_import_e2e.py +0 -0
  187. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_insights_cli.py +0 -0
  188. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_install_hook.py +0 -0
  189. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_lang_detect.py +0 -0
  190. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_library.py +0 -0
  191. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_lint.py +0 -0
  192. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_lint_cli.py +0 -0
  193. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_markdown.py +0 -0
  194. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_mcp.py +0 -0
  195. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_merge_view.py +0 -0
  196. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_models.py +0 -0
  197. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_output.py +0 -0
  198. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_persona.py +0 -0
  199. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_pipeline.py +0 -0
  200. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_prompt_dna.py +0 -0
  201. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_public_api.py +0 -0
  202. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_recommend.py +0 -0
  203. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_schema_version.py +0 -0
  204. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_score_cli.py +0 -0
  205. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_segmenter.py +0 -0
  206. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_share_e2e.py +0 -0
  207. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_sharing_client.py +0 -0
  208. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_style.py +0 -0
  209. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_telemetry_cli.py +0 -0
  210. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_telemetry_collector.py +0 -0
  211. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_telemetry_consent.py +0 -0
  212. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_telemetry_e2e.py +0 -0
  213. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_telemetry_events.py +0 -0
  214. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_telemetry_prompt.py +0 -0
  215. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_telemetry_queue.py +0 -0
  216. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_telemetry_sender.py +0 -0
  217. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_templates.py +0 -0
  218. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_timeutil.py +0 -0
  219. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_trends.py +0 -0
  220. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_trends_cli.py +0 -0
  221. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_use_cli.py +0 -0
  222. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_wrapped.py +0 -0
  223. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_wrapped_cli.py +0 -0
  224. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_wrapped_e2e.py +0 -0
  225. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_wrapped_html.py +0 -0
  226. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_wrapped_output.py +0 -0
  227. {reprompt_cli-1.0.0 → reprompt_cli-1.3.0}/tests/test_wrapped_share.py +0 -0
Binary file
Binary file
Binary file
@@ -18,7 +18,7 @@ uv run python -m build # build wheel
18
18
 
19
19
  ```
20
20
  src/reprompt/
21
- ├── cli.py # Typer CLI (scan, import, report, search, library, recommend, demo, status, purge, install-hook, install-extension, extension-status, score, compare, insights, digest, style, use) + plugin loading
21
+ ├── cli.py # Typer CLI (scan, import, report, search, library, recommend, demo, status, purge, install-hook, install-extension, extension-status, score, compare, insights, digest, style, use, privacy, compress, distill) + plugin loading
22
22
  ├── config.py # pydantic-settings, env vars (REPROMPT_ prefix) + TOML config
23
23
  ├── demo.py # Built-in demo data generator (no network required)
24
24
  ├── core/
@@ -38,9 +38,13 @@ src/reprompt/
38
38
  │ ├── lang_detect.py # Language detection (zh/ja/ko/en) via Unicode ranges
39
39
  │ ├── extractors_zh.py # Chinese feature extraction (jieba + Chinese regex)
40
40
  │ ├── persona.py # 6 prompt personas (Architect/Debugger/Explorer/Novelist/Sniper/Teacher)
41
- └── wrapped.py # WrappedReport dataclass + build_wrapped(db) aggregation
41
+ ├── wrapped.py # WrappedReport dataclass + build_wrapped(db) aggregation
42
+ │ ├── privacy.py # Privacy metadata registry + exposure summary per adapter
43
+ │ ├── compress.py # 4-layer prompt compression (char norm + phrase simplify + filler delete + structure cleanup)
44
+ │ ├── conversation.py # ConversationTurn, Conversation, DistillResult dataclasses
45
+ │ └── distill.py # 6-signal importance scoring + filtering + summary generation
42
46
  ├── adapters/
43
- │ ├── base.py # BaseAdapter ABC
47
+ │ ├── base.py # BaseAdapter ABC + parse_conversation() default
44
48
  │ ├── claude_code.py # Claude Code JSONL parser
45
49
  │ ├── openclaw.py # OpenClaw JSON parser (supports ~/.openclaw/ + legacy ~/.opencode/)
46
50
  │ ├── cursor.py # Cursor IDE .vscdb parser (cursorDiskKV + legacy ItemTable)
@@ -80,7 +84,9 @@ src/reprompt/
80
84
  ├── json_out.py # JSON for pipelines
81
85
  ├── markdown.py # Markdown export
82
86
  ├── wrapped_terminal.py # Rich Prompt Wrapped report rendering
83
- └── wrapped_html.py # Self-contained HTML share card (dark theme)
87
+ ├── wrapped_html.py # Self-contained HTML share card (dark theme)
88
+ ├── compress_terminal.py # Rich output for compress command
89
+ └── distill_terminal.py # Rich output for distill command
84
90
  ```
85
91
 
86
92
  ## Data Flow
@@ -117,7 +123,7 @@ reprompt-extension (private) ← Browser extension: Chrome/Firefox prompt capt
117
123
  - Pattern upsert (not clear+re-insert) for stable IDs
118
124
  - Prompts starting with `<` are filtered (system-injected XML)
119
125
  - Config: env vars (REPROMPT_ prefix) > TOML (~/.config/reprompt/config.toml) > defaults
120
- - Tests: pytest, 923 tests, 95% coverage target
126
+ - Tests: pytest, 1217 tests, 95% coverage target
121
127
 
122
128
  ## Prompt Science Engine
123
129
 
@@ -128,3 +134,28 @@ Research-backed prompt analysis (added v0.6.0):
128
134
 
129
135
  Papers: Google 2512.14982 (repetition), Stanford 2307.03172 (position),
130
136
  SPELL EMNLP 2023 (perplexity), Prompt Report 2406.06608 (taxonomy).
137
+
138
+ ## Prompt Compression Engine
139
+
140
+ Rule-based prompt optimization (added v1.2.0):
141
+ - `reprompt compress "prompt"` — 4-layer compression with token savings display
142
+ - Layer 0: Character normalization (curly quotes, zero-width chars, NFKC)
143
+ - Layer 2: Phrase simplification (40+ zh, 50+ en rules)
144
+ - Layer 1: Filler word deletion (50+ zh, 40+ en phrases, jieba-aware)
145
+ - Layer 3: Structure cleanup (markdown strip, emoji, LLM output artifacts)
146
+ - `--json` for pipeline integration, `--copy` to clipboard
147
+ - `compressibility` field in PromptDNA, visible in insights + HTML dashboard
148
+
149
+ Sources: LLMLingua (Microsoft), CompactPrompt, TSC, stopwords-iso/zh, Prompt Report 2406.06608.
150
+
151
+ ## Conversation Distillation Engine
152
+
153
+ Conversation-level analysis (added v1.3.0):
154
+ - `reprompt distill` — extract important turns from AI conversations
155
+ - 6-signal importance scoring: position, length, tool_trigger, error_recovery, semantic_shift, uniqueness
156
+ - Hybrid data source: raw session files (full conversation) + DB enrichment
157
+ - `parse_conversation()` on adapters returns both user and assistant turns
158
+ - Claude Code and ChatGPT adapters have full implementations; others fall back to user-only
159
+ - `--last N` for recent sessions, `--summary` for compressed output, `--json`, `--copy`
160
+ - `--threshold` to control importance cutoff (default 0.3)
161
+ - Pro plugin interface: `reprompt.distill_backends` entry point for LLM summarization (future)
@@ -1,7 +1,7 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: reprompt-cli
3
- Version: 1.0.0
4
- Summary: Discover, analyze, and evolve your best prompts from AI coding sessions
3
+ Version: 1.3.0
4
+ Summary: Discover, analyze, and optimize your prompts from AI coding sessions
5
5
  Project-URL: Homepage, https://github.com/reprompt-dev/reprompt
6
6
  Project-URL: Repository, https://github.com/reprompt-dev/reprompt
7
7
  Project-URL: Issues, https://github.com/reprompt-dev/reprompt/issues
@@ -0,0 +1,512 @@
1
+ # reprompt compress — Prompt Compression Design
2
+
3
+ **Date:** 2026-03-22
4
+ **Version:** v1.2.x (after Phase 4 extension release)
5
+ **Status:** Approved (updated with research findings)
6
+
7
+ ## Summary
8
+
9
+ Rule-based prompt compression that removes filler words, simplifies phrases, normalizes characters, and cleans LLM output formatting — no LLM required. First step from "analyze your prompts" toward "improve your prompts."
10
+
11
+ Informed by: LLMLingua (Microsoft), CompactPrompt, TSC, metawake/prompt_compressor, clean-text/ftfy, sanitext, stopwords-iso/zh, goto456/stopwords, and academic hedging/discourse marker research. Full research reports in `.claude/cache/agents/oracle/output-2026-03-22-*.md`.
12
+
13
+ ## Decisions
14
+
15
+ | Decision | Choice | Rationale |
16
+ |----------|--------|-----------|
17
+ | Input | CLI argument only (`reprompt compress "text"`) | Start simple; --file / stdin / --last N deferred |
18
+ | Approach | jieba + 4-layer rule engine | Research validated: rule-based achieves 20-35% compression without ML (TSC reports 40-60%, metawake 22%) |
19
+ | Storage | Not stored to DB | Ad-hoc tool; dashboard stats derived from prompt_dna compressibility field |
20
+ | Compression level | Medium | Filler removal + phrase simplification + structure cleanup; no sentence rewriting |
21
+ | No spaCy/ML deps | Curated word lists instead of POS tagging | Zero-config principle; spaCy adds 300MB+ |
22
+
23
+ ## Architecture
24
+
25
+ ### New file: `src/reprompt/core/compress.py`
26
+
27
+ Four-layer pipeline:
28
+
29
+ ```
30
+ Input text
31
+ → lang_detect() (reuse core/lang_detect.py)
32
+ → Detect mixed language (zh_ratio 0.2–0.8 → "mixed")
33
+ → Mark protected zones (code blocks, file paths, URLs, error messages)
34
+ → Layer 0: Character normalization (lossless, always-on)
35
+ → Layer 2: Phrase simplification (runs before Layer 1)
36
+ → Layer 1: Filler word deletion (jieba segmentation + phrase table)
37
+ → Layer 3: Structure cleanup (markdown, whitespace, LLM output artifacts)
38
+ → Restore protected zones
39
+ → Return CompressResult
40
+ ```
41
+
42
+ **Why Layer 2 before Layer 1:** Layer 2's replacement table contains phrases that overlap with Layer 1's deletion list. Running simplification first ensures meaningful replacements ("帮我看看" → "检查") take priority over blind deletion. Layer 1 then only removes remaining filler words that Layer 2 did not match.
43
+
44
+ **Table deduplication rule:** Phrases in `*_PHRASE_SIMPLIFY` must NOT also appear in `*_FILLER_PHRASES`. Layer 1 tables contain only words not covered by Layer 2.
45
+
46
+ ### Data model
47
+
48
+ ```python
49
+ @dataclass
50
+ class CompressResult:
51
+ original: str # Input text
52
+ compressed: str # Compressed output
53
+ original_tokens: int # Estimated token count (see Token Counting below)
54
+ compressed_tokens: int # Compressed token count
55
+ savings_pct: float # 1 - compressed/original
56
+ changes: list[str] # Per-layer summaries
57
+ language: str # Detected language (zh/en/mixed)
58
+ ```
59
+
60
+ **`changes` field contract:** Each entry is a per-layer summary string with count and layer name. Examples:
61
+ - `"normalized 3 characters"` (Layer 0)
62
+ - `"simplified 2 phrases"` (Layer 2)
63
+ - `"removed 5 filler words"` (Layer 1)
64
+ - `"cleaned 4 markdown artifacts"` (Layer 3)
65
+
66
+ Terminal output joins these with `, `. No per-match detail — keep it human-readable.
67
+
68
+ ### Token counting
69
+
70
+ - Chinese-dominant text (zh_ratio > 0.5): count characters (excluding whitespace and punctuation)
71
+ - English-dominant text (zh_ratio <= 0.5): count whitespace-separated words
72
+ - Mixed text uses the dominant language's method based on zh_ratio threshold
73
+
74
+ ### Protected zones
75
+
76
+ Regions that must not be compressed:
77
+
78
+ - Fenced code blocks (``` ... ```)
79
+ - Inline code (`...`)
80
+ - File paths (`/path/to/file`, `file.py`)
81
+ - URLs (`http://...`, `https://...`)
82
+ - Stack traces / error message patterns
83
+ - Named entities (code identifiers, function names, variable names)
84
+
85
+ Implementation: regex-replace with numbered placeholders before compression, restore after.
86
+ Reuse detection patterns from `extractors.py` `_compute_specificity`.
87
+
88
+ ---
89
+
90
+ ## Layer 0 — Character Normalization (lossless, always-on)
91
+
92
+ Source: ftfy, sanitext, llm-textfix research.
93
+
94
+ Fixes common LLM output character problems without changing meaning:
95
+
96
+ ```python
97
+ CHAR_NORMALIZE = {
98
+ # Curly quotes → straight
99
+ "\u201c": '"', "\u201d": '"', # " " → "
100
+ "\u2018": "'", "\u2019": "'", # ' ' → '
101
+ # Dashes
102
+ "\u2014": "-", # em dash → hyphen
103
+ "\u2013": "-", # en dash → hyphen
104
+ # Spaces
105
+ "\u00a0": " ", # non-breaking space → space
106
+ "\u200b": "", # zero-width space → remove
107
+ "\u200c": "", # zero-width non-joiner → remove
108
+ "\u200d": "", # zero-width joiner → remove
109
+ "\ufeff": "", # BOM / zero-width no-break space → remove
110
+ "\u00ad": "", # soft hyphen → remove
111
+ # Full-width → half-width (common in CJK text)
112
+ # Applied via unicodedata.normalize('NFKC', text)
113
+ }
114
+ ```
115
+
116
+ Also applies Unicode NFKC normalization (full-width A → A, fi ligature → fi, etc.).
117
+
118
+ ---
119
+
120
+ ## Layer 2 — Phrase Simplification (runs first)
121
+
122
+ Replacement tables. Longer patterns matched first to avoid partial matches.
123
+
124
+ ### Chinese — Polite Request Prefixes (remove entirely)
125
+
126
+ Source: FluentU, HanyuAce, spoken Chinese corpus research, prompt optimization guides.
127
+
128
+ ```python
129
+ ZH_PHRASE_SIMPLIFY = {
130
+ # Polite request prefixes → remove
131
+ "不好意思打扰一下": "",
132
+ "冒昧问一下": "",
133
+ "能不能帮我": "",
134
+ "可不可以帮我": "",
135
+ "可以帮我...吗": "",
136
+ "你能帮我...吗": "",
137
+ "麻烦你帮我": "",
138
+ "麻烦帮忙": "",
139
+ "我想请你": "",
140
+ "我想请问": "",
141
+ "我想问一下": "",
142
+ "我需要你帮我": "",
143
+ "请你帮我": "",
144
+ "是否可以": "",
145
+ "如果可以的话": "",
146
+ "如果方便的话": "",
147
+ "如果不麻烦的话": "",
148
+ # Verbose action phrases → concise verbs
149
+ "帮我检查一下": "检查",
150
+ "帮我看看": "检查",
151
+ "帮我看一下": "检查",
152
+ "帮我分析一下": "分析",
153
+ "帮我写一个": "写",
154
+ "帮我写一下": "写",
155
+ "帮我改一下": "修改",
156
+ "帮我修改一下": "修改",
157
+ "帮我解释一下": "解释",
158
+ "帮我翻译一下": "翻译",
159
+ "帮我总结一下": "总结",
160
+ "帮我优化一下": "优化",
161
+ "帮我生成一个": "生成",
162
+ "帮我想想": "建议",
163
+ "帮我想一下": "建议",
164
+ # Verbose expressions → concise
165
+ "有没有什么办法": "如何",
166
+ "有没有什么好的方法": "最佳方法",
167
+ "有没有什么": "有哪些",
168
+ "能不能给我一些建议": "建议",
169
+ "你有什么建议吗": "建议",
170
+ "我现在遇到了一个问题": "", # state problem directly
171
+ "我想知道": "",
172
+ "我想要你": "",
173
+ "我希望你能": "",
174
+ # Verbose expressions
175
+ "在...方面": "关于",
176
+ "关于...这个问题": "关于",
177
+ "在这种情况下": "此时",
178
+ # Redundant intensifiers
179
+ "非常非常": "非常",
180
+ "特别特别": "非常",
181
+ "尽可能地": "尽量",
182
+ }
183
+ ```
184
+
185
+ ### English — Verbose Requests & Phrasing
186
+
187
+ Source: TSC, metawake/prompt_compressor, Portkey token optimization, IBM prompt engineering, academic hedging research.
188
+
189
+ ```python
190
+ EN_PHRASE_SIMPLIFY = {
191
+ # Polite request prefixes → remove (zero-information per TSC)
192
+ "I was wondering if you could": "",
193
+ "Could you please provide me with": "provide",
194
+ "I would like you to create a list of": "list",
195
+ "Can you help me understand": "explain",
196
+ "Could you possibly provide": "provide",
197
+ "Would you be able to": "",
198
+ "Can you go ahead and": "",
199
+ "I want you to help me with": "",
200
+ "What I'd like is for you to": "",
201
+ "I want you to": "",
202
+ "I need you to": "",
203
+ "I would like you to": "",
204
+ "Could you please": "",
205
+ "Can you help me": "",
206
+ "Go ahead and": "",
207
+ "Feel free to": "",
208
+ "Please make sure that": "ensure",
209
+ # Preamble phrases → remove
210
+ "I'm working on a project and": "",
211
+ "so basically what I need is": "",
212
+ "I have a question about": "",
213
+ "my question is": "",
214
+ "let me explain": "",
215
+ "here's what I need": "",
216
+ "what I need is": "",
217
+ "what I'm looking for is": "",
218
+ "I'm looking for": "",
219
+ "I'm trying to": "",
220
+ "I'd like to ask": "",
221
+ # Verbose phrasing → concise (per CompactPrompt/TSC)
222
+ "in order to": "to",
223
+ "due to the fact that": "because",
224
+ "for the purpose of": "to",
225
+ "with regard to": "about",
226
+ "with respect to": "about",
227
+ "in terms of": "regarding",
228
+ "as a result of": "because",
229
+ "in the event that": "if",
230
+ "in the case that": "if",
231
+ "at this point in time": "now",
232
+ "at the present time": "now",
233
+ "prior to": "before",
234
+ "subsequent to": "after",
235
+ "a large number of": "many",
236
+ "the vast majority of": "most",
237
+ "a wide variety of": "various",
238
+ # Periphrastic verbs → simple verbs
239
+ "take into consideration": "consider",
240
+ "take into account": "consider",
241
+ "come to the conclusion": "conclude",
242
+ "give an explanation of": "explain",
243
+ "provide a description of": "describe",
244
+ "make a decision": "decide",
245
+ "conduct an analysis of": "analyze",
246
+ "perform a review of": "review",
247
+ "is able to": "can",
248
+ "has the ability to": "can",
249
+ # Redundant pairs
250
+ "each and every": "every",
251
+ "first and foremost": "first",
252
+ "any and all": "all",
253
+ "completely and totally": "completely",
254
+ "various different": "various",
255
+ }
256
+ ```
257
+
258
+ ---
259
+
260
+ ## Layer 1 — Filler Word Deletion (runs second)
261
+
262
+ Categorized by information density (Prompt Report arXiv:2406.06608 taxonomy):
263
+ - **Zero-information:** Always removable (politeness, tag questions, emotional fillers)
264
+ - **Low-information:** Removable in most contexts (discourse fillers, hedges, vague enumerators)
265
+
266
+ ### Chinese Filler Phrases
267
+
268
+ Source: FluentU, HanyuAce, Frontiers corpus study, stopwords-iso/zh, goto456/stopwords.
269
+
270
+ ```python
271
+ ZH_FILLER_PHRASES = [
272
+ # Discourse fillers (话语填充词) — hesitation/stalling
273
+ "嗯", "呃", "哦", "嘛", "啦", "喽", "呗",
274
+ # Verbal tics (口头禅)
275
+ "然后呢", "就是说", "那个", "那么", "那什么",
276
+ "你知道吗", "你知道", "你看", "我跟你说", "怎么说呢",
277
+ # Hedge/softening (语气缓和)
278
+ "基本上", "其实", "反正", "总之", "所以说",
279
+ "说实话", "老实说", "说白了", "毕竟", "其实就是",
280
+ # Temporal fillers
281
+ "的时候", "的话", "到时候",
282
+ # Vague enumerators
283
+ "之类的", "什么的", "啥的", "诸如此类",
284
+ # Tag questions (seeking agreement — zero information to LLM)
285
+ "对吧", "对不对", "是不是", "是吧", "好吧", "行吧",
286
+ "这样子", "就这样",
287
+ # Particles (zero-info in prompt context)
288
+ "一下", "一些",
289
+ # Preambles (state the question/task directly instead)
290
+ "我想问", "请问一下",
291
+ ]
292
+ ```
293
+
294
+ ### English Filler Phrases
295
+
296
+ Source: Wikipedia discourse markers, Cambridge filler words, Enago/SJSU hedging research, Portkey token optimization.
297
+
298
+ ```python
299
+ EN_FILLER_PHRASES = [
300
+ # Discourse fillers
301
+ "basically", "actually", "essentially", "literally",
302
+ "honestly", "you know", "I mean", "like",
303
+ "well", "anyway", "right", "okay so",
304
+ "the thing is", "here's the thing",
305
+ "as a matter of fact", "at the end of the day",
306
+ "to be honest", "to be frank", "in my opinion",
307
+ # Hedging language (per academic hedging research — Enago, SJSU)
308
+ "it seems like", "it appears that", "apparently",
309
+ "presumably", "to some extent", "to a certain degree",
310
+ "more or less", "roughly", "arguably",
311
+ "I believe", "I suppose", "I assume",
312
+ "not sure but", "I'm not entirely sure but",
313
+ # Politeness markers (zero-info per TSC)
314
+ "please", "kindly",
315
+ "I would appreciate if", "I would really appreciate",
316
+ "thank you", "thanks in advance", "thank you so much",
317
+ "if you don't mind", "I'd be grateful if",
318
+ "it would be great if", "I was wondering if",
319
+ "would it be possible to",
320
+ # Sentence-initial fillers (only when at start of sentence)
321
+ "so", # when sentence-initial filler, not conjunction
322
+ ]
323
+ ```
324
+
325
+ ---
326
+
327
+ ## Layer 3 — Structure Cleanup (enhanced with LLM output patterns)
328
+
329
+ Source: budparr markdown regex, strip-markdown, sanitext, llm-textfix, clean-text.
330
+
331
+ Three sub-layers:
332
+
333
+ ### 3A. Whitespace normalization
334
+ ```python
335
+ text = re.sub(r'\n{3,}', '\n\n', text) # Collapse 3+ newlines → 2
336
+ text = re.sub(r'[ \t]+\n', '\n', text) # Trailing whitespace on lines
337
+ text = re.sub(r'\n[ \t]+\n', '\n\n', text) # Lines with only whitespace
338
+ text = re.sub(r' {2,}', ' ', text) # Multiple spaces → single
339
+ ```
340
+
341
+ ### 3B. Markdown/LLM output cleanup
342
+ ```python
343
+ # Header normalization (LLM outputs often have excessive depth)
344
+ text = re.sub(r'^#{5,}\s', '#### ', text, flags=re.M) # Cap at H4
345
+
346
+ # Strip formatting markers (common in pasted LLM output context)
347
+ text = re.sub(r'\*{3}([^*]+)\*{3}', r'\1', text) # ***bold italic*** → text
348
+ text = re.sub(r'\*{2}([^*]+)\*{2}', r'\1', text) # **bold** → text
349
+ text = re.sub(r'\*([^*]+)\*', r'\1', text) # *italic* → text
350
+
351
+ # Remove decorative elements
352
+ text = re.sub(r'^---+$', '', text, flags=re.M) # Horizontal rules
353
+ text = re.sub(r'^===+$', '', text, flags=re.M) # Alt horizontal rules
354
+
355
+ # Normalize bullet markers (inconsistent formatting from LLM)
356
+ text = re.sub(r'^[\*\-\+]\s', '- ', text, flags=re.M)
357
+
358
+ # Remove decorative emoji and symbols
359
+ text = re.sub(r'[\U0001F600-\U0001F9FF]', '', text) # Emoji block
360
+ text = re.sub(r'[✓✗✘✔✕★☆●○■□▸▹►▻▪▫]', '', text) # Decorative symbols
361
+ ```
362
+
363
+ ### 3C. Punctuation and conjunction cleanup
364
+ ```python
365
+ # Merge duplicate punctuation
366
+ text = re.sub(r',{2,}', ',', text) # ,,→ ,
367
+ text = re.sub(r'。{2,}', '。', text) # 。。→ 。
368
+ text = re.sub(r',{2,}', ',', text) # ,, → ,
369
+ text = re.sub(r'\.{4,}', '...', text) # .... → ...
370
+
371
+ # Remove dangling conjunctions at sentence start (artifact of phrase deletion)
372
+ ZH_DANGLING = ["然后", "而且", "并且", "所以", "因为", "但是", "不过"]
373
+ EN_DANGLING = ["and", "but", "so", "then", "also", "however", "therefore"]
374
+ # Match at line/sentence start after prior deletions
375
+
376
+ # Remove empty lines left by deletions
377
+ text = re.sub(r'\n{3,}', '\n\n', text) # Final pass
378
+ ```
379
+
380
+ ---
381
+
382
+ ## Language Handling
383
+
384
+ - `lang_detect()` from `core/lang_detect.py` returns `LanguageInfo` with a single `lang` value (zh/en/ja/ko)
385
+ - **Mixed detection:** `compress.py` computes `zh_ratio` from `LanguageInfo.script_ratios["cjk"]`. If zh_ratio is between 0.2 and 0.8, treat as "mixed" for `CompressResult.language`. `lang_detect` itself never returns "mixed".
386
+ - Chinese (zh_ratio > 0.5): jieba segmentation → word-level matching → phrase replacement
387
+ - English (zh_ratio <= 0.2): whitespace tokenization → phrase replacement
388
+ - Mixed (0.2 < zh_ratio <= 0.5): segment by sentence, apply Chinese rules to Chinese sentences and English rules to English sentences
389
+
390
+ ## jieba Graceful Degradation
391
+
392
+ If jieba is not installed, fall back to character-level substring matching for Chinese text, consistent with `extractors_zh.py`. Compression quality degrades (less precise word boundaries) but does not error.
393
+
394
+ ## CLI Command
395
+
396
+ ```
397
+ reprompt compress "text"
398
+ ```
399
+
400
+ **Options:**
401
+ - `--json` — JSON output (CompressResult as dict)
402
+ - `--copy` — Copy compressed result to clipboard (reuse `sharing/clipboard.py`). If `copy_to_clipboard()` returns False, print warning: "Could not copy to clipboard (xclip/xsel not found)"
403
+
404
+ **Terminal output (Rich):**
405
+
406
+ ```
407
+ Original: 帮我看看这个文件的时候,我们需要检查一下错误处理的部分
408
+ Compressed: 检查此文件错误处理部分
409
+
410
+ Tokens: 28 → 11 (61% saved)
411
+ Changes: simplified 1 phrase, removed 3 filler words
412
+ ```
413
+
414
+ **New file: `src/reprompt/output/compress_terminal.py`** — Rich rendering for compress results.
415
+
416
+ ## Dashboard Integration
417
+
418
+ ### prompt_dna: new `compressibility` field
419
+
420
+ In `extractors.py` `extract_features()`, add:
421
+
422
+ ```python
423
+ compressibility: float # 0.0 - 1.0, computed via lightweight compress pass
424
+ ```
425
+
426
+ Calculated during scan by running Layer 0 + Layer 2 + Layer 1 on the prompt text and measuring `1 - (compressed_len / original_len)`. Stored in prompt_dna alongside existing features.
427
+
428
+ **`feature_vector()` audit:** Adding `compressibility` to `PromptDNA` changes the length of `feature_vector()` output. Before merging, audit `scorer.py` and any other consumer of `feature_vector()` to confirm they do not depend on positional indexing. The scorer uses named field access (`dna.context_specificity`), not vector positions, so this should be safe — but must be verified during implementation.
429
+
430
+ ### Terminal report (`reprompt report`)
431
+
432
+ Add one line in the insights section:
433
+
434
+ ```
435
+ Compressibility: avg 23% — your prompts could be ~23% shorter without losing information
436
+ ```
437
+
438
+ ### HTML dashboard (`reprompt report --html`)
439
+
440
+ Add compressibility stats to `report_data["overview"]` dict (computed in `pipeline.py` alongside existing overview stats). The HTML template renders a new card:
441
+ - Average compressibility percentage + visual bar
442
+ - Top 3 most compressible prompts (hover shows before/after)
443
+
444
+ ### `reprompt insights`
445
+
446
+ Add compressibility insight using the existing dict shape:
447
+
448
+ ```python
449
+ {
450
+ "category": "verbosity",
451
+ "finding": f"You: {avg_compress:.0%} avg compressible content",
452
+ "optimal": "Research-optimal: <15%",
453
+ "action": "Remove filler phrases, be more direct with instructions",
454
+ "impact": "medium",
455
+ }
456
+ ```
457
+
458
+ ## Files to Create/Modify
459
+
460
+ | File | Action | Purpose |
461
+ |------|--------|---------|
462
+ | `core/compress.py` | **Create** | 4-layer compression engine + CompressResult |
463
+ | `output/compress_terminal.py` | **Create** | Rich terminal rendering |
464
+ | `cli.py` | Modify | Add `compress` command |
465
+ | `core/extractors.py` | Modify | Add `compressibility` field to extract_features |
466
+ | `core/prompt_dna.py` | Modify | Add `compressibility` to PromptDNA (audit `feature_vector()`) |
467
+ | `core/insights.py` | Modify | Add compressibility insight |
468
+ | `core/pipeline.py` | Modify | Compute avg compressibility for report_data["overview"] |
469
+ | `output/terminal.py` | Modify | Add compressibility line to report |
470
+ | `output/html_report.py` | Modify | Add compressibility card using report_data["overview"] |
471
+
472
+ ## Testing
473
+
474
+ - Unit tests for each compression layer (0, 1, 2, 3) with zh + en
475
+ - Layer execution order test (Layer 2 before Layer 1, no overlapping matches)
476
+ - Character normalization tests (curly quotes, zero-width chars, full-width)
477
+ - Markdown cleanup tests (excessive newlines, bold/italic strip, header cap, emoji removal)
478
+ - Protected zone tests (code blocks, URLs, file paths not compressed)
479
+ - Edge cases: empty input, pure code, pure English, pure Chinese, mixed zh/en
480
+ - jieba-absent fallback test (character-level matching still works)
481
+ - CompressResult correctness (savings_pct matches actual, changes list format)
482
+ - Token counting: zh-dominant vs en-dominant vs mixed
483
+ - CLI integration test (--json output format, --copy with clipboard failure)
484
+ - Compressibility field in prompt_dna (scan pipeline)
485
+ - `feature_vector()` length stability test
486
+
487
+ ## Research References
488
+
489
+ | Source | Used For |
490
+ |--------|----------|
491
+ | [LLMLingua (Microsoft)](https://arxiv.org/abs/2310.05736) | Information density taxonomy (not used for ML, only for categorization) |
492
+ | [CompactPrompt](https://arxiv.org/abs/2510.18043) | Self-information taxonomy: zero/low/compressible/context-dependent tokens |
493
+ | [TSC](https://developer-service.blog/telegraphic-semantic-compression-tsc-a-semantic-compression-method-for-llm-contexts/) | Grammar filtering rules, preserve list (nouns, verbs, numbers, entities, negations) |
494
+ | [metawake/prompt_compressor](https://github.com/metawake/prompt_compressor) | YAML rule engine pattern, 22% avg compression benchmark |
495
+ | [stopwords-iso/zh](https://github.com/stopwords-iso/stopwords-zh) | 1,892 Chinese stop words |
496
+ | [goto456/stopwords](https://github.com/goto456/stopwords) | HIT/Baidu/SCU Chinese stop word lists |
497
+ | [ftfy](https://github.com/rspeer/python-ftfy) | Character normalization patterns |
498
+ | [sanitext](https://github.com/panispani/sanitext) | Zero-width char removal, homoglyph mapping |
499
+ | [clean-text](https://github.com/jfilter/clean-text) | URL/email stripping, Unicode normalization |
500
+ | [Prompt Report (arXiv:2406.06608)](https://arxiv.org/abs/2406.06608) | Prompt taxonomy for categorizing compression rules |
501
+ | [Enago: Hedging in Academic Writing](https://www.enago.com/academy/hedging-in-academic-writing/) | English hedging language list |
502
+ | [FluentU Chinese Filler Words](https://www.fluentu.com/blog/chinese/chinese-filler-words/) | Chinese filler word categories |
503
+ | [Frontiers: ni zhidao corpus](https://www.frontiersin.org/articles/10.3389/fpsyg.2021.716791) | Chinese pragmatic marker research |
504
+
505
+ ## Not in Scope (deferred)
506
+
507
+ - `--file` / stdin / `--last N` input modes
508
+ - Custom filler word config via config.toml / YAML (inspired by metawake pattern)
509
+ - LLM-powered semantic compression (`reprompt suggest`, v1.3)
510
+ - Sentence rewriting / paraphrasing
511
+ - spaCy POS tagging (TSC approach — too heavy for zero-config)
512
+ - N-gram alias compression (CompactPrompt/TokenSpan — needs repeated phrases, better for long docs)