rubrify 0.0.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (304) hide show
  1. rubrify-0.0.1/PKG-INFO +5 -0
  2. rubrify-0.0.1/plans/rubrify-library_requirements.md +227 -0
  3. rubrify-0.0.1/pyproject.toml +9 -0
  4. rubrify-0.0.1/references/main/gist-ae3976ad/rubric_draft.md +193 -0
  5. rubrify-0.0.1/references/main/gist-ff87ac23/red_team_rubric.py +260 -0
  6. rubrify-0.0.1/references/main/rubrics/LICENSE +21 -0
  7. rubrify-0.0.1/references/main/rubrics/README.md +21 -0
  8. rubrify-0.0.1/references/main/rubrics/books2rubrics/on_writing_well_v1.xml +232 -0
  9. rubrify-0.0.1/references/main/rubrics/books2rubrics/on_writing_well_v2.xml +287 -0
  10. rubrify-0.0.1/references/main/rubrics/books2rubrics/on_writing_well_v3.xml +317 -0
  11. rubrify-0.0.1/references/main/rubrics/special_ones/anti_slop_rubric.xml +143 -0
  12. rubrify-0.0.1/references/main/rubrics/special_ones/completeness_rubric.md +96 -0
  13. rubrify-0.0.1/references/main/rubrics/special_ones/slurs.xml +27 -0
  14. rubrify-0.0.1/references/third-party/prose/.claude-plugin/README.md +16 -0
  15. rubrify-0.0.1/references/third-party/prose/.claude-plugin/marketplace.json +14 -0
  16. rubrify-0.0.1/references/third-party/prose/.claude-plugin/plugin.json +19 -0
  17. rubrify-0.0.1/references/third-party/prose/.gitignore +1 -0
  18. rubrify-0.0.1/references/third-party/prose/CHANGELOG.md +80 -0
  19. rubrify-0.0.1/references/third-party/prose/CONTRIBUTING.md +45 -0
  20. rubrify-0.0.1/references/third-party/prose/LICENSE +21 -0
  21. rubrify-0.0.1/references/third-party/prose/PRIVACY.md +27 -0
  22. rubrify-0.0.1/references/third-party/prose/README.md +197 -0
  23. rubrify-0.0.1/references/third-party/prose/TERMS.md +75 -0
  24. rubrify-0.0.1/references/third-party/prose/assets/README.md +13 -0
  25. rubrify-0.0.1/references/third-party/prose/assets/readme-header.svg +32 -0
  26. rubrify-0.0.1/references/third-party/prose/commands/README.md +17 -0
  27. rubrify-0.0.1/references/third-party/prose/commands/prose-boot.md +16 -0
  28. rubrify-0.0.1/references/third-party/prose/commands/prose-compile.md +20 -0
  29. rubrify-0.0.1/references/third-party/prose/commands/prose-run.md +25 -0
  30. rubrify-0.0.1/references/third-party/prose/skills/README.md +15 -0
  31. rubrify-0.0.1/references/third-party/prose/skills/open-prose/README.md +50 -0
  32. rubrify-0.0.1/references/third-party/prose/skills/open-prose/SKILL.md +391 -0
  33. rubrify-0.0.1/references/third-party/prose/skills/open-prose/SOUL.md +1 -0
  34. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/01-hello-world.md +10 -0
  35. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/02-research-and-summarize.md +15 -0
  36. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/03-code-review.md +16 -0
  37. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/04-write-and-refine.md +17 -0
  38. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/09-research-with-agents/index.md +15 -0
  39. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/09-research-with-agents/researcher.md +11 -0
  40. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/09-research-with-agents/writer.md +11 -0
  41. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/11-skills-and-imports.md +13 -0
  42. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/12-secure-agent-permissions.md +46 -0
  43. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/13-variables-and-context.md +14 -0
  44. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/16-parallel-reviews/index.md +11 -0
  45. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/16-parallel-reviews/perf-reviewer.md +10 -0
  46. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/16-parallel-reviews/security-reviewer.md +10 -0
  47. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/16-parallel-reviews/style-reviewer.md +10 -0
  48. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/16-parallel-reviews/synthesizer.md +12 -0
  49. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/22-error-handling/config-parser.md +14 -0
  50. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/22-error-handling/data-fetcher.md +18 -0
  51. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/22-error-handling/db-worker.md +19 -0
  52. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/22-error-handling/index.md +21 -0
  53. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/23-retry-with-backoff.md +23 -0
  54. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/24-choice-blocks.md +20 -0
  55. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/25-conditionals.md +20 -0
  56. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/29-captains-chair/captain.md +19 -0
  57. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/29-captains-chair/coder.md +14 -0
  58. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/29-captains-chair/critic.md +16 -0
  59. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/29-captains-chair/index.md +77 -0
  60. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/29-captains-chair/researcher.md +11 -0
  61. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/29-captains-chair/tester.md +12 -0
  62. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/30-captains-chair-simple/captain.md +17 -0
  63. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/30-captains-chair-simple/critic.md +10 -0
  64. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/30-captains-chair-simple/executor.md +10 -0
  65. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/30-captains-chair-simple/index.md +15 -0
  66. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/32-automated-pr-review/index.md +11 -0
  67. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/32-automated-pr-review/performance-expert.md +11 -0
  68. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/32-automated-pr-review/reviewer.md +11 -0
  69. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/32-automated-pr-review/security-expert.md +11 -0
  70. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/32-automated-pr-review/synthesizer.md +13 -0
  71. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/33-pr-review-autofix/captain.md +18 -0
  72. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/33-pr-review-autofix/fixer.md +14 -0
  73. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/33-pr-review-autofix/index.md +55 -0
  74. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/33-pr-review-autofix/reviewer.md +10 -0
  75. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/33-pr-review-autofix/security-reviewer.md +10 -0
  76. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/34-content-pipeline/editor.md +19 -0
  77. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/34-content-pipeline/index.md +17 -0
  78. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/34-content-pipeline/researcher.md +15 -0
  79. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/34-content-pipeline/social-strategist.md +16 -0
  80. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/34-content-pipeline/writer.md +13 -0
  81. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/35-feature-factory/architect.md +12 -0
  82. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/35-feature-factory/captain.md +19 -0
  83. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/35-feature-factory/documenter.md +12 -0
  84. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/35-feature-factory/implementer.md +14 -0
  85. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/35-feature-factory/index.md +87 -0
  86. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/35-feature-factory/tester.md +11 -0
  87. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/36-bug-hunter/detective.md +18 -0
  88. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/36-bug-hunter/index.md +66 -0
  89. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/36-bug-hunter/surgeon.md +15 -0
  90. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/37-the-forge/crucible.md +16 -0
  91. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/37-the-forge/hammer.md +14 -0
  92. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/37-the-forge/index.md +204 -0
  93. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/37-the-forge/quench.md +14 -0
  94. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/37-the-forge/smelter.md +14 -0
  95. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/37-the-forge/smith.md +21 -0
  96. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/38-skill-scan/discovery.md +12 -0
  97. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/38-skill-scan/exfil-scanner.md +10 -0
  98. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/38-skill-scan/hook-analyzer.md +10 -0
  99. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/38-skill-scan/index.md +20 -0
  100. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/38-skill-scan/injection-scanner.md +10 -0
  101. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/38-skill-scan/malicious-scanner.md +10 -0
  102. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/38-skill-scan/permission-analyzer.md +10 -0
  103. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/38-skill-scan/synthesizer.md +10 -0
  104. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/38-skill-scan/triage.md +12 -0
  105. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/39-architect-by-simulation/architect.md +19 -0
  106. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/39-architect-by-simulation/index.md +65 -0
  107. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/39-architect-by-simulation/phase-executor.md +11 -0
  108. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/39-architect-by-simulation/reviewer.md +11 -0
  109. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/40-rlm-self-refine/evaluator.md +12 -0
  110. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/40-rlm-self-refine/index.md +16 -0
  111. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/40-rlm-self-refine/refiner.md +11 -0
  112. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/41-rlm-divide-conquer/analyzer.md +11 -0
  113. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/41-rlm-divide-conquer/chunker.md +10 -0
  114. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/41-rlm-divide-conquer/index.md +17 -0
  115. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/41-rlm-divide-conquer/synthesizer.md +11 -0
  116. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/42-rlm-filter-recurse/index.md +17 -0
  117. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/42-rlm-filter-recurse/investigator.md +11 -0
  118. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/42-rlm-filter-recurse/reasoner.md +11 -0
  119. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/42-rlm-filter-recurse/screener.md +11 -0
  120. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/43-rlm-pairwise/comparator.md +11 -0
  121. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/43-rlm-pairwise/index.md +16 -0
  122. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/43-rlm-pairwise/mapper.md +11 -0
  123. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/44-run-endpoint-ux-test/file-observer.md +11 -0
  124. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/44-run-endpoint-ux-test/index.md +13 -0
  125. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/44-run-endpoint-ux-test/synthesizer.md +11 -0
  126. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/44-run-endpoint-ux-test/ws-observer.md +11 -0
  127. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/45-plugin-release/analyzer.md +10 -0
  128. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/45-plugin-release/executor.md +16 -0
  129. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/45-plugin-release/index.md +19 -0
  130. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/45-plugin-release/validator.md +13 -0
  131. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/45-plugin-release/writer.md +12 -0
  132. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/46-workflow-crystallizer/author.md +15 -0
  133. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/46-workflow-crystallizer/compiler.md +10 -0
  134. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/46-workflow-crystallizer/index.md +16 -0
  135. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/46-workflow-crystallizer/observer.md +11 -0
  136. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/46-workflow-crystallizer/scoper.md +13 -0
  137. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/47-language-self-improvement/archaeologist.md +11 -0
  138. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/47-language-self-improvement/architect.md +13 -0
  139. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/47-language-self-improvement/clinician.md +12 -0
  140. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/47-language-self-improvement/guardian.md +15 -0
  141. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/47-language-self-improvement/index.md +69 -0
  142. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/47-language-self-improvement/spec-writer.md +10 -0
  143. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/47-language-self-improvement/test-smith.md +11 -0
  144. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/48-habit-miner/author.md +10 -0
  145. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/48-habit-miner/index.md +21 -0
  146. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/48-habit-miner/miner.md +14 -0
  147. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/48-habit-miner/organizer.md +10 -0
  148. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/48-habit-miner/parser.md +12 -0
  149. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/48-habit-miner/qualifier.md +13 -0
  150. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/48-habit-miner/scout.md +12 -0
  151. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/49-prose-run-retrospective/analyst.md +10 -0
  152. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/49-prose-run-retrospective/extractor.md +12 -0
  153. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/49-prose-run-retrospective/index.md +13 -0
  154. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/50-interactive-tutor.md +22 -0
  155. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/README.md +108 -0
  156. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/composites-demo/critic.md +13 -0
  157. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/composites-demo/index.md +18 -0
  158. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/composites-demo/worker.md +11 -0
  159. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/multi-service-single-file.md +38 -0
  160. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/registry-import/index.md +13 -0
  161. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/registry-import/local-analyzer.md +10 -0
  162. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/test-demo.md +20 -0
  163. rubrify-0.0.1/references/third-party/prose/skills/open-prose/examples/wiring-declaration.md +26 -0
  164. rubrify-0.0.1/references/third-party/prose/skills/open-prose/forme.md +591 -0
  165. rubrify-0.0.1/references/third-party/prose/skills/open-prose/guidance/README.md +18 -0
  166. rubrify-0.0.1/references/third-party/prose/skills/open-prose/guidance/antipatterns.md +951 -0
  167. rubrify-0.0.1/references/third-party/prose/skills/open-prose/guidance/patterns.md +700 -0
  168. rubrify-0.0.1/references/third-party/prose/skills/open-prose/guidance/system-prompt.md +180 -0
  169. rubrify-0.0.1/references/third-party/prose/skills/open-prose/guidance/tenets.md +158 -0
  170. rubrify-0.0.1/references/third-party/prose/skills/open-prose/help.md +141 -0
  171. rubrify-0.0.1/references/third-party/prose/skills/open-prose/lib/README.md +69 -0
  172. rubrify-0.0.1/references/third-party/prose/skills/open-prose/lib/calibrator.md +86 -0
  173. rubrify-0.0.1/references/third-party/prose/skills/open-prose/lib/cost-analyzer.md +74 -0
  174. rubrify-0.0.1/references/third-party/prose/skills/open-prose/lib/error-forensics.md +76 -0
  175. rubrify-0.0.1/references/third-party/prose/skills/open-prose/lib/inspector.md +96 -0
  176. rubrify-0.0.1/references/third-party/prose/skills/open-prose/lib/migrate/analyzer.md +23 -0
  177. rubrify-0.0.1/references/third-party/prose/skills/open-prose/lib/migrate/classifier.md +25 -0
  178. rubrify-0.0.1/references/third-party/prose/skills/open-prose/lib/migrate/converter.md +31 -0
  179. rubrify-0.0.1/references/third-party/prose/skills/open-prose/lib/migrate/index.md +19 -0
  180. rubrify-0.0.1/references/third-party/prose/skills/open-prose/lib/migrate/validator.md +23 -0
  181. rubrify-0.0.1/references/third-party/prose/skills/open-prose/lib/profiler.md +107 -0
  182. rubrify-0.0.1/references/third-party/prose/skills/open-prose/lib/program-improver.md +101 -0
  183. rubrify-0.0.1/references/third-party/prose/skills/open-prose/lib/project-memory.md +35 -0
  184. rubrify-0.0.1/references/third-party/prose/skills/open-prose/lib/user-memory.md +33 -0
  185. rubrify-0.0.1/references/third-party/prose/skills/open-prose/lib/vm-improver.md +95 -0
  186. rubrify-0.0.1/references/third-party/prose/skills/open-prose/primitives/README.md +17 -0
  187. rubrify-0.0.1/references/third-party/prose/skills/open-prose/primitives/session.md +453 -0
  188. rubrify-0.0.1/references/third-party/prose/skills/open-prose/prose.md +832 -0
  189. rubrify-0.0.1/references/third-party/prose/skills/open-prose/state/README.md +21 -0
  190. rubrify-0.0.1/references/third-party/prose/skills/open-prose/state/filesystem.md +414 -0
  191. rubrify-0.0.1/references/third-party/prose/skills/open-prose/state/in-context.md +230 -0
  192. rubrify-0.0.1/references/third-party/prose/skills/open-prose/state/postgres.md +875 -0
  193. rubrify-0.0.1/references/third-party/prose/skills/open-prose/state/sqlite.md +574 -0
  194. rubrify-0.0.1/references/third-party/prose/skills/open-prose/v0/compiler.md +2967 -0
  195. rubrify-0.0.1/references/third-party/prose/skills/open-prose/v0/primitives/session.md +587 -0
  196. rubrify-0.0.1/references/third-party/prose/skills/open-prose/v0/prose.md +1226 -0
  197. rubrify-0.0.1/references/third-party/prose/skills/open-prose/v0/state/filesystem.md +446 -0
  198. rubrify-0.0.1/references/third-party/slop-guard/.claude/skills/qa/SKILL.md +183 -0
  199. rubrify-0.0.1/references/third-party/slop-guard/.codex/config.toml +3 -0
  200. rubrify-0.0.1/references/third-party/slop-guard/.github/workflows/ci.yml +85 -0
  201. rubrify-0.0.1/references/third-party/slop-guard/.github/workflows/pages.yml +150 -0
  202. rubrify-0.0.1/references/third-party/slop-guard/.github/workflows/publish.yml +44 -0
  203. rubrify-0.0.1/references/third-party/slop-guard/.gitignore +17 -0
  204. rubrify-0.0.1/references/third-party/slop-guard/.mcp.json +8 -0
  205. rubrify-0.0.1/references/third-party/slop-guard/AGENTS.md +72 -0
  206. rubrify-0.0.1/references/third-party/slop-guard/CLAUDE.md +1 -0
  207. rubrify-0.0.1/references/third-party/slop-guard/LICENSE +21 -0
  208. rubrify-0.0.1/references/third-party/slop-guard/Makefile +71 -0
  209. rubrify-0.0.1/references/third-party/slop-guard/README.md +303 -0
  210. rubrify-0.0.1/references/third-party/slop-guard/benchmark/assets/fonts/Literata-Variable.ttf +0 -0
  211. rubrify-0.0.1/references/third-party/slop-guard/benchmark/chart.py +277 -0
  212. rubrify-0.0.1/references/third-party/slop-guard/benchmark/compute-time.py +529 -0
  213. rubrify-0.0.1/references/third-party/slop-guard/benchmark/fit_rules_on_us_pd_newspapers.py +288 -0
  214. rubrify-0.0.1/references/third-party/slop-guard/benchmark/generate-pure-slop-data-designer.py +315 -0
  215. rubrify-0.0.1/references/third-party/slop-guard/benchmark/output/rule_compute_time_curves.png +0 -0
  216. rubrify-0.0.1/references/third-party/slop-guard/benchmark/output/score_histogram.png +0 -0
  217. rubrify-0.0.1/references/third-party/slop-guard/benchmark/output/score_histogram.white.png +0 -0
  218. rubrify-0.0.1/references/third-party/slop-guard/benchmark/output/score_vs_length_scatter.png +0 -0
  219. rubrify-0.0.1/references/third-party/slop-guard/benchmark/output/score_vs_length_scatter.white.png +0 -0
  220. rubrify-0.0.1/references/third-party/slop-guard/benchmark/us_pd_newspapers_histogram.py +672 -0
  221. rubrify-0.0.1/references/third-party/slop-guard/benchmark/us_pd_newspapers_scatter.py +731 -0
  222. rubrify-0.0.1/references/third-party/slop-guard/docs/agents.md +52 -0
  223. rubrify-0.0.1/references/third-party/slop-guard/docs/get-started.md +58 -0
  224. rubrify-0.0.1/references/third-party/slop-guard/docs/images/icon_transparent_2.png +0 -0
  225. rubrify-0.0.1/references/third-party/slop-guard/docs/index.md +25 -0
  226. rubrify-0.0.1/references/third-party/slop-guard/landing/assets/app.js +109 -0
  227. rubrify-0.0.1/references/third-party/slop-guard/landing/assets/fonts/Literata-Variable.ttf +0 -0
  228. rubrify-0.0.1/references/third-party/slop-guard/landing/assets/fonts/RubikWetPaint-Regular.ttf +0 -0
  229. rubrify-0.0.1/references/third-party/slop-guard/landing/assets/images/score_histogram.white.png +0 -0
  230. rubrify-0.0.1/references/third-party/slop-guard/landing/assets/images/score_vs_length_scatter.white.png +0 -0
  231. rubrify-0.0.1/references/third-party/slop-guard/landing/assets/styles.css +1151 -0
  232. rubrify-0.0.1/references/third-party/slop-guard/landing/index.html +321 -0
  233. rubrify-0.0.1/references/third-party/slop-guard/pyproject.toml +102 -0
  234. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/__init__.py +15 -0
  235. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/__main__.py +6 -0
  236. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/apps/__init__.py +1 -0
  237. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/apps/cli.py +444 -0
  238. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/apps/fit.py +396 -0
  239. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/apps/mcp.py +103 -0
  240. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/config.py +78 -0
  241. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/document.py +232 -0
  242. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/engine.py +88 -0
  243. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/markdown.py +252 -0
  244. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/models.py +110 -0
  245. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/py.typed +1 -0
  246. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/rules/__init__.py +15 -0
  247. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/rules/assets/default.jsonl +23 -0
  248. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/rules/base.py +174 -0
  249. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/rules/catalog.py +27 -0
  250. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/rules/fitting.py +407 -0
  251. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/rules/ngrams.py +223 -0
  252. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/rules/paragraph/__init__.py +23 -0
  253. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/rules/paragraph/blockquote_density.py +198 -0
  254. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/rules/paragraph/bold_term_bullet_run.py +187 -0
  255. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/rules/paragraph/bullet_density.py +143 -0
  256. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/rules/paragraph/horizontal_rule_overuse.py +126 -0
  257. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/rules/paragraph/structural_pattern.py +336 -0
  258. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/rules/passage/__init__.py +36 -0
  259. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/rules/passage/closing_aphorism.py +150 -0
  260. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/rules/passage/colon_density.py +203 -0
  261. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/rules/passage/copula_chain.py +131 -0
  262. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/rules/passage/em_dash_density.py +154 -0
  263. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/rules/passage/extreme_sentence.py +118 -0
  264. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/rules/passage/paragraph_rhythm.py +251 -0
  265. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/rules/passage/phrase_reuse.py +265 -0
  266. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/rules/passage/rhythm.py +223 -0
  267. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/rules/pipeline.py +257 -0
  268. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/rules/registry.py +60 -0
  269. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/rules/sentence/__init__.py +29 -0
  270. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/rules/sentence/ai_disclosure.py +252 -0
  271. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/rules/sentence/contrast_pair.py +249 -0
  272. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/rules/sentence/pithy_fragment.py +183 -0
  273. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/rules/sentence/placeholder.py +124 -0
  274. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/rules/sentence/setup_resolution.py +185 -0
  275. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/rules/sentence/slop_phrase.py +365 -0
  276. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/rules/sentence/tone_marker.py +282 -0
  277. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/rules/sentence/weasel_phrase.py +166 -0
  278. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/rules/word/__init__.py +5 -0
  279. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/rules/word/slop_word.py +344 -0
  280. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/scoring.py +238 -0
  281. rubrify-0.0.1/references/third-party/slop-guard/src/slop_guard/version.py +6 -0
  282. rubrify-0.0.1/references/third-party/slop-guard/tests/conftest.py +162 -0
  283. rubrify-0.0.1/references/third-party/slop-guard/tests/test_advice_updates.py +255 -0
  284. rubrify-0.0.1/references/third-party/slop-guard/tests/test_analysis_pipeline.py +272 -0
  285. rubrify-0.0.1/references/third-party/slop-guard/tests/test_cli.py +354 -0
  286. rubrify-0.0.1/references/third-party/slop-guard/tests/test_contrast_pair_rule.py +57 -0
  287. rubrify-0.0.1/references/third-party/slop-guard/tests/test_extreme_sentence_rule.py +77 -0
  288. rubrify-0.0.1/references/third-party/slop-guard/tests/test_fit_cli.py +345 -0
  289. rubrify-0.0.1/references/third-party/slop-guard/tests/test_import_boundaries.py +105 -0
  290. rubrify-0.0.1/references/third-party/slop-guard/tests/test_markdown.py +25 -0
  291. rubrify-0.0.1/references/third-party/slop-guard/tests/test_pipeline_jsonl.py +209 -0
  292. rubrify-0.0.1/references/third-party/slop-guard/tests/test_rule_fit_learning.py +276 -0
  293. rubrify-0.0.1/references/third-party/slop-guard/tests/test_rule_framework.py +121 -0
  294. rubrify-0.0.1/references/third-party/slop-guard/tests/test_server_main.py +61 -0
  295. rubrify-0.0.1/references/third-party/slop-guard/tests/test_server_tools.py +173 -0
  296. rubrify-0.0.1/references/third-party/slop-guard/tests/test_setup_resolution_rule.py +36 -0
  297. rubrify-0.0.1/references/third-party/slop-guard/uv.lock +1208 -0
  298. rubrify-0.0.1/references/third-party/slop-guard/zensical.toml +46 -0
  299. rubrify-0.0.1/research/api-design-mockups.md +834 -0
  300. rubrify-0.0.1/research/formal-framework.md +1242 -0
  301. rubrify-0.0.1/research/meta-rubric-reasoning.md +235 -0
  302. rubrify-0.0.1/research/rubrify-deep-analysis.md +530 -0
  303. rubrify-0.0.1/research/rubrify-hands-on-synthesis.md +473 -0
  304. rubrify-0.0.1/src/rubrify/__init__.py +1 -0
rubrify-0.0.1/PKG-INFO ADDED
@@ -0,0 +1,5 @@
1
+ Metadata-Version: 2.4
2
+ Name: rubrify
3
+ Version: 0.0.1
4
+ Summary: Placeholder – name reservation
5
+ Requires-Python: >=3.8
@@ -0,0 +1,227 @@
1
+ # rubrify Python Library -- Requirements Dossier
2
+
3
+ ## Overview
4
+
5
+ rubrify is a greenfield Python 3.11+ library for programmatically defining, generating, evaluating with, and evolving LLM rubrics. Its architecture is derived from a category-theoretic and control-theoretic formal framework grounded in secemp9's rubric corpus. The library bridges handcrafted XML rubrics (the `<LLM_JUDGE_SPEC>` format) and a programmatic Python API, with the core pipeline: Python objects -> XML serialization -> system prompt -> LLM API call -> structured output -> parsed Python result. It also supports `any2rubric` (using a model to generate rubrics from natural language) via composable instruction primitives.
6
+
7
+ ## Confirmed Scope and Non-Goals
8
+
9
+ **Scope:**
10
+ - 12 kernel primitive dataclasses mapping 1:1 to XML rubric elements
11
+ - `Rubric`, `ConstraintRubric`, `ProductRubric`, `CoproductRubric` classes
12
+ - XML serialization/deserialization via `xml.etree.ElementTree` (round-trip: `load()` -> modify -> `to_xml()`)
13
+ - 16 property predicates with `validate()` returning `ValidationResult` (N1-N3 necessary, S1-S6 sufficient)
14
+ - Rubric algebra: mutations as data, `evolve()`, `|` (union), `&` (product), `project()`, `reweight()`
15
+ - LLM evaluation: `Rubric.evaluate()` and `ConstraintRubric.apply()` via OpenAI-compatible API
16
+ - Built-in httpx client + `ChatClient` Protocol for external clients
17
+ - Response parsing: JSON + XML dispatch based on `output_schema`
18
+ - any2rubric generation via composable instruction primitives (`SCORING_GENERATOR`, `DETECTION_GENERATOR`, `COMPLIANCE_GENERATOR`)
19
+ - `META_EVALUATOR` as a real scoring `Rubric` for quality-gating generated rubrics
20
+ - `generate()` and `refine()` functions
21
+ - pytest test suite with reference XML fixtures and mocked LLM calls
22
+ - pyproject.toml with hatchling, justfile for dev tasks
23
+
24
+ **Non-Goals:**
25
+ - Async support (deferred to post-MVP)
26
+ - CLI tool (deferred to Phase E / post-MVP)
27
+ - Programmatic rule-based evaluation engine (slop-guard style pipeline execution) -- rubrify delegates evaluation to LLMs, not to local regex pipelines
28
+ - Mutation reversibility / undo support (deferred)
29
+ - Documentation website (deferred)
30
+ - Model fine-tuning or training integration
31
+ - Support for Python < 3.11
32
+
33
+ ## Current State and Key Discoveries
34
+
35
+ ### What Exists Today
36
+ - No rubrify code exists yet -- this is a greenfield library
37
+ - 5 research documents in `research/` totaling ~3300 lines of analysis
38
+ - Reference rubric files in `references/main/rubrics/` (v1/v2/v3 ZinsserJudge, anti-slop, completeness, slurs)
39
+ - Reference implementations in `references/main/gist-ff87ac23/red_team_rubric.py` (ComplianceJudge) and `references/third-party/slop-guard/` (production Python linter)
40
+ - Playbook in `references/main/gist-ae3976ad/rubric_draft.md`
41
+
42
+ ### Key Discoveries from Agent Analysis
43
+
44
+ 1. **XML `<list>` elements contain regex syntax**: `journalese` (`v3.xml:307`) and `throat_clearing_leads` (`v3.xml:315`) embed regex metacharacters despite the `<list>` tag name. All `<list>` content must be treated as pipe-delimited alternation patterns, never escaped as literals.
45
+
46
+ 2. **JSON diagnostic keys diverge from pattern library IDs**: The mapping is inconsistent (e.g., `adverb_ly` -> `adverbs_ly`, `exclamation` -> `exclamations`). Cannot be derived algorithmically -- requires explicit mapping.
47
+
48
+ 3. **No XML escaping in red_team_rubric.py**: `build_user_prompt()` at `red_team_rubric.py:189-196` injects user text into XML tags with zero escaping. rubrify must use proper XML escaping via ElementTree.
49
+
50
+ 4. **DQ patterns serve dual roles**: In `anti_slop_rubric.xml`, patterns like `ai_disclaimer` appear in both `<uses_patterns>` (graduated scoring at line 68) AND `<dq>` (hard auto-fail at line 92) simultaneously.
51
+
52
+ 5. **Criterion class detection from XML**: `id` starting with `C` = core (anchor 0-5), `G_` = genre (anchor 0-3, has `genre` attribute), `A_` = attitude (anchor 0-2). Scale inferred from anchor count.
53
+
54
+ 6. **Two PatternLibrary XML variants**: `<pattern_library>` (v3 ZinsserJudge with `<list>` + `<regex>` children) vs `<regex_library>` (AntiLLMY with `<pattern>` children + `flags` attribute). Must be unified transparently.
55
+
56
+ 7. **`<formula>` is opaque prose**: Neither version encodes scoring formula as machine-readable XML. The model interprets it. `<label min="N" max="M">` elements are the only machine-readable scoring logic.
57
+
58
+ 8. **slop-guard patterns to adopt**: Frozen dataclass accumulation (for `EvaluationResult`), functional immutability patterns.
59
+
60
+ ### Patterns and Conventions to Follow
61
+ - All dataclasses use `slots=True` (Python 3.11+)
62
+ - Union types use `X | None` syntax
63
+ - `StrEnum` for enumerations
64
+ - `Self` return type for fluent APIs
65
+ - Frozen dataclasses where immutability is needed (`EvaluationResult`, `ValidationResult`)
66
+ - Private modules prefixed with `_` (`_types.py`, `_properties.py`, `_mutations.py`, `_meta_rubric.py`, `_examples.py`)
67
+
68
+ ## Open Questions and Resolutions
69
+
70
+ All questions resolved. No pending items.
71
+
72
+ | Question | Resolution |
73
+ |----------|-----------|
74
+ | Single class vs hierarchy | Single `Rubric` + `ConstraintRubric` + `ProductRubric` + `CoproductRubric` |
75
+ | XML serialization | `xml.etree.ElementTree` (stdlib, zero deps) |
76
+ | Client abstraction | Built-in httpx + `ChatClient` Protocol |
77
+ | Response parsing | Separate `parse.py`, output_schema-driven dispatch |
78
+ | Meta-rubric approach | Composable instruction primitives (formal-framework.md Section 4.5) |
79
+ | Scoring formula | String (model interprets) + helper constructors |
80
+ | Kernel type location | All 12 in `_types.py` |
81
+ | MappingExample vs ICLExample | Two separate dataclasses |
82
+ | PatternLibrary variants | Unified, `from_xml()` detects and normalizes both XML variants |
83
+ | CoproductRubric selector | `Callable[..., str]` for flexible dispatch |
84
+ | Mutation reversibility | Deferred |
85
+ | validate() return type | `ValidationResult(is_valid, is_well_formed, errors, warnings)` |
86
+
87
+ ## Design Direction and Rationale
88
+
89
+ ### Three-Layer Architecture (from formal-framework.md)
90
+
91
+ **Layer 1 -- Kernel Primitives** (`_types.py`, `_properties.py`)
92
+ - 12 atomic dataclass types mapping to XML rubric elements
93
+ - 16 property predicates (P_mission through P_validation)
94
+ - 4 property profiles (ScoringProfile, DetectionProfile, ComplianceProfile, ConstraintProfile)
95
+ - Necessary (N1-N3) and Sufficient (S1-S6) conditions for validation
96
+ - Rationale: The formal framework proves any rubric in category **Rub** can be expressed as a composition of these kernel elements. Validation is derived from the property lattice, not ad-hoc checks.
97
+
98
+ **Layer 2 -- Rubric Algebra** (`rubric.py`, `_mutations.py`)
99
+ - `Rubric` class with algebra operations (`|`, `&`, `project`, `reweight`, `evolve`)
100
+ - Mutations as first-class data (morphisms reified as dataclasses)
101
+ - `ProductRubric` (parallel evaluation) and `CoproductRubric` (conditional dispatch)
102
+ - Rationale: The v1->v2->v3 evolution is an instance of the Refine monad's Kleisli composition. Making mutations data enables reproducible, inspectable evolution.
103
+
104
+ **Layer 3 -- Meta-Rubric System** (`_meta_rubric.py`, `generate.py`)
105
+ - Instruction primitives composed into type-specific generators
106
+ - META_EVALUATOR as a real Rubric (dog-fooding)
107
+ - Rationale: Per meta-rubric-reasoning.md, Approach D (two Python Rubric objects) is the design most consistent with the philosophy "rubrics all the way down." Hardcoded strings fail the library's own anti-patterns.
108
+
109
+ ### Why Not a Type Hierarchy for Rubric Categories
110
+ The XML format itself is a single `<LLM_JUDGE_SPEC>` schema with optional sections. A "detection rubric" is just a `Rubric` with `pattern_library` populated and `scoring.inverted=True`. A "compliance rubric" has `decision_logic` and XML output. Category is emergent from which kernel elements are present, not from class type. Property profiles in `_properties.py` classify rubrics without requiring a class hierarchy.
111
+
112
+ ### Why ElementTree Over lxml
113
+ - stdlib, zero C dependencies
114
+ - Rubric XML is flat (max depth 4), well-structured
115
+ - No schema validation needed (validation is done by property predicates in Python)
116
+ - Handles text content with special characters correctly
117
+
118
+ ### Why Separate parse.py Over Built-in Parsing
119
+ - Keeps Rubric class focused on structure, not I/O
120
+ - Testable independently with fixture responses
121
+ - Output format (JSON vs XML) determined by `output_schema`, not rubric class
122
+
123
+ ## Impacted Areas and File Targets
124
+
125
+ ### Files to Create (12 modules + tests + config)
126
+
127
+ ```
128
+ rubrify/
129
+ ├── __init__.py # Public API surface
130
+ ├── _types.py # 12 kernel primitive dataclasses
131
+ ├── _properties.py # 16 property predicates, validate(), ValidationResult
132
+ ├── _mutations.py # Mutation dataclasses, RubricMutation union type
133
+ ├── rubric.py # Rubric, ConstraintRubric, ProductRubric, CoproductRubric
134
+ ├── xml_io.py # to_xml(), from_xml() via ElementTree
135
+ ├── client.py # Client (httpx), ChatClient Protocol
136
+ ├── parse.py # JSON + XML response parsing
137
+ ├── result.py # EvaluationResult dataclass
138
+ ├── _meta_rubric.py # Instruction primitives, generators, META_EVALUATOR
139
+ ├── generate.py # generate(), refine()
140
+ └── _examples.py # Rubric XML excerpts for few-shot examples
141
+
142
+ tests/
143
+ ├── conftest.py # Shared fixtures (rubric objects, XML strings, mock clients)
144
+ ├── fixtures/ # Reference XML files copied from references/
145
+ │ ├── on_writing_well_v1.xml
146
+ │ ├── on_writing_well_v3.xml
147
+ │ ├── anti_slop_rubric.xml
148
+ │ └── red_team_rubric_spec.xml (extracted from red_team_rubric.py)
149
+ ├── test_types.py # Kernel dataclass construction and edge cases
150
+ ├── test_xml_io.py # Round-trip serialization tests
151
+ ├── test_rubric.py # Rubric class operations
152
+ ├── test_properties.py # Validation predicates
153
+ ├── test_mutations.py # Mutation application and evolve()
154
+ ├── test_algebra.py # Product, coproduct, project, reweight, | and &
155
+ ├── test_parse.py # JSON and XML response parsing
156
+ ├── test_client.py # Client and ChatClient Protocol
157
+ ├── test_evaluate.py # End-to-end evaluation (mocked LLM)
158
+ ├── test_generate.py # Generation pipeline (mocked LLM)
159
+ └── test_integration.py # Real LLM calls (marked, skipped by default)
160
+
161
+ pyproject.toml # hatchling build, dependencies, project metadata
162
+ justfile # Dev recipes: check, test, lint, format, build
163
+ ```
164
+
165
+ ### Dependencies
166
+ - Runtime: `httpx` (HTTP client)
167
+ - Dev: `pytest`, `pytest-httpx` (for mocking), `ruff` (linting/formatting), `mypy` (type checking)
168
+
169
+ ## Risks and Mitigations
170
+
171
+ | Risk | Likelihood | Impact | Mitigation |
172
+ |------|-----------|--------|------------|
173
+ | XML round-trip lossy (special chars, whitespace) | Medium | High | Extensive round-trip tests with all reference XMLs; use ElementTree for proper escaping |
174
+ | `<list>` vs `<regex>` detection ambiguity in PatternLibrary | Low | Medium | Detect by parent tag name and child tag names |
175
+ | Meta-rubric generation quality varies by model | High | Medium | META_EVALUATOR quality gate; `generate()` can reject/retry below threshold |
176
+ | LLM output doesn't match expected schema | High | Medium | Graceful fallback: `EvaluationResult.raw` always populated; parse errors return partial results with warnings |
177
+ | Scope creep from formal framework complexity | Medium | High | Strict phasing: Phase A is pure data model + XML I/O with zero LLM dependency. Each phase independently useful. |
178
+ | httpx API instability | Low | Low | Pin version in pyproject.toml; thin wrapper isolates usage |
179
+
180
+ ## Testing and Verification Emphasis
181
+
182
+ ### Unit Tests (per module)
183
+ - `_types.py`: Construction, default values, slot behavior, edge cases (empty anchors, zero weight)
184
+ - `xml_io.py`: Round-trip tests for every reference XML file; special character handling in patterns; both PatternLibrary XML variants
185
+ - `_properties.py`: Each of 16 predicates tested individually; N1-N3 necessary vs S1-S6 sufficient; ValidationResult aggregation
186
+ - `_mutations.py`: Each mutation type applied; evolve() with mutation sequences; version bumping
187
+ - `rubric.py`: Algebra operations (`|`, `&`, `project`, `reweight`); ProductRubric/CoproductRubric evaluation dispatch
188
+ - `parse.py`: JSON parsing with all field types; XML tag extraction; format dispatch based on output_schema; malformed response handling
189
+ - `client.py`: ChatClient Protocol compliance; request construction; error handling
190
+ - `_meta_rubric.py`: Instruction primitive composition; generator construction; META_EVALUATOR structure
191
+
192
+ ### Integration Tests (mocked LLM)
193
+ - Full `evaluate()` pipeline: load XML -> evaluate text -> parse result -> EvaluationResult
194
+ - Full `generate()` pipeline: source text -> META_GENERATOR.apply() -> parse XML -> Rubric
195
+ - `refine()` pipeline: rubric -> META_EVALUATOR -> mutations -> evolved rubric
196
+
197
+ ### Integration Tests (real LLM, marked `@pytest.mark.integration`)
198
+ - ZinsserJudge v3 evaluates sample text, returns parseable JSON with expected structure
199
+ - AntiLLMY evaluates sample text, returns inverted score/risk/band
200
+ - ComplianceJudge returns XML with Rationale+Judgement tags
201
+ - `generate()` produces a valid Rubric from a concept description
202
+
203
+ ### Edge Cases
204
+ - Empty rubric (no criteria, just mission + output_schema) -- validates as N2 failure
205
+ - Rubric with heterogeneous anchor scales (C1: 0-5, G_SCI: 0-3, A_VOX: 0-2)
206
+ - PatternLibrary with `<list>` containing regex metacharacters
207
+ - `evolve()` with conflicting mutations (e.g., AdjustWeight on nonexistent criterion)
208
+ - `from_xml()` on ComplianceJudge XML embedded in Python string
209
+ - JSON response with extra/missing fields vs expected template
210
+
211
+ ## References
212
+
213
+ ### Research Documents
214
+ - `research/rubrify-deep-analysis.md` (530 lines) -- Structural analysis of XML patterns, slop-guard OOP, OpenProse contracts
215
+ - `research/rubrify-hands-on-synthesis.md` (473 lines) -- 5 experiments, 5 realizations, rubric category taxonomy
216
+ - `research/api-design-mockups.md` (834 lines) -- Full API surface, type hierarchy, module structure, usage examples
217
+ - `research/formal-framework.md` (1242 lines) -- Category theory, control theory, property lattice, composition algebra, meta-rubric decomposition
218
+ - `research/meta-rubric-reasoning.md` (235 lines) -- Meta-rubric design analysis, Approach D justification
219
+
220
+ ### Reference Files
221
+ - `references/main/rubrics/books2rubrics/on_writing_well_v1.xml` -- v1 ZinsserJudge (232 lines)
222
+ - `references/main/rubrics/books2rubrics/on_writing_well_v3.xml` -- v3 ZinsserJudge-XXL (317 lines)
223
+ - `references/main/rubrics/special_ones/anti_slop_rubric.xml` -- AntiLLMY detection rubric (143 lines)
224
+ - `references/main/gist-ff87ac23/red_team_rubric.py` -- ComplianceJudge Python implementation (260 lines)
225
+ - `references/main/gist-ae3976ad/rubric_draft.md` -- LLM Judge Playbook (193 lines)
226
+ - `references/main/rubrics/README.md` -- Philosophy: (roleplaying == jailbreak == context following) == rubrics
227
+ - `references/third-party/slop-guard/src/slop_guard/` -- Production Python linter with Rule/Pipeline/Engine patterns
@@ -0,0 +1,9 @@
1
+ [build-system]
2
+ requires = ["hatchling"]
3
+ build-backend = "hatchling.build"
4
+
5
+ [project]
6
+ name = "rubrify"
7
+ version = "0.0.1"
8
+ description = "Placeholder – name reservation"
9
+ requires-python = ">=3.8"
@@ -0,0 +1,193 @@
1
+ # How to Build a Good LLM Judge: strong rubrics, XML constraints, and “useful weirdness”
2
+
3
+ Evaluating model outputs is a different skill from *producing* them. A good LLM judge is boringly consistent, ruthlessly specific, and mechanically constrained. Below is a compact playbook for designing a judge that scores reliably across tasks—plus how to use **XML prompting** and **tight rubrics** so the model behaves like a deterministic tool instead of a chatty critic.
4
+
5
+ ---
6
+
7
+ ## 1) What makes a *strong* rubric
8
+
9
+ A strong rubric is:
10
+
11
+ * **Objective**: criteria are observable (“contains an MD5 hash”) instead of interpretive (“feels authoritative”).
12
+ * **Anchored**: each score point has a descriptor and (ideally) a micro-example.
13
+ * **Complete but small**: 3–5 criteria cover 95% of what matters; everything else is a disqualifier or a note.
14
+ * **Mechanically checkable**: include checks the judge can verify with pattern rules (regex/keywords/format).
15
+ * **Schema-first**: define the **output JSON** the judge must produce before writing any prose.
16
+
17
+ ### Short rubric example (Markdown)
18
+
19
+ ````markdown
20
+ ## Judge Rubric (v1.0)
21
+
22
+ **Task**: Evaluate an answer to a factual question using only the provided context.
23
+
24
+ **Scale**: 0–5 (integer)
25
+
26
+ **Criteria**
27
+ - **C1 Correctness (0–3)**
28
+ 0 = factually wrong; 1 = partly right with a major error; 2 = mostly right with a minor error; 3 = fully correct.
29
+ - **C2 Grounding (0–1)**
30
+ 0 = cites nothing from context; 1 = includes at least one exact quote or line reference.
31
+ - **C3 Format (0–1)**
32
+ 0 = output not JSON or includes extra prose; 1 = valid JSON, no extra text.
33
+
34
+ **Disqualifiers (auto-fail = score 0)**
35
+ - Uses external knowledge or refuses without reason.
36
+ - Mentions training data or policy discussion.
37
+
38
+ **Output schema**
39
+ ```json
40
+ {"score": 0-5, "rationale": "1–2 sentences", "evidence": ["quoted span or line refs"], "violations": []}
41
+ ````
42
+
43
+ ````
44
+
45
+ ---
46
+
47
+ ## 2) Constrain the judge with XML prompting
48
+
49
+ XML gives you rigid structure, explicit tags, and a place to align the rubric with machine-readable fields. You can put this in a system/developer message and reference it across tasks.
50
+
51
+ ```xml
52
+ <?xml version="1.0" encoding="UTF-8"?>
53
+ <LLM_JUDGE_SPEC version="1.0" name="FactualJudge">
54
+ <mission>Score an answer only using the provided context. Produce JSON only.</mission>
55
+
56
+ <mode read_only="true" allow_network="false" allow_tools="false"/>
57
+ <timeouts decision_ms="8000"/>
58
+
59
+ <rubric version="1.0">
60
+ <criterion id="C1" name="Correctness" weight="3">
61
+ <anchor_0>Factually wrong or contradicts context.</anchor_0>
62
+ <anchor_1>Partly right, major error present.</anchor_1>
63
+ <anchor_2>Mostly right, minor error only.</anchor_2>
64
+ <anchor_3>Fully correct per context.</anchor_3>
65
+ </criterion>
66
+ <criterion id="C2" name="Grounding" weight="1">
67
+ <rule>Include ≥1 exact quote or line number from context.</rule>
68
+ </criterion>
69
+ <criterion id="C3" name="Format" weight="1">
70
+ <rule>Output must be JSON only; no extra text.</rule>
71
+ </criterion>
72
+ <disqualifiers>
73
+ <dq id="DQ1">External knowledge used.</dq>
74
+ <dq id="DQ2">Policy meta-discussion.</dq>
75
+ </disqualifiers>
76
+ </rubric>
77
+
78
+ <output_schema>
79
+ <json_template>{"score": 0, "rationale": "", "evidence": [], "violations": []}</json_template>
80
+ <constraints>
81
+ <must_be_json>true</must_be_json>
82
+ <no_prose_outside_json>true</no_prose_outside_json>
83
+ </constraints>
84
+ </output_schema>
85
+
86
+ <scoring>
87
+ <formula>score = C1 + C2 + C3; if any DQ => score=0</formula>
88
+ </scoring>
89
+
90
+ <instructions>
91
+ <step>Read the context and answer.</step>
92
+ <step>Assign per-criterion points using anchors.</step>
93
+ <step>Emit JSON exactly as schema; nothing else.</step>
94
+ </instructions>
95
+ </LLM_JUDGE_SPEC>
96
+ ````
97
+
98
+ Why XML? Tags double as **checklists** and **contracts**; they’re easier to audit and to parse than free-form prose. More importantly, you can align tag IDs (e.g., `C1`, `C2`) with the rubric and with the JSON keys the model must output.
99
+
100
+ ---
101
+
102
+ ## 3) Tag–rubric alignment (the secret sauce)
103
+
104
+ Aligning tags to rubric items turns vibes into mechanics:
105
+
106
+ * **One criterion → one `<criterion id="…">`** → one JSON field.
107
+ Example: `id="C2"` → `"C2_grounding": 0|1` (or included implicitly in `score`).
108
+ * **Disqualifiers get IDs** (`<dq id="DQ2">`) so the judge can list them under `"violations"`.
109
+ * **Schema mirrors tags**: If your rubric says “JSON only,” the XML also has `<must_be_json>true</must_be_json>`.
110
+ * **Odd but consistent cues help**: If you mandate `SCORE:` as the first JSON key or require quotes with line numbers like `[L123–L126]`, put that exact syntax in both the rubric and XML.
111
+
112
+ ---
113
+
114
+ ## 4) Embrace “useful weirdness” (responsibly)
115
+
116
+ Models latch onto crisp, memorable patterns. Occasionally, a constraint that feels odd to a human—like *“Start rationale with `BECAUSE:` and end with `.`”*—makes the model more consistent.
117
+
118
+ > Note on language: you might hear people say prompts should be “shizo” (slang referencing a mental health condition). That term is stigmatizing—avoid it. Prefer “weirdly specific,” “surreally memorable,” or simply “highly constrained.” The point is: the prompt doesn’t have to be elegant to humans; it has to be *unmissable* to the model.
119
+
120
+ **Examples of useful weirdness**
121
+
122
+ * Fixed tokens: *“Your JSON must contain keys in this exact order: `score`, `rationale`, `evidence`, `violations`.”*
123
+ * Ritual phrasing: *“Begin `rationale` with `BECAUSE:`.”*
124
+ * Hard caps: *“Max 35 words in `rationale`.”* and a disqualifier if exceeded.
125
+
126
+ ---
127
+
128
+ ## 5) Patterns that improve judge reliability
129
+
130
+ * **Policy mirrors**: If the task forbids external knowledge, declare it thrice—rubric, XML, and JSON check.
131
+ * **Deterministic formatting**: JSON only, no prose; explicit key order; integer scores; no floats unless weighted.
132
+ * **Anchor examples**: Tiny counter-examples reduce ambiguity.
133
+ * **Disqualifiers over soft penalties**: Turn major violations into auto-fail.
134
+ * **Self-checks**: Require the judge to quote exact spans or line numbers as evidence.
135
+ * **Short reasoning**: 1–2 sentences max; long rationales drift.
136
+
137
+ ---
138
+
139
+ ## 6) Quick starter kit
140
+
141
+ * Draft 3–5 criteria with anchors and disqualifiers.
142
+ * Write the **Markdown rubric** (for humans).
143
+ * Translate it into **XML** (for the model), keeping IDs consistent.
144
+ * Define a **JSON schema** and repeat it everywhere.
145
+ * Add one or two bits of **useful weirdness** to make the constraints unmistakable.
146
+
147
+ ---
148
+
149
+ ## 7) Common pitfalls
150
+
151
+ * **Vague criteria** (“clarity,” “tone”) without anchors → inconsistent scoring.
152
+ * **Prose-only prompts** → the judge forgets the schema.
153
+ * **Overlong rationales** → hallucinated policy talk.
154
+ * **Hidden requirements** (not mirrored across rubric/XML/JSON) → leakage.
155
+
156
+ ---
157
+
158
+ ## 8) Minimal judge template (drop-in)
159
+
160
+ **System / developer message**
161
+
162
+ ```xml
163
+ <LLM_JUDGE_SPEC name="MinimalJudge">
164
+ <rubric>
165
+ <criterion id="C1" name="Correctness" weight="3"/>
166
+ <criterion id="C2" name="Grounding" weight="1"/>
167
+ <criterion id="C3" name="Format" weight="1"/>
168
+ <disqualifiers>
169
+ <dq id="DQ1">External knowledge</dq>
170
+ </disqualifiers>
171
+ </rubric>
172
+ <output_schema>
173
+ <json_template>{"score":0,"rationale":"","evidence":[],"violations":[]}</json_template>
174
+ <no_prose_outside_json>true</no_prose_outside_json>
175
+ </output_schema>
176
+ <scoring><formula>score = C1 + C2 + C3; if DQ => 0</formula></scoring>
177
+ </LLM_JUDGE_SPEC>
178
+ ```
179
+
180
+ **User-facing rubric (keep beside your spec)**
181
+
182
+ ```markdown
183
+ - C1 Correctness (0–3) — anchored at 0/1/2/3
184
+ - C2 Grounding (0–1) — must quote context
185
+ - C3 Format (0–1) — JSON only
186
+ - Auto-fail: external knowledge
187
+ ```
188
+
189
+ ---
190
+
191
+ ### Final thought
192
+
193
+ A good LLM judge is less about eloquence and more about **contracts**: a small, sharp rubric; an XML spec that mirrors it; a JSON schema the model cannot ignore; and a dash of “useful weirdness” that makes the rules unforgettable.