PyPI - deepwork - Versions diffs - 0.5.1__py3-none-any.whl → 0.7.0a1__py3-none-any.whl - Mend

deepwork 0.5.1py3-none-any.whl → 0.7.0a1py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (66) hide show

deepwork/__init__.py +1 -1
deepwork/cli/hook.py +3 -4
deepwork/cli/install.py +70 -117
deepwork/cli/main.py +2 -2
deepwork/cli/serve.py +133 -0
deepwork/cli/sync.py +93 -58
deepwork/core/adapters.py +91 -102
deepwork/core/generator.py +19 -386
deepwork/core/hooks_syncer.py +1 -1
deepwork/core/parser.py +270 -1
deepwork/hooks/README.md +0 -44
deepwork/hooks/__init__.py +3 -6
deepwork/hooks/check_version.sh +54 -21
deepwork/mcp/__init__.py +23 -0
deepwork/mcp/quality_gate.py +347 -0
deepwork/mcp/schemas.py +263 -0
deepwork/mcp/server.py +253 -0
deepwork/mcp/state.py +422 -0
deepwork/mcp/tools.py +394 -0
deepwork/schemas/job.schema.json +347 -0
deepwork/schemas/job_schema.py +27 -239
deepwork/standard_jobs/deepwork_jobs/doc_specs/job_spec.md +9 -15
deepwork/standard_jobs/deepwork_jobs/job.yml +146 -46
deepwork/standard_jobs/deepwork_jobs/steps/define.md +100 -33
deepwork/standard_jobs/deepwork_jobs/steps/errata.md +154 -0
deepwork/standard_jobs/deepwork_jobs/steps/fix_jobs.md +207 -0
deepwork/standard_jobs/deepwork_jobs/steps/fix_settings.md +177 -0
deepwork/standard_jobs/deepwork_jobs/steps/implement.md +22 -138
deepwork/standard_jobs/deepwork_jobs/steps/iterate.md +221 -0
deepwork/standard_jobs/deepwork_jobs/steps/learn.md +2 -26
deepwork/standard_jobs/deepwork_jobs/steps/test.md +154 -0
deepwork/standard_jobs/deepwork_jobs/templates/job.yml.template +2 -0
deepwork/templates/claude/settings.json +16 -0
deepwork/templates/claude/skill-deepwork.md.jinja +37 -0
deepwork/templates/gemini/skill-deepwork.md.jinja +37 -0
deepwork-0.7.0a1.dist-info/METADATA +317 -0
deepwork-0.7.0a1.dist-info/RECORD +64 -0
deepwork/cli/rules.py +0 -32
deepwork/core/command_executor.py +0 -190
deepwork/core/pattern_matcher.py +0 -271
deepwork/core/rules_parser.py +0 -559
deepwork/core/rules_queue.py +0 -321
deepwork/hooks/rules_check.py +0 -759
deepwork/schemas/rules_schema.py +0 -135
deepwork/standard_jobs/deepwork_jobs/steps/review_job_spec.md +0 -208
deepwork/standard_jobs/deepwork_jobs/templates/doc_spec.md.example +0 -86
deepwork/standard_jobs/deepwork_rules/hooks/capture_prompt_work_tree.sh +0 -38
deepwork/standard_jobs/deepwork_rules/hooks/global_hooks.yml +0 -8
deepwork/standard_jobs/deepwork_rules/hooks/user_prompt_submit.sh +0 -16
deepwork/standard_jobs/deepwork_rules/job.yml +0 -49
deepwork/standard_jobs/deepwork_rules/rules/.gitkeep +0 -13
deepwork/standard_jobs/deepwork_rules/rules/api-documentation-sync.md.example +0 -10
deepwork/standard_jobs/deepwork_rules/rules/readme-documentation.md.example +0 -10
deepwork/standard_jobs/deepwork_rules/rules/security-review.md.example +0 -11
deepwork/standard_jobs/deepwork_rules/rules/skill-md-validation.md +0 -46
deepwork/standard_jobs/deepwork_rules/rules/source-test-pairing.md.example +0 -13
deepwork/standard_jobs/deepwork_rules/steps/define.md +0 -249
deepwork/templates/claude/skill-job-meta.md.jinja +0 -77
deepwork/templates/claude/skill-job-step.md.jinja +0 -235
deepwork/templates/gemini/skill-job-meta.toml.jinja +0 -76
deepwork/templates/gemini/skill-job-step.toml.jinja +0 -162
deepwork-0.5.1.dist-info/METADATA +0 -381
deepwork-0.5.1.dist-info/RECORD +0 -72
{deepwork-0.5.1.dist-info → deepwork-0.7.0a1.dist-info}/WHEEL +0 -0
{deepwork-0.5.1.dist-info → deepwork-0.7.0a1.dist-info}/entry_points.txt +0 -0
{deepwork-0.5.1.dist-info → deepwork-0.7.0a1.dist-info}/licenses/LICENSE.md +0 -0

deepwork/standard_jobs/deepwork_jobs/steps/iterate.md ADDED Viewed

@@ -0,0 +1,221 @@
+# Iterate on Workflow Design
+## Objective
+Review the test run conversation and improve the job definition based on what happened. This step closes the feedback loop by incorporating learnings from the test into the workflow itself, making future runs more efficient and producing better results.
+## Task
+Analyze the conversation history from the test step, identify areas for improvement, and update the job definition and step instructions accordingly.
+### Step 1: Review the Conversation History
+Carefully analyze the conversation from the test step, looking for:
+1. **Process Inefficiencies**
+   - Steps that took multiple attempts to complete
+   - Questions the agent had to ask that should have been in the instructions
+   - Unnecessary back-and-forth with the user
+   - Information that had to be repeated
+2. **Output Quality Issues**
+   - Issues identified during critique (from Step 3 of test)
+   - Corrections requested by the user
+   - Patterns in user feedback (what did they consistently want changed?)
+3. **Tool Usage Problems**
+   - Tools that didn't work as expected
+   - Missing tools that would have helped
+   - Inefficient tool sequences
+4. **Missing or Unclear Instructions**
+   - Ambiguities that led to wrong outputs
+   - Missing guidance that caused confusion
+   - Quality criteria that weren't clear enough
+### Step 2: Plan Improvements
+For each issue identified, determine the appropriate fix:
+| Issue Type | Solution Location |
+|------------|-------------------|
+| Process inefficiency | Update step instructions with clearer guidance |
+| Output quality | Update quality criteria or add examples |
+| Missing information | Add to step inputs or instructions |
+| Tool problems | Suggest different tools in instructions |
+| Unclear criteria | Rewrite quality criteria to be specific |
+**Prioritize improvements** that will have the most impact on future runs. Focus on:
+- Issues that caused multiple iterations
+- Problems that affected the final output quality
+- Confusion that could be eliminated with clearer instructions
+### Step 3: Update Step Instructions
+For each step that needs improvement:
+1. **Read the current instruction file** at `.deepwork/jobs/[job_name]/steps/[step_id].md`
+2. **Make targeted improvements**:
+   - Add missing context or clarification
+   - Include examples of good output (use what worked in the test)
+   - Clarify ambiguous instructions
+   - Add tool recommendations if a different approach would be better
+   - Update quality criteria to match user expectations
+3. **Keep instructions concise**:
+   - Avoid redundancy
+   - Be direct and actionable
+   - Use bullet points where appropriate
+### Step 4: Update Quality Criteria
+Review and update quality criteria in two places:
+1. **In step instruction files** - The "Quality Criteria" section should reflect what the user actually cared about during testing
+2. **In job.yml** - If steps have `quality_criteria` or `stop_hooks`, update them to:
+   - Remove criteria that weren't relevant
+   - Add criteria based on user feedback
+   - Make existing criteria more specific
+**Example improvement:**
+```yaml
+# Before
+quality_criteria:
+  - "Report is formatted correctly"
+# After
+quality_criteria:
+  - "Report uses distinct colors for each data series in charts"
+  - "Tables have sufficient padding and font size for readability"
+  - "Executive summary is understandable by non-technical readers"
+```
+### Step 5: Consider Alternative Tools
+If any tools didn't work well during the test:
+1. **Identify the problem** - What went wrong? (slow, wrong output, hard to use)
+2. **Research alternatives** - What other tools could accomplish the same goal?
+3. **Update instructions** - If a better tool exists, update the step instructions to recommend it
+Examples:
+- If web scraping was unreliable, suggest a specific browser automation approach
+- If data processing was slow, suggest a different method or tool
+- If file generation had issues, recommend a different library or format
+### Step 6: Update Job Version and Changelog
+After making improvements:
+1. **Bump the version** in `job.yml`:
+   - Patch version (x.x.1) for minor instruction tweaks
+   - Minor version (x.1.0) for quality criteria changes or significant improvements
+2. **Add a changelog entry** describing what was improved:
+   ```yaml
+   changelog:
+     - version: "1.0.1"
+       changes: "Improved [step] instructions based on test run: added examples, clarified criteria, updated tool recommendations"
+   ```
+### Step 7: Provide Recap
+Summarize the improvements made:
+1. **List the issues identified** from the test run
+2. **Describe the changes made** to address each issue
+3. **Explain the expected impact** on future runs
+Example recap format:
+```
+## Iteration Summary
+### Issues Identified
+1. Chart colors were hard to distinguish
+2. Tables were too small to read
+3. Executive summary was too technical
+### Changes Made
+**gather_data step:**
+- Added instruction to use distinct color palettes for charts
+- Updated quality criteria to require readable table formatting
+**generate_report step:**
+- Added example of good executive summary
+- Updated quality criteria: "Summary understandable by non-technical stakeholders"
+**job.yml:**
+- Updated version to 1.0.1
+- Added changelog entry
+### Expected Impact
+Future runs should produce reports with better visual design and clearer summaries, reducing the need for post-generation corrections.
+```
+## Example Dialog
+```
+Claude: Let me review our test run and improve the workflow based on what happened.
+**Analyzing Test Conversation...**
+I identified the following issues from the test run:
+### Issues Found
+1. **Chart readability (gather_data step)**: The instructions didn't specify how to handle color choices for visualizations. During the test, we had to fix colors that were too similar.
+2. **Missing month-over-month comparison (analyze_trends step)**: The user wanted comparisons to previous periods, but this wasn't in the instructions. We had to add it after the fact.
+3. **Technical executive summary (generate_report step)**: The summary was too technical for non-technical stakeholders. The quality criteria didn't specify the audience clearly.
+4. **Table formatting (generate_report step)**: Tables were hard to read. No guidance on sizing or padding.
+### Changes Made
+**`.deepwork/jobs/monthly_report/steps/gather_data.md`:**
+- Added: "Use distinct, high-contrast colors for chart data series. Avoid similar shades for different categories."
+- Added to Quality Criteria: "Charts use visually distinct colors that are accessible to colorblind readers"
+**`.deepwork/jobs/monthly_report/steps/analyze_trends.md`:**
+- Added: "Always include comparison to the same period in the previous month/quarter"
+- Added input: "previous_period_data" to formalize this requirement
+**`.deepwork/jobs/monthly_report/steps/generate_report.md`:**
+- Added example of good executive summary targeting non-technical readers
+- Added: "Format tables with sufficient padding (at least 8px) and readable font size (minimum 11pt)"
+- Updated Quality Criteria:
+  - "Executive summary is understandable by non-technical stakeholders"
+  - "Tables are readable with adequate padding and font size"
+**`.deepwork/jobs/monthly_report/job.yml`:**
+- Version bumped to 1.0.1
+- Added changelog: "Improved visual design guidance and audience-appropriate writing based on test feedback"
+Done! The workflow has been updated. Changes are immediately available through the MCP server.
+### Expected Impact
+Future monthly reports should:
+- Have better chart color choices from the start
+- Include month-over-month comparisons automatically
+- Have clearer executive summaries for leadership
+- Have more readable tables
+This should significantly reduce the number of corrections needed after the initial output.
+```
+## Important Guidelines
+1. **Be specific** - Don't just note "instructions were unclear" - explain exactly what was unclear and how you fixed it
+2. **Use test results as examples** - If something worked well in the test, add it as an example in the instructions
+3. **Don't over-engineer** - Only add improvements that address actual problems from the test
+4. **Maintain consistency** - Keep the same structure and style in instruction files
+5. **Test-driven improvement** - Every change should trace back to something that happened in the test

deepwork/standard_jobs/deepwork_jobs/steps/learn.md CHANGED Viewed

@@ -233,15 +233,6 @@ If instruction files were modified:
      changes: "Improved [step] instructions based on execution learnings: [brief description]"
    ```
-### Step 7: Sync and Relay Instructions
-1. **Run deepwork sync** (if instructions were modified)
-   ```bash
-   deepwork sync
-   ```
-2. **If skills were regenerated**, look at the "To use the new skills" section in the `deepwork sync` output and **relay these exact reload instructions to the user** (e.g., "Type 'exit' then run 'claude --resume'" for Claude Code)
 ## File Reference Patterns
 When adding entries to AGENTS.md, prefer these patterns:
@@ -260,23 +251,10 @@ When adding entries to AGENTS.md, prefer these patterns:
 - Configuration requires these fields: name, version, ...
 ```
-## Quality Criteria
-- Conversation has been analyzed for job executions
-- Points of confusion and inefficiency are identified
-- Learnings are correctly classified (generalizable vs bespoke)
-- Job instructions updated for generalizable improvements
-- Instructions are concise - no redundancy or unnecessary verbosity
-- Shared/lengthy content extracted into referenced files where appropriate
-- AGENTS.md created/updated with bespoke learnings
-- File references used instead of duplicating content
-- AGENTS.md is in the correct folder (the deepest common folder for the topic)
-- When all criteria are met, include `<promise>✓ Quality Criteria Met</promise>`
 ## Example Dialog
 ```
-User: /deepwork_jobs.learn
+User: [Invokes the learn workflow via MCP]
 Claude: I'll analyze this conversation for learnings from DeepWork job executions.
@@ -326,11 +304,9 @@ I found the following job executions:
 3. Updated job.yml version to 1.0.1 with changelog entry
-4. Ran `deepwork sync`
 **Summary**
-Updated job instructions and created AGENTS.md with bespoke learnings. To get the updated skills, type 'exit' then run 'claude --resume'.
+Updated job instructions and created AGENTS.md with bespoke learnings.
 ```
 ## Handling Edge Cases

deepwork/standard_jobs/deepwork_jobs/steps/test.md ADDED Viewed

@@ -0,0 +1,154 @@
+# Test the New Workflow
+## Objective
+Run the newly created workflow on a real use case chosen by the user, critique the output, and iterate until the user is satisfied with the results. This step validates that the workflow works as intended before finalizing it.
+## Task
+Guide the user through testing their new workflow by running it on a real example, then critically evaluating the output and refining it based on user feedback.
+### Step 1: Announce Readiness and Gather Test Case
+The workflow is now implemented and ready to test. Use the AskUserQuestion tool to:
+1. **Inform the user** that the workflow is ready for a test run
+2. **Ask what they'd like to test it on** - Get a specific, real use case
+Example question to ask:
+```
+Your new workflow is ready to try out! What would you like to use it on for the first test run?
+Please describe a specific case you want to run through the workflow - ideally something you actually need done, so we can validate the workflow produces useful results.
+```
+**Important**: Get a concrete, specific test case. Vague responses like "just test it" should be followed up with clarifying questions to understand what inputs/context the workflow needs.
+### Step 2: Prepare and Run the Workflow
+1. **Prepare clean context** - Before invoking the workflow, consider compacting the conversation history (e.g., using `/compact` in Claude Code) to ensure the workflow starts with clean context focused on the test case.
+2. **Start the new workflow** - Use `start_workflow` through the DeepWork MCP server with the job name and workflow name to begin executing the workflow.
+3. **Complete the full workflow** - Continue through all steps of the workflow until it produces its final output. Use `finished_step` to progress through each step.
+4. **Note any issues during execution** - Pay attention to:
+   - Confusion or ambiguity in instructions
+   - Missing information that had to be asked for
+   - Steps that took longer than expected
+   - Awkward tool usage or process flow
+### Step 3: Critique the Output
+After the workflow completes, perform a self-critique of the output:
+1. **Review the final deliverable** - Read through all outputs produced by the workflow
+2. **Identify up to 3 top issues** - Look for problems such as:
+   - Missing information or sections
+   - Formatting issues (layout, structure, readability)
+   - Quality problems (vague content, errors, inconsistencies)
+   - Misalignment with what the user likely wanted
+   - Technical issues (broken links, malformed data, etc.)
+3. **Present each issue to the user** with a specific question asking if they want it fixed. For example:
+   ```
+   I noticed a few things we could improve:
+   1. **Text overlap**: The PDF has some text overlapping images in a few places - shall I correct that?
+   2. **Missing summary**: The report doesn't have an executive summary at the top - would you like me to add one?
+   3. **Data formatting**: The numbers aren't consistently formatted (some have commas, some don't) - should I standardize them?
+   ```
+4. **Make corrections** for any issues the user confirms they want fixed
+### Step 4: Gather Additional Feedback
+After addressing the identified issues:
+1. **Ask if the user has more comments** on the output:
+   ```
+   I've made those corrections. Is there anything else you'd like me to change about the output?
+   ```
+2. **Iterate on feedback** - Keep making corrections based on user feedback until they indicate they're satisfied
+3. **Confirm completion** - When the user says the output is good, confirm that testing is complete
+### Loop Behavior
+The feedback loop should continue until the user explicitly indicates satisfaction. Look for signals like:
+- "Looks good"
+- "That's fine"
+- "I'm happy with it"
+- "No more changes"
+- "Perfect"
+If the user provides more feedback, address it and ask again. Don't assume satisfaction without explicit confirmation.
+## Example Dialog
+```
+Claude: Your new workflow is ready to try out!
+The `monthly_report` workflow has been implemented with these steps:
+1. gather_data - Collects metrics from various sources
+2. analyze_trends - Identifies patterns and insights
+3. generate_report - Creates the final PDF report
+What would you like to use it on for the first test run? Please describe a specific report you need - for example, which month's data and any particular focus areas.
+User: Let's do the January 2026 engineering report, focusing on deployment frequency and incident response times.
+Claude: Great, let me run the workflow on your January 2026 engineering report.
+[Starting the monthly_report workflow via MCP...]
+[... workflow runs through all steps ...]
+Claude: The January 2026 Engineering Report has been generated. Let me review it for any issues.
+I noticed a few things we could improve:
+1. **Chart readability**: The deployment frequency chart uses very similar colors for different teams - shall I change to a more distinct color palette?
+2. **Missing context**: The incident response times don't include comparison to the previous month - would you like me to add month-over-month comparison?
+3. **Executive summary**: The summary is quite technical - should I add a simpler high-level overview for non-technical stakeholders?
+User: Yes to all three, please.
+Claude: Done! I've updated the report with:
+- Distinct color palette for the deployment chart
+- Month-over-month comparison for incident response times
+- Added a non-technical executive summary
+Is there anything else you'd like me to change about the report?
+User: Can you make the tables larger? They're a bit hard to read.
+Claude: I've increased the table sizes for better readability. The font is now larger and there's more padding around cells.
+Is there anything else you'd like me to adjust?
+User: No, that looks great!
+Claude: Excellent! The test run is complete. The workflow successfully generated your January 2026 Engineering Report with all the refinements applied.
+**Test Summary:**
+- Workflow ran successfully through all 3 steps
+- Output required adjustments to chart colors, comparisons, and table sizing
+- User is satisfied with the final result
+Ready to proceed to the iterate step to improve the workflow based on what we learned.
+```
+## Important Guidelines
+1. **Use real test cases** - Testing with actual data/needs validates the workflow better than hypothetical examples
+2. **Be specific in critiques** - Don't say "formatting issues" - say exactly what's wrong and where
+3. **Limit initial critique to 3 issues** - Too many issues at once is overwhelming; address more in the feedback loop
+4. **Don't assume what user wants fixed** - Always ask before making corrections
+5. **Iterate until satisfied** - Don't rush to completion; the user's satisfaction is the goal

deepwork/standard_jobs/deepwork_jobs/templates/job.yml.template CHANGED Viewed

@@ -34,6 +34,8 @@ steps:
     outputs:
       - [output_filename_or_path]  # e.g., "report.md" or "reports/analysis.md"
     dependencies: []  # List of step IDs that must complete first
+    # Optional: Delegate to a specific agent type (uses context: fork)
+    # agent: general-purpose  # or other agent type
     # Optional: Quality validation hooks
     hooks:
       after_agent:

deepwork/templates/claude/settings.json ADDED Viewed

@@ -0,0 +1,16 @@
+{
+  "permissions": {
+    "allow": [
+      "Read(./.deepwork/**)",
+      "Edit(./.deepwork/**)",
+      "Write(./.deepwork/**)",
+      "Bash(deepwork:*)",
+      "Bash(./.deepwork/jobs/deepwork_jobs/make_new_job.sh:*)",
+      "WebSearch",
+      "mcp__deepwork__get_workflows",
+      "mcp__deepwork__start_workflow",
+      "mcp__deepwork__finished_step",
+      "mcp__deepwork__abort_workflow"
+    ]
+  }
+}

deepwork/templates/claude/skill-deepwork.md.jinja ADDED Viewed

@@ -0,0 +1,37 @@
+{#
+Template: skill-deepwork.md.jinja
+Purpose: Generates the main /deepwork skill that instructs agents to use MCP tools
+This template is used to create the entry-point skill for DeepWork.
+Instead of containing step instructions, it directs agents to use the
+DeepWork MCP server tools.
+#}
+---
+name: deepwork
+description: "Start or continue DeepWork workflows using MCP tools"
+---
+# DeepWork Workflow Manager
+Execute multi-step workflows with quality gate checkpoints.
+> **IMPORTANT**: Use the DeepWork MCP server tools. All workflow operations
+> are performed through MCP tool calls and following the instructions they return,
+> not by reading instructions from files.
+## How to Use
+1. Call `get_workflows` to discover available workflows
+2. Call `start_workflow` with goal, job_name, and workflow_name
+3. Follow the step instructions returned
+4. Call `finished_step` with your outputs when done
+5. Handle the response: `needs_work`, `next_step`, or `workflow_complete`
+## Intent Parsing
+When the user invokes `/deepwork`, parse their intent:
+1. **ALWAYS**: Call `get_workflows` to discover available workflows
+2. Based on the available flows and what the user said in their request, proceed:
+    - **Explicit workflow**: `/deepwork <a workflow name>` → start the `<a workflow name>` workflow
+    - **General request**: `/deepwork <a request>` → infer best match from available workflows
+    - **No context**: `/deepwork` alone → ask user to choose from available workflows

deepwork/templates/gemini/skill-deepwork.md.jinja ADDED Viewed

@@ -0,0 +1,37 @@
+{#
+Template: skill-deepwork.md.jinja
+Purpose: Generates the main /deepwork skill that instructs agents to use MCP tools
+This template is used to create the entry-point skill for DeepWork.
+Instead of containing step instructions, it directs agents to use the
+DeepWork MCP server tools.
+#}
++++
+name = "deepwork"
+description = "Start or continue DeepWork workflows using MCP tools"
++++
+# DeepWork Workflow Manager
+Execute multi-step workflows with quality gate checkpoints.
+> **IMPORTANT**: Use the DeepWork MCP server tools. All workflow operations
+> are performed through MCP tool calls and following the instructions they return,
+> not by reading instructions from files.
+## How to Use
+1. Call `get_workflows` to discover available workflows
+2. Call `start_workflow` with goal, job_name, and workflow_name
+3. Follow the step instructions returned
+4. Call `finished_step` with your outputs when done
+5. Handle the response: `needs_work`, `next_step`, or `workflow_complete`
+## Intent Parsing
+When the user invokes `/deepwork`, parse their intent:
+1. **ALWAYS**: Call `get_workflows` to discover available workflows
+2. Based on the available flows and what the user said in their request, proceed:
+    - **Explicit workflow**: `/deepwork <a workflow name>` → start the `<a workflow name>` workflow
+    - **General request**: `/deepwork <a request>` → infer best match from available workflows
+    - **No context**: `/deepwork` alone → ask user to choose from available workflows

deepwork 0.5.1__py3-none-any.whl → 0.7.0a1__py3-none-any.whl

deepwork 0.5.1py3-none-any.whl → 0.7.0a1py3-none-any.whl