@mclawnet/agent 0.5.9 → 0.6.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/cli.js +168 -61
- package/dist/__tests__/cli.test.d.ts +2 -0
- package/dist/__tests__/cli.test.d.ts.map +1 -0
- package/dist/__tests__/service-config.test.d.ts +2 -0
- package/dist/__tests__/service-config.test.d.ts.map +1 -0
- package/dist/__tests__/service-linux.test.d.ts +2 -0
- package/dist/__tests__/service-linux.test.d.ts.map +1 -0
- package/dist/__tests__/service-macos.test.d.ts +2 -0
- package/dist/__tests__/service-macos.test.d.ts.map +1 -0
- package/dist/__tests__/service-windows.test.d.ts +2 -0
- package/dist/__tests__/service-windows.test.d.ts.map +1 -0
- package/dist/backend-adapter.d.ts +2 -0
- package/dist/backend-adapter.d.ts.map +1 -1
- package/dist/chunk-CBZIH6FY.js +93 -0
- package/dist/chunk-CBZIH6FY.js.map +1 -0
- package/dist/{chunk-KHPEQTWF.js → chunk-GLO5OZAY.js} +203 -213
- package/dist/chunk-GLO5OZAY.js.map +1 -0
- package/dist/chunk-RO47ET27.js +88 -0
- package/dist/chunk-RO47ET27.js.map +1 -0
- package/dist/hub-connection.d.ts.map +1 -1
- package/dist/index.js +5 -3
- package/dist/index.js.map +1 -1
- package/dist/linux-6AR7SXHW.js +176 -0
- package/dist/linux-6AR7SXHW.js.map +1 -0
- package/dist/macos-XVPWIH4C.js +174 -0
- package/dist/macos-XVPWIH4C.js.map +1 -0
- package/dist/service/config.d.ts +19 -0
- package/dist/service/config.d.ts.map +1 -0
- package/dist/service/index.d.ts +6 -0
- package/dist/service/index.d.ts.map +1 -0
- package/dist/service/index.js +47 -0
- package/dist/service/index.js.map +1 -0
- package/dist/service/linux.d.ts +18 -0
- package/dist/service/linux.d.ts.map +1 -0
- package/dist/service/macos.d.ts +18 -0
- package/dist/service/macos.d.ts.map +1 -0
- package/dist/service/types.d.ts +19 -0
- package/dist/service/types.d.ts.map +1 -0
- package/dist/service/windows.d.ts +18 -0
- package/dist/service/windows.d.ts.map +1 -0
- package/dist/session-manager.d.ts +4 -7
- package/dist/session-manager.d.ts.map +1 -1
- package/dist/skill-loader.d.ts +8 -0
- package/dist/skill-loader.d.ts.map +1 -0
- package/dist/start.d.ts.map +1 -1
- package/dist/start.js +2 -1
- package/dist/windows-NLONSCDA.js +165 -0
- package/dist/windows-NLONSCDA.js.map +1 -0
- package/package.json +7 -5
- package/skills/academic-search/SKILL.md +147 -0
- package/skills/architecture/SKILL.md +294 -0
- package/skills/changelog-generator/SKILL.md +112 -0
- package/skills/chart-visualization/SKILL.md +183 -0
- package/skills/code-review/SKILL.md +304 -0
- package/skills/codebase-health/SKILL.md +281 -0
- package/skills/consulting-analysis/SKILL.md +584 -0
- package/skills/content-research-writer/SKILL.md +546 -0
- package/skills/data-analysis/SKILL.md +194 -0
- package/skills/deep-research/SKILL.md +198 -0
- package/skills/docx/SKILL.md +211 -0
- package/skills/github-deep-research/SKILL.md +207 -0
- package/skills/image-generation/SKILL.md +209 -0
- package/skills/lead-research-assistant/SKILL.md +207 -0
- package/skills/mcp-builder/SKILL.md +304 -0
- package/skills/meeting-insights-analyzer/SKILL.md +335 -0
- package/skills/pair-programming/SKILL.md +196 -0
- package/skills/pdf/SKILL.md +309 -0
- package/skills/performance-analysis/SKILL.md +261 -0
- package/skills/podcast-generation/SKILL.md +224 -0
- package/skills/pptx/SKILL.md +497 -0
- package/skills/project-learnings/SKILL.md +280 -0
- package/skills/security-audit/SKILL.md +211 -0
- package/skills/skill-creator/SKILL.md +200 -0
- package/skills/technical-writing/SKILL.md +286 -0
- package/skills/testing/SKILL.md +363 -0
- package/skills/video-generation/SKILL.md +247 -0
- package/skills/web-design-guidelines/SKILL.md +203 -0
- package/skills/webapp-testing/SKILL.md +162 -0
- package/skills/workflow-automation/SKILL.md +299 -0
- package/skills/xlsx/SKILL.md +305 -0
- package/dist/chunk-KHPEQTWF.js.map +0 -1
|
@@ -0,0 +1,194 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: data-analysis
|
|
3
|
+
description: Guide systematic data analysis workflows using Python (pandas, DuckDB) or SQL. Use when analyzing datasets, generating statistics, creating summaries, or exploring structured data from CSV/Excel/database sources.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Data Analysis
|
|
7
|
+
|
|
8
|
+
A systematic framework for analyzing structured data — from initial inspection through statistical analysis to visualization and reporting.
|
|
9
|
+
|
|
10
|
+
## Overview
|
|
11
|
+
|
|
12
|
+
Data analysis follows a predictable arc:
|
|
13
|
+
|
|
14
|
+
1. **Understand** — What question are we answering? What decision does this inform?
|
|
15
|
+
2. **Inspect** — What does the data look like? Types, ranges, quality issues?
|
|
16
|
+
3. **Transform** — Clean, reshape, and enrich the data for analysis.
|
|
17
|
+
4. **Analyze** — Apply aggregation or statistical techniques to extract insights.
|
|
18
|
+
5. **Visualize** — Create charts that communicate findings clearly.
|
|
19
|
+
6. **Report** — Summarize findings with context, caveats, and recommendations.
|
|
20
|
+
|
|
21
|
+
Never skip the inspection step. Jumping straight to analysis on data you do not understand produces confidently wrong results.
|
|
22
|
+
|
|
23
|
+
## When to Use
|
|
24
|
+
|
|
25
|
+
- Exploring a dataset's structure and contents
|
|
26
|
+
- Summary statistics, distributions, or trend analysis
|
|
27
|
+
- Answering business questions from CSV, Excel, or database data
|
|
28
|
+
- Aggregation reports with grouping, filtering, and ranking
|
|
29
|
+
- Comparing cohorts or time periods; detecting anomalies
|
|
30
|
+
|
|
31
|
+
## When NOT to Use
|
|
32
|
+
|
|
33
|
+
- **ML model training** — This covers descriptive analysis, not predictive modeling.
|
|
34
|
+
- **Real-time streaming** — Use stream processing tools.
|
|
35
|
+
- **ETL pipeline design** — This is ad-hoc analysis, not production pipelines.
|
|
36
|
+
|
|
37
|
+
## Step 1: Understand Requirements
|
|
38
|
+
|
|
39
|
+
Before touching data, clarify the question.
|
|
40
|
+
|
|
41
|
+
- What specific question needs an answer? Restate it precisely.
|
|
42
|
+
- Who is the audience? Technical team, executives, external stakeholders?
|
|
43
|
+
- What format is expected? Table, chart, single number, written summary?
|
|
44
|
+
- What time range, filters, or segments apply?
|
|
45
|
+
|
|
46
|
+
A vague question like "analyze our sales data" must be narrowed: "Top 10 products by revenue in Q1 2025, broken down by region?"
|
|
47
|
+
|
|
48
|
+
## Step 2: Data Inspection
|
|
49
|
+
|
|
50
|
+
**Pandas:**
|
|
51
|
+
```python
|
|
52
|
+
df = pd.read_csv("data.csv")
|
|
53
|
+
df.shape; df.dtypes; df.head(10); df.describe()
|
|
54
|
+
df.isnull().sum(); df.nunique(); df.duplicated().sum()
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
**DuckDB:**
|
|
58
|
+
```sql
|
|
59
|
+
DESCRIBE TABLE 'data.csv';
|
|
60
|
+
SELECT COUNT(*) FROM 'data.csv';
|
|
61
|
+
SUMMARIZE SELECT * FROM 'data.csv';
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
**What to note:** Column types match expectations? Missing values random or systematic? Cardinality sensible? Date formats consistent? Numeric ranges plausible (no negative ages, percentages over 100)?
|
|
65
|
+
|
|
66
|
+
## Step 3: Choosing Your Tool
|
|
67
|
+
|
|
68
|
+
| Scenario | Best Tool | Why |
|
|
69
|
+
|---|---|---|
|
|
70
|
+
| Quick exploration, single file | **pandas** | Fastest to write, rich API |
|
|
71
|
+
| Large file (100MB+), multi-file joins | **DuckDB** | Columnar engine, minimal memory |
|
|
72
|
+
| Existing database | **SQL** | Query where data lives |
|
|
73
|
+
| Complex reshaping (pivot, melt) | **pandas** | Most flexible transformation API |
|
|
74
|
+
| Window functions, CTEs | **DuckDB / SQL** | SQL is more expressive for these |
|
|
75
|
+
|
|
76
|
+
**Rule of thumb:** Data fits in memory and one-off exploration? Use pandas. Joins, window functions, or files larger than RAM? Use DuckDB.
|
|
77
|
+
|
|
78
|
+
```python
|
|
79
|
+
import duckdb
|
|
80
|
+
result = duckdb.sql("SELECT region, SUM(revenue) FROM 'sales.csv' GROUP BY region")
|
|
81
|
+
result.df() # convert to pandas DataFrame
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
## Step 4: Common Analysis Patterns
|
|
85
|
+
|
|
86
|
+
### Aggregation and Grouping
|
|
87
|
+
|
|
88
|
+
```python
|
|
89
|
+
# pandas
|
|
90
|
+
df.groupby("region")["revenue"].agg(["sum", "mean", "count"])
|
|
91
|
+
```
|
|
92
|
+
```sql
|
|
93
|
+
-- SQL
|
|
94
|
+
SELECT region, SUM(revenue) AS total, AVG(revenue) AS avg, COUNT(*) AS n
|
|
95
|
+
FROM sales GROUP BY region ORDER BY total DESC;
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
### Ranking and Top-N
|
|
99
|
+
|
|
100
|
+
```python
|
|
101
|
+
df.nlargest(10, "revenue")
|
|
102
|
+
```
|
|
103
|
+
```sql
|
|
104
|
+
SELECT *, RANK() OVER (PARTITION BY region ORDER BY revenue DESC) AS rnk
|
|
105
|
+
FROM sales QUALIFY rnk <= 10;
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
### Time Series Aggregation
|
|
109
|
+
|
|
110
|
+
```python
|
|
111
|
+
df["date"] = pd.to_datetime(df["date"])
|
|
112
|
+
df.set_index("date").resample("M")["revenue"].sum()
|
|
113
|
+
```
|
|
114
|
+
```sql
|
|
115
|
+
SELECT DATE_TRUNC('month', order_date) AS month, SUM(revenue) AS total
|
|
116
|
+
FROM orders GROUP BY month ORDER BY month;
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
### Joins
|
|
120
|
+
|
|
121
|
+
```python
|
|
122
|
+
merged = pd.merge(orders, customers, on="customer_id", how="left")
|
|
123
|
+
```
|
|
124
|
+
```sql
|
|
125
|
+
SELECT o.*, c.segment, c.region
|
|
126
|
+
FROM orders o LEFT JOIN customers c ON o.customer_id = c.customer_id;
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
### Window Functions (SQL)
|
|
130
|
+
|
|
131
|
+
```sql
|
|
132
|
+
-- Running total
|
|
133
|
+
SUM(revenue) OVER (ORDER BY date ROWS UNBOUNDED PRECEDING)
|
|
134
|
+
-- Month-over-month change
|
|
135
|
+
revenue - LAG(revenue) OVER (ORDER BY month) AS mom_change
|
|
136
|
+
-- Percentile rank
|
|
137
|
+
PERCENT_RANK() OVER (PARTITION BY department ORDER BY salary)
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
### Pivot / Crosstab
|
|
141
|
+
|
|
142
|
+
```python
|
|
143
|
+
pd.pivot_table(df, values="revenue", index="region", columns="quarter", aggfunc="sum", fill_value=0)
|
|
144
|
+
```
|
|
145
|
+
```sql
|
|
146
|
+
-- DuckDB
|
|
147
|
+
PIVOT (SELECT region, category, revenue FROM sales)
|
|
148
|
+
ON category USING SUM(revenue) GROUP BY region;
|
|
149
|
+
```
|
|
150
|
+
|
|
151
|
+
## Step 5: Visualization Guidance
|
|
152
|
+
|
|
153
|
+
| Message | Chart Type |
|
|
154
|
+
|---|---|
|
|
155
|
+
| Comparison across categories | Bar chart (horizontal if many labels) |
|
|
156
|
+
| Trend over time | Line chart |
|
|
157
|
+
| Part-of-whole composition | Stacked bar or pie (2-5 slices only) |
|
|
158
|
+
| Distribution | Histogram or box plot |
|
|
159
|
+
| Correlation | Scatter plot |
|
|
160
|
+
|
|
161
|
+
**Rules:**
|
|
162
|
+
- Title states the insight, not just the metric. "Revenue grew 40% in Q2" not "Revenue by Quarter."
|
|
163
|
+
- Label axes with units. "Revenue ($M)" not "Revenue."
|
|
164
|
+
- Sort bar charts by value unless order is inherent (months, stages).
|
|
165
|
+
- Avoid 3D charts, dual axes, and pie charts with more than 5 slices.
|
|
166
|
+
|
|
167
|
+
```python
|
|
168
|
+
import matplotlib.pyplot as plt
|
|
169
|
+
import seaborn as sns
|
|
170
|
+
sns.set_style("whitegrid")
|
|
171
|
+
fig, ax = plt.subplots(figsize=(10, 6))
|
|
172
|
+
ax.set_title("Clear, Descriptive Title")
|
|
173
|
+
ax.set_xlabel("X Label with Units"); ax.set_ylabel("Y Label with Units")
|
|
174
|
+
plt.tight_layout(); plt.savefig("output.png", dpi=150)
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
## Step 6: Output and Reporting
|
|
178
|
+
|
|
179
|
+
Present results in the format most useful to the audience:
|
|
180
|
+
- **Tables** — For precise values. Markdown for small results, CSV export for large.
|
|
181
|
+
- **Charts** — For trends, comparisons, distributions. Save as PNG.
|
|
182
|
+
- **Written narrative** — For executive audiences. Finding, evidence, caveats.
|
|
183
|
+
|
|
184
|
+
### Reporting Structure
|
|
185
|
+
|
|
186
|
+
Every analysis report should include: **Key Findings** (headline results), **Methodology** (data source, time range, filters), **Detailed Results** (tables/charts), **Caveats** (missing data, assumptions), and **Recommendations** (next steps).
|
|
187
|
+
|
|
188
|
+
### Quality Checklist
|
|
189
|
+
- [ ] Numbers add up — totals match, percentages sum correctly
|
|
190
|
+
- [ ] Null handling is explicit — excluded, filled, or counted separately?
|
|
191
|
+
- [ ] Date ranges are correct — no boundary date errors
|
|
192
|
+
- [ ] Units are consistent — no mixing dollars/cents, bytes/megabytes
|
|
193
|
+
- [ ] Sample size is noted — context for statistical significance
|
|
194
|
+
- [ ] Results are reproducible — steps are clear enough to replicate
|
|
@@ -0,0 +1,198 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: deep-research
|
|
3
|
+
description: Conduct systematic multi-angle web research across multiple sources and depths. Use when answering questions requiring current comprehensive information, researching topics in depth, or gathering data before content generation tasks.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Deep Research
|
|
7
|
+
|
|
8
|
+
A systematic methodology for conducting thorough web research across multiple angles, depths, and sources. Load this skill BEFORE starting any content generation task to ensure comprehensive information gathering.
|
|
9
|
+
|
|
10
|
+
## When to Use
|
|
11
|
+
|
|
12
|
+
**Always load this skill when:**
|
|
13
|
+
|
|
14
|
+
### Research Questions
|
|
15
|
+
- User asks "what is X", "explain X", "research X", "investigate X"
|
|
16
|
+
- User wants to understand a concept, technology, or topic in depth
|
|
17
|
+
- The question requires current, comprehensive information from multiple sources
|
|
18
|
+
- A single web search would be insufficient to answer properly
|
|
19
|
+
|
|
20
|
+
### Content Generation (Pre-research)
|
|
21
|
+
- Creating presentations, articles, reports, or documentation
|
|
22
|
+
- Creating frontend designs or UI mockups
|
|
23
|
+
- Producing any content that requires real-world information, examples, or current data
|
|
24
|
+
|
|
25
|
+
## When NOT to Use
|
|
26
|
+
|
|
27
|
+
- **Academic literature specifically** — use the `academic-search` skill for scholarly papers
|
|
28
|
+
- **GitHub repository analysis** — use the `github-deep-research` skill
|
|
29
|
+
- **Questions answerable from the codebase** — read the code directly
|
|
30
|
+
- **Consulting-grade reports** — use the `consulting-analysis` skill (which uses deep-research as a sub-step)
|
|
31
|
+
|
|
32
|
+
## Core Principle
|
|
33
|
+
|
|
34
|
+
**Never generate content based solely on general knowledge.** The quality of your output directly depends on the quality and quantity of research conducted beforehand. A single search query is NEVER enough.
|
|
35
|
+
|
|
36
|
+
## Research Methodology
|
|
37
|
+
|
|
38
|
+
### Phase 1: Broad Exploration
|
|
39
|
+
|
|
40
|
+
Start with broad searches to understand the landscape:
|
|
41
|
+
|
|
42
|
+
1. **Initial Survey**: Search for the main topic to understand the overall context
|
|
43
|
+
2. **Identify Dimensions**: From initial results, identify key subtopics, themes, angles, or aspects that need deeper exploration
|
|
44
|
+
3. **Map the Territory**: Note different perspectives, stakeholders, or viewpoints that exist
|
|
45
|
+
|
|
46
|
+
Example:
|
|
47
|
+
```
|
|
48
|
+
Topic: "AI in healthcare"
|
|
49
|
+
Initial searches:
|
|
50
|
+
- "AI healthcare applications 2025"
|
|
51
|
+
- "artificial intelligence medical diagnosis"
|
|
52
|
+
- "healthcare AI market trends"
|
|
53
|
+
|
|
54
|
+
Identified dimensions:
|
|
55
|
+
- Diagnostic AI (radiology, pathology)
|
|
56
|
+
- Treatment recommendation systems
|
|
57
|
+
- Administrative automation
|
|
58
|
+
- Patient monitoring
|
|
59
|
+
- Regulatory landscape
|
|
60
|
+
- Ethical considerations
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
### Phase 2: Deep Dive
|
|
64
|
+
|
|
65
|
+
For each important dimension identified, conduct targeted research:
|
|
66
|
+
|
|
67
|
+
1. **Specific Queries**: Use WebSearch with precise keywords for each subtopic
|
|
68
|
+
2. **Multiple Phrasings**: Try different keyword combinations and phrasings
|
|
69
|
+
3. **Fetch Full Content**: Use WebFetch to read important sources in full, not just snippets
|
|
70
|
+
4. **Follow References**: When sources mention other important resources, search for those too
|
|
71
|
+
|
|
72
|
+
Example:
|
|
73
|
+
```
|
|
74
|
+
Dimension: "Diagnostic AI in radiology"
|
|
75
|
+
Targeted searches:
|
|
76
|
+
- "AI radiology FDA approved systems"
|
|
77
|
+
- "chest X-ray AI detection accuracy"
|
|
78
|
+
- "radiology AI clinical trials results"
|
|
79
|
+
|
|
80
|
+
Then fetch and read:
|
|
81
|
+
- Key research papers or summaries
|
|
82
|
+
- Industry reports
|
|
83
|
+
- Real-world case studies
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
### Phase 3: Diversity & Validation
|
|
87
|
+
|
|
88
|
+
Ensure comprehensive coverage by seeking diverse information types:
|
|
89
|
+
|
|
90
|
+
| Information Type | Purpose | Example Searches |
|
|
91
|
+
|-----------------|---------|------------------|
|
|
92
|
+
| **Facts & Data** | Concrete evidence | "statistics", "data", "numbers", "market size" |
|
|
93
|
+
| **Examples & Cases** | Real-world applications | "case study", "example", "implementation" |
|
|
94
|
+
| **Expert Opinions** | Authority perspectives | "expert analysis", "interview", "commentary" |
|
|
95
|
+
| **Trends & Predictions** | Future direction | "trends 2025", "forecast", "future of" |
|
|
96
|
+
| **Comparisons** | Context and alternatives | "vs", "comparison", "alternatives" |
|
|
97
|
+
| **Challenges & Criticisms** | Balanced view | "challenges", "limitations", "criticism" |
|
|
98
|
+
|
|
99
|
+
### Phase 4: Synthesis Check
|
|
100
|
+
|
|
101
|
+
Before proceeding to content generation, verify:
|
|
102
|
+
|
|
103
|
+
- [ ] Have I searched from at least 3-5 different angles?
|
|
104
|
+
- [ ] Have I fetched and read the most important sources in full?
|
|
105
|
+
- [ ] Do I have concrete data, examples, and expert perspectives?
|
|
106
|
+
- [ ] Have I explored both positive aspects and challenges/limitations?
|
|
107
|
+
- [ ] Is my information current and from authoritative sources?
|
|
108
|
+
|
|
109
|
+
**If any answer is NO, continue researching before generating content.**
|
|
110
|
+
|
|
111
|
+
## Search Strategy Tips
|
|
112
|
+
|
|
113
|
+
### Effective Query Patterns
|
|
114
|
+
|
|
115
|
+
```
|
|
116
|
+
# Be specific with context
|
|
117
|
+
Bad: "AI trends"
|
|
118
|
+
Good: "enterprise AI adoption trends 2025"
|
|
119
|
+
|
|
120
|
+
# Include authoritative source hints
|
|
121
|
+
"[topic] research paper"
|
|
122
|
+
"[topic] McKinsey report"
|
|
123
|
+
"[topic] industry analysis"
|
|
124
|
+
|
|
125
|
+
# Search for specific content types
|
|
126
|
+
"[topic] case study"
|
|
127
|
+
"[topic] statistics"
|
|
128
|
+
"[topic] expert interview"
|
|
129
|
+
|
|
130
|
+
# Use temporal qualifiers with the actual current year
|
|
131
|
+
"[topic] 2025"
|
|
132
|
+
"[topic] latest"
|
|
133
|
+
"[topic] recent developments"
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
### Temporal Awareness
|
|
137
|
+
|
|
138
|
+
**Always use today's date when forming time-sensitive search queries.** The current date is available in your system prompt context.
|
|
139
|
+
|
|
140
|
+
Use the right level of precision depending on what the user is asking:
|
|
141
|
+
|
|
142
|
+
| User intent | Temporal precision needed | Example query |
|
|
143
|
+
|---|---|---|
|
|
144
|
+
| "today / this morning / just released" | **Month + Day** | `"tech news February 28 2025"` |
|
|
145
|
+
| "this week" | **Week range** | `"technology releases week of Feb 24 2025"` |
|
|
146
|
+
| "recently / latest / new" | **Month** | `"AI breakthroughs February 2025"` |
|
|
147
|
+
| "this year / trends" | **Year** | `"software trends 2025"` |
|
|
148
|
+
|
|
149
|
+
**Rules:**
|
|
150
|
+
- When the user asks about "today" or "just released", use **month + day + year** in your search queries to get same-day results
|
|
151
|
+
- Never drop to year-only when day-level precision is needed — `"tech news 2025"` will NOT surface today's news
|
|
152
|
+
- Try multiple phrasings: numeric form (`2025-02-28`), written form (`February 28 2025`), and relative terms (`today`, `this week`) across different queries
|
|
153
|
+
|
|
154
|
+
### When to Use WebFetch
|
|
155
|
+
|
|
156
|
+
Use WebFetch to read full page content when:
|
|
157
|
+
- A search result looks highly relevant and authoritative
|
|
158
|
+
- You need detailed information beyond the snippet
|
|
159
|
+
- The source contains data, case studies, or expert analysis
|
|
160
|
+
- You want to understand the full context of a finding
|
|
161
|
+
|
|
162
|
+
### Iterative Refinement
|
|
163
|
+
|
|
164
|
+
Research is iterative. After initial searches:
|
|
165
|
+
1. Review what you have learned
|
|
166
|
+
2. Identify gaps in your understanding
|
|
167
|
+
3. Formulate new, more targeted queries
|
|
168
|
+
4. Repeat until you have comprehensive coverage
|
|
169
|
+
|
|
170
|
+
## Quality Bar
|
|
171
|
+
|
|
172
|
+
Your research is sufficient when you can confidently answer:
|
|
173
|
+
- What are the key facts and data points?
|
|
174
|
+
- What are 2-3 concrete real-world examples?
|
|
175
|
+
- What do experts say about this topic?
|
|
176
|
+
- What are the current trends and future directions?
|
|
177
|
+
- What are the challenges or limitations?
|
|
178
|
+
- What makes this topic relevant or important now?
|
|
179
|
+
|
|
180
|
+
## Common Mistakes to Avoid
|
|
181
|
+
|
|
182
|
+
- Stopping after 1-2 searches
|
|
183
|
+
- Relying on search snippets without reading full sources
|
|
184
|
+
- Searching only one aspect of a multi-faceted topic
|
|
185
|
+
- Ignoring contradicting viewpoints or challenges
|
|
186
|
+
- Using outdated information when current data exists
|
|
187
|
+
- Starting content generation before research is complete
|
|
188
|
+
|
|
189
|
+
## Output
|
|
190
|
+
|
|
191
|
+
After completing research, you should have:
|
|
192
|
+
1. A comprehensive understanding of the topic from multiple angles
|
|
193
|
+
2. Specific facts, data points, and statistics
|
|
194
|
+
3. Real-world examples and case studies
|
|
195
|
+
4. Expert perspectives and authoritative sources
|
|
196
|
+
5. Current trends and relevant context
|
|
197
|
+
|
|
198
|
+
**Only then proceed to content generation**, using the gathered information to create high-quality, well-informed content.
|
|
@@ -0,0 +1,211 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: docx
|
|
3
|
+
description: Create, edit, and analyze Word documents (.docx) with support for tracked changes, comments, formatting preservation, and text extraction. Use when working with professional documents for creating, modifying, reviewing with redlines, or extracting content.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# DOCX creation, editing, and analysis
|
|
7
|
+
|
|
8
|
+
## Overview
|
|
9
|
+
|
|
10
|
+
A user may ask you to create, edit, or analyze the contents of a .docx file. A .docx file is essentially a ZIP archive containing XML files and other resources that you can read or edit. You have different tools and workflows available for different tasks.
|
|
11
|
+
|
|
12
|
+
## When to Use
|
|
13
|
+
|
|
14
|
+
- Creating new Word documents from scratch
|
|
15
|
+
- Editing or reviewing existing .docx files with tracked changes
|
|
16
|
+
- Extracting text, comments, or metadata from Word documents
|
|
17
|
+
- Adding redline/tracked changes to legal, business, or academic documents
|
|
18
|
+
- Converting documents between formats
|
|
19
|
+
|
|
20
|
+
## When NOT to Use
|
|
21
|
+
|
|
22
|
+
- **Spreadsheets** — use the `xlsx` skill
|
|
23
|
+
- **Presentations** — use the `pptx` skill
|
|
24
|
+
- **PDF documents** — use the `pdf` skill
|
|
25
|
+
- **Plain text or Markdown** — edit directly, no special tooling needed
|
|
26
|
+
|
|
27
|
+
## Workflow Decision Tree
|
|
28
|
+
|
|
29
|
+
### Reading/Analyzing Content
|
|
30
|
+
Use "Text extraction" or "Raw XML access" sections below
|
|
31
|
+
|
|
32
|
+
### Creating New Document
|
|
33
|
+
Use "Creating a new Word document" workflow
|
|
34
|
+
|
|
35
|
+
### Editing Existing Document
|
|
36
|
+
- **Your own document + simple changes**
|
|
37
|
+
Use "Basic OOXML editing" workflow
|
|
38
|
+
|
|
39
|
+
- **Someone else's document**
|
|
40
|
+
Use **"Redlining workflow"** (recommended default)
|
|
41
|
+
|
|
42
|
+
- **Legal, academic, business, or government docs**
|
|
43
|
+
Use **"Redlining workflow"** (required)
|
|
44
|
+
|
|
45
|
+
## Reading and analyzing content
|
|
46
|
+
|
|
47
|
+
### Text extraction
|
|
48
|
+
If you just need to read the text contents of a document, you should convert the document to markdown using pandoc. Pandoc provides excellent support for preserving document structure and can show tracked changes:
|
|
49
|
+
|
|
50
|
+
```bash
|
|
51
|
+
# Convert document to markdown with tracked changes
|
|
52
|
+
pandoc --track-changes=all path-to-file.docx -o output.md
|
|
53
|
+
# Options: --track-changes=accept/reject/all
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
### Raw XML access
|
|
57
|
+
You need raw XML access for: comments, complex formatting, document structure, embedded media, and metadata. For any of these features, you'll need to unpack a document and read its raw XML contents.
|
|
58
|
+
|
|
59
|
+
#### Unpacking a file
|
|
60
|
+
`python ooxml/scripts/unpack.py <office_file> <output_directory>`
|
|
61
|
+
|
|
62
|
+
#### Key file structures
|
|
63
|
+
* `word/document.xml` - Main document contents
|
|
64
|
+
* `word/comments.xml` - Comments referenced in document.xml
|
|
65
|
+
* `word/media/` - Embedded images and media files
|
|
66
|
+
* Tracked changes use `<w:ins>` (insertions) and `<w:del>` (deletions) tags
|
|
67
|
+
|
|
68
|
+
## Creating a new Word document
|
|
69
|
+
|
|
70
|
+
When creating a new Word document from scratch, use **docx-js**, which allows you to create Word documents using JavaScript/TypeScript.
|
|
71
|
+
|
|
72
|
+
### Workflow
|
|
73
|
+
1. **MANDATORY - READ ENTIRE FILE**: Read [`docx-js.md`](docx-js.md) (~500 lines) completely from start to finish. **NEVER set any range limits when reading this file.** Read the full file content for detailed syntax, critical formatting rules, and best practices before proceeding with document creation.
|
|
74
|
+
2. Create a JavaScript/TypeScript file using Document, Paragraph, TextRun components (You can assume all dependencies are installed, but if not, refer to the dependencies section below)
|
|
75
|
+
3. Export as .docx using Packer.toBuffer()
|
|
76
|
+
|
|
77
|
+
## Editing an existing Word document
|
|
78
|
+
|
|
79
|
+
When editing an existing Word document, use the **Document library** (a Python library for OOXML manipulation). The library automatically handles infrastructure setup and provides methods for document manipulation. For complex scenarios, you can access the underlying DOM directly through the library.
|
|
80
|
+
|
|
81
|
+
### Workflow
|
|
82
|
+
1. **MANDATORY - READ ENTIRE FILE**: Read [`ooxml.md`](ooxml.md) (~600 lines) completely from start to finish. **NEVER set any range limits when reading this file.** Read the full file content for the Document library API and XML patterns for directly editing document files.
|
|
83
|
+
2. Unpack the document: `python ooxml/scripts/unpack.py <office_file> <output_directory>`
|
|
84
|
+
3. Create and run a Python script using the Document library (see "Document Library" section in ooxml.md)
|
|
85
|
+
4. Pack the final document: `python ooxml/scripts/pack.py <input_directory> <office_file>`
|
|
86
|
+
|
|
87
|
+
The Document library provides both high-level methods for common operations and direct DOM access for complex scenarios.
|
|
88
|
+
|
|
89
|
+
## Redlining workflow for document review
|
|
90
|
+
|
|
91
|
+
This workflow allows you to plan comprehensive tracked changes using markdown before implementing them in OOXML. **CRITICAL**: For complete tracked changes, you must implement ALL changes systematically.
|
|
92
|
+
|
|
93
|
+
**Batching Strategy**: Group related changes into batches of 3-10 changes. This makes debugging manageable while maintaining efficiency. Test each batch before moving to the next.
|
|
94
|
+
|
|
95
|
+
**Principle: Minimal, Precise Edits**
|
|
96
|
+
When implementing tracked changes, only mark text that actually changes. Repeating unchanged text makes edits harder to review and appears unprofessional. Break replacements into: [unchanged text] + [deletion] + [insertion] + [unchanged text]. Preserve the original run's RSID for unchanged text by extracting the `<w:r>` element from the original and reusing it.
|
|
97
|
+
|
|
98
|
+
Example - Changing "30 days" to "60 days" in a sentence:
|
|
99
|
+
```python
|
|
100
|
+
# BAD - Replaces entire sentence
|
|
101
|
+
'<w:del><w:r><w:delText>The term is 30 days.</w:delText></w:r></w:del><w:ins><w:r><w:t>The term is 60 days.</w:t></w:r></w:ins>'
|
|
102
|
+
|
|
103
|
+
# GOOD - Only marks what changed, preserves original <w:r> for unchanged text
|
|
104
|
+
'<w:r w:rsidR="00AB12CD"><w:t>The term is </w:t></w:r><w:del><w:r><w:delText>30</w:delText></w:r></w:del><w:ins><w:r><w:t>60</w:t></w:r></w:ins><w:r w:rsidR="00AB12CD"><w:t> days.</w:t></w:r>'
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
### Tracked changes workflow
|
|
108
|
+
|
|
109
|
+
1. **Get markdown representation**: Convert document to markdown with tracked changes preserved:
|
|
110
|
+
```bash
|
|
111
|
+
pandoc --track-changes=all path-to-file.docx -o current.md
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
2. **Identify and group changes**: Review the document and identify ALL changes needed, organizing them into logical batches:
|
|
115
|
+
|
|
116
|
+
**Location methods** (for finding changes in XML):
|
|
117
|
+
- Section/heading numbers (e.g., "Section 3.2", "Article IV")
|
|
118
|
+
- Paragraph identifiers if numbered
|
|
119
|
+
- Grep patterns with unique surrounding text
|
|
120
|
+
- Document structure (e.g., "first paragraph", "signature block")
|
|
121
|
+
- **DO NOT use markdown line numbers** - they don't map to XML structure
|
|
122
|
+
|
|
123
|
+
**Batch organization** (group 3-10 related changes per batch):
|
|
124
|
+
- By section: "Batch 1: Section 2 amendments", "Batch 2: Section 5 updates"
|
|
125
|
+
- By type: "Batch 1: Date corrections", "Batch 2: Party name changes"
|
|
126
|
+
- By complexity: Start with simple text replacements, then tackle complex structural changes
|
|
127
|
+
- Sequential: "Batch 1: Pages 1-3", "Batch 2: Pages 4-6"
|
|
128
|
+
|
|
129
|
+
3. **Read documentation and unpack**:
|
|
130
|
+
- **MANDATORY - READ ENTIRE FILE**: Read [`ooxml.md`](ooxml.md) (~600 lines) completely from start to finish. **NEVER set any range limits when reading this file.** Pay special attention to the "Document Library" and "Tracked Change Patterns" sections.
|
|
131
|
+
- **Unpack the document**: `python ooxml/scripts/unpack.py <file.docx> <dir>`
|
|
132
|
+
- **Note the suggested RSID**: The unpack script will suggest an RSID to use for your tracked changes. Copy this RSID for use in step 4b.
|
|
133
|
+
|
|
134
|
+
4. **Implement changes in batches**: Group changes logically (by section, by type, or by proximity) and implement them together in a single script. This approach:
|
|
135
|
+
- Makes debugging easier (smaller batch = easier to isolate errors)
|
|
136
|
+
- Allows incremental progress
|
|
137
|
+
- Maintains efficiency (batch size of 3-10 changes works well)
|
|
138
|
+
|
|
139
|
+
**Suggested batch groupings:**
|
|
140
|
+
- By document section (e.g., "Section 3 changes", "Definitions", "Termination clause")
|
|
141
|
+
- By change type (e.g., "Date changes", "Party name updates", "Legal term replacements")
|
|
142
|
+
- By proximity (e.g., "Changes on pages 1-3", "Changes in first half of document")
|
|
143
|
+
|
|
144
|
+
For each batch of related changes:
|
|
145
|
+
|
|
146
|
+
**a. Map text to XML**: Grep for text in `word/document.xml` to verify how text is split across `<w:r>` elements.
|
|
147
|
+
|
|
148
|
+
**b. Create and run script**: Use `get_node` to find nodes, implement changes, then `doc.save()`. See **"Document Library"** section in ooxml.md for patterns.
|
|
149
|
+
|
|
150
|
+
**Note**: Always grep `word/document.xml` immediately before writing a script to get current line numbers and verify text content. Line numbers change after each script run.
|
|
151
|
+
|
|
152
|
+
5. **Pack the document**: After all batches are complete, convert the unpacked directory back to .docx:
|
|
153
|
+
```bash
|
|
154
|
+
python ooxml/scripts/pack.py unpacked reviewed-document.docx
|
|
155
|
+
```
|
|
156
|
+
|
|
157
|
+
6. **Final verification**: Do a comprehensive check of the complete document:
|
|
158
|
+
- Convert final document to markdown:
|
|
159
|
+
```bash
|
|
160
|
+
pandoc --track-changes=all reviewed-document.docx -o verification.md
|
|
161
|
+
```
|
|
162
|
+
- Verify ALL changes were applied correctly:
|
|
163
|
+
```bash
|
|
164
|
+
grep "original phrase" verification.md # Should NOT find it
|
|
165
|
+
grep "replacement phrase" verification.md # Should find it
|
|
166
|
+
```
|
|
167
|
+
- Check that no unintended changes were introduced
|
|
168
|
+
|
|
169
|
+
|
|
170
|
+
## Converting Documents to Images
|
|
171
|
+
|
|
172
|
+
To visually analyze Word documents, convert them to images using a two-step process:
|
|
173
|
+
|
|
174
|
+
1. **Convert DOCX to PDF**:
|
|
175
|
+
```bash
|
|
176
|
+
soffice --headless --convert-to pdf document.docx
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
2. **Convert PDF pages to JPEG images**:
|
|
180
|
+
```bash
|
|
181
|
+
pdftoppm -jpeg -r 150 document.pdf page
|
|
182
|
+
```
|
|
183
|
+
This creates files like `page-1.jpg`, `page-2.jpg`, etc.
|
|
184
|
+
|
|
185
|
+
Options:
|
|
186
|
+
- `-r 150`: Sets resolution to 150 DPI (adjust for quality/size balance)
|
|
187
|
+
- `-jpeg`: Output JPEG format (use `-png` for PNG if preferred)
|
|
188
|
+
- `-f N`: First page to convert (e.g., `-f 2` starts from page 2)
|
|
189
|
+
- `-l N`: Last page to convert (e.g., `-l 5` stops at page 5)
|
|
190
|
+
- `page`: Prefix for output files
|
|
191
|
+
|
|
192
|
+
Example for specific range:
|
|
193
|
+
```bash
|
|
194
|
+
pdftoppm -jpeg -r 150 -f 2 -l 5 document.pdf page # Converts only pages 2-5
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
## Code Style Guidelines
|
|
198
|
+
**IMPORTANT**: When generating code for DOCX operations:
|
|
199
|
+
- Write concise code
|
|
200
|
+
- Avoid verbose variable names and redundant operations
|
|
201
|
+
- Avoid unnecessary print statements
|
|
202
|
+
|
|
203
|
+
## Dependencies
|
|
204
|
+
|
|
205
|
+
Required dependencies (install if not available):
|
|
206
|
+
|
|
207
|
+
- **pandoc**: `sudo apt-get install pandoc` (for text extraction)
|
|
208
|
+
- **docx**: `npm install -g docx` (for creating new documents)
|
|
209
|
+
- **LibreOffice**: `sudo apt-get install libreoffice` (for PDF conversion)
|
|
210
|
+
- **Poppler**: `sudo apt-get install poppler-utils` (for pdftoppm to convert PDF to images)
|
|
211
|
+
- **defusedxml**: `pip install defusedxml` (for secure XML parsing)
|