sc-research 1.0.13 → 1.0.14

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,36 +1,19 @@
1
1
  ---
2
2
  name: social_media_sentiment
3
- description: Analyze existing Reddit/X raw data and generate `classified_sentiment.json` with strict `SentimentData` output. Use for sentiment, mood, or positive-vs-negative analysis requests.
3
+ description: Analysis-only worker. Reads existing raw Reddit/X data and generates `classified_sentiment.json` with strict `SentimentData` output. Use for sentiment, mood, or positive-vs-negative analysis requests.
4
4
  ---
5
5
 
6
6
  # Social Media Sentiment Skill
7
7
 
8
- This worker measures community tone and produces evidence-backed sentiment output for the dashboard.
8
+ This worker measures community tone and produces evidence-backed sentiment output for the dashboard. It performs **analysis only** — fetching is handled by the orchestrator via `social_media_fetch`.
9
9
 
10
- ## Required Inputs
10
+ ## Prerequisites
11
11
 
12
- Use existing raw files only:
12
+ The following files must already exist (produced by `social_media_fetch`):
13
13
 
14
- - `reddit_data.json`
15
- - `x_data.json`
14
+ - `reddit_data.json` and/or `x_data.json`
16
15
 
17
- At least one valid source file must exist.
18
-
19
- ## Command Execution Flow
20
-
21
- Use this sequence for sentiment analysis:
22
-
23
- 1. Fetch or refresh raw data (outside this worker):
24
-
25
- - `sc-research research:deep "TOPIC"`
26
- - Optional filters: `--source=reddit|x|both --from=YYYY-MM-DD --to=YYYY-MM-DD`
27
-
28
- 2. Run this `social_media_sentiment` worker to generate `classified_sentiment.json`.
29
- 3. Optional visualization:
30
-
31
- - `sc-research visualize`
32
-
33
- If raw files are missing, stale, or mismatched for the requested topic/date range, run step 1 first.
16
+ At least one valid source file must be present. If both are missing, **stop and report failure** — do not attempt to fetch data.
34
17
 
35
18
  ## Step 1: Preflight Validation
36
19
 
@@ -93,6 +76,59 @@ Save result to:
93
76
 
94
77
  - `classified_sentiment.json`
95
78
 
79
+ ## Output Type Contract
80
+
81
+ Your output MUST match this exact shape. The dashboard detects sentiment data by checking for `distribution` (object) + `by_source` (object). Missing either field = broken tab.
82
+
83
+ ```json
84
+ {
85
+ "topic": "iPhone 16 Pro reviews",
86
+ "overall_mood": "Positive",
87
+ "distribution": {
88
+ "very_positive": 15,
89
+ "positive": 38,
90
+ "mixed": 22,
91
+ "negative": 8
92
+ },
93
+ "by_source": {
94
+ "reddit": {
95
+ "very_positive": 10,
96
+ "positive": 25,
97
+ "mixed": 15,
98
+ "negative": 5
99
+ },
100
+ "x": {
101
+ "very_positive": 5,
102
+ "positive": 13,
103
+ "mixed": 7,
104
+ "negative": 3
105
+ }
106
+ },
107
+ "product_sentiments": [
108
+ {
109
+ "name": "iPhone 16 Pro",
110
+ "overall": "Positive",
111
+ "reddit_sentiment": "Positive",
112
+ "x_sentiment": "Mixed",
113
+ "evidence_quotes": [
114
+ {
115
+ "text": "Camera upgrade alone makes it worth it",
116
+ "author": "u/tech_reviewer",
117
+ "link": "https://reddit.com/r/iphone/comments/xyz789",
118
+ "sentiment": "Very Positive"
119
+ }
120
+ ]
121
+ }
122
+ ]
123
+ }
124
+ ```
125
+
126
+ ### Enum Rules for Sentiment
127
+
128
+ - ALL sentiment labels: **Title Case** — `"Very Positive"`, `"Positive"`, `"Mixed"`, `"Negative"`. NEVER use `"positive"` or `"neutral"`.
129
+ - `distribution` keys: **snake_case** — `very_positive`, `positive`, `mixed`, `negative`. NEVER use `veryPositive` or `Very Positive` as keys.
130
+ - `by_source` keys: **lowercase** — `reddit`, `x`. NEVER use `Reddit` or `X`.
131
+
96
132
  ## Final Validation Checklist
97
133
 
98
134
  - JSON parse succeeds.
@@ -1,33 +1,19 @@
1
1
  ---
2
2
  name: social_media_trend
3
- description: Analyze existing Reddit/X raw data and generate `classified_trend.json` with strict `TrendData` output. Use for timeline, growth/decline, and discussion-peak analysis requests.
3
+ description: Analysis-only worker. Reads existing raw Reddit/X data and generates `classified_trend.json` with strict `TrendData` output. Use for timeline, growth/decline, and discussion-peak analysis requests.
4
4
  ---
5
5
 
6
6
  # Social Media Trend Skill
7
7
 
8
- This worker converts raw discussion timestamps into a trend timeline with key moments.
8
+ This worker converts raw discussion timestamps into a trend timeline with key moments. It performs **analysis only** — fetching is handled by the orchestrator via `social_media_fetch`.
9
9
 
10
- ## Required Inputs
10
+ ## Prerequisites
11
11
 
12
- Use existing files only:
12
+ The following files must already exist (produced by `social_media_fetch`):
13
13
 
14
- - `reddit_data.json`
15
- - `x_data.json`
14
+ - `reddit_data.json` and/or `x_data.json`
16
15
 
17
- At least one valid source file must exist.
18
-
19
- ## Command Execution Flow
20
-
21
- Use this sequence for trend analysis:
22
-
23
- 1. Fetch or refresh raw data (outside this worker):
24
- - `sc-research research:deep "TOPIC"`
25
- - Optional filters: `--source=reddit|x|both --from=YYYY-MM-DD --to=YYYY-MM-DD`
26
- 2. Run this `social_media_trend` worker to generate `classified_trend.json`.
27
- 3. Optional visualization:
28
- - `sc-research visualize`
29
-
30
- If raw files are missing, stale, or mismatched for the requested topic/date range, run step 1 first.
16
+ At least one valid source file must be present. If both are missing, **stop and report failure** — do not attempt to fetch data.
31
17
 
32
18
  ## Step 1: Preflight and Date Parsing
33
19
 
@@ -84,6 +70,60 @@ Read `../social_media_schema/SKILL.md` and output strict `TrendData` JSON to:
84
70
 
85
71
  - `classified_trend.json`
86
72
 
73
+ ## Output Type Contract
74
+
75
+ Your output MUST match this exact shape. The dashboard detects trend data by checking for `date_range` (object) + `timeline` (array). Missing either field = broken tab.
76
+
77
+ ```json
78
+ {
79
+ "topic": "AI coding assistants",
80
+ "date_range": {
81
+ "from": "2025-01-01",
82
+ "to": "2025-01-31"
83
+ },
84
+ "granularity": "day",
85
+ "timeline": [
86
+ {
87
+ "period": "2025-01-01",
88
+ "post_count": 12,
89
+ "total_engagement": 3400,
90
+ "reddit_posts": 8,
91
+ "x_posts": 4
92
+ },
93
+ {
94
+ "period": "2025-01-02",
95
+ "post_count": 7,
96
+ "total_engagement": 1850,
97
+ "reddit_posts": 5,
98
+ "x_posts": 2
99
+ }
100
+ ],
101
+ "key_moments": [
102
+ {
103
+ "date": "2025-01-15",
104
+ "event": "Major product update announcement drove spike in discussion",
105
+ "significance": "high",
106
+ "url": "https://reddit.com/r/programming/comments/def456"
107
+ }
108
+ ]
109
+ }
110
+ ```
111
+
112
+ ### Period Format Rules
113
+
114
+ The `period` field in each timeline entry MUST match the selected granularity:
115
+
116
+ | Granularity | Period Format | Example |
117
+ | ----------- | ------------- | -------------- |
118
+ | `"day"` | `YYYY-MM-DD` | `"2025-01-15"` |
119
+ | `"week"` | `YYYY-Www` | `"2025-W03"` |
120
+ | `"month"` | `YYYY-MM` | `"2025-01"` |
121
+
122
+ ### Enum Rules for Trend
123
+
124
+ - `significance` on KeyMoment: **lowercase** — `"high"`, `"medium"`, `"low"`. NEVER use `"High"`, `"Medium"`, `"Low"`.
125
+ - All numeric fields (`post_count`, `total_engagement`, `reddit_posts`, `x_posts`) must be numbers, NEVER strings or null.
126
+
87
127
  ## Final Validation Checklist
88
128
 
89
129
  - JSON parse succeeds.
@@ -1,15 +1,28 @@
1
1
  ---
2
2
  name: using_social_media_research
3
- description: This skill should be used when the user asks what people on Reddit/X think about a topic, including requests like "best/top recommendation", "compare options", "sentiment", "trend over time", "controversy/debate", "what is trending", "quick summary", or "full analysis". It acts as the entrypoint router and selects the correct fetch strategy and worker skill.
3
+ description: This skill should be used when the user asks what people on Reddit/X think about a topic, including requests like "best/top recommendation", "compare options", "sentiment", "trend over time", "controversy/debate", "what is trending", "quick summary", or "full analysis". It acts as the entrypoint router and delegates to the correct fetch and worker skills.
4
4
  ---
5
5
 
6
6
  # Using Social Media Research (Orchestrator)
7
7
 
8
- This skill is the router for the research pipeline. Its job is to choose the right path, fetch the right depth, run the right worker, and return the right output file.
8
+ This skill is the **single entrypoint** for the entire research pipeline.
9
+
10
+ ## Pipeline Flow (always follow this order)
11
+
12
+ ```
13
+ User Question
14
+ → Step 1: Resolve Intent (pick the right worker)
15
+ → Step 2: Fetch Data (delegate to social_media_fetch)
16
+ → Step 3: Classify (delegate to the chosen worker skill)
17
+ → Step 4: Present Results
18
+ → Step 5: Visualize (optional — run sc-research visualize)
19
+ ```
20
+
21
+ **No other skill or command should run fetch commands directly.** Only this orchestrator decides when and how to fetch.
9
22
 
10
23
  ## Auto-Trigger Cues
11
24
 
12
- Treat this as the entrypoint skill when user requests map to social-media research intent, for example:
25
+ Activate this skill when the user's request maps to social-media research intent:
13
26
 
14
27
  - "What do people think about X?"
15
28
  - "Best X according to Reddit?"
@@ -18,111 +31,116 @@ Treat this as the entrypoint skill when user requests map to social-media resear
18
31
  - "Is X trending recently?"
19
32
  - "What are people debating about X?"
20
33
  - "Give me a quick social media summary"
34
+ - "Full analysis of X"
21
35
 
22
- ## Core Routing Contract
23
-
24
- - For analysis intents, run exactly one worker by default.
25
- - Run multiple workers only when the user explicitly asks for multi-view output (for example: "full analysis", "all views", "run everything").
26
- - Use deep fetch for every worker analysis.
27
- - Use quick fetch for explicit quick-answer requests.
28
- - If intent is still ambiguous after routing rules, use rank as the default overview route.
29
- - Reuse existing raw data only if it still matches topic, source, and date range; otherwise refetch.
30
-
31
- ## Command Execution Summary
36
+ ## Core Contract
32
37
 
33
- After route selection, execute commands as follows:
38
+ - Run exactly **one** worker by default.
39
+ - Run **multiple** workers only when the user explicitly asks for multi-view output ("full analysis", "all views", "run everything").
40
+ - Always use **deep** fetch for worker analysis routes.
41
+ - Use **quick** fetch only for explicit quick-answer requests.
42
+ - If intent is ambiguous after all routing rules, default to **quick-answer mode** (direct text response, no classified file).
43
+ - **Delegate all fetching to `social_media_fetch`** — never run `sc-research research` commands directly from this skill or any worker.
34
44
 
35
- - Rank / Sentiment / Trend / Controversy routes:
36
- - `sc-research research:deep "TOPIC" [--source=...] [--from=YYYY-MM-DD --to=YYYY-MM-DD]`
37
- - then run the selected worker skill to produce the matching `classified_*.json`.
38
- - Discovery route:
39
- - Broad weekly feed: `sc-research research:deep "DISCOVERY_WEEKLY" --mode=discovery [--source=...] [--from=... --to=...]`
40
- - Topic-focused: `sc-research research:deep "TOPIC" --mode=discovery [--source=...] [--from=... --to=...]`
41
- - then run `social_media_discovery`.
42
- - Quick answer route:
43
- - `sc-research research "TOPIC" --source=reddit [--from=... --to=...]`
44
- - then synthesize a short text answer (no classified file).
45
- - Optional dashboard view after any classified output:
46
- - `sc-research visualize`
45
+ ---
47
46
 
48
47
  ## Step 1: Resolve Intent (Strict Precedence)
49
48
 
50
- Apply rules top-to-bottom:
49
+ Apply rules top-to-bottom. First match wins.
51
50
 
52
51
  1. **Explicit multi-analysis request**
53
- - If user asks for "full analysis", "all views", or equivalent, run:
54
- 1. `social_media_rank`
55
- 2. `social_media_sentiment`
56
- 3. `social_media_trend`
57
- 4. `social_media_controversy`
58
- - Include `social_media_discovery` only if the user also asks for emerging/viral topic discovery.
52
+ - Trigger: "full analysis", "all views", "run everything", or equivalent.
53
+ - Run all four workers in order: `social_media_rank` → `social_media_sentiment` → `social_media_trend` → `social_media_controversy`.
54
+ - Include `social_media_discovery` only if user also asks about emerging/viral topics.
55
+
59
56
  2. **Explicit template request**
60
- - If the user names a template ("sentiment", "trend", "controversy", "discovery", "rank"), that route wins.
57
+ - Trigger: user names a specific analysis ("sentiment", "trend", "controversy", "discovery", "rank").
58
+ - Route directly to that single worker.
59
+
61
60
  3. **Explicit quick-answer request**
62
- - If the user asks for a quick/brief answer, use quick-answer mode (no classified file).
61
+ - Trigger: "quick answer", "short summary", "brief".
62
+ - Use quick-answer mode (no classified file, direct text response).
63
+
63
64
  4. **Inferred strongest intent**
64
- - Map by primary question intent (keywords table below).
65
- 5. **Fallback**
65
+ - Map by primary question keywords (see table below).
66
66
 
67
- - If still ambiguous, default to `social_media_rank`.
67
+ 5. **Fallback**
68
+ - Default to **quick-answer mode** — synthesize a 3–5 sentence answer directly from the raw data. Do not produce a `classified_*.json` file.
68
69
 
69
- ## Step 2: Map Intent to Route
70
+ ### Intent Worker Mapping
70
71
 
71
- | Route | Typical trigger phrases | Worker | Output |
72
+ | Route | Trigger phrases | Worker skill | Output file |
72
73
  | ------------ | ------------------------------------------------- | -------------------------- | ----------------------------- |
73
74
  | Rank | best, top, compare, recommendation, which one | `social_media_rank` | `classified_rank.json` |
74
75
  | Sentiment | feel, sentiment, opinion, positive/negative | `social_media_sentiment` | `classified_sentiment.json` |
75
76
  | Trend | timeline, over time, peak, growth, decline | `social_media_trend` | `classified_trend.json` |
76
77
  | Controversy | debate, divisive, disagreement, polarizing, vs | `social_media_controversy` | `classified_controversy.json` |
77
78
  | Discovery | trending topics, viral, discover themes, clusters | `social_media_discovery` | `classified_discovery.json` |
78
- | Quick Answer | quick answer, short summary, brief | none | direct text answer |
79
+ | Quick Answer | quick answer, short summary, brief | _(none)_ | direct text answer |
80
+
81
+ ### Source Preference Detection
82
+
83
+ | User wording | Source value |
84
+ | ------------------------------------- | ------------------------------------------------------- |
85
+ | "on Reddit", "subreddit", "Redditors" | `reddit` |
86
+ | "on X", "on Twitter", "tweets" | `x` |
87
+ | no explicit source | _(omit — runtime uses all sources with valid API keys)_ |
88
+
89
+ ## Step 2: Fetch Data (delegate to `social_media_fetch`)
90
+
91
+ Read the `social_media_fetch` skill and follow its instructions to fetch raw data. Provide it with:
92
+
93
+ - **topic**: the user's topic string
94
+ - **depth**: `deep` for all worker routes, `quick` for quick-answer route
95
+ - **mode**: `discovery` for the discovery route, `research` for all others
96
+ - **source**: from source preference above (omit if not specified)
97
+ - **date range**: `from`/`to` if user provided dates
98
+
99
+ The fetch skill handles data freshness checks, CLI execution, and output validation. Do not duplicate that logic here.
100
+
101
+ After fetch completes, confirm that at least one raw data file (`reddit_data.json` / `x_data.json`) exists and is valid before proceeding.
102
+
103
+ ## Step 3: Classify (delegate to worker skill)
79
104
 
80
- ## Step 3: Detect Source and Fetch Depth
105
+ Based on the route chosen in Step 1:
81
106
 
82
- ### Source Detection
107
+ - **Single route**: Read the selected worker skill's instructions (e.g., `social_media_rank`) and follow them to produce the matching `classified_*.json` file.
108
+ - **Multi-route** (full analysis): Read and execute each worker skill in order. Each worker reads existing raw data and writes its own output file independently.
109
+ - **Quick answer**: Synthesize a 3–5 sentence answer directly from the raw data. Do not produce any `classified_*.json` file.
83
110
 
84
- | User wording | Source flag |
85
- | ------------------------------------- | ----------------------------------------------------------------------------- |
86
- | "on Reddit", "subreddit", "Redditors" | `--source=reddit` |
87
- | "on X", "on Twitter", "tweets" | `--source=x` |
88
- | no explicit source | no source flag (runtime uses all enabled sources based on available API keys) |
111
+ ### CRITICAL: Schema Enforcement
89
112
 
90
- ### Fetch Strategy
113
+ Before writing ANY `classified_*.json` file, you MUST:
91
114
 
92
- - **Worker routes (rank/sentiment/trend/controversy):**
93
- - `sc-research research:deep "TOPIC" [--source=...] [--from=YYYY-MM-DD --to=YYYY-MM-DD]`
94
- - **Discovery route:**
95
- - Broad weekly feed: `sc-research research:deep "DISCOVERY_WEEKLY" --mode=discovery [--source=...] [--from=... --to=...]`
96
- - Topic-focused: `sc-research research:deep "TOPIC" --mode=discovery [--source=...] [--from=... --to=...]`
97
- - **Quick answer route:**
98
- - `sc-research research "TOPIC" --source=reddit [--from=... --to=...]`
115
+ 1. Read `social_media_schema` skill — it is the **single source of truth** for JSON shapes.
116
+ 2. Each worker skill contains an **Output Type Contract** section with a concrete JSON example — match it exactly.
117
+ 3. Use ONLY the enum values listed in the schema. Wrong casing = broken dashboard.
99
118
 
100
- ## Step 4: Data Freshness Check Before Fetch
119
+ The dashboard auto-detects each classified type by checking for specific field signatures. If required fields are missing or misnamed, the dashboard will **not show that tab at all**.
101
120
 
102
- Before refetching, check whether existing `reddit_data.json` / `x_data.json` can be reused:
121
+ ## Step 4: Present Results
103
122
 
104
- - Same or equivalent topic intent
105
- - Same requested source scope
106
- - Same requested date window (if provided)
107
- - Files are parseable and contain `items` arrays
123
+ - Confirm the expected `classified_*.json` file(s) exist and are parseable.
124
+ - Present results matching what was requested:
125
+ - Single-route single output summary
126
+ - Multi-route sectioned summary per route
127
+ - Quick answer → 3–5 sentence direct text
108
128
 
109
- If any check fails, run a fresh fetch.
129
+ ## Step 5: Visualize (optional)
110
130
 
111
- ## Step 5: Execute and Return
131
+ Suggest running `sc-research visualize` to view results in the web dashboard. The visualize command:
112
132
 
113
- - Run the selected worker skill.
114
- - Confirm expected `classified_*.json` file exists and is parseable.
115
- - Present only what was requested:
116
- - Single-route request -> single output summary
117
- - Multi-route request -> sectioned summary per route
118
- - Quick answer -> 3-5 sentence direct answer, no classified output required
133
+ 1. Reads all `classified_*.json` files in the working directory.
134
+ 2. Validates each against the expected schema.
135
+ 3. Merges them into a single `data.json` for the dashboard.
136
+ 4. Opens the dashboard at `localhost:5173`.
119
137
 
120
- ## Ambiguity and Safety Rules
138
+ ## Safety Rules
121
139
 
122
- - If a request mixes intents without explicit multi-analysis wording, prefer one route and state what was chosen.
123
- - Never fabricate missing output files.
140
+ - If a request mixes intents without explicit multi-analysis wording, pick the single strongest route and state what was chosen.
141
+ - Never fabricate output files or data.
124
142
  - Never silently switch from deep to quick fetch for worker routes.
125
- - Keep each route independent: each `classified_*.json` can be produced without generating the others.
143
+ - Each `classified_*.json` file is independent producing one never requires producing another.
126
144
 
127
145
  ## File Map
128
146