@botlearn/twitter-intel 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 BotLearn
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,35 @@
1
+ # @botlearn/twitter-intel
2
+
3
+ > Twitter/X platform intelligence gathering — tracking KOLs, extracting trending topics, analyzing engagement signals, detecting bot activity, and synthesizing actionable insights for OpenClaw Agent
4
+
5
+ ## Installation
6
+
7
+ ```bash
8
+ # via npm
9
+ npm install @botlearn/twitter-intel
10
+
11
+ # via clawhub
12
+ clawhub install @botlearn/twitter-intel
13
+ ```
14
+
15
+ ## Category
16
+
17
+ Information Retrieval
18
+
19
+ ## Dependencies
20
+
21
+ None
22
+
23
+ ## Files
24
+
25
+ | File | Description |
26
+ |------|-------------|
27
+ | `manifest.json` | Skill metadata and configuration |
28
+ | `skill.md` | Role definition and activation rules |
29
+ | `knowledge/` | Domain knowledge documents |
30
+ | `strategies/` | Behavioral strategy definitions |
31
+ | `tests/` | Smoke and benchmark tests |
32
+
33
+ ## License
34
+
35
+ MIT
@@ -0,0 +1,86 @@
1
+ ---
2
+ domain: twitter-intel
3
+ topic: anti-patterns
4
+ priority: medium
5
+ ttl: 30d
6
+ ---
7
+
8
+ # Twitter Intelligence — Anti-Patterns
9
+
10
+ ## Engagement Blindness Anti-Patterns
11
+
12
+ ### 1. Equating Virality with Credibility
13
+ - **Problem**: Treating a tweet with 50K retweets as inherently more credible than one with 500 retweets
14
+ - **Why it fails**: Virality is driven by emotional resonance, controversy, and algorithmic amplification — not accuracy. Misinformation routinely outperforms corrections in engagement metrics
15
+ - **Fix**: Always evaluate the source account's credibility score independently of engagement numbers. A Nano-KOL with domain expertise and 200 likes may carry more intelligence value than a Mega-KOL hot take with 100K likes
16
+
17
+ ### 2. Like Count as Sentiment Proxy
18
+ - **Problem**: Interpreting high like counts as public agreement or approval
19
+ - **Why it fails**: Users like tweets for many reasons: humor, relatability, bookmarking, ironic appreciation. Likes on sarcastic or critical tweets are easily misread as endorsement of the surface message
20
+ - **Fix**: Use explicit textual sentiment analysis on the tweet content itself. Treat likes as an attention signal, not an opinion signal. Cross-reference with reply sentiment for ground truth
21
+
22
+ ### 3. Follower Count as Authority Measure
23
+ - **Problem**: Automatically ranking accounts with more followers as more authoritative
24
+ - **Why it fails**: Follower counts can be inflated through purchasing, follow-back schemes, or historical virality unrelated to current domain expertise. Many genuine domain experts have modest followings
25
+ - **Fix**: Use the composite credibility score from knowledge/best-practices.md, which weights listed count, original content ratio, and engagement quality alongside follower count
26
+
27
+ ### 4. Impression Count Overreliance
28
+ - **Problem**: Treating impression counts as a reliable measure of message reach and impact
29
+ - **Why it fails**: Impressions measure timeline appearances, not actual reading. Auto-scrolling, algorithmic insertion, and muted accounts all inflate impressions without genuine attention
30
+ - **Fix**: Use engagement rate (interactions / impressions) as the meaningful metric. Low engagement rate on high impressions suggests passive exposure, not active reception
31
+
32
+ ## Sarcasm & Tone Anti-Patterns
33
+
34
+ ### 5. Ignoring Sarcasm and Irony Markers
35
+ - **Problem**: Taking tweet text at face value without assessing tone, leading to inverted sentiment classification
36
+ - **Why it fails**: Twitter culture is heavily sarcastic. Tweets like "Oh great, another data breach, exactly what we needed" would be classified as positive without tone analysis
37
+ - **Fix**: Check for sarcasm indicators:
38
+ - Quotation marks around praise ("great" move)
39
+ - Hyperbolic positive language in negative contexts
40
+ - Eye-roll or clown emojis following a statement
41
+ - "Surely" / "definitely" / "totally" in contexts that suggest the opposite
42
+ - Thread context: sarcastic reply to a serious tweet
43
+ - Account history: does this user typically use irony?
44
+
45
+ ### 6. Context-Free Quote Tweet Analysis
46
+ - **Problem**: Analyzing the quoted tweet's text without considering the quoting user's commentary
47
+ - **Why it fails**: Quote tweets are frequently used to disagree, mock, or add critical commentary. The quoted content and the quoting commentary often have opposite sentiments
48
+ - **Fix**: Always analyze the quote tweet as a composite: original text + quoting commentary + any added media. The quoting user's intent is the primary signal, not the original content
49
+
50
+ ### 7. Thread Fragment Extraction
51
+ - **Problem**: Extracting a single tweet from a multi-tweet thread and analyzing it in isolation
52
+ - **Why it fails**: Thread authors build nuanced arguments across tweets. A single tweet may contain a devil's advocate position, a setup for a counterpoint, or a hypothetical — all of which are misrepresented when isolated
53
+ - **Fix**: Always retrieve and analyze the complete thread (using `conversation_id`). Attribute the overall thread thesis, not individual tweet fragments
54
+
55
+ ## Bot Amplification Anti-Patterns
56
+
57
+ ### 8. Treating All Engagement as Organic
58
+ - **Problem**: Including bot-generated retweets, likes, and replies in engagement metrics and trend calculations without filtering
59
+ - **Why it fails**: Bot networks can manufacture artificial trends, inflate engagement by 10-100x, and create a false impression of consensus. Reporting bot-amplified metrics as organic misleads intelligence consumers
60
+ - **Fix**: Run bot detection heuristics (knowledge/domain.md) on the engagement sources before reporting metrics. Report both raw and bot-filtered engagement numbers. Flag topics where >15% of engagement comes from suspected bots
61
+
62
+ ### 9. Coordinated Hashtag Campaigns as Organic Trends
63
+ - **Problem**: Reporting a trending hashtag as an organic trend when it is being driven by a coordinated campaign
64
+ - **Why it fails**: Organized groups (political campaigns, marketing agencies, state actors) routinely coordinate hashtag pushes. The hashtag may trend without genuine organic interest
65
+ - **Fix**: Check for coordination signals:
66
+ - Multiple accounts tweeting the same hashtag within the same 5-minute window with similar/identical text
67
+ - Accounts in the campaign share creation dates, follower patterns, or bio templates
68
+ - Hashtag volume drops sharply after the campaign window — organic trends have longer tails
69
+ - Compare the hashtag's geographic spread — coordinated campaigns often originate from a single region
70
+
71
+ ### 10. Astroturfing Misread as Grassroots Movement
72
+ - **Problem**: Presenting an astroturfing campaign (organized fake grassroots activity) as genuine public sentiment
73
+ - **Why it fails**: Sophisticated astroturfing uses aged accounts, varied content, and staggered timing to mimic organic activity. Without careful analysis, it passes initial filters
74
+ - **Fix**: Apply the KOL Cascade Analysis from knowledge/best-practices.md. Genuine grassroots movements show Micro-KOL-to-Macro-KOL cascade over days. Astroturfing shows simultaneous activation across account tiers with no prior Micro-KOL buildup
75
+
76
+ ## Analysis & Reporting Anti-Patterns
77
+
78
+ ### 11. Recency Bias in Trend Assessment
79
+ - **Problem**: Reporting the most recent tweets as the definitive position on a topic without historical baseline
80
+ - **Why it fails**: Twitter discourse oscillates rapidly. A negative reaction in the last 2 hours may follow days of positive sentiment, or vice versa. Snapshot analysis misrepresents the trajectory
81
+ - **Fix**: Always establish a baseline period (7-30 days) before assessing current sentiment. Report both the current state and the direction of change. Use the Sentiment Shift Detection technique from knowledge/best-practices.md
82
+
83
+ ### 12. Single-Platform Echo Chamber
84
+ - **Problem**: Treating Twitter as representative of broader public opinion
85
+ - **Why it fails**: Twitter's user base skews toward specific demographics (urban, media-engaged, tech-savvy). Topics that dominate Twitter may be irrelevant to the broader population, and vice versa. Twitter's algorithmic amplification creates feedback loops
86
+ - **Fix**: Always caveat intelligence with "on Twitter/X" — never extrapolate to general public sentiment. Recommend cross-platform validation when the user needs broader opinion data. Note the platform's demographic skew in the confidence assessment
@@ -0,0 +1,117 @@
1
+ ---
2
+ domain: twitter-intel
3
+ topic: signal-filtering-credibility-trend-detection
4
+ priority: high
5
+ ttl: 30d
6
+ ---
7
+
8
+ # Twitter Intelligence — Best Practices
9
+
10
+ ## Signal Filtering Methodology
11
+
12
+ ### 1. Multi-Layer Noise Reduction
13
+ Apply filters sequentially to reduce the tweet corpus to actionable signals:
14
+ 1. **Language filter** — Restrict to target language(s) using `lang:` operator
15
+ 2. **Bot filter** — Exclude accounts matching bot heuristics from knowledge/domain.md
16
+ 3. **Retweet deduplication** — Collapse retweet chains to original tweet; count retweets as amplification metric
17
+ 4. **Relevance filter** — Score tweet-to-topic semantic similarity; discard below 0.6 threshold
18
+ 5. **Authority filter** — Weight remaining tweets by source KOL tier (from knowledge/domain.md)
19
+
20
+ ### 2. Signal-to-Noise Ratio Optimization
21
+ - Prefer `-is:retweet` for opinion extraction — retweets indicate amplification, not original thought
22
+ - Use `is:quote` to capture annotated discourse — quote tweets reveal disagreement, nuance, and counter-narratives
23
+ - Filter for `has:links` when seeking evidence-backed claims
24
+ - Apply `min_faves:` thresholds proportional to the topic's tweet volume:
25
+ - High-volume topic (>10K tweets/day): `min_faves:50`
26
+ - Medium-volume topic (1K-10K/day): `min_faves:10`
27
+ - Low-volume/niche topic (<1K/day): no minimum — all signals matter
28
+
29
+ ### 3. Temporal Window Selection
30
+ - **Breaking news**: Last 1-4 hours — prioritize recency over engagement
31
+ - **Trend monitoring**: Last 24-72 hours — balance recency with engagement signals
32
+ - **Sentiment baseline**: Last 7-30 days — establish norms before measuring shifts
33
+ - **Historical analysis**: Full archive — requires Academic/Pro API access
34
+
35
+ ## Credibility Assessment Framework
36
+
37
+ ### Account Credibility Score (0-100)
38
+
39
+ Calculate a composite credibility score for each account:
40
+
41
+ | Factor | Weight | Scoring |
42
+ |--------|--------|---------|
43
+ | Account age | 15% | <30 days: 0, 30d-1y: 40, 1-3y: 70, 3y+: 100 |
44
+ | Follower/following ratio | 15% | <0.5: 20, 0.5-2: 40, 2-10: 70, 10+: 100 |
45
+ | Listed count per 1K followers | 15% | <1: 20, 1-5: 50, 5-20: 80, 20+: 100 |
46
+ | Original content ratio | 15% | <20%: 20, 20-50%: 50, 50-80%: 80, 80%+: 100 |
47
+ | Verified status | 10% | None: 40, Blue: 60, Gold/Grey: 100 |
48
+ | Bio completeness | 10% | Empty: 0, Generic: 30, Professional with affiliations: 100 |
49
+ | Engagement quality | 10% | Bot-like patterns: 0, Normal: 60, High-quality replies: 100 |
50
+ | Posting consistency | 10% | Sporadic/burst: 30, Regular cadence: 70, Daily with variety: 100 |
51
+
52
+ ### Credibility Tiers
53
+
54
+ | Tier | Score | Treatment |
55
+ |------|-------|-----------|
56
+ | Authoritative | 80-100 | Primary source — cite directly, high confidence |
57
+ | Credible | 60-79 | Reliable source — cite with standard attribution |
58
+ | Provisional | 40-59 | Use with caution — require corroboration from higher tier |
59
+ | Suspect | 20-39 | Do not cite alone — only as supporting data with corroboration |
60
+ | Unreliable | 0-19 | Exclude from analysis — flag if part of coordinated campaign |
61
+
62
+ ### Cross-Referencing Protocol
63
+ - **Single-source claim**: Never report. Require 2+ independent Credible-tier sources
64
+ - **Controversial claim**: Require 3+ independent sources across different KOL tiers
65
+ - **Statistical claim**: Require primary data source or link to verifiable dataset
66
+ - **Breaking event**: Allow single Authoritative-tier source with "unconfirmed" label; upgrade after corroboration
67
+
68
+ ## Trend Detection Techniques
69
+
70
+ ### 1. Volume Velocity Analysis
71
+ Track tweet volume over sliding windows to detect acceleration:
72
+ - Calculate **tweets per hour** for the target topic over the last 72 hours
73
+ - Compute **velocity** = (current hour volume) / (average hourly volume over past 72h)
74
+ - Thresholds:
75
+ - Velocity 1.5-3x: **Elevated interest** — monitor closely
76
+ - Velocity 3-10x: **Emerging trend** — begin analysis
77
+ - Velocity >10x: **Viral event** — prioritize for immediate briefing
78
+
79
+ ### 2. Hashtag Co-occurrence Mapping
80
+ - Track which hashtags appear together in tweets about the target topic
81
+ - New co-occurrences signal narrative evolution (e.g., a tech topic suddenly co-occurring with #regulation)
82
+ - Build a co-occurrence graph; detect new clusters forming over 24-48h windows
83
+
84
+ ### 3. KOL Cascade Analysis
85
+ - Track when a topic moves across KOL tiers:
86
+ - Nano/Micro-KOL discussion first → Macro-KOL pickup → Mega-KOL amplification = organic trend
87
+ - Mega-KOL first → immediate broad amplification without prior Micro-KOL discussion = top-down narrative push
88
+ - The **cascade direction** indicates whether a trend is grassroots or manufactured
89
+
90
+ ### 4. Sentiment Shift Detection
91
+ - Establish a rolling 7-day sentiment baseline for the target topic
92
+ - Detect statistically significant shifts (>2 standard deviations from baseline)
93
+ - Categorize shifts:
94
+ - **Gradual drift**: Sentiment changes over days — underlying narrative evolution
95
+ - **Sharp reversal**: Sentiment flips within hours — triggered by a specific event or revelation
96
+ - **Polarization spike**: Average sentiment stays similar but variance increases — growing disagreement
97
+
98
+ ### 5. Geographic & Demographic Spread
99
+ - Track when a topic crosses from one geographic region or language community to another
100
+ - Use `place_country:` and `lang:` operators to segment
101
+ - Cross-community spread is a strong indicator of a trend gaining mainstream traction
102
+
103
+ ## Intelligence Report Structure
104
+
105
+ ### Standard Briefing Format
106
+ 1. **Executive Summary** — 2-3 sentence overview: what happened, why it matters, confidence level
107
+ 2. **Key Findings** — Bulleted list of 3-5 main intelligence points, each with source attribution
108
+ 3. **KOL Positions** — Table of notable KOL statements with credibility tier and reach metrics
109
+ 4. **Trend Metrics** — Volume, velocity, sentiment data with time-series context
110
+ 5. **Bot/Inauthenticity Assessment** — Percentage of engagement flagged as inauthentic; impact on findings
111
+ 6. **Confidence Rating** — Overall confidence: High (80-100%), Medium (50-79%), Low (<50%) with justification
112
+ 7. **Recommended Actions** — What the user should do with this intelligence; monitoring suggestions
113
+
114
+ ### Confidence Rating Criteria
115
+ - **High (80-100%)**: 3+ Authoritative sources, consistent across KOL tiers, minimal bot contamination (<5%)
116
+ - **Medium (50-79%)**: 2+ Credible sources, mostly consistent with minor discrepancies, moderate bot presence (5-15%)
117
+ - **Low (<50%)**: Single source or conflicting accounts, high bot contamination (>15%), or rapidly evolving situation
@@ -0,0 +1,140 @@
1
+ ---
2
+ domain: twitter-intel
3
+ topic: twitter-api-engagement-metrics-kol-signals
4
+ priority: high
5
+ ttl: 30d
6
+ ---
7
+
8
+ # Twitter Intelligence — API, Engagement Metrics & KOL Signals
9
+
10
+ ## Twitter/X API v2 Endpoints
11
+
12
+ ### Tweet Search
13
+ - **Recent Search** — `GET /2/tweets/search/recent` — Tweets from last 7 days
14
+ - Query operators: `from:`, `to:`, `is:retweet`, `is:reply`, `is:quote`, `has:media`, `has:links`, `lang:`
15
+ - Max results per request: 100 (paginate with `next_token`)
16
+ - Rate limit: 450 requests / 15-min window (App-level), 180 (User-level)
17
+ - **Full-Archive Search** — `GET /2/tweets/search/all` — Complete tweet history (Academic/Pro access)
18
+ - Same query operators as Recent Search
19
+ - Rate limit: 300 requests / 15-min window
20
+ - **Filtered Stream** — `POST /2/tweets/search/stream/rules` + `GET /2/tweets/search/stream`
21
+ - Real-time streaming with up to 25 concurrent rules (Basic), 1000 (Pro)
22
+ - Supports all search operators as filter rules
23
+
24
+ ### User & Account Data
25
+ - **User Lookup** — `GET /2/users/by/username/:username`
26
+ - Fields: `id`, `name`, `username`, `created_at`, `description`, `public_metrics`, `verified`, `verified_type`
27
+ - **User Tweets Timeline** — `GET /2/users/:id/tweets` — Up to 3,200 most recent tweets
28
+ - **Followers/Following** — `GET /2/users/:id/followers`, `GET /2/users/:id/following`
29
+ - Rate limit: 15 requests / 15-min window
30
+
31
+ ### Engagement & Metrics
32
+ - **Tweet Metrics** (via `tweet.fields=public_metrics`):
33
+ - `retweet_count`, `reply_count`, `like_count`, `quote_count`, `bookmark_count`, `impression_count`
34
+ - **User Metrics** (via `user.fields=public_metrics`):
35
+ - `followers_count`, `following_count`, `tweet_count`, `listed_count`
36
+
37
+ ### Trend Data
38
+ - **Trending Topics** — `GET /1.1/trends/place.json?id={WOEID}`
39
+ - Returns top 50 trends for a location (WOEID-based)
40
+ - Includes `tweet_volume` (last 24h) when available
41
+ - Rate limit: 75 requests / 15-min window
42
+
43
+ ## Search Query Operators
44
+
45
+ ### Content Filters
46
+ - `"exact phrase"` — Match exact phrase in tweet text
47
+ - `keyword1 keyword2` — Both terms required (implicit AND)
48
+ - `keyword1 OR keyword2` — Match either term
49
+ - `-keyword` — Exclude tweets containing term
50
+ - `#hashtag` — Match hashtag
51
+ - `$TICKER` — Match cashtag (financial symbols)
52
+ - `url:"domain.com"` — Tweets containing links to domain
53
+
54
+ ### Account Filters
55
+ - `from:username` — Tweets authored by account
56
+ - `to:username` — Tweets directed at account (replies/mentions)
57
+ - `@username` — Tweets mentioning account
58
+ - `retweets_of:username` — Retweets of account's tweets
59
+
60
+ ### Tweet Type Filters
61
+ - `is:retweet` / `-is:retweet` — Include/exclude retweets
62
+ - `is:reply` / `-is:reply` — Include/exclude replies
63
+ - `is:quote` — Quote tweets only
64
+ - `is:verified` — From verified accounts only
65
+ - `has:media` — Tweets with images or video
66
+ - `has:links` — Tweets with URLs
67
+ - `has:hashtags` — Tweets with at least one hashtag
68
+
69
+ ### Temporal & Engagement Filters
70
+ - `since:2024-01-01` / `until:2024-12-31` — Date range (YYYY-MM-DD)
71
+ - `min_retweets:100` — Minimum retweet threshold
72
+ - `min_faves:500` — Minimum like threshold
73
+ - `min_replies:50` — Minimum reply threshold
74
+ - `lang:en` — Language filter (ISO 639-1)
75
+
76
+ ## KOL Identification Signals
77
+
78
+ ### Authority Indicators
79
+ | Signal | Description | Weight |
80
+ |--------|------------|--------|
81
+ | Verified badge | Blue checkmark (paid) or gold/grey (org/gov) | Medium — verification is now pay-to-play; gold/grey is stronger |
82
+ | Follower-to-following ratio | High ratio (>10:1) suggests organic authority | High |
83
+ | Listed count | Number of lists the account appears on — indicates curation by others | High |
84
+ | Account age | Older accounts with consistent activity are more credible | Medium |
85
+ | Bio & affiliations | Institutional affiliation, professional credentials | High |
86
+
87
+ ### Content Quality Indicators
88
+ | Signal | Description | Weight |
89
+ |--------|------------|--------|
90
+ | Original tweet ratio | Proportion of original tweets vs retweets — high ratio suggests thought leadership | High |
91
+ | Thread creation | Regularly publishes long-form threads — indicates deep analysis | Medium |
92
+ | Citation behavior | Links to primary sources, papers, data — indicates research rigor | High |
93
+ | Engagement quality | Reply-to-like ratio — high reply engagement suggests genuine discourse | Medium |
94
+ | Consistency | Posts on-topic regularly over months/years — not a flash account | High |
95
+
96
+ ### KOL Classification Tiers
97
+
98
+ | Tier | Followers | Characteristics | Intelligence Value |
99
+ |------|-----------|----------------|-------------------|
100
+ | Mega-KOL | 1M+ | Broad reach, high noise, opinion-shaping | Trend confirmation, narrative direction |
101
+ | Macro-KOL | 100K-1M | Industry visibility, media crossover | Sector sentiment, emerging narratives |
102
+ | Mid-KOL | 10K-100K | Domain specialists, practitioner voices | Technical signals, insider perspective |
103
+ | Micro-KOL | 1K-10K | Niche experts, early adopters | Early signals, ground-truth validation |
104
+ | Nano-KOL | <1K | Hyper-specialized, often undervalued | Deep domain knowledge, contrarian signals |
105
+
106
+ ## Engagement Metric Interpretation
107
+
108
+ ### Healthy Engagement Ratios
109
+ - **Like-to-impression ratio**: 1-3% is typical; >5% indicates high resonance
110
+ - **Retweet-to-like ratio**: 0.1-0.3 is normal; >0.5 suggests strong shareability or controversy
111
+ - **Reply-to-like ratio**: 0.01-0.05 is normal; >0.1 indicates contentious content ("ratio'd")
112
+ - **Quote-to-retweet ratio**: >0.3 suggests the tweet is being challenged or annotated
113
+
114
+ ### Anomalous Engagement Patterns
115
+ - **Spike without context**: Sudden engagement surge with no clear catalyst — possible bot amplification
116
+ - **Follower burst**: Account gains 10K+ followers in <24h without viral content — possible purchased followers
117
+ - **Uniform engagement timing**: Likes/retweets arriving at metronomic intervals — bot signature
118
+ - **Low-quality reply flood**: High reply count but replies are generic, single-emoji, or from low-follower accounts — astroturfing
119
+
120
+ ## Bot Detection Heuristics
121
+
122
+ ### Account-Level Signals
123
+ - Default profile image or AI-generated avatar
124
+ - Username contains long random number strings (e.g., `user83749201`)
125
+ - Account created in bulk pattern (similar creation dates, sequential naming)
126
+ - Bio is generic, copied, or empty; displays no domain expertise
127
+ - Following/follower ratio near 1:1 with high absolute numbers (follow-back bot)
128
+
129
+ ### Behavior-Level Signals
130
+ - Tweets at inhuman frequency (>100 tweets/hour)
131
+ - Posts 24/7 with no sleep pattern
132
+ - Content is entirely retweets or templated replies
133
+ - Engages across unrelated topics with no coherent interest graph
134
+ - Replies within seconds of target tweet publication
135
+
136
+ ### Network-Level Signals
137
+ - Cluster of accounts retweeting the same content within minutes
138
+ - Shared followers/following lists across multiple accounts
139
+ - Coordinated hashtag usage at the same timestamps
140
+ - Reply chains that form repetitive patterns (e.g., same phrase variations)
package/manifest.json ADDED
@@ -0,0 +1,26 @@
1
+ {
2
+ "name": "@botlearn/twitter-intel",
3
+ "version": "0.1.0",
4
+ "description": "Twitter/X platform intelligence gathering — tracking KOLs, extracting trending topics, analyzing engagement signals, detecting bot activity, and synthesizing actionable insights for OpenClaw Agent",
5
+ "category": "information-retrieval",
6
+ "author": "BotLearn",
7
+ "benchmarkDimension": "information-retrieval",
8
+ "expectedImprovement": 30,
9
+ "dependencies": {},
10
+ "compatibility": {
11
+ "openclaw": ">=0.5.0"
12
+ },
13
+ "files": {
14
+ "skill": "skill.md",
15
+ "knowledge": [
16
+ "knowledge/domain.md",
17
+ "knowledge/best-practices.md",
18
+ "knowledge/anti-patterns.md"
19
+ ],
20
+ "strategies": [
21
+ "strategies/main.md"
22
+ ],
23
+ "smokeTest": "tests/smoke.json",
24
+ "benchmark": "tests/benchmark.json"
25
+ }
26
+ }
package/package.json ADDED
@@ -0,0 +1,35 @@
1
+ {
2
+ "name": "@botlearn/twitter-intel",
3
+ "version": "0.1.0",
4
+ "description": "Twitter/X platform intelligence gathering — tracking KOLs, extracting trending topics, analyzing engagement signals, detecting bot activity, and synthesizing actionable insights for OpenClaw Agent",
5
+ "type": "module",
6
+ "main": "manifest.json",
7
+ "files": [
8
+ "manifest.json",
9
+ "skill.md",
10
+ "knowledge/",
11
+ "strategies/",
12
+ "tests/",
13
+ "README.md"
14
+ ],
15
+ "keywords": [
16
+ "botlearn",
17
+ "openclaw",
18
+ "skill",
19
+ "information-retrieval"
20
+ ],
21
+ "author": "BotLearn",
22
+ "license": "MIT",
23
+ "repository": {
24
+ "type": "git",
25
+ "url": "https://github.com/readai-team/botlearn-awesome-skills.git",
26
+ "directory": "packages/skills/twitter-intel"
27
+ },
28
+ "homepage": "https://github.com/readai-team/botlearn-awesome-skills/tree/main/packages/skills/twitter-intel",
29
+ "bugs": {
30
+ "url": "https://github.com/readai-team/botlearn-awesome-skills/issues"
31
+ },
32
+ "publishConfig": {
33
+ "access": "public"
34
+ }
35
+ }
package/skill.md ADDED
@@ -0,0 +1,48 @@
1
+ ---
2
+ name: twitter-intel
3
+ role: Twitter Intelligence Analyst
4
+ version: 1.0.0
5
+ triggers:
6
+ - "twitter"
7
+ - "tweet"
8
+ - "KOL"
9
+ - "trending"
10
+ - "X platform"
11
+ - "twitter intelligence"
12
+ - "twitter analysis"
13
+ - "influencer tracking"
14
+ - "twitter trends"
15
+ - "social listening"
16
+ ---
17
+
18
+ # Role
19
+
20
+ You are a Twitter Intelligence Analyst. When activated, you monitor the Twitter/X platform to track key opinion leaders (KOLs), extract trending narratives, analyze engagement signals, detect bot-driven amplification, and synthesize actionable intelligence reports from the platform's real-time discourse.
21
+
22
+ # Capabilities
23
+
24
+ 1. Curate and maintain watchlists of KOLs, domain experts, and emerging voices within specified topics or industries
25
+ 2. Filter high-signal tweets from noise using engagement metrics, account credibility scoring, and content relevance analysis
26
+ 3. Extract and classify opinions, stances, and sentiment from tweet threads, quote tweets, and reply chains
27
+ 4. Detect emerging trends, narrative shifts, and coordinated amplification campaigns before they reach mainstream awareness
28
+ 5. Synthesize multi-source Twitter intelligence into structured, time-stamped briefings with confidence ratings and source attribution
29
+ 6. Identify bot networks, astroturfing patterns, and inauthentic engagement to separate organic signal from manufactured consensus
30
+
31
+ # Constraints
32
+
33
+ 1. Never treat high engagement (likes, retweets) as a proxy for credibility — always verify the source account's authenticity and authority
34
+ 2. Never report on a trend based on a single tweet or a single account — require corroboration from 3+ independent sources
35
+ 3. Never ignore sarcasm, irony, or satire markers — always assess tweet tone before extracting sentiment or opinion
36
+ 4. Never present bot-amplified content as organic public opinion — always flag suspected inauthentic activity
37
+ 5. Always include temporal context (timestamps, trend velocity) — Twitter intelligence is time-sensitive by nature
38
+ 6. Always respect rate limits and platform terms of service when interfacing with Twitter/X API endpoints
39
+
40
+ # Activation
41
+
42
+ WHEN the user requests Twitter monitoring, KOL tracking, or trend analysis:
43
+ 1. Identify the target topic, industry, or set of accounts to monitor
44
+ 2. Execute source curation and signal filtering following strategies/main.md
45
+ 3. Apply knowledge/domain.md for API usage, metric interpretation, and KOL identification
46
+ 4. Evaluate findings using knowledge/best-practices.md for credibility and trend validation
47
+ 5. Check against knowledge/anti-patterns.md to avoid engagement blindness, sarcasm misreads, and bot amplification traps
48
+ 6. Output a structured intelligence briefing with confidence levels, source attribution, and temporal context
@@ -0,0 +1,113 @@
1
+ ---
2
+ strategy: twitter-intel
3
+ version: 1.0.0
4
+ steps: 6
5
+ ---
6
+
7
+ # Twitter Intelligence Strategy
8
+
9
+ ## Step 1: Source Curation
10
+ - Parse the user's request to identify: **target topic**, **accounts of interest**, **time window**, **geographic scope**, and **desired intelligence type** (trend analysis, KOL tracking, sentiment monitoring, or event coverage)
11
+ - IF the request is vague THEN ask one clarifying question to narrow the scope (e.g., "Which industry vertical?" or "Any specific accounts to prioritize?")
12
+ - Build initial source lists:
13
+ - **Topic keywords**: Extract 3-7 primary keywords and hashtags relevant to the topic
14
+ - **KOL watchlist**: Identify 5-15 accounts across KOL tiers (from knowledge/domain.md) that are relevant to the topic
15
+ - **Exclusion list**: Identify known bot accounts, spam hashtags, and noise sources to exclude
16
+ - Construct Twitter/X API search queries using operators from knowledge/domain.md:
17
+ - Primary query: topic keywords + `-is:retweet` + `lang:` filter
18
+ - KOL query: `from:` operators for watchlist accounts + topic keywords
19
+ - Trend query: hashtag co-occurrences + `min_faves:` threshold
20
+ - Set the temporal window based on intelligence type:
21
+ - Breaking event: last 1-4 hours
22
+ - Trend monitoring: last 24-72 hours
23
+ - Baseline establishment: last 7-30 days
24
+
25
+ ## Step 2: Signal Filtering
26
+ - Execute queries and collect raw tweet corpus
27
+ - Apply multi-layer noise reduction from knowledge/best-practices.md:
28
+ 1. Remove exact duplicates and retweet chains — collapse to original tweets with amplification counts
29
+ 2. Run bot detection heuristics from knowledge/domain.md on all accounts in the corpus
30
+ 3. Score each account using the Account Credibility Score framework (knowledge/best-practices.md)
31
+ 4. Filter tweets by semantic relevance to the target topic — discard below 0.6 similarity threshold
32
+ 5. Apply engagement thresholds proportional to topic volume
33
+ - Tag each remaining tweet with metadata:
34
+ - Source credibility tier (Authoritative / Credible / Provisional / Suspect)
35
+ - Bot probability score (0-100%)
36
+ - Relevance score (0.6-1.0)
37
+ - IF >15% of engagement is flagged as bot-driven THEN flag the topic for inauthenticity risk
38
+ - VERIFY against anti-pattern #8 (knowledge/anti-patterns.md): ensure bot engagement is separated from organic metrics
39
+
40
+ ## Step 3: Opinion Extraction
41
+ - For each high-signal tweet (credibility tier >= Provisional, relevance >= 0.7):
42
+ - Extract the **stated position** — what claim or opinion does the tweet express?
43
+ - Classify **sentiment** — positive / negative / neutral / mixed toward the target topic
44
+ - Detect **tone markers** — check for sarcasm, irony, or satire per anti-pattern #5 (knowledge/anti-patterns.md)
45
+ - IF the tweet is a quote tweet THEN analyze as composite (original + commentary) per anti-pattern #6
46
+ - IF the tweet is part of a thread THEN retrieve full thread via `conversation_id` and analyze holistically per anti-pattern #7
47
+ - Group extracted opinions into **stance clusters**:
48
+ - Supporters (positive sentiment toward topic/entity)
49
+ - Critics (negative sentiment toward topic/entity)
50
+ - Neutral analysts (factual commentary without clear stance)
51
+ - Contrarians (minority positions that diverge from the dominant narrative)
52
+ - For each cluster, identify the **strongest voice** — the highest-credibility KOL representing that position
53
+ - Calculate **opinion distribution** — percentage of credible voices in each cluster
54
+
55
+ ## Step 4: Trend Detection
56
+ - Apply Volume Velocity Analysis from knowledge/best-practices.md:
57
+ - Calculate tweets per hour over the analysis window
58
+ - Compute velocity relative to 72-hour average
59
+ - IF velocity > 3x THEN classify as emerging trend
60
+ - IF velocity > 10x THEN classify as viral event — prioritize for immediate briefing
61
+ - Execute Hashtag Co-occurrence Mapping:
62
+ - Build co-occurrence graph for hashtags in the filtered corpus
63
+ - Identify new hashtag clusters forming in the last 24-48 hours
64
+ - Flag unexpected co-occurrences (e.g., tech topic + political hashtag = narrative hijacking risk)
65
+ - Perform KOL Cascade Analysis:
66
+ - Track the chronological spread across KOL tiers
67
+ - IF Micro-KOL → Macro-KOL → Mega-KOL cascade THEN classify as organic trend
68
+ - IF Mega-KOL-first or simultaneous cross-tier activation THEN flag as top-down narrative push per anti-pattern #9
69
+ - Run Sentiment Shift Detection:
70
+ - Compare current sentiment against the 7-day rolling baseline
71
+ - IF shift > 2 standard deviations THEN classify the shift type (gradual drift, sharp reversal, or polarization spike)
72
+ - VERIFY against anti-pattern #10: check for astroturfing signatures in any detected trends
73
+
74
+ ## Step 5: Insight Synthesis
75
+ - Merge findings from Steps 2-4 into a coherent intelligence picture:
76
+ - Connect opinion clusters (Step 3) with trend dynamics (Step 4)
77
+ - Identify **causal narratives**: what events or statements triggered the observed patterns?
78
+ - Assess **trajectory**: is the trend accelerating, plateauing, or declining?
79
+ - Evaluate **cross-topic spillover**: is the topic affecting or being affected by adjacent conversations?
80
+ - Generate confidence ratings using criteria from knowledge/best-practices.md:
81
+ - Count corroborating sources across credibility tiers
82
+ - Calculate bot contamination percentage
83
+ - Assess source diversity (single echo chamber vs. multi-community signal)
84
+ - Identify **intelligence gaps** — what questions remain unanswered? What data would increase confidence?
85
+ - VERIFY against anti-pattern #11: ensure the assessment includes baseline context, not just the latest snapshot
86
+ - VERIFY against anti-pattern #12: caveat all findings as Twitter/X-specific; do not extrapolate to general public opinion
87
+
88
+ ## Step 6: Intelligence Briefing Output
89
+ - Structure the output following the Standard Briefing Format from knowledge/best-practices.md:
90
+ 1. **Executive Summary** — 2-3 sentences: what the intelligence reveals, why it matters, confidence level
91
+ 2. **Key Findings** — 3-5 bulleted intelligence points, each with:
92
+ - The finding itself
93
+ - Source attribution (KOL name, credibility tier, tweet link)
94
+ - Corroboration count (how many independent sources support this)
95
+ 3. **KOL Positions** — Table of notable KOL statements:
96
+ - Account handle | Credibility tier | Follower count | Position summary | Tweet link
97
+ 4. **Trend Metrics** — Quantitative data:
98
+ - Tweet volume and velocity (with time-series if available)
99
+ - Sentiment distribution (% positive / negative / neutral)
100
+ - Top hashtags and co-occurrences
101
+ - Geographic spread (if detectable)
102
+ 5. **Bot & Inauthenticity Assessment** — Percentage of flagged engagement; impact on findings
103
+ 6. **Confidence Rating** — High / Medium / Low with explicit justification
104
+ 7. **Recommended Actions** — What the user should do next:
105
+ - Accounts to watch for follow-up developments
106
+ - Suggested monitoring frequency
107
+ - Cross-platform validation recommendations if confidence is Medium or Low
108
+ - SELF-CHECK before delivery:
109
+ - Are all claims backed by attributed sources?
110
+ - Have bot-contaminated metrics been flagged?
111
+ - Is temporal context (timestamps, trend direction) included throughout?
112
+ - Does the briefing avoid the 12 anti-patterns from knowledge/anti-patterns.md?
113
+ - IF any check fails THEN loop back to the relevant step and re-analyze
@@ -0,0 +1,476 @@
1
+ {
2
+ "version": "0.0.1",
3
+ "dimension": "information-retrieval",
4
+ "tasks": [
5
+ {
6
+ "id": "bench-easy-01",
7
+ "difficulty": "easy",
8
+ "description": "Identify top KOLs for a well-known topic",
9
+ "input": "List the top 5 most influential accounts on Twitter/X discussing artificial intelligence. For each account, provide their handle, follower count, KOL tier, and a brief description of why they are influential in the AI space.",
10
+ "rubric": [
11
+ {
12
+ "criterion": "KOL Identification Accuracy",
13
+ "weight": 0.4,
14
+ "scoring": {
15
+ "5": "Identifies 5 genuinely influential AI accounts across multiple tiers (researchers, industry leaders, commentators); accounts are widely recognized in the AI community",
16
+ "3": "Identifies 3-4 relevant accounts but selection is skewed toward one category (e.g., all corporate accounts)",
17
+ "1": "Identifies 1-2 relevant accounts mixed with irrelevant ones",
18
+ "0": "No relevant AI KOLs identified"
19
+ }
20
+ },
21
+ {
22
+ "criterion": "Metadata Completeness",
23
+ "weight": 0.3,
24
+ "scoring": {
25
+ "5": "Each account includes: handle, follower count, KOL tier classification, credibility indicators, and a specific reason for their influence",
26
+ "3": "Handles and follower counts provided but missing tier classification or influence rationale",
27
+ "1": "Only handles listed without supporting data",
28
+ "0": "No metadata provided"
29
+ }
30
+ },
31
+ {
32
+ "criterion": "Source Diversity",
33
+ "weight": 0.3,
34
+ "scoring": {
35
+ "5": "Accounts span multiple KOL tiers (Mega, Macro, Mid) and roles (researcher, founder, journalist, practitioner)",
36
+ "3": "Accounts from 2 tiers or roles",
37
+ "1": "All accounts from same tier or role",
38
+ "0": "No diversity consideration"
39
+ }
40
+ }
41
+ ],
42
+ "expectedScoreWithout": 35,
43
+ "expectedScoreWith": 75
44
+ },
45
+ {
46
+ "id": "bench-easy-02",
47
+ "difficulty": "easy",
48
+ "description": "Extract trending hashtags for a specific topic",
49
+ "input": "What are the top trending hashtags on Twitter/X related to cryptocurrency in the past 24 hours? Provide the hashtag, estimated tweet volume, and a one-sentence description of what each hashtag is about.",
50
+ "rubric": [
51
+ {
52
+ "criterion": "Hashtag Relevance",
53
+ "weight": 0.4,
54
+ "scoring": {
55
+ "5": "Returns 5+ genuinely trending crypto-related hashtags that are currently active; all are directly relevant to cryptocurrency topics",
56
+ "3": "Returns 3-4 relevant hashtags but includes 1-2 generic or outdated ones",
57
+ "1": "Returns mostly generic hashtags (#crypto, #bitcoin) without trending context",
58
+ "0": "Irrelevant hashtags"
59
+ }
60
+ },
61
+ {
62
+ "criterion": "Volume & Context Data",
63
+ "weight": 0.3,
64
+ "scoring": {
65
+ "5": "Each hashtag includes estimated tweet volume, what it refers to, and why it is trending now",
66
+ "3": "Volume estimates provided but context is vague",
67
+ "1": "Hashtags listed without volume or context",
68
+ "0": "No supporting data"
69
+ }
70
+ },
71
+ {
72
+ "criterion": "Temporal Accuracy",
73
+ "weight": 0.3,
74
+ "scoring": {
75
+ "5": "Hashtags are clearly from the last 24 hours with timestamps or recency indicators; distinguishes current trends from evergreen tags",
76
+ "3": "Some recency indication but not clearly time-bounded",
77
+ "1": "No temporal context — could be from any time period",
78
+ "0": "Returns outdated trends"
79
+ }
80
+ }
81
+ ],
82
+ "expectedScoreWithout": 35,
83
+ "expectedScoreWith": 80
84
+ },
85
+ {
86
+ "id": "bench-easy-03",
87
+ "difficulty": "easy",
88
+ "description": "Basic sentiment check on a public figure",
89
+ "input": "What is the general sentiment on Twitter/X toward Elon Musk this week? Provide a simple breakdown: percentage positive, negative, and neutral, with 2-3 example tweets for each category.",
90
+ "rubric": [
91
+ {
92
+ "criterion": "Sentiment Classification",
93
+ "weight": 0.4,
94
+ "scoring": {
95
+ "5": "Provides clear percentage breakdown with reasonable distribution; acknowledges mixed/polarized sentiment; methodology is explained",
96
+ "3": "Provides a general sentiment direction (mostly positive/negative) but percentages are vague or unsupported",
97
+ "1": "Single-word sentiment (e.g., 'negative') without nuance or data",
98
+ "0": "No sentiment analysis"
99
+ }
100
+ },
101
+ {
102
+ "criterion": "Example Quality",
103
+ "weight": 0.3,
104
+ "scoring": {
105
+ "5": "2-3 examples per category from credible accounts; examples clearly represent the stated sentiment; tone is correctly interpreted (sarcasm detected)",
106
+ "3": "1-2 examples per category but some are ambiguous or from low-credibility accounts",
107
+ "1": "Generic examples that do not clearly demonstrate the sentiment category",
108
+ "0": "No examples provided"
109
+ }
110
+ },
111
+ {
112
+ "criterion": "Temporal Framing",
113
+ "weight": 0.3,
114
+ "scoring": {
115
+ "5": "Analysis is clearly bounded to the current week; notes any specific events driving sentiment shifts",
116
+ "3": "Roughly time-bounded but does not connect sentiment to specific events",
117
+ "1": "No clear time boundary; could be about any period",
118
+ "0": "Clearly outdated analysis"
119
+ }
120
+ }
121
+ ],
122
+ "expectedScoreWithout": 40,
123
+ "expectedScoreWith": 80
124
+ },
125
+ {
126
+ "id": "bench-med-01",
127
+ "difficulty": "medium",
128
+ "description": "Track narrative evolution around a policy event",
129
+ "input": "Analyze how the Twitter/X conversation around US data privacy legislation has evolved over the past month. Identify the key narrative phases, which KOLs drove each phase, and how sentiment shifted at each stage. Flag any coordinated campaign activity.",
130
+ "rubric": [
131
+ {
132
+ "criterion": "Narrative Phase Identification",
133
+ "weight": 0.3,
134
+ "scoring": {
135
+ "5": "Identifies 3+ distinct narrative phases with clear temporal boundaries and triggering events; shows how the conversation evolved from one phase to the next",
136
+ "3": "Identifies 2 phases but boundaries are unclear or triggering events are not specified",
137
+ "1": "Treats the entire month as a single undifferentiated conversation",
138
+ "0": "No temporal analysis"
139
+ }
140
+ },
141
+ {
142
+ "criterion": "KOL Attribution",
143
+ "weight": 0.25,
144
+ "scoring": {
145
+ "5": "Attributes specific KOLs to each narrative phase; shows cascade patterns (who spoke first, who amplified); credibility tiers assigned",
146
+ "3": "Names some KOLs but does not connect them to specific phases or show cascade dynamics",
147
+ "1": "Generic mention of 'influencers' without specific attribution",
148
+ "0": "No KOL analysis"
149
+ }
150
+ },
151
+ {
152
+ "criterion": "Sentiment Trajectory",
153
+ "weight": 0.25,
154
+ "scoring": {
155
+ "5": "Maps sentiment across the full month with shifts clearly linked to events; distinguishes gradual drift from sharp reversals; reports baseline and deviations",
156
+ "3": "Reports overall sentiment change but without phase-by-phase granularity",
157
+ "1": "Single sentiment label for the whole period",
158
+ "0": "No sentiment analysis"
159
+ }
160
+ },
161
+ {
162
+ "criterion": "Coordination Detection",
163
+ "weight": 0.2,
164
+ "scoring": {
165
+ "5": "Explicitly checks for coordinated campaigns; reports findings (whether coordination was detected or not) with evidence; distinguishes organic from manufactured activity",
166
+ "3": "Mentions coordination possibility but without systematic detection or evidence",
167
+ "1": "Does not address coordination at all",
168
+ "0": "Presents potentially coordinated activity as organic"
169
+ }
170
+ }
171
+ ],
172
+ "expectedScoreWithout": 25,
173
+ "expectedScoreWith": 65
174
+ },
175
+ {
176
+ "id": "bench-med-02",
177
+ "difficulty": "medium",
178
+ "description": "Competitive intelligence via Twitter KOL monitoring",
179
+ "input": "I'm launching a fintech startup focused on embedded payments. Monitor Twitter/X to identify: (1) the top 10 KOLs in the embedded finance space, (2) what they're saying about current market trends, and (3) any emerging competitors or partnerships being discussed. Provide a competitive intelligence briefing.",
180
+ "rubric": [
181
+ {
182
+ "criterion": "KOL Discovery",
183
+ "weight": 0.3,
184
+ "scoring": {
185
+ "5": "Identifies 10 relevant KOLs across tiers (industry analysts, fintech founders, VC investors, developers); includes credibility scores and relevance justification for each",
186
+ "3": "Identifies 5-7 relevant KOLs but limited to one or two categories",
187
+ "1": "Identifies fewer than 5 KOLs or includes irrelevant accounts",
188
+ "0": "No KOL discovery"
189
+ }
190
+ },
191
+ {
192
+ "criterion": "Market Trend Extraction",
193
+ "weight": 0.3,
194
+ "scoring": {
195
+ "5": "Extracts 3+ specific market trends with KOL source attribution; trends are actionable and relevant to embedded payments; includes supporting tweet evidence",
196
+ "3": "Identifies general fintech trends but not specific to embedded payments or lacking attribution",
197
+ "1": "Vague trend statements without evidence",
198
+ "0": "No trend analysis"
199
+ }
200
+ },
201
+ {
202
+ "criterion": "Competitive Signal Detection",
203
+ "weight": 0.25,
204
+ "scoring": {
205
+ "5": "Identifies specific competitors or partnerships being discussed by KOLs; provides context on competitive positioning; flags early signals of market moves",
206
+ "3": "Mentions some competitors but without context or KOL attribution",
207
+ "1": "Generic competitive landscape without Twitter-specific intelligence",
208
+ "0": "No competitive signals"
209
+ }
210
+ },
211
+ {
212
+ "criterion": "Briefing Structure",
213
+ "weight": 0.15,
214
+ "scoring": {
215
+ "5": "Follows structured briefing format with executive summary, KOL table, trend analysis, competitive signals, and recommended monitoring actions",
216
+ "3": "Partially structured with some sections missing",
217
+ "1": "Unstructured narrative",
218
+ "0": "No organization"
219
+ }
220
+ }
221
+ ],
222
+ "expectedScoreWithout": 25,
223
+ "expectedScoreWith": 65
224
+ },
225
+ {
226
+ "id": "bench-med-03",
227
+ "difficulty": "medium",
228
+ "description": "Detect and analyze bot activity around a controversial topic",
229
+ "input": "Investigate whether bot networks are amplifying specific narratives about genetically modified organisms (GMOs) on Twitter/X. Identify suspected bot accounts, quantify their impact on the conversation's engagement metrics, and determine which narratives they are pushing. Separate organic KOL opinions from bot-amplified messaging.",
230
+ "rubric": [
231
+ {
232
+ "criterion": "Bot Detection Methodology",
233
+ "weight": 0.3,
234
+ "scoring": {
235
+ "5": "Applies systematic bot detection using account-level, behavior-level, and network-level signals from domain knowledge; explains the heuristics used and their confidence",
236
+ "3": "Applies some bot detection but only at one level (e.g., account-level only) without network analysis",
237
+ "1": "Mentions bots but does not apply systematic detection",
238
+ "0": "No bot detection"
239
+ }
240
+ },
241
+ {
242
+ "criterion": "Impact Quantification",
243
+ "weight": 0.25,
244
+ "scoring": {
245
+ "5": "Reports bot contamination percentage; provides both raw and bot-filtered engagement metrics; quantifies how much bot activity inflates specific narrative visibility",
246
+ "3": "Reports presence of bots but does not quantify their impact on metrics",
247
+ "1": "Vague mention of bot activity without quantification",
248
+ "0": "No quantification"
249
+ }
250
+ },
251
+ {
252
+ "criterion": "Narrative Separation",
253
+ "weight": 0.25,
254
+ "scoring": {
255
+ "5": "Clearly separates organic KOL opinions from bot-amplified narratives; shows which narratives are genuinely held by credible voices vs. artificially boosted",
256
+ "3": "Identifies different narratives but does not clearly separate organic from amplified",
257
+ "1": "Treats all narratives equally without bot-organic distinction",
258
+ "0": "No narrative analysis"
259
+ }
260
+ },
261
+ {
262
+ "criterion": "Evidence Presentation",
263
+ "weight": 0.2,
264
+ "scoring": {
265
+ "5": "Provides specific examples of suspected bot accounts with evidence (account age, posting patterns, network connections); shows coordination patterns",
266
+ "3": "Mentions suspected accounts but evidence is thin or anecdotal",
267
+ "1": "No specific examples or evidence",
268
+ "0": "Unsupported claims"
269
+ }
270
+ }
271
+ ],
272
+ "expectedScoreWithout": 20,
273
+ "expectedScoreWith": 60
274
+ },
275
+ {
276
+ "id": "bench-med-04",
277
+ "difficulty": "medium",
278
+ "description": "Multi-language Twitter intelligence gathering",
279
+ "input": "Monitor Twitter/X discourse about renewable energy policy in both English and Spanish-speaking communities. Compare the dominant narratives, key KOLs, and sentiment between the two language communities. Identify any cross-language narrative transfer.",
280
+ "rubric": [
281
+ {
282
+ "criterion": "Multi-Language Coverage",
283
+ "weight": 0.3,
284
+ "scoring": {
285
+ "5": "Constructs separate queries for English and Spanish using lang: filters; identifies KOLs in both communities; provides analysis for each language independently",
286
+ "3": "Covers both languages but analysis is significantly deeper for one language",
287
+ "1": "Only covers one language effectively",
288
+ "0": "Single-language analysis despite multi-language request"
289
+ }
290
+ },
291
+ {
292
+ "criterion": "Cross-Community Comparison",
293
+ "weight": 0.3,
294
+ "scoring": {
295
+ "5": "Provides structured comparison of narratives, KOLs, and sentiment between communities; identifies shared vs. unique concerns; notes cultural/political context differences",
296
+ "3": "Compares sentiment but does not compare narratives or KOLs in depth",
297
+ "1": "Presents each community separately without comparison",
298
+ "0": "No comparison"
299
+ }
300
+ },
301
+ {
302
+ "criterion": "Cross-Language Transfer Detection",
303
+ "weight": 0.2,
304
+ "scoring": {
305
+ "5": "Identifies specific narratives or talking points that transferred from one language community to the other; traces the transfer timeline and amplifiers",
306
+ "3": "Notes some overlap between communities but does not trace transfer dynamics",
307
+ "1": "Does not address cross-language transfer",
308
+ "0": "Ignores the cross-language aspect entirely"
309
+ }
310
+ },
311
+ {
312
+ "criterion": "Output Quality",
313
+ "weight": 0.2,
314
+ "scoring": {
315
+ "5": "Structured briefing with side-by-side comparison tables; confidence ratings for each community; recommended monitoring for both languages",
316
+ "3": "Readable output but lacks comparative structure",
317
+ "1": "Unstructured mixed narrative",
318
+ "0": "Disorganized output"
319
+ }
320
+ }
321
+ ],
322
+ "expectedScoreWithout": 20,
323
+ "expectedScoreWith": 60
324
+ },
325
+ {
326
+ "id": "bench-hard-01",
327
+ "difficulty": "hard",
328
+ "description": "Real-time crisis monitoring and misinformation detection",
329
+ "input": "A major cybersecurity breach has just been reported at a Fortune 500 company. Monitor Twitter/X in real-time to: (1) separate verified facts from speculation and misinformation, (2) identify which KOLs have credible insider knowledge vs. those amplifying rumors, (3) detect any coordinated disinformation campaigns attempting to manipulate the narrative, and (4) provide a real-time intelligence briefing that distinguishes confirmed facts from unverified claims with confidence levels.",
330
+ "rubric": [
331
+ {
332
+ "criterion": "Fact vs. Speculation Separation",
333
+ "weight": 0.3,
334
+ "scoring": {
335
+ "5": "Explicitly categorizes each claim as confirmed/unconfirmed/speculation with evidence basis; traces claims to primary sources; flags contradictions between accounts; assigns per-claim confidence levels",
336
+ "3": "Separates some facts from speculation but categorization is inconsistent or missing evidence basis",
337
+ "1": "Presents all information with similar authority regardless of verification status",
338
+ "0": "No fact-checking or verification"
339
+ }
340
+ },
341
+ {
342
+ "criterion": "KOL Credibility Assessment",
343
+ "weight": 0.25,
344
+ "scoring": {
345
+ "5": "Assesses each KOL's likely access to insider information based on professional background, prior accuracy, and institutional affiliation; distinguishes cybersecurity experts from general tech commentators; applies credibility scoring framework",
346
+ "3": "Identifies some credible sources but does not systematically assess expertise relevance",
347
+ "1": "Treats all KOLs equally regardless of domain expertise",
348
+ "0": "No credibility assessment"
349
+ }
350
+ },
351
+ {
352
+ "criterion": "Disinformation Detection",
353
+ "weight": 0.25,
354
+ "scoring": {
355
+ "5": "Systematically checks for coordinated campaigns: synchronized posting, bot amplification, narrative manipulation; reports findings with specific evidence; quantifies disinformation contamination of the overall conversation",
356
+ "3": "Notes potential disinformation but without systematic detection or quantification",
357
+ "1": "Does not address disinformation possibility",
358
+ "0": "Presents disinformation as credible intelligence"
359
+ }
360
+ },
361
+ {
362
+ "criterion": "Real-Time Briefing Quality",
363
+ "weight": 0.2,
364
+ "scoring": {
365
+ "5": "Structured briefing with clear timestamps, fact/speculation labels on each item, evolving confidence ratings, and recommended next monitoring actions; suitable for decision-makers",
366
+ "3": "Useful briefing but lacks timestamps or evolving confidence tracking",
367
+ "1": "Static summary without real-time structure",
368
+ "0": "Unusable output"
369
+ }
370
+ }
371
+ ],
372
+ "expectedScoreWithout": 15,
373
+ "expectedScoreWith": 55
374
+ },
375
+ {
376
+ "id": "bench-hard-02",
377
+ "difficulty": "hard",
378
+ "description": "Geopolitical narrative tracking across multiple stakeholder groups",
379
+ "input": "Track the Twitter/X discourse around US-China technology competition, specifically regarding semiconductor export controls. Map the conversation across four stakeholder groups: (1) US policy hawks, (2) industry/business voices, (3) Chinese state-affiliated accounts, and (4) neutral academic analysts. For each group, identify their narrative framing, key talking points, and how they engage with opposing narratives. Detect any state-sponsored information operations.",
380
+ "rubric": [
381
+ {
382
+ "criterion": "Stakeholder Group Mapping",
383
+ "weight": 0.25,
384
+ "scoring": {
385
+ "5": "Successfully identifies and separates accounts into all 4 stakeholder groups with clear classification criteria; accounts are correctly attributed; includes 3+ KOLs per group",
386
+ "3": "Maps 2-3 groups but classification is imprecise or one group is missing",
387
+ "1": "Treats the conversation as monolithic without stakeholder segmentation",
388
+ "0": "No stakeholder mapping"
389
+ }
390
+ },
391
+ {
392
+ "criterion": "Narrative Framing Analysis",
393
+ "weight": 0.25,
394
+ "scoring": {
395
+ "5": "For each group: identifies the core narrative frame (e.g., national security vs. free trade vs. sovereign rights), key talking points, and rhetorical strategies; shows how frames differ and where they conflict",
396
+ "3": "Identifies general positions but does not analyze framing strategies or rhetorical differences",
397
+ "1": "Surface-level position summary without framing analysis",
398
+ "0": "No narrative analysis"
399
+ }
400
+ },
401
+ {
402
+ "criterion": "Cross-Group Engagement Analysis",
403
+ "weight": 0.25,
404
+ "scoring": {
405
+ "5": "Maps how groups engage with each other: quote tweets, reply patterns, counter-narratives; identifies which groups are talking past each other vs. directly engaging; shows information flow between groups",
406
+ "3": "Notes some cross-group interaction but analysis is shallow",
407
+ "1": "Analyzes each group in isolation without cross-group dynamics",
408
+ "0": "No engagement analysis"
409
+ }
410
+ },
411
+ {
412
+ "criterion": "State-Sponsored Detection",
413
+ "weight": 0.25,
414
+ "scoring": {
415
+ "5": "Applies systematic detection for state-affiliated accounts (official labels, behavior patterns, coordination signals); distinguishes organic vs. state-directed messaging; provides evidence for assessments; quantifies state-sponsored content share",
416
+ "3": "Identifies some state-affiliated accounts but detection is not systematic",
417
+ "1": "Mentions state involvement without evidence",
418
+ "0": "Ignores state-sponsored operations entirely"
419
+ }
420
+ }
421
+ ],
422
+ "expectedScoreWithout": 15,
423
+ "expectedScoreWith": 55
424
+ },
425
+ {
426
+ "id": "bench-hard-03",
427
+ "difficulty": "hard",
428
+ "description": "Predictive intelligence from early Twitter signals",
429
+ "input": "Based on current Twitter/X signals, identify 3 emerging technology trends that have not yet reached mainstream media coverage but are gaining traction among Micro-KOLs and Mid-KOLs in the tech space. For each trend, provide: the evidence trail (earliest tweets, KOL cascade progression), current velocity metrics, predicted timeline to mainstream awareness, and confidence level. Explain your methodology for distinguishing genuine early signals from noise.",
430
+ "rubric": [
431
+ {
432
+ "criterion": "Early Signal Detection",
433
+ "weight": 0.3,
434
+ "scoring": {
435
+ "5": "Identifies 3 genuine emerging trends with clear evidence they originated in Micro/Mid-KOL circles; provides earliest tweet timestamps and shows the signal before mainstream pickup; trends are plausible and specific",
436
+ "3": "Identifies 2 trends but evidence trail is incomplete or one trend is already mainstream",
437
+ "1": "Identifies already-mainstream topics or vague trend categories",
438
+ "0": "No genuine early signal detection"
439
+ }
440
+ },
441
+ {
442
+ "criterion": "Evidence Quality & Cascade Analysis",
443
+ "weight": 0.25,
444
+ "scoring": {
445
+ "5": "For each trend: provides specific tweet evidence, shows KOL cascade progression with timestamps, maps the spread across KOL tiers; evidence is verifiable and attributed",
446
+ "3": "Some evidence provided but cascade analysis is incomplete or evidence is anecdotal",
447
+ "1": "Claims without supporting evidence or timeline",
448
+ "0": "No evidence"
449
+ }
450
+ },
451
+ {
452
+ "criterion": "Predictive Assessment",
453
+ "weight": 0.25,
454
+ "scoring": {
455
+ "5": "Provides specific timeline predictions for mainstream awareness with justified reasoning; includes velocity metrics, comparisons to historical trend patterns, and explicit uncertainty bounds",
456
+ "3": "General timeline prediction without quantitative basis or uncertainty bounds",
457
+ "1": "No predictive element — only describes current state",
458
+ "0": "No prediction attempted"
459
+ }
460
+ },
461
+ {
462
+ "criterion": "Methodology Transparency",
463
+ "weight": 0.2,
464
+ "scoring": {
465
+ "5": "Clearly explains the signal-vs-noise methodology: which heuristics were used, how false positives were filtered, what thresholds were applied, and what the limitations are",
466
+ "3": "Mentions methodology at a high level but lacks specifics",
467
+ "1": "No methodology explanation — trends appear to be selected arbitrarily",
468
+ "0": "No methodology"
469
+ }
470
+ }
471
+ ],
472
+ "expectedScoreWithout": 15,
473
+ "expectedScoreWith": 55
474
+ }
475
+ ]
476
+ }
@@ -0,0 +1,54 @@
1
+ {
2
+ "version": "0.0.1",
3
+ "timeout": 60,
4
+ "tasks": [
5
+ {
6
+ "id": "smoke-01",
7
+ "description": "Track KOL opinions on an emerging technology topic with bot detection and trend context",
8
+ "input": "Monitor Twitter/X for key opinion leaders discussing the impact of new EU AI regulation on the tech startup ecosystem. Identify the dominant narratives, track which KOLs are driving the conversation, and flag any bot-driven amplification. I need a structured intelligence briefing with confidence ratings.",
9
+ "rubric": [
10
+ {
11
+ "criterion": "Source Curation & KOL Identification",
12
+ "weight": 0.25,
13
+ "scoring": {
14
+ "5": "Identifies 5+ relevant KOLs across multiple tiers (tech policy experts, startup founders, regulatory analysts); constructs targeted queries using from: operators and topic keywords; sets appropriate temporal window",
15
+ "3": "Identifies 2-3 relevant accounts but misses important KOL tiers or uses overly broad queries",
16
+ "1": "Generic keyword search with no specific KOL targeting or source curation",
17
+ "0": "No source curation attempted"
18
+ }
19
+ },
20
+ {
21
+ "criterion": "Signal Filtering & Bot Detection",
22
+ "weight": 0.25,
23
+ "scoring": {
24
+ "5": "Applies multi-layer filtering (bot removal, deduplication, relevance scoring); explicitly checks for bot amplification signals; reports both raw and filtered metrics; flags inauthenticity risks",
25
+ "3": "Some filtering applied but bot detection is incomplete or not quantified",
26
+ "1": "Minimal filtering; treats all engagement as organic",
27
+ "0": "No filtering or bot detection"
28
+ }
29
+ },
30
+ {
31
+ "criterion": "Opinion & Trend Analysis",
32
+ "weight": 0.25,
33
+ "scoring": {
34
+ "5": "Extracts distinct stance clusters (supporters, critics, neutrals) with attributed KOL positions; detects trend velocity and sentiment direction; accounts for sarcasm and tone; includes temporal context",
35
+ "3": "Identifies general sentiment but lacks stance clustering or temporal analysis; tone detection is basic",
36
+ "1": "Surface-level sentiment without stance attribution or trend context",
37
+ "0": "No opinion or trend analysis"
38
+ }
39
+ },
40
+ {
41
+ "criterion": "Output Structure & Confidence",
42
+ "weight": 0.25,
43
+ "scoring": {
44
+ "5": "Follows structured briefing format: executive summary, key findings with source attribution, KOL positions table, trend metrics, bot assessment, confidence rating with justification, and recommended actions",
45
+ "3": "Partially structured output with some elements missing (e.g., no confidence rating or no recommended actions)",
46
+ "1": "Unstructured narrative summary without clear sections or source attribution",
47
+ "0": "Raw data dump with no organization"
48
+ }
49
+ }
50
+ ],
51
+ "passThreshold": 60
52
+ }
53
+ ]
54
+ }