npm - @botlearn/reddit-tracker - Versions diffs - 0.1.0 - Mend

@botlearn/reddit-tracker 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

package/LICENSE +21 -0
package/README.md +35 -0
package/knowledge/anti-patterns.md +72 -0
package/knowledge/best-practices.md +124 -0
package/knowledge/domain.md +140 -0
package/manifest.json +26 -0
package/package.json +35 -0
package/skill.md +43 -0
package/strategies/main.md +117 -0
package/tests/benchmark.json +486 -0
package/tests/smoke.json +54 -0

package/LICENSE ADDED Viewed

@@ -0,0 +1,21 @@
+MIT License
+Copyright (c) 2025 BotLearn
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

package/README.md ADDED Viewed

@@ -0,0 +1,35 @@
+# @botlearn/reddit-tracker
+> Reddit community monitoring, engagement velocity tracking, cross-subreddit correlation, and early trend detection for OpenClaw Agent
+## Installation
+```bash
+# via npm
+npm install @botlearn/reddit-tracker
+# via clawhub
+clawhub install @botlearn/reddit-tracker
+```
+## Category
+Information Retrieval
+## Dependencies
+None
+## Files
+| File | Description |
+|------|-------------|
+| `manifest.json` | Skill metadata and configuration |
+| `skill.md` | Role definition and activation rules |
+| `knowledge/` | Domain knowledge documents |
+| `strategies/` | Behavioral strategy definitions |
+| `tests/` | Smoke and benchmark tests |
+## License
+MIT

package/knowledge/anti-patterns.md ADDED Viewed

@@ -0,0 +1,72 @@
+---
+domain: reddit-tracker
+topic: anti-patterns
+priority: medium
+ttl: 30d
+---
+# Reddit Tracker — Anti-Patterns
+## Vote & Score Anti-Patterns
+### 1. Vote Manipulation Blindness
+- **Problem**: Treating upvote scores at face value without accounting for vote manipulation, brigading, or bot networks; a post with 10K upvotes may have been artificially boosted
+- **Fix**: Cross-validate score with independent signals — comment quality and diversity, account age distribution of commenters, upvote-to-unique-commenter ratio. A genuinely popular post has a diverse, organic commenter base; a manipulated one has sparse or formulaic comments relative to its score
+### 2. Conflating Upvotes with Quality
+- **Problem**: Assuming high-score posts represent high-quality or important information; Reddit's upvote system rewards early timing, emotional triggers, and community in-jokes more than accuracy or significance
+- **Fix**: Evaluate post quality independently of score — check if claims are sourced, if expert commenters corroborate or challenge the post, and whether the upvote ratio aligns with the comment sentiment. A post with 5K upvotes but a top comment debunking it is a misinformation signal, not a quality signal
+### 3. Ignoring Vote Fuzzing
+- **Problem**: Relying on exact upvote/downvote counts for precision analysis; Reddit deliberately fuzzes vote counts to prevent manipulation detection, so absolute numbers are approximate
+- **Fix**: Use `upvote_ratio` and relative rank changes instead of absolute score changes. Track a post's position in the subreddit listing over time rather than its raw score. Velocity trends are reliable even when individual data points are fuzzed
+### 4. Score Snapshot Fallacy
+- **Problem**: Measuring a post's score at a single point in time and drawing conclusions; a post at 500 points could be rising rapidly toward 5K or slowly decaying from 2K
+- **Fix**: Always sample engagement metrics at multiple time points to calculate velocity. A minimum of 3 data points over 30+ minutes is required to establish a reliable trend direction
+## Trend Detection Anti-Patterns
+### 5. Single-Subreddit Myopia
+- **Problem**: Declaring a topic as "trending on Reddit" based on activity in only one subreddit; a topic trending in r/technology may be completely unknown in broader Reddit
+- **Fix**: Always check at least 3-5 related subreddits before classifying a trend as cross-community. Use the cross-subreddit spread score from best-practices.md. Qualify all trend claims with scope — "trending in r/technology" is different from "trending across Reddit"
+### 6. Repost Conflation
+- **Problem**: Counting reposts and duplicates as separate trend signals; the same content reposted across subreddits by the same user or bot network inflates apparent spread
+- **Fix**: Deduplicate by URL, image hash, and text similarity before counting cross-subreddit appearances. Check if the poster accounts are related (similar names, creation dates, post histories). A genuine trend has diverse original posters, not one account spamming copies
+### 7. Survivorship Bias in Trend Analysis
+- **Problem**: Only studying posts that reached the front page and generalizing their patterns; for every front-page post, hundreds had similar early signals but failed to break out
+- **Fix**: Maintain a baseline of posts that showed early breakout signals but did NOT reach the front page. Calculate precision and recall — what fraction of flagged posts actually broke out, and what fraction of actual breakouts were detected early? Tune thresholds based on both
+### 8. Ignoring Megathread Absorption
+- **Problem**: Missing that a topic's individual posts are being removed by moderators who consolidate discussion into a megathread; this makes the topic appear to have declining post volume when it actually has concentrated, high-volume discussion
+- **Fix**: Monitor for megathread creation (pinned/stickied posts with "Megathread" or "Discussion Thread" in the title). When a megathread exists, measure trend engagement through the megathread's comment volume and velocity, not through individual post counts
+## Community & Context Anti-Patterns
+### 9. Ignoring Subreddit Culture
+- **Problem**: Applying the same engagement thresholds and interpretation rules to all subreddits; r/science has strict sourcing norms while r/memes rewards absurdity — the same engagement patterns mean completely different things
+- **Fix**: Build per-subreddit behavioral profiles. Calibrate velocity thresholds, sentiment interpretation, and content quality signals relative to each community's baseline. A 50-comment discussion in r/AskHistorians is exceptional; the same count in r/AskReddit is negligible
+### 10. Bot & Karma Farm Blindness
+- **Problem**: Including engagement from bot accounts and karma farming operations in trend calculations; these accounts artificially inflate metrics and can create phantom trends
+- **Fix**: Flag and discount engagement from accounts that match bot patterns: account age < 30 days + high post frequency + identical comment patterns + posts exclusively to karma-farming subreddits. Weight signals from accounts with established, diverse histories more heavily
+### 11. Astroturfing Detection Failure
+- **Problem**: Failing to recognize coordinated inauthentic behavior — marketing campaigns, political operations, or corporate PR disguised as organic community interest
+- **Fix**: Check for: (1) multiple accounts posting about the same product/topic within a short window with suspiciously positive framing, (2) comments that read like ad copy or talking points, (3) accounts with histories that show sudden topic pivots, (4) submission timing patterns that suggest coordination (e.g., 5 posts about the same startup within 10 minutes from accounts in different subreddits)
+### 12. Timezone Ignorance
+- **Problem**: Interpreting low engagement during off-peak hours as lack of interest; a post submitted at 3 AM EST to a US-centric subreddit will naturally underperform regardless of topic quality
+- **Fix**: Always normalize engagement by the subreddit's hourly activity baseline. Compare a post's velocity against other posts submitted in the same time window, not against the subreddit's all-time averages. Flag posts that show strong velocity during off-peak hours as especially significant
+## Output Anti-Patterns
+### 13. Trend Report Without Confidence Levels
+- **Problem**: Presenting trend detections as binary (trending / not trending) without indicating confidence or supporting evidence strength
+- **Fix**: Every trend detection should include: confidence level (high/medium/low), number of confirming signals, time horizon for the prediction, and explicit list of evidence supporting the classification
+### 14. Missing Temporal Context
+- **Problem**: Reporting a trend without specifying its lifecycle phase — is it emerging, peaking, or decaying? A trend already at peak is not actionable for early detection
+- **Fix**: Always classify the trend's current phase (Seed, Ignition, Surge, Peak, Decay) and provide estimated time to peak or time since peak. Include the velocity curve direction (accelerating, stable, decelerating)

package/knowledge/best-practices.md ADDED Viewed

@@ -0,0 +1,124 @@
+---
+domain: reddit-tracker
+topic: velocity-trend-detection-cross-subreddit-correlation
+priority: high
+ttl: 30d
+---
+# Reddit Tracker — Best Practices
+## Velocity-Based Trend Detection
+### 1. Engagement Velocity Definition
+Engagement velocity measures the rate of change in a post's engagement metrics over time, not the absolute values:
+- **Upvote velocity** — Score change per minute: `(score_t2 - score_t1) / (t2 - t1)`
+- **Comment velocity** — New comments per minute over a sliding window
+- **Award velocity** — Awards received per hour (early awards are a strong signal)
+- **Crosspost velocity** — Rate at which the post is crossposted to other subreddits
+### 2. Velocity Curve Phases
+Every breakout post follows a characteristic velocity curve:
+| Phase | Time from Post | Velocity Behavior | Detection Signal |
+|-------|---------------|-------------------|-----------------|
+| Seed | 0-15 min | Low, erratic | Not yet detectable |
+| Ignition | 15-60 min | Sharp acceleration | **Primary detection window** |
+| Surge | 1-4 hours | Sustained high velocity | Confirmed breakout |
+| Peak | 4-12 hours | Velocity plateau then decline | Maximum reach |
+| Decay | 12-48 hours | Declining velocity | Trend exhaustion |
+### 3. Velocity Normalization
+Raw velocity must be normalized to produce comparable signals:
+- **By subreddit size**: Divide by subscriber count — 50 upvotes/min in a 10K sub is more significant than in a 10M sub
+- **By time-of-day**: Compare against the subreddit's historical hourly engagement baseline
+- **By day-of-week**: Weekend vs weekday patterns differ significantly for many communities
+- **By post type**: Image posts typically accelerate faster than text posts in the same subreddit
+### 4. Breakout Detection Threshold
+A post is flagged as a potential breakout when:
+- Upvote velocity exceeds 2x the subreddit's 90th-percentile velocity for its post age
+- Comment velocity exceeds 3x the median for the subreddit within the first hour
+- OR the post receives 2+ awards within the first 30 minutes in a community where awards are rare
+### 5. Multi-Metric Confirmation
+Never rely on a single velocity metric. Confirm with at least two:
+- High upvote velocity + high comment velocity = strong engagement (likely genuine trend)
+- High upvote velocity + low comment velocity = passive consumption (may be meme/image virality, not discussion-worthy trend)
+- Low upvote velocity + high comment velocity = controversial or niche discussion (check controversial flag)
+## Cross-Subreddit Correlation
+### 1. Same-Topic Detection
+Identify when the same topic emerges across multiple independent subreddits:
+- **URL matching** — Same link posted to different subreddits (crosspost or independent submission)
+- **Keyword clustering** — Same key terms appearing in titles across different communities within a 6-hour window
+- **Entity co-occurrence** — Same named entities (people, companies, products) surfacing in unrelated subreddits
+- **Semantic similarity** — Post titles or bodies with high cosine similarity across communities
+### 2. Cross-Community Spread Score
+Calculate a spread score to quantify how broadly a topic has penetrated:
+```
+spread_score = (num_subreddits / expected_subreddits) * avg_normalized_velocity * diversity_factor
+```
+- `num_subreddits` — Count of distinct subreddits where the topic appeared
+- `expected_subreddits` — Baseline expectation based on the topic domain (tech news may naturally span 3-5 subreddits)
+- `avg_normalized_velocity` — Average engagement velocity across all subreddits (normalized per community)
+- `diversity_factor` — Higher when subreddits span different categories (e.g., both r/technology and r/stocks discussing the same company)
+### 3. Origin Tracing
+Identify where a trend started to understand its trajectory:
+- Sort all related posts by `created_utc` — the earliest post is likely the origin
+- Check if the origin subreddit is a known "incubator" community (e.g., niche hobby subs often incubate trends before they reach mega-subs)
+- Track the crosspost chain to map the exact spread path
+### 4. Propagation Pattern Classification
+| Pattern | Description | Significance |
+|---------|------------|--------------|
+| Hub-and-Spoke | One mega-sub post spawns crossposts | Top-down virality; already mainstream |
+| Grassroots | Multiple small subs independently discover topic | Organic emergence; high prediction value |
+| Cascade | Topic hops through communities sequentially | Building momentum; time-sensitive |
+| Synchronized | Same topic appears simultaneously in unrelated subs | External event trigger (news, product launch) |
+## Temporal Analysis Best Practices
+### 1. Peak Activity Windows
+Every subreddit has characteristic activity patterns:
+- US-centric subs peak 9 AM - 12 PM EST on weekdays
+- Global subs have multiple peaks across time zones
+- Gaming subs peak evenings and weekends
+- Finance subs spike at market open/close and during events
+### 2. Anomaly Detection via Temporal Baseline
+- Build a 30-day rolling baseline of hourly post and comment volume per subreddit
+- Flag any hour where volume exceeds 2 standard deviations above the baseline
+- Volume spikes during off-peak hours are particularly significant
+### 3. Trend Timing Prediction
+To predict when a trend will peak:
+- Measure the velocity curve slope during the Ignition phase
+- Compare against historical breakout curves for the same subreddit
+- Apply time-of-day correction — a post entering Surge phase during a subreddit's peak hours will peak faster
+## Sentiment Signal Extraction
+### 1. Comment Sentiment Distribution
+For any trending post, analyze the comment section:
+- **Positive / Negative / Neutral** ratio in top-level comments
+- **Controversial flag** density — high density indicates polarizing topic
+- **Sentiment shift** — Compare early comments vs. later comments to detect narrative evolution
+### 2. Community Reaction Markers
+- Top comment agreeing with post = community endorsement
+- Top comment contradicting post = community skepticism (even if post score is high)
+- Pinned moderator comment = topic requires community governance attention
+- Post locked by mods = extreme engagement or rule violations
+### 3. Engagement Quality Tiers
+| Tier | Indicator | Meaning |
+|------|-----------|---------|
+| Deep | Long comments, citations, debate threads | Genuine interest and expertise |
+| Reactive | Short affirmative comments, emojis, memes | Viral moment but shallow engagement |
+| Hostile | Insults, reports, mod intervention | Controversial topic, unreliable sentiment signal |
+| Astroturfed | Identical phrasing, new accounts, coordinated timing | Inauthentic engagement — exclude from analysis |

package/knowledge/domain.md ADDED Viewed

@@ -0,0 +1,140 @@
+---
+domain: reddit-tracker
+topic: reddit-api-scoring-karma-post-types
+priority: high
+ttl: 30d
+---
+# Reddit Platform — API, Scoring, Karma & Post Types
+## Reddit Data API
+### Authentication
+- Reddit uses OAuth2 for API access; all requests require a bearer token
+- Rate limit: 100 requests per minute per OAuth client (600/min for mod accounts)
+- User-Agent header is mandatory — requests without it are deprioritized or blocked
+- App types: `script` (personal), `web` (3rd-party), `installed` (mobile/desktop)
+### Key Endpoints for Monitoring
+#### Subreddit Listings
+- `GET /r/{subreddit}/hot` — Posts ranked by Reddit's "hot" algorithm (recency + engagement)
+- `GET /r/{subreddit}/new` — Chronological newest posts (critical for early detection)
+- `GET /r/{subreddit}/rising` — Posts gaining traction faster than peers (best signal for breakouts)
+- `GET /r/{subreddit}/top?t={hour|day|week}` — Highest-scoring posts within a time window
+- `GET /r/{subreddit}/controversial?t={hour|day|week}` — Posts with high engagement but split up/down ratio
+#### Post & Comment Data
+- `GET /comments/{article_id}` — Full comment tree for a post
+- `GET /r/{subreddit}/comments` — Latest comments across the subreddit (firehose)
+- Each post object includes: `score`, `upvote_ratio`, `num_comments`, `created_utc`, `gilded`, `total_awards_received`, `is_crosspost`
+#### Search
+- `GET /r/{subreddit}/search?q={query}&sort=new&restrict_sr=true` — Subreddit-scoped search
+- `GET /search?q={query}&sort=relevance&t=day` — Site-wide search with time filter
+- Supports Lucene-style syntax: `title:keyword`, `selftext:keyword`, `author:username`, `flair:tag`
+### Pagination
+- Reddit uses cursor-based pagination with `after` and `before` fullname tokens
+- Maximum 100 items per request (default 25)
+- Listings capped at ~1000 items total — cannot paginate beyond that
+## Reddit Scoring System
+### Vote Score
+- `score = upvotes - downvotes` (approximate — Reddit applies vote fuzzing)
+- `upvote_ratio` — Fraction of votes that are upvotes (0.0 to 1.0)
+- Vote fuzzing: Reddit adds/subtracts random votes to obscure true counts; ratios are more reliable than absolute scores
+### Hot Ranking Algorithm
+Reddit's hot ranking combines score magnitude with recency:
+```
+hot_score = log10(max(|score|, 1)) + sign(score) * (created_utc - epoch) / 45000
+```
+- Logarithmic score: the first 10 votes matter as much as the next 100
+- Time decay: a post needs exponentially more votes to maintain rank as it ages
+- Epoch reference: Reddit uses a fixed epoch (December 8, 2005)
+### Best Ranking (Comments)
+- Uses Wilson score confidence interval — favors comments with high upvote ratio AND sufficient sample size
+- A comment with 10 up / 1 down outranks one with 100 up / 50 down
+### Controversy Score
+- `controversy = (upvotes + downvotes) / max(upvotes, downvotes)` when both > 0
+- Higher controversy score = more evenly split votes
+- Controversial posts show a dagger icon (†) on old Reddit
+## Karma System
+### Post Karma
+- Earned from upvotes on link posts and text posts
+- Not 1:1 with score — Reddit applies diminishing returns (approximately logarithmic)
+- Subreddit-specific karma thresholds gate posting privileges in many communities
+### Comment Karma
+- Earned from upvotes on comments
+- Also subject to diminishing returns at scale
+- Many subreddits require minimum comment karma for participation
+### Award Karma
+- Earned when posts/comments receive premium awards
+- Different awards grant different karma amounts
+- Awards visible as icons on the post — a signal of high perceived value
+### Karma Relevance to Trend Detection
+- High-karma accounts posting about a topic is a stronger signal than low-karma accounts
+- Sudden karma spikes on a topic across multiple users indicate organic interest
+- Accounts with zero/very low karma posting identical content suggest coordinated campaigns
+## Post Types
+### Link Posts
+- External URL submissions — the classic Reddit post type
+- Engagement measured by: score, upvote_ratio, num_comments, crossposts
+- A link appearing in multiple subreddits simultaneously is a strong breakout signal
+### Text Posts (Self Posts)
+- User-written content with a title and body (supports Markdown)
+- Often longer-form discussion starters in niche communities
+- Text posts that generate high comment-to-score ratios indicate active discussion
+### Image & Video Posts
+- Hosted on Reddit's own media infrastructure (i.redd.it, v.redd.it)
+- Image posts in meme-oriented subreddits can go viral in under an hour
+- Video view counts are available but not exposed in the standard API
+### Crossposts
+- A post shared from one subreddit to another, maintaining a link to the original
+- Crosspost chains are a primary indicator of content spreading across communities
+- `is_crosspost: true` and `crosspost_parent` fields in the post object
+### Polls
+- Reddit-native poll posts — votes are anonymized and results visible after voting
+- High participation polls indicate community engagement with the topic
+### Live Threads & Talk Posts
+- Real-time discussion formats for breaking events
+- Creation of a live thread for a topic signals perceived significance
+## Subreddit Metadata
+### Key Fields for Monitoring
+- `subscribers` — Total subscriber count (used for normalizing engagement)
+- `active_user_count` — Users online now (available via `GET /r/{subreddit}/about`)
+- `created_utc` — Subreddit creation date (older = more established baseline)
+- `public_description` — Community self-description and scope
+### Subreddit Size Tiers
+| Tier | Subscribers | Characteristics |
+|------|------------|-----------------|
+| Mega | >10M | Front-page feeder; high noise, fast velocity |
+| Large | 1M-10M | Established communities; reliable trend signals |
+| Medium | 100K-1M | Niche but active; early signals before mega-subs |
+| Small | 10K-100K | Specialist communities; high signal-to-noise for domain topics |
+| Micro | <10K | Very niche; useful for domain-expert sentiment only |
+### Flair System
+- Post flairs categorize content within a subreddit (e.g., "Discussion", "News", "OC")
+- User flairs indicate community standing or expertise
+- Flair-based filtering: `GET /r/{subreddit}/search?q=flair:News` — narrows monitoring to specific content types

package/manifest.json ADDED Viewed

@@ -0,0 +1,26 @@
+{
+  "name": "@botlearn/reddit-tracker",
+  "version": "0.1.0",
+  "description": "Reddit community monitoring, engagement velocity tracking, cross-subreddit correlation, and early trend detection for OpenClaw Agent",
+  "category": "information-retrieval",
+  "author": "BotLearn",
+  "benchmarkDimension": "information-retrieval",
+  "expectedImprovement": 30,
+  "dependencies": {},
+  "compatibility": {
+    "openclaw": ">=0.5.0"
+  },
+  "files": {
+    "skill": "skill.md",
+    "knowledge": [
+      "knowledge/domain.md",
+      "knowledge/best-practices.md",
+      "knowledge/anti-patterns.md"
+    ],
+    "strategies": [
+      "strategies/main.md"
+    ],
+    "smokeTest": "tests/smoke.json",
+    "benchmark": "tests/benchmark.json"
+  }
+}

package/package.json ADDED Viewed

@@ -0,0 +1,35 @@
+{
+  "name": "@botlearn/reddit-tracker",
+  "version": "0.1.0",
+  "description": "Reddit community monitoring, engagement velocity tracking, cross-subreddit correlation, and early trend detection for OpenClaw Agent",
+  "type": "module",
+  "main": "manifest.json",
+  "files": [
+    "manifest.json",
+    "skill.md",
+    "knowledge/",
+    "strategies/",
+    "tests/",
+    "README.md"
+  ],
+  "keywords": [
+    "botlearn",
+    "openclaw",
+    "skill",
+    "information-retrieval"
+  ],
+  "author": "BotLearn",
+  "license": "MIT",
+  "repository": {
+    "type": "git",
+    "url": "https://github.com/readai-team/botlearn-awesome-skills.git",
+    "directory": "packages/skills/reddit-tracker"
+  },
+  "homepage": "https://github.com/readai-team/botlearn-awesome-skills/tree/main/packages/skills/reddit-tracker",
+  "bugs": {
+    "url": "https://github.com/readai-team/botlearn-awesome-skills/issues"
+  },
+  "publishConfig": {
+    "access": "public"
+  }
+}

package/skill.md ADDED Viewed

@@ -0,0 +1,43 @@
+---
+name: reddit-tracker
+role: Reddit Trend Analyst
+version: 1.0.0
+triggers:
+  - "reddit"
+  - "subreddit"
+  - "trending"
+  - "community"
+  - "hotspot"
+  - "reddit trends"
+  - "what's hot on reddit"
+---
+# Role
+You are a Reddit Trend Analyst. When activated, you monitor subreddit activity, track engagement velocity, correlate signals across communities, and predict emerging trends up to 24 hours before they peak on Reddit's front page.
+# Capabilities
+1. Monitor targeted subreddits for rising posts by tracking upvote velocity, comment acceleration, and award density within configurable time windows
+2. Detect emerging trends by computing engagement velocity curves and comparing them against historical breakout patterns for the target subreddit
+3. Correlate cross-subreddit signals — identify when the same topic, URL, or narrative surfaces independently in multiple communities simultaneously
+4. Analyze community sentiment by evaluating comment tone distribution, controversial-flag ratios, and top-comment polarity within trending threads
+5. Predict trend trajectories by combining velocity metrics, cross-community spread rate, and temporal posting patterns to forecast peak timing and reach
+# Constraints
+1. Never treat raw upvote counts as a reliable quality or trend signal — always normalize by subreddit size, post age, and historical baseline
+2. Never ignore Reddit's vote fuzzing — reported scores are approximate; rely on velocity and rank changes rather than absolute numbers
+3. Never conflate karma farming or repost surges with organic trend emergence — check for duplicate URLs, bot account patterns, and artificial award clustering
+4. Always distinguish between subreddit-local trends (only relevant within a niche community) and cross-community breakouts (genuine broad interest)
+5. Always account for time-zone posting patterns — a post's velocity must be interpreted relative to the subreddit's peak-activity hours
+# Activation
+WHEN the user requests Reddit monitoring, trend detection, or community analysis:
+1. Identify the target subreddits, topics, or keywords from the user's request
+2. Execute the monitoring strategy from strategies/main.md
+3. Apply Reddit platform knowledge from knowledge/domain.md for correct API usage and scoring interpretation
+4. Evaluate signals using velocity and correlation methods from knowledge/best-practices.md
+5. Verify findings against known pitfalls in knowledge/anti-patterns.md
+6. Output a trend report with confidence scores, predicted peak timing, and supporting evidence

package/strategies/main.md ADDED Viewed

@@ -0,0 +1,117 @@
+---
+strategy: reddit-tracker
+version: 1.0.0
+steps: 6
+---
+# Reddit Tracker Strategy
+## Step 1: Target Identification & Scope Definition
+- Parse the user's request to identify: **target subreddits**, **topics/keywords**, **time horizon**, and **desired output** (trend report, sentiment summary, breakout alerts)
+- IF specific subreddits are named THEN add them to the watch list directly
+- IF only a topic or domain is given THEN identify the top 3-5 relevant subreddits by:
+  - Searching Reddit for the topic and noting which subreddits surface most frequently
+  - Checking subreddit recommendation resources and community directories
+  - Including both mega-subs (broad reach) and niche subs (early signal) for the topic
+- Determine monitoring mode:
+  - **Snapshot** — One-time scan of current state (default if user asks "what's trending")
+  - **Watch** — Continuous monitoring over a specified time window (if user asks "alert me" or "track")
+- Record each subreddit's subscriber count and current active user count as baseline context from knowledge/domain.md
+## Step 2: Data Collection & Velocity Tracking
+- For each target subreddit, collect posts from three listing endpoints:
+  - `/rising` — Primary signal source for early breakout detection
+  - `/new` — Catch posts in the Seed phase that rising has not yet surfaced
+  - `/hot` — Baseline for what the community's algorithm already considers trending
+- For each collected post, record the initial engagement snapshot:
+  - `score`, `upvote_ratio`, `num_comments`, `total_awards_received`, `created_utc`, `is_crosspost`
+- IF monitoring mode is Watch THEN sample again after 15 minutes and 60 minutes to calculate velocity:
+  - Upvote velocity: `(score_t2 - score_t1) / elapsed_minutes`
+  - Comment velocity: `(comments_t2 - comments_t1) / elapsed_minutes`
+  - Award velocity: awards received per hour
+- Normalize all velocities using subreddit size and time-of-day baselines from knowledge/best-practices.md
+- Flag posts that exceed breakout detection thresholds:
+  - Upvote velocity > 2x the subreddit's 90th-percentile for the post's age
+  - Comment velocity > 3x the subreddit's median within the first hour
+  - Awards > 2 within the first 30 minutes (in communities where awards are uncommon)
+## Step 3: Cross-Community Correlation
+- Aggregate flagged posts and rising content across all monitored subreddits
+- Detect cross-subreddit topic convergence using:
+  - **URL deduplication** — Same link appearing in multiple subreddits (crosspost or independent)
+  - **Keyword overlap** — Same key terms in post titles across different communities within a 6-hour window
+  - **Entity matching** — Same named entities (people, companies, events) surfacing independently
+- For each correlated topic, calculate the cross-community spread score:
+  - `spread_score = (num_subreddits / expected_subreddits) * avg_normalized_velocity * diversity_factor`
+  - `diversity_factor` increases when subreddits span different categories
+- Classify the propagation pattern from knowledge/best-practices.md:
+  - Hub-and-Spoke (top-down from mega-sub)
+  - Grassroots (independent emergence in small subs — highest prediction value)
+  - Cascade (sequential community hopping)
+  - Synchronized (simultaneous appearance suggesting external trigger)
+- IF a Grassroots pattern is detected THEN elevate the trend's priority — these are the signals that predict breakouts 24 hours early
+## Step 4: Sentiment Analysis & Discussion Quality
+- For each flagged trending post, analyze the comment section:
+  - Sample top 20 comments by "best" ranking (Wilson score)
+  - Classify each comment's sentiment: positive, negative, neutral, mixed
+  - Calculate the sentiment distribution ratio for the thread
+- Assess discussion quality using engagement quality tiers from knowledge/best-practices.md:
+  - **Deep** — Long comments, citations, debate threads (genuine interest)
+  - **Reactive** — Short affirmations, emojis, memes (viral but shallow)
+  - **Hostile** — Insults, reports, mod intervention (controversial, unreliable signal)
+  - **Astroturfed** — Identical phrasing, new accounts, coordinated timing (exclude from analysis)
+- Check for community reaction markers:
+  - Top comment alignment with post (endorsement vs. contradiction)
+  - Moderator intervention (pinned comments, post locks, flair changes)
+  - Controversial flag density in the comment tree
+- IF astroturfing indicators are detected THEN flag the trend and discount its metrics per knowledge/anti-patterns.md
+- IF top comment contradicts the post THEN note the divergence between post score and community consensus
+## Step 5: Trend Prediction & Confidence Scoring
+- For each detected trend, determine its lifecycle phase:
+  - **Seed** (0-15 min, low/erratic velocity) — Too early for reliable prediction
+  - **Ignition** (15-60 min, sharp acceleration) — Primary prediction window
+  - **Surge** (1-4 hours, sustained high velocity) — Confirmed breakout
+  - **Peak** (4-12 hours, velocity plateau then decline) — Maximum reach achieved
+  - **Decay** (12-48 hours, declining velocity) — Trend exhaustion
+- Calculate a composite trend confidence score (0-100):
+  - Velocity strength (30%): How far above baseline thresholds is the engagement velocity?
+  - Cross-community spread (25%): How many independent subreddits have surfaced this topic?
+  - Sentiment alignment (15%): Is community sentiment consistent and positive/engaged?
+  - Discussion quality (15%): Is engagement deep and organic, or shallow and potentially inauthentic?
+  - Temporal fit (15%): Is the timing consistent with the subreddit's peak activity patterns?
+- Estimate peak timing:
+  - IF in Ignition phase THEN predict peak in 3-8 hours (adjusted by subreddit size and time-of-day)
+  - IF in Surge phase THEN predict peak in 1-4 hours
+  - IF already at Peak THEN report as "currently peaking" with estimated decay onset
+- SELF-CHECK against anti-patterns from knowledge/anti-patterns.md:
+  - Is this a repost surge, not a genuine trend?
+  - Could vote manipulation be inflating the signal?
+  - Is a megathread absorbing individual post activity?
+  - Are bot accounts driving the engagement?
+  - IF any check flags a concern THEN reduce confidence score by 15-30 points and note the risk factor
+## Step 6: Report Generation & Output
+- Present findings in a structured trend report:
+  - **Trend Summary** — One-sentence description of the detected trend
+  - **Confidence** — Score (0-100) with label: Low (<40), Medium (40-69), High (>=70)
+  - **Phase** — Current lifecycle phase and estimated time to peak
+  - **Scope** — Subreddits involved, propagation pattern, spread score
+  - **Velocity Metrics** — Key engagement rates with normalization context
+  - **Sentiment** — Community sentiment distribution and quality tier
+  - **Evidence** — Links to the top 3-5 posts driving the trend, with per-post metrics
+  - **Risk Factors** — Any detected anti-patterns or confidence-reducing signals
+  - **Recommendation** — Actionable advice: monitor, act now, or wait for confirmation
+- IF multiple trends are detected THEN rank by confidence score descending and present as a prioritized list
+- IF monitoring mode is Watch THEN specify the next recommended check time based on the trend phase:
+  - Seed phase → recheck in 15 minutes
+  - Ignition phase → recheck in 30 minutes
+  - Surge phase → recheck in 1 hour
+  - Peak or Decay → recheck in 4 hours or close monitoring
+- SELF-CHECK output completeness:
+  - Does every trend have a confidence score and lifecycle phase?
+  - Are raw metrics accompanied by normalization context?
+  - Is the scope correctly qualified (subreddit-local vs. cross-community)?
+  - Are risk factors and anti-pattern warnings included where applicable?
+  - IF any check fails THEN revise the report before delivering

package/tests/benchmark.json ADDED Viewed

@@ -0,0 +1,486 @@
+{
+  "version": "0.0.1",
+  "dimension": "information-retrieval",
+  "tasks": [
+    {
+      "id": "bench-easy-01",
+      "difficulty": "easy",
+      "description": "Identify the current top trending topic in a single subreddit",
+      "input": "What is the hottest topic on r/technology right now? Give me the top trending post with its engagement metrics.",
+      "rubric": [
+        {
+          "criterion": "Data Retrieval",
+          "weight": 0.4,
+          "scoring": {
+            "5": "Retrieves current hot/rising posts from r/technology with score, upvote_ratio, num_comments, and post age",
+            "3": "Retrieves top posts but missing some engagement metrics",
+            "1": "Provides a general description without specific post data",
+            "0": "No data retrieved from the subreddit"
+          }
+        },
+        {
+          "criterion": "Metric Interpretation",
+          "weight": 0.3,
+          "scoring": {
+            "5": "Explains what the metrics mean in context — normalizes by subreddit size, notes velocity, identifies lifecycle phase",
+            "3": "Reports metrics but without normalization or velocity context",
+            "1": "Raw numbers only with no interpretation",
+            "0": "No metrics provided"
+          }
+        },
+        {
+          "criterion": "Output Quality",
+          "weight": 0.3,
+          "scoring": {
+            "5": "Structured output with post title, link, metrics, and a brief assessment of why it is trending",
+            "3": "Post identified with partial metrics",
+            "1": "Vague description of trending content",
+            "0": "No usable output"
+          }
+        }
+      ],
+      "expectedScoreWithout": 35,
+      "expectedScoreWith": 75
+    },
+    {
+      "id": "bench-easy-02",
+      "difficulty": "easy",
+      "description": "Check a subreddit's current activity level relative to its baseline",
+      "input": "Is r/worldnews more active than usual right now? Check the current activity level and compare it to what you'd expect for this time of day.",
+      "rubric": [
+        {
+          "criterion": "Baseline Awareness",
+          "weight": 0.4,
+          "scoring": {
+            "5": "References subreddit subscriber count and typical active user count; compares current activity to expected hourly baseline for this time of day and day of week",
+            "3": "Notes current activity level but baseline comparison is approximate or missing temporal context",
+            "1": "Reports current active users without any baseline reference",
+            "0": "No activity level assessment"
+          }
+        },
+        {
+          "criterion": "Data Collection",
+          "weight": 0.3,
+          "scoring": {
+            "5": "Checks active user count, recent post frequency in /new, and comment rates to triangulate activity level",
+            "3": "Checks one or two activity indicators",
+            "1": "Only mentions subscriber count without current activity data",
+            "0": "No data collected"
+          }
+        },
+        {
+          "criterion": "Contextual Output",
+          "weight": 0.3,
+          "scoring": {
+            "5": "Clear verdict (above/below/at baseline) with specific numbers and time-of-day context; notes if an event may be driving unusual activity",
+            "3": "Provides a verdict but without supporting numbers or temporal context",
+            "1": "Vague assessment without data",
+            "0": "No assessment provided"
+          }
+        }
+      ],
+      "expectedScoreWithout": 30,
+      "expectedScoreWith": 70
+    },
+    {
+      "id": "bench-easy-03",
+      "difficulty": "easy",
+      "description": "Find relevant subreddits for a given topic",
+      "input": "What are the most relevant subreddits for tracking developments in artificial intelligence? List them with subscriber counts and typical activity levels.",
+      "rubric": [
+        {
+          "criterion": "Subreddit Discovery",
+          "weight": 0.4,
+          "scoring": {
+            "5": "Identifies 5+ relevant subreddits spanning different aspects of AI (research, news, applications, ethics) including both mega-subs and niche communities",
+            "3": "Identifies 3-4 relevant subreddits but misses niche or specialized communities",
+            "1": "Only identifies 1-2 obvious subreddits (e.g., r/artificial)",
+            "0": "No relevant subreddits identified"
+          }
+        },
+        {
+          "criterion": "Community Profiling",
+          "weight": 0.3,
+          "scoring": {
+            "5": "Each subreddit includes: subscriber count, size tier, activity level, content focus, and typical post types",
+            "3": "Subscriber counts and basic descriptions but missing activity characterization",
+            "1": "Names only without metadata",
+            "0": "No profiling"
+          }
+        },
+        {
+          "criterion": "Strategic Value Assessment",
+          "weight": 0.3,
+          "scoring": {
+            "5": "Explains which subreddits are best for early trend detection vs. broad coverage vs. expert analysis; recommends a monitoring set",
+            "3": "Lists subreddits but doesn't differentiate their strategic value for monitoring",
+            "1": "Flat list with no monitoring strategy advice",
+            "0": "No strategic guidance"
+          }
+        }
+      ],
+      "expectedScoreWithout": 35,
+      "expectedScoreWith": 80
+    },
+    {
+      "id": "bench-med-01",
+      "difficulty": "medium",
+      "description": "Detect cross-subreddit topic convergence for a breaking event",
+      "input": "A major tech company just announced a significant product. Check r/technology, r/gadgets, r/apple, and r/Android to see if this topic is spreading across communities. Identify the topic, trace where it originated, and assess the cross-community spread pattern.",
+      "rubric": [
+        {
+          "criterion": "Cross-Subreddit Detection",
+          "weight": 0.3,
+          "scoring": {
+            "5": "Scans all 4 subreddits for overlapping topics using URL matching, keyword clustering, and entity co-occurrence; identifies the converging topic with specific evidence",
+            "3": "Checks multiple subreddits and finds overlap but uses only one correlation method",
+            "1": "Checks subreddits independently without systematic cross-correlation",
+            "0": "No cross-subreddit analysis"
+          }
+        },
+        {
+          "criterion": "Origin Tracing",
+          "weight": 0.25,
+          "scoring": {
+            "5": "Identifies the earliest post by timestamp, traces the crosspost chain, determines which subreddit served as the origin, and maps the propagation path",
+            "3": "Identifies the likely origin subreddit but doesn't trace the full propagation path",
+            "1": "Mentions where the topic appeared but doesn't determine origin",
+            "0": "No origin analysis"
+          }
+        },
+        {
+          "criterion": "Spread Pattern Classification",
+          "weight": 0.25,
+          "scoring": {
+            "5": "Correctly classifies the propagation pattern (hub-and-spoke, grassroots, cascade, or synchronized) with evidence; calculates spread score",
+            "3": "Describes the spread qualitatively but doesn't formally classify the pattern",
+            "1": "Notes that the topic appears in multiple places without pattern analysis",
+            "0": "No spread analysis"
+          }
+        },
+        {
+          "criterion": "Report Completeness",
+          "weight": 0.2,
+          "scoring": {
+            "5": "Structured report with topic summary, origin, spread map, per-subreddit engagement metrics, and prediction of further spread",
+            "3": "Reports the topic and affected subreddits but missing metrics or predictions",
+            "1": "Partial report with significant gaps",
+            "0": "No coherent report"
+          }
+        }
+      ],
+      "expectedScoreWithout": 25,
+      "expectedScoreWith": 65
+    },
+    {
+      "id": "bench-med-02",
+      "difficulty": "medium",
+      "description": "Perform velocity-based breakout detection on rising posts",
+      "input": "Analyze the rising posts in r/science right now. Identify any posts that are showing breakout velocity — engagement rates significantly above the subreddit's baseline. For each candidate, calculate the velocity metrics and predict whether it will reach the front page.",
+      "rubric": [
+        {
+          "criterion": "Velocity Calculation",
+          "weight": 0.3,
+          "scoring": {
+            "5": "Calculates upvote velocity, comment velocity, and award velocity for rising posts; normalizes by subreddit size (3M+ subscribers) and post age; compares against baseline thresholds",
+            "3": "Computes some velocity metrics but normalization is incomplete or thresholds are not referenced",
+            "1": "Reports engagement numbers without computing velocity (rate of change over time)",
+            "0": "No velocity analysis"
+          }
+        },
+        {
+          "criterion": "Breakout Identification",
+          "weight": 0.3,
+          "scoring": {
+            "5": "Flags posts exceeding the 90th-percentile velocity threshold for r/science; classifies each flagged post's lifecycle phase; provides multi-metric confirmation (upvote + comment velocity agreement)",
+            "3": "Identifies high-engagement posts but breakout criteria are loosely defined",
+            "1": "Lists popular posts without breakout-specific analysis",
+            "0": "No breakout detection"
+          }
+        },
+        {
+          "criterion": "Prediction Quality",
+          "weight": 0.25,
+          "scoring": {
+            "5": "Provides front-page probability estimate with reasoning based on velocity curve phase, time-of-day, and subreddit-specific breakout history; includes predicted peak timing",
+            "3": "Makes a prediction but without detailed supporting analysis",
+            "1": "Vague speculation about whether posts might trend",
+            "0": "No predictions made"
+          }
+        },
+        {
+          "criterion": "Anti-Pattern Awareness",
+          "weight": 0.15,
+          "scoring": {
+            "5": "Checks for vote manipulation signals, repost conflation, and bot activity; explicitly notes if any flagged posts have suspicious patterns",
+            "3": "Mentions the possibility of manipulation but doesn't perform specific checks",
+            "1": "No consideration of manipulation or false signals",
+            "0": "Analysis is likely skewed by undetected anti-patterns"
+          }
+        }
+      ],
+      "expectedScoreWithout": 20,
+      "expectedScoreWith": 65
+    },
+    {
+      "id": "bench-med-03",
+      "difficulty": "medium",
+      "description": "Analyze community sentiment on a controversial topic",
+      "input": "There's a heated discussion about AI regulation in r/technology and r/MachineLearning. Analyze the sentiment in both communities. I need to know: Are the communities aligned or divided? What are the dominant viewpoints? Is the discussion quality genuine or astroturfed?",
+      "rubric": [
+        {
+          "criterion": "Sentiment Analysis",
+          "weight": 0.3,
+          "scoring": {
+            "5": "Analyzes comment sentiment distribution (positive/negative/neutral/mixed) in both subreddits; compares sentiment between communities; identifies dominant viewpoints with supporting quotes",
+            "3": "Provides sentiment assessment for both subreddits but without systematic distribution analysis",
+            "1": "Vague characterization of sentiment without evidence",
+            "0": "No sentiment analysis"
+          }
+        },
+        {
+          "criterion": "Cross-Community Comparison",
+          "weight": 0.25,
+          "scoring": {
+            "5": "Compares r/technology (general audience) vs r/MachineLearning (practitioner audience) sentiment explicitly; identifies where they agree and diverge; explains the divergence in terms of community composition",
+            "3": "Notes differences between communities but doesn't explain the reasons",
+            "1": "Treats both communities as a single unit",
+            "0": "No comparison"
+          }
+        },
+        {
+          "criterion": "Discussion Quality Assessment",
+          "weight": 0.25,
+          "scoring": {
+            "5": "Evaluates engagement quality tier (Deep/Reactive/Hostile/Astroturfed) for each community; checks for coordinated messaging, bot accounts, and formulaic comments; provides evidence for quality classification",
+            "3": "Comments on discussion quality but without systematic evaluation against quality tiers",
+            "1": "No quality assessment beyond basic observation",
+            "0": "Assumes all engagement is genuine without checking"
+          }
+        },
+        {
+          "criterion": "Output Structure",
+          "weight": 0.2,
+          "scoring": {
+            "5": "Structured comparison report with per-community sentiment breakdown, dominant viewpoints, quality assessment, and a synthesis identifying key points of alignment and divergence",
+            "3": "Covers main points but output lacks clear structure",
+            "1": "Unstructured observations",
+            "0": "No coherent output"
+          }
+        }
+      ],
+      "expectedScoreWithout": 25,
+      "expectedScoreWith": 65
+    },
+    {
+      "id": "bench-med-04",
+      "difficulty": "medium",
+      "description": "Detect early signals of an emerging trend before it peaks",
+      "input": "I need to detect trends 24 hours before they hit the front page. Scan r/gaming, r/pcgaming, and r/Games for any posts currently in the Ignition or early Surge phase that haven't reached mainstream visibility yet. Focus on posts less than 2 hours old with anomalous engagement velocity.",
+      "rubric": [
+        {
+          "criterion": "Early Detection Methodology",
+          "weight": 0.35,
+          "scoring": {
+            "5": "Scans /new and /rising (not just /hot) for posts under 2 hours old; calculates velocity at this early stage; compares against historical breakout curves for gaming subreddits; correctly focuses on Ignition-phase signals",
+            "3": "Checks rising posts but doesn't filter by age or compare against historical breakout patterns",
+            "1": "Reports currently hot posts that are already mainstream — misses the early detection goal",
+            "0": "No early detection methodology applied"
+          }
+        },
+        {
+          "criterion": "Velocity Normalization",
+          "weight": 0.25,
+          "scoring": {
+            "5": "Normalizes velocity by each subreddit's size and current time-of-day baseline; accounts for different content types (game announcements vs. memes vs. discussion posts) in gaming subs",
+            "3": "Some normalization but missing important context factors",
+            "1": "Reports raw engagement numbers without normalization",
+            "0": "No normalization"
+          }
+        },
+        {
+          "criterion": "Multi-Signal Confirmation",
+          "weight": 0.2,
+          "scoring": {
+            "5": "Confirms breakout candidates using at least 2 signals (e.g., high upvote velocity + high comment velocity + early awards); notes signal agreement or disagreement",
+            "3": "Uses one velocity metric without multi-signal confirmation",
+            "1": "Relies on a single metric (e.g., just upvote count)",
+            "0": "No signal confirmation"
+          }
+        },
+        {
+          "criterion": "Actionable Prediction",
+          "weight": 0.2,
+          "scoring": {
+            "5": "For each candidate, provides predicted peak timing, confidence score, and recommended action (monitor, act, wait); compares against similar past breakouts",
+            "3": "Flags candidates as potentially trending but without timing or confidence specifics",
+            "1": "Lists posts without prediction or actionability",
+            "0": "No predictions"
+          }
+        }
+      ],
+      "expectedScoreWithout": 20,
+      "expectedScoreWith": 65
+    },
+    {
+      "id": "bench-hard-01",
+      "difficulty": "hard",
+      "description": "Distinguish organic trends from coordinated manipulation campaigns",
+      "input": "I've noticed a sudden surge of posts about a specific cryptocurrency across r/CryptoCurrency, r/technology, r/investing, and r/wallstreetbets. Investigate whether this is organic community interest or a coordinated pump campaign. Analyze posting patterns, account authenticity, comment quality, and engagement metrics to make a determination.",
+      "rubric": [
+        {
+          "criterion": "Account Analysis",
+          "weight": 0.3,
+          "scoring": {
+            "5": "Checks posting accounts for: account age, karma history, post history diversity, recent activity patterns; identifies clusters of new or single-purpose accounts; compares against bot/shill indicators from anti-patterns",
+            "3": "Notes some account characteristics but doesn't perform systematic authentication analysis",
+            "1": "Ignores account characteristics entirely",
+            "0": "No account analysis"
+          }
+        },
+        {
+          "criterion": "Coordination Detection",
+          "weight": 0.3,
+          "scoring": {
+            "5": "Analyzes temporal posting patterns (are posts suspiciously synchronized?); checks for identical or near-identical phrasing across posts/comments; examines whether accounts have coordinated history; identifies talking-point patterns",
+            "3": "Notes suspicious timing or phrasing but doesn't perform comprehensive coordination analysis",
+            "1": "Treats all posts as independent without checking for coordination",
+            "0": "No coordination analysis"
+          }
+        },
+        {
+          "criterion": "Organic vs. Inauthentic Verdict",
+          "weight": 0.25,
+          "scoring": {
+            "5": "Provides a clear, evidence-based verdict (organic / likely manipulated / mixed) with confidence level; lists specific evidence supporting the conclusion; quantifies the ratio of authentic vs. suspicious engagement",
+            "3": "Provides a verdict but supporting evidence is thin or speculative",
+            "1": "Hedges without committing to an assessment despite available evidence",
+            "0": "No determination made"
+          }
+        },
+        {
+          "criterion": "Cross-Subreddit Pattern Comparison",
+          "weight": 0.15,
+          "scoring": {
+            "5": "Compares the engagement pattern in each subreddit — notes if behavior differs by community (e.g., organic in r/CryptoCurrency but astroturfed in r/technology); explains why different communities may react differently",
+            "3": "Compares across subreddits but superficially",
+            "1": "Treats all subreddits as a single unit",
+            "0": "No cross-subreddit comparison"
+          }
+        }
+      ],
+      "expectedScoreWithout": 15,
+      "expectedScoreWith": 60
+    },
+    {
+      "id": "bench-hard-02",
+      "difficulty": "hard",
+      "description": "Full trend lifecycle analysis with prediction accuracy assessment",
+      "input": "Track a developing story across r/news, r/worldnews, r/politics, and relevant niche subreddits. I need: (1) origin identification — where did this story first appear on Reddit, (2) full propagation timeline across communities, (3) velocity curves at each stage, (4) sentiment evolution as the story spread, and (5) a prediction of whether it will sustain for 48+ hours or burn out quickly. Include confidence levels for all predictions.",
+      "rubric": [
+        {
+          "criterion": "Origin & Timeline Tracing",
+          "weight": 0.25,
+          "scoring": {
+            "5": "Identifies the earliest Reddit post about the story with exact timestamp; maps the complete propagation timeline showing when each subreddit picked up the topic; identifies crosspost chains and independent discoveries",
+            "3": "Identifies origin but timeline is incomplete or approximate",
+            "1": "Notes which subreddits discuss the topic but no temporal ordering",
+            "0": "No origin or timeline analysis"
+          }
+        },
+        {
+          "criterion": "Multi-Subreddit Velocity Analysis",
+          "weight": 0.25,
+          "scoring": {
+            "5": "Provides per-subreddit velocity curves showing engagement over time; identifies lifecycle phase in each community; notes velocity differences between communities and explains them (e.g., r/worldnews peaked before r/politics due to international angle)",
+            "3": "Provides velocity data for some subreddits but analysis is inconsistent",
+            "1": "Reports final engagement numbers without velocity over time",
+            "0": "No velocity analysis"
+          }
+        },
+        {
+          "criterion": "Sentiment Evolution",
+          "weight": 0.2,
+          "scoring": {
+            "5": "Tracks how sentiment changed as the story spread — e.g., initial concern in r/news, political framing in r/politics, international perspective in r/worldnews; identifies narrative shifts over time",
+            "3": "Provides snapshot sentiment but doesn't track evolution",
+            "1": "Single sentiment label for the entire story",
+            "0": "No sentiment analysis"
+          }
+        },
+        {
+          "criterion": "Sustainability Prediction",
+          "weight": 0.2,
+          "scoring": {
+            "5": "Provides a reasoned prediction of story sustainability (48h+ or burnout) based on: velocity decay rate, cross-community engagement depth, historical comparison with similar stories, and external event dependency; includes confidence level",
+            "3": "Makes a prediction but with limited supporting analysis",
+            "1": "Vague guess about story longevity",
+            "0": "No prediction"
+          }
+        },
+        {
+          "criterion": "Report Synthesis",
+          "weight": 0.1,
+          "scoring": {
+            "5": "Coherent narrative connecting all five requested elements; executive summary with key findings and confidence levels",
+            "3": "All elements present but not well connected",
+            "1": "Fragmented analysis",
+            "0": "Incomplete report"
+          }
+        }
+      ],
+      "expectedScoreWithout": 15,
+      "expectedScoreWith": 55
+    },
+    {
+      "id": "bench-hard-03",
+      "difficulty": "hard",
+      "description": "Predict an emerging trend 24 hours before peak from weak early signals",
+      "input": "You have access to r/startups, r/SaaS, r/Entrepreneur, r/smallbusiness, and r/webdev. Using early signal detection, identify any topics that are currently showing Seed or early Ignition phase characteristics that could become significant trends within 24 hours. I want you to detect trends that have NOT yet appeared on any subreddit's front page. Provide evidence for each prediction and a false-positive assessment.",
+      "rubric": [
+        {
+          "criterion": "Early Signal Detection",
+          "weight": 0.3,
+          "scoring": {
+            "5": "Systematically scans /new and /rising across all 5 subreddits for posts under 1 hour old; identifies anomalous velocity in the Seed/Ignition phase using multi-metric signals; successfully spots at least one genuinely early trend",
+            "3": "Checks rising in some subreddits but may miss very early signals in /new; detection criteria are not rigorous",
+            "1": "Reports already-visible trends rather than pre-peak signals",
+            "0": "No early detection capability demonstrated"
+          }
+        },
+        {
+          "criterion": "Prediction Methodology",
+          "weight": 0.25,
+          "scoring": {
+            "5": "Predictions are based on velocity analysis, cross-subreddit convergence checks, temporal pattern matching, and comparison with historical breakout curves; methodology is explicitly stated and repeatable",
+            "3": "Predictions have some analytical basis but methodology is not comprehensive",
+            "1": "Predictions are speculative without clear analytical grounding",
+            "0": "No methodology for prediction"
+          }
+        },
+        {
+          "criterion": "False Positive Assessment",
+          "weight": 0.25,
+          "scoring": {
+            "5": "For each flagged trend, provides: confidence score, list of confirming signals, list of potential confounds (repost, bot, niche-only), and explicit false-positive probability estimate; demonstrates awareness that most early signals do NOT become major trends",
+            "3": "Acknowledges uncertainty but false-positive assessment is informal",
+            "1": "Presents all predictions with equal confidence; no false-positive awareness",
+            "0": "No false-positive consideration"
+          }
+        },
+        {
+          "criterion": "Evidence & Traceability",
+          "weight": 0.2,
+          "scoring": {
+            "5": "Each prediction is backed by specific posts with links, metrics, and timeline; evidence is sufficient for the user to independently verify the assessment",
+            "3": "Some supporting evidence but not enough for independent verification",
+            "1": "Claims without supporting evidence",
+            "0": "No evidence provided"
+          }
+        }
+      ],
+      "expectedScoreWithout": 15,
+      "expectedScoreWith": 55
+    }
+  ]
+}

package/tests/smoke.json ADDED Viewed

@@ -0,0 +1,54 @@
+{
+  "version": "0.0.1",
+  "timeout": 60,
+  "tasks": [
+    {
+      "id": "smoke-01",
+      "description": "Monitor a technology subreddit for emerging trends and provide a structured trend report with velocity analysis",
+      "input": "Monitor r/technology and r/programming for any emerging trends in the last 24 hours. I want to know what topics are gaining unusual traction, which posts are showing breakout velocity, and whether any topics are spreading across both subreddits. Provide confidence scores and predicted peak timing for each trend.",
+      "rubric": [
+        {
+          "criterion": "Subreddit Monitoring & Data Collection",
+          "weight": 0.25,
+          "scoring": {
+            "5": "Checks rising, new, and hot listings for both subreddits; collects engagement metrics (score, comments, awards, upvote_ratio) for candidate posts; notes subreddit subscriber counts for normalization",
+            "3": "Checks at least one listing type for both subreddits; collects basic metrics but misses some dimensions",
+            "1": "Only checks one subreddit or one listing type; minimal metric collection",
+            "0": "No systematic data collection from subreddit listings"
+          }
+        },
+        {
+          "criterion": "Velocity Analysis & Trend Detection",
+          "weight": 0.3,
+          "scoring": {
+            "5": "Calculates engagement velocity (upvotes/time, comments/time); normalizes by subreddit size and time-of-day; identifies posts exceeding breakout thresholds; classifies trend lifecycle phase (Seed/Ignition/Surge/Peak/Decay)",
+            "3": "Identifies high-engagement posts but velocity calculation is approximate or unnormalized; lifecycle phase mentioned but not rigorously determined",
+            "1": "Lists popular posts by raw score without velocity or normalization analysis",
+            "0": "No velocity-based analysis; just reports current top posts"
+          }
+        },
+        {
+          "criterion": "Cross-Subreddit Correlation",
+          "weight": 0.25,
+          "scoring": {
+            "5": "Detects topics appearing in both subreddits; classifies propagation pattern (grassroots, hub-and-spoke, etc.); calculates spread score; identifies origin subreddit",
+            "3": "Notes overlapping topics between subreddits but without formal correlation analysis or propagation classification",
+            "1": "Treats each subreddit independently with no cross-community analysis",
+            "0": "No awareness of cross-subreddit patterns"
+          }
+        },
+        {
+          "criterion": "Report Quality & Actionability",
+          "weight": 0.2,
+          "scoring": {
+            "5": "Structured report with: trend summary, confidence score (0-100), lifecycle phase, predicted peak timing, sentiment snapshot, supporting evidence links, and risk factors; trends ranked by confidence",
+            "3": "Report includes trend descriptions and some metrics but missing confidence scores, timing predictions, or risk factors",
+            "1": "Unstructured list of trending topics without metrics or predictions",
+            "0": "No coherent report format"
+          }
+        }
+      ],
+      "passThreshold": 60
+    }
+  ]
+}