npm - @opendirectory.dev/skills - Versions diffs - 0.1.0 - Mend

@opendirectory.dev/skills 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (212) hide show

package/skills/show-hn-writer/SKILL.md ADDED Viewed

@@ -0,0 +1,303 @@
+---
+name: show-hn-writer
+description: ''
+compatibility: [claude-code, gemini-cli, github-copilot]
+author: OpenDirectory
+version: 1.0.0
+---
+# Show HN Writer
+Draft a Show HN post title and body that follows the unwritten rules of Hacker News: specific, honest, first-person, no marketing.
+---
+**Critical rule:** Never use marketing language. No "game-changing", "revolutionary", "powerful", "robust", "seamless", "innovative", "best-in-class", or "streamline". Never write in third person about the product. Never ask readers to upvote, share, or check out other links.
+---
+## Step 1: Gather Project Context
+Check if the user has already provided enough context to write the post. You need:
+- What the project does (one technical sentence)
+- What problem it solves and who has that problem
+- Why the builder made it (the honest story: scratch your own itch, side project, weekend hack)
+- What makes it technically interesting or different from existing solutions
+- Current state: alpha, beta, open source, free, paid, solo project, team
+If any of these are missing, ask in a single message:
+"To write your Show HN post, I need a few details:
+1. What does [project name] do, technically? (one sentence)
+2. Who has the problem it solves? (be specific: 'developers who...' not 'anyone who...')
+3. Why did you build it? (the real story)
+4. What's technically interesting about how it works?
+5. What's the current state: open source? free? alpha?"
+Do not proceed until you have answers to all five.
+---
+## Step 2: Read Context Files (if available)
+Check for project context files before asking:
+```bash
+ls README.md 2>/dev/null && echo "README found"
+ls CLAUDE.md 2>/dev/null && echo "CLAUDE.md found"
+ls package.json 2>/dev/null && echo "package.json found"
+```
+If README.md exists, read the first 100 lines. Extract: what it does, tech stack, how to run it, any stated motivation.
+If you find enough context in the files, skip the Step 1 questions entirely or ask only what's missing.
+---
+## Step 3: Draft the Title
+The Show HN title must start with "Show HN:": this is required, not optional.
+**Title format A: Product-First:**
+```
+Show HN: [Project Name] – [what it does in plain English]
+```
+**Title format B: Outcome-Focused:**
+```
+Show HN: [Project Name] – [specific outcome] for [specific person]
+```
+**Title rules:**
+- 60-80 characters total (including "Show HN: ")
+- No exclamation marks
+- No adjectives ("fast", "simple", "easy", "powerful") unless they are literal technical specs
+- The dash is an en dash (–), not a hyphen (-)
+- No trailing punctuation
+- Describe what it does, not what it could do for someone
+Good examples:
+- `Show HN: Zulip – Group chat that threads every conversation`
+- `Show HN: Lite XL – A lightweight text editor written in C and Lua`
+- `Show HN: Datasette – Instantly publish SQLite databases to the web`
+Bad examples (never write these):
+- `Show HN: The most powerful tool for managing your workflow` (adjective, no specifics)
+- `Show HN: Check out my new project!` (no description, no name)
+- `Show HN: I built a thing for developers` (vague)
+Draft three title variants:
+1. Product-First format
+2. Outcome-Focused format
+3. Technical-Angle format (lead with the interesting technical decision)
+---
+## Step 4: Draft the Body
+The Show HN body is a builder talking to peers. It is not a product description. It is not a pitch.
+**Structure:**
+**Opening line:** One sentence, first-person, what you built. Not "Introducing X" or "X is a tool that". Just: "I built [X] because [reason]." or "For the past [N] months I've been working on [X]."
+**The why:** Two to four sentences on why you made it. Was it a problem you had personally? Something frustrating at work? A technical curiosity? Be specific and honest. If you built it for fun, say so.
+**How it works:** Three to six sentences on the technical approach. This is what HN readers care about. What's the interesting engineering decision? What did you learn? What tradeoff did you make and why? Name the specific technology choices.
+**Current state:** One to two sentences. Is it open source? Free? Alpha? Looking for beta users? Solo project or team? How long have you been working on it?
+**Invitation:** One sentence to close. Invite feedback, questions, or criticism. Never ask for upvotes or shares. Examples: "Would love to hear what you think." / "Happy to answer questions about the implementation." / "Criticism welcome: still early days."
+**Body rules:**
+- Write in first person throughout
+- 150-350 words total
+- No bullet points, no headers, no bold text
+- No links in the body (the URL goes in the submission, not the body)
+- No asking people to sign up, follow, or subscribe
+- No comparison tables or feature lists
+- If there's a demo or GitHub link, do NOT add it to the body: it goes in the URL field of the submission
+---
+## Step 5: Summarize Submission with Gemini
+Write a Gemini request to evaluate the draft and check for Show HN anti-patterns:
+```bash
+cat > /tmp/show-hn-review-request.json << 'ENDJSON'
+{
+  "system_instruction": {
+    "parts": [{
+      "text": "You are a longtime Hacker News member reviewing a Show HN post draft. Your job is to catch anything that will hurt its reception: marketing language, vague descriptions, third-person writing, requests for upvotes/shares, adjectives without specifics, titles over 80 characters. For each issue found, state: the exact phrase, why it hurts, and a specific suggested replacement. If the post passes, say 'Passes review.' Output only the review: no commentary, no preamble. Do not use em dashes. Do not praise the post."
+    }]
+  },
+  "contents": [{
+    "parts": [{
+      "text": "DRAFT_POST_HERE"
+    }]
+  }],
+  "generationConfig": {
+    "temperature": 0.2,
+    "maxOutputTokens": 1024
+  }
+}
+ENDJSON
+curl -s -X POST \
+  "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent?key=$GEMINI_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d @/tmp/show-hn-review-request.json \
+  | python3 -c "import sys,json; d=json.load(sys.stdin); print(d['candidates'][0]['content']['parts'][0]['text'])"
+```
+Replace `DRAFT_POST_HERE` with the full title and body text.
+**If GEMINI_API_KEY is not set:** Skip this step. Proceed with the manual self-QA in Step 6.
+---
+## Step 6: Self-QA
+Before presenting the final output, check each item:
+**Title checks:**
+- [ ] Starts with exactly "Show HN:" (capital H, capital N, colon, space)
+- [ ] 60-80 characters total
+- [ ] Contains no marketing adjectives
+- [ ] Describes what the product does, not what it will do for someone
+- [ ] No exclamation marks
+**Body checks:**
+- [ ] Opens in first person ("I built...", "For the past N months...", "I've been working on...")
+- [ ] 150-350 words
+- [ ] Contains at least one technical detail (language, approach, architecture decision)
+- [ ] No links in the body text
+- [ ] Closes with an invitation for feedback, not a call to action
+- [ ] No bullet points or headers
+- [ ] No marketing words: "game-changer", "revolutionary", "powerful", "robust", "seamless", "innovative", "best-in-class", "streamline", "leverage", "transform"
+If any check fails, fix before presenting.
+---
+## Step 7: Post to Hacker News (Optional)
+If `HN_USERNAME` and `HN_PASSWORD` are set in the environment, offer to post directly.
+```bash
+echo "HN_USERNAME: ${HN_USERNAME:-not set}"
+echo "HN_PASSWORD: ${HN_PASSWORD:+set (hidden)}"
+```
+If both are set, tell the user: "HN credentials found. Confirm to submit this post directly to Hacker News, or say 'output only' to get the text."
+On confirmation, run the three-step submission:
+**Step A: Authenticate and get session cookie**
+```bash
+HN_COOKIE=$(curl -s -c /tmp/hn-cookies.txt -b /tmp/hn-cookies.txt \
+  -X POST "https://news.ycombinator.com/login" \
+  -d "acct=${HN_USERNAME}&pw=${HN_PASSWORD}&goto=news" \
+  -D /tmp/hn-headers.txt \
+  -L -o /dev/null -w "%{http_code}")
+# Verify login succeeded (should redirect to news, not back to login)
+grep -c "user?id=${HN_USERNAME}" /tmp/hn-headers.txt > /dev/null 2>&1 \
+  || curl -s -b /tmp/hn-cookies.txt "https://news.ycombinator.com/" \
+     | grep -c "logout" > /dev/null 2>&1 \
+  && echo "Login: success" || echo "Login: failed: check credentials"
+```
+If login fails: stop. Tell the user their credentials did not work and present the draft for manual submission instead.
+**Step B: Fetch the submission form to get the CSRF token (fnid)**
+```bash
+FNID=$(curl -s -b /tmp/hn-cookies.txt \
+  "https://news.ycombinator.com/submit" \
+  | python3 -c "
+import sys, re
+html = sys.stdin.read()
+m = re.search(r'name=\"fnid\" value=\"([^\"]+)\"', html)
+print(m.group(1) if m else 'NOT_FOUND')
+")
+echo "fnid: $FNID"
+```
+If fnid is NOT_FOUND: stop. HN may have changed their form structure. Present the draft for manual submission.
+**Step C: Submit the post**
+For a URL submission (project URL):
+```bash
+curl -s -b /tmp/hn-cookies.txt \
+  -X POST "https://news.ycombinator.com/r" \
+  -d "title={ENCODED_TITLE}&url={ENCODED_URL}&fnid=${FNID}&fnop=submit-page" \
+  -D /tmp/hn-submit-headers.txt \
+  -L -o /tmp/hn-submit-response.html
+# Check for success: HN redirects to the post or to newest
+grep -E "item\?id=|/newest" /tmp/hn-submit-headers.txt \
+  && echo "Submission: success" || echo "Submission: check manually"
+```
+For a text post (no URL, just body):
+```bash
+curl -s -b /tmp/hn-cookies.txt \
+  -X POST "https://news.ycombinator.com/r" \
+  -d "title={ENCODED_TITLE}&text={ENCODED_BODY}&fnid=${FNID}&fnop=submit-page" \
+  -D /tmp/hn-submit-headers.txt \
+  -L -o /tmp/hn-submit-response.html
+```
+**After submission:**
+1. Extract the post URL from the redirect headers
+2. Tell the user: "Posted: https://news.ycombinator.com/item?id=[ID]"
+3. Remind them: "Reply to comments within the first two hours. Do not reshare the link for 24 hours: HN penalizes vote rings."
+**If any step fails:** Clean up cookie file (`rm -f /tmp/hn-cookies.txt`) and present the draft for manual submission. Do not retry automatically.
+```bash
+# Always clean up credentials from disk after the session
+rm -f /tmp/hn-cookies.txt /tmp/hn-headers.txt /tmp/hn-submit-headers.txt /tmp/hn-submit-response.html
+```
+If `HN_USERNAME` or `HN_PASSWORD` is not set: skip this step entirely and proceed to Step 8.
+---
+## Step 8: Present Output
+Present in this order:
+```
+## Show HN Post
+### Recommended Title
+Show HN: [title]
+### Alternative Titles
+1. Show HN: [variant 1]
+2. Show HN: [variant 2]
+---
+### Body
+[post body text]
+---
+### Submission Notes
+- URL field: [project URL or GitHub URL]
+- Best time to post: Tuesday to Thursday, 8–10 AM US Eastern
+- After posting: Respond to every comment in the first two hours
+- Do not share the link elsewhere for the first 24 hours: HN flags vote rings
+```
+Do not add commentary about the post. Present the output, then stop.

package/skills/show-hn-writer/evals/evals.json ADDED Viewed

@@ -0,0 +1,35 @@
+{
+  "skill_name": "show-hn-writer",
+  "evals": [
+    {
+      "id": 1,
+      "prompt": "Write a Show HN post for my project. It's called Datasync: it watches a folder and automatically syncs changes to an S3 bucket using a local daemon. I built it because rsync kept breaking on binary files and I needed something that just worked. It's open source, written in Go, free.",
+      "expected_output": "Agent reads context, skips Step 1 questions (enough context provided). Drafts three title variants all starting with 'Show HN:'. Recommended title is 60-80 characters. No marketing adjectives. Body opens in first person ('I built...' or 'For the past...'). Body is 150-350 words. Includes technical details (Go, S3, daemon, how it handles binary files). At least one technical decision explained. Closes with feedback invitation, not a CTA. No links in body. Self-QA passes all checks. Output includes three title variants, recommended title, body, and submission notes.",
+      "files": []
+    },
+    {
+      "id": 2,
+      "prompt": "Help me launch on Hacker News. Here's my README: [README with marketing language: 'revolutionary AI-powered platform', 'game-changing productivity boost', 'the future of developer tooling']",
+      "expected_output": "Agent reads the README. Detects marketing language in the source material. Drafts title and body without marketing language. Gemini review step flags any remaining adjectives. Self-QA catches 'revolutionary', 'game-changing' if they leaked through. Final output contains none of the original marketing phrases. Title is specific and descriptive. Body explains what the tool actually does technically.",
+      "files": ["README.md"]
+    },
+    {
+      "id": 3,
+      "prompt": "Write a Show HN for my weekend project. I just made a simple todo app.",
+      "expected_output": "Agent recognizes missing context (what makes it technically interesting, why they built this one and not used an existing one, what is different). Asks the 5 required questions before drafting. Does not draft a post based on 'simple todo app' alone: that is not enough to write a good Show HN. After the user provides more context (e.g., 'it uses a CRDT for offline sync'), drafts a specific post about the technical decision, not the generic 'todo app' framing.",
+      "files": []
+    },
+    {
+      "id": 4,
+      "prompt": "Write a Show HN title for my project. It's a CLI tool called 'recast' that converts any JSON API response into a typed TypeScript interface.",
+      "expected_output": "Agent drafts three title variants. All start with 'Show HN:'. Character counts are shown. Recommended title is the one that most specifically describes what it does. Example acceptable title: 'Show HN: Recast – Generate TypeScript interfaces from any JSON API response' (under 80 chars). No adjectives. No marketing words. En dash (–) used as separator.",
+      "files": []
+    },
+    {
+      "id": 5,
+      "prompt": "Write a Show HN post. My project README is attached.",
+      "expected_output": "Agent reads README.md. Extracts: what it does, tech stack, motivation. Asks only for information not found in the README (likely: why the builder made it, what makes it technically interesting, current state). Does not ask for information that is available in the file. Drafts complete post using README content plus user answers. Post passes all self-QA checks.",
+      "files": ["README.md"]
+    }
+  ]
+}

package/skills/show-hn-writer/references/hn-rules.md ADDED Viewed

@@ -0,0 +1,74 @@
+# Show HN Rules and Community Norms
+## Official Rules
+- Title must start with exactly "Show HN:": no variation
+- Post in the Show HN comments thread if your submission does not get front page traction
+- Do not ask for upvotes in the post or comments
+- Do not post the same project more than once every few months
+- Must be something you made (or a significant contribution to)
+- Works-in-progress, betas, and experiments are acceptable
+## What Gets Upvoted
+Analysis of high-performing Show HN posts:
+**Strong signals:**
+- Specific technical explanation in the first paragraph of comments
+- Builder responds quickly and in depth to technical questions
+- Unusual technical decision (doing it differently than the standard approach)
+- Honest about limitations and what it does NOT do
+- Clear target audience ("this is for people who...")
+- Open source, or explains why it is not
+**Weak signals:**
+- "I built this in a weekend" (people stopped caring about this)
+- Vague descriptions ("productivity tool", "developer tool")
+- Screenshots without explanation
+- Body text that reads like a press release
+## Title Patterns That Work
+From top Show HN submissions:
+- Lead with the capability: "Show HN: X – converts Y to Z"
+- Technical specificity: "Show HN: X – builds Y using Z (no Y required)"
+- Comparison as context (use sparingly): "Show HN: X – like [known thing] but for [specific use case]"
+- The honest experiment: "Show HN: X – an experiment in doing Y differently"
+## Title Patterns That Fail
+- Adjective overload: "Show HN: X – a fast, simple, powerful tool for Y"
+- Missing the point: "Show HN: I made a website"
+- Marketing speak: "Show HN: The future of developer tooling"
+- Too long: titles over 80 characters get cut off
+## Body Norms
+The body is optional but high-performing posts almost always have one.
+The body is read by people who are already interested (they clicked from the title). Do not re-explain what it does: go deeper. Talk about:
+- Why you made it instead of using [existing tool]
+- The technical decision you are most proud of or most uncertain about
+- What surprised you during the build
+- What you are still figuring out
+## Comments Strategy
+- Respond to every comment on launch day
+- When someone asks a question, answer it thoroughly: do not deflect
+- When someone critiques your approach, acknowledge it honestly
+- If you do not know the answer, say so
+- Never get defensive
+## Timing
+- Best time: Tuesday to Thursday, 8-10 AM US Eastern
+- Worst time: Friday afternoon, weekends, major news days
+- The first 2 hours determine front page placement
+- Do not share the link on other platforms until 24 hours after posting (vote rings get flagged)
+## Resubmission Policy
+You can repost a significantly updated version of a project. Wait at least 3-6 months. Acknowledge the original post in your new body: "I posted an early version 6 months ago. Here is what changed."

package/skills/show-hn-writer/references/title-formulas.md ADDED Viewed

@@ -0,0 +1,93 @@
+# Show HN Title Formulas
+## The Core Formula
+```
+Show HN: [Name] – [verb phrase describing what it does]
+```
+The dash is an en dash (–), not a hyphen (-).
+Total length: 60-80 characters including "Show HN: ".
+## Five Proven Formats
+### Format 1: Direct Capability
+```
+Show HN: [Name] – [verb] + [object] + [context]
+```
+Examples:
+- `Show HN: Datasette – Instantly publish SQLite databases to the web`
+- `Show HN: Jitsi – Free, open source video conferencing`
+- `Show HN: Pandoc – Universal document format converter`
+Best for: tools with a clear, single function
+### Format 2: Problem-Solution
+```
+Show HN: [Name] – [pain point] without [the annoying part]
+```
+Examples:
+- `Show HN: Zulip – Group chat that threads every conversation`
+- `Show HN: Coolify – Self-hosted Heroku alternative with no vendor lock-in`
+Best for: tools that replace something people already use but dislike
+### Format 3: Technical Architecture
+```
+Show HN: [Name] – [what it is] built with [interesting tech choice]
+```
+Examples:
+- `Show HN: Lite XL – A lightweight text editor written in C and Lua`
+- `Show HN: Bun – A fast all-in-one JavaScript runtime built in Zig`
+Best for: developer tools where the implementation is the interesting part
+### Format 4: Outcome for Specific User
+```
+Show HN: [Name] – [specific outcome] for [specific person]
+```
+Examples:
+- `Show HN: Fly.io – Deploy app servers close to your users`
+- `Show HN: Retool – Build internal tools in minutes`
+Best for: B2B tools with a clear target persona
+### Format 5: The Experiment Frame
+```
+Show HN: [Name] – An experiment in [doing X] differently
+```
+Examples:
+- `Show HN: Ink – React for command-line apps`
+- `Show HN: Tauri – Electron alternative using system webview`
+Best for: projects that challenge a conventional approach
+## Character Counting
+Count characters in your title before submitting:
+```python
+title = "Show HN: Datasette – Instantly publish SQLite databases to the web"
+print(f"{len(title)} characters")
+```
+Target: 60-80 characters. Under 60 feels thin. Over 80 gets cut off in some views.
+## Words That Add Length Without Value
+Remove these from your title:
+- "simple", "easy", "fast", "powerful", "lightweight" (unless it is a literal technical spec like "under 5MB")
+- "tool", "app", "platform", "solution", "service" (usually redundant)
+- "for developers", "for teams" (only include if it narrows the audience usefully)
+- "open source" (put it in the body; the title has no room for it)
+## En Dash vs Hyphen
+Always use an en dash (–) as the separator between name and description.
+Copy it: –
+Not a hyphen (-) and not an em dash (—).
+Most Show HN titles follow this convention. It is a visual signal of the format.

package/skills/stargazer/README.md ADDED Viewed

@@ -0,0 +1,79 @@
+# Stargazer
+![stargazer](https://github.com/user-attachments/assets/54d6b003-2f2c-42bd-a8f9-b4e835338dfc)
+Stargazer is a GitHub star scraper that collects detailed information about users who star a specific repository. It scrapes data like their emails, usernames, names, company names, locations, and social links. While it grabs a full profile, it is primarily designed to find and verify user emails.
+It outperforms standard scrapers by implementing a multi-tier verification system that filters out false positives, specifically those originating from repository forks (e.g., stopping the scraper from accidentally pulling a famous developer's email just because a user forked their repository).
+https://github.com/user-attachments/assets/1e15a51b-8bff-4282-928e-fc99da2343cb
+## The 5-Tier Extraction Architecture
+This system uses five distinct methods to find and verify user emails, ensuring maximum coverage and accuracy:
+1. **Profile API**: Direct extraction from the public GitHub user profile.
+2. **Events API**: Analysis of recent user activity and public event streams to find associated email addresses.
+3. **GPG Keys**: Extraction of email identities linked to verified GPG keys on the user account.
+4. **Patch Regex**: Deep scanning of commit patches and diffs using regular expressions to identify author emails.
+5. **Global Search API**: Advanced search queries with cryptographic login verification to find user mentions across the platform.
+## Key Features
+*   **Multi-Token Rotation**: Supports a pool of GitHub Personal Access Tokens to distribute requests and maximize speed.
+*   **Smart Rate Limiting**: Uses semaphores to manage concurrency and bypass the standard 30 requests per minute search limit safely.
+*   **JSONL Checkpointing**: Saves progress in real-time using JSON Lines format, allowing for easy resumption if your internet drops or the script is stopped.
+*   **Jitter**: Implements randomized delays between requests to mimic human behavior and avoid GitHub abuse detection.
+*   **CSV Generation**: Includes built-in tools to transform raw extraction data into clean, structured CSV files for marketing or analysis.
+## How to get a GitHub Personal Access Token (PAT)
+To run the scraper, you will need at least one GitHub Personal Access Token (though 3 or more are recommended for repositories with thousands of stars).
+1. Go to your GitHub Token Settings: https://github.com/settings/tokens
+2. Click **Generate new token (classic)**.
+3. Give it a descriptive note (e.g., "Stargazer Scraper").
+4. Under scopes, select **`read:user`** and **`user:email`**. This allows the script to read public profile data and emails. No other permissions are required.
+5. Scroll down and click **Generate token**.
+6. Copy the `ghp_...` string and paste it into your `.env` file.
+## Installation
+1. Clone the repository to your local machine.
+2. Install the required dependencies using pip:
+   ```bash
+   pip install aiohttp python-dotenv
+   ```
+3. Create a `.env` file in the root directory. You can use `stargazer-skill/assets/.env.example` as a template.
+## Configuration
+Edit the `.env` file with your specific settings:
+*   `GITHUB_PATS`: A comma-separated list of your GitHub Personal Access Tokens.
+*   `TARGET_OWNER`: The username or organization that owns the repository.
+*   `TARGET_REPO`: The name of the repository to scan.
+*   `MAX_USERS`: The maximum number of stargazers to process.
+*   `MAX_CONCURRENT`: The number of concurrent requests to run (50 is recommended).
+## Usage Guide
+Run the extraction process in three sequential steps.
+### 1. Extract Data
+Start the main extraction script to gather detailed user information.
+```bash
+python stargazer-skill/scripts/stargazer_deep_extractor.py
+```
+### 2. Count Results
+Analyze the extracted data to see how many unique emails were found. You can run this while the main script is running.
+```bash
+python stargazer-skill/scripts/count_emails.py
+```
+### 3. Convert to CSV
+Generate a final CSV file for use in other applications.
+```bash
+python stargazer-skill/scripts/convert_to_csv.py
+```

package/skills/stargazer/SKILL.md ADDED Viewed

@@ -0,0 +1,7 @@
+---
+name: stargazer
+description: A skill for stargazer
+author: OpenDirectory
+version: 1.0.0
+---

package/skills/stargazer/stargazer-skill/SKILL.md ADDED Viewed

@@ -0,0 +1,58 @@
+---
+name: stargazer-deep-extractor
+description: Advanced 5-tier OSINT scraper for extracting GitHub stargazer emails. Use this skill when a user wants to scrape, extract, or download stargazers from a GitHub repository.
+---
+# Stargazer Deep Extractor Skill
+This skill provides a highly detailed, script-like workflow for an AI Agent to extract GitHub stargazers and their email addresses. It leverages the 5-tier Stargazer Deep Extractor toolkit (Profile API, PushEvents, GPG Keys, Patch Regex, and Global Search API) to maximize extraction yields while bypassing rate limits through multi-token rotation and asyncio semaphores.
+## Tone and Formatting Constraints
+- You must adopt a strictly professional, technical tone.
+- Do not use emojis in your responses.
+- Do not use em-dashes (use standard hyphens or colons instead).
+## Workflow Execution Steps
+### Step 1: Input Validation and Extraction
+When a user requests a repository scrape, you must extract the exact GitHub owner and repository name.
+- For a URL like `https://github.com/openai/codex`, the owner is `openai` and the repository is `codex`.
+- If the user only provides `openai/codex`, split it by the slash.
+- If the repository name is not provided, you must ask the user for it.
+### Step 2: Interrogation and Environment Setup
+If the user is running this for the first time, you must interrogate them to set up the `.env` file correctly. Ask the following questions in a point-wise format:
+1. "What is the maximum number of stargazers you would like to scrape? By default, this will be the total star count of the repository."
+2. "Please provide your GitHub Personal Access Tokens (PATs) as a comma-separated list."
+### Step 3: Token Requirement Notice
+You must remind the user about GitHub's strict rate limits and recommend the optimal number of tokens based on the repository's size.
+Provide the following advisory notice to the user:
+- Less than 2,000 stars: 1 token is generally sufficient.
+- 2,000 to 5,000 stars: 2 to 3 tokens are recommended.
+- More than 20,000 stars: 4 or more tokens are strongly recommended.
+**CRITICAL RULE**: Do not hard code this token math as a mandatory constraint. You must deliver this as a notice or recommendation. If the user decides not to follow the advice and wants to proceed with fewer tokens, you must allow them to do so and proceed with whatever they provide. Let the user use whatever they want.
+### Step 4: Configuration Editing
+After gathering the user's responses, you must configure the `.env` file in the execution directory (referencing `assets/.env.example` if needed).
+Ensure the following variables are written to `.env`:
+- `GITHUB_PATS`: Comma-separated list of the user's tokens.
+- `TARGET_OWNER`: The parsed repository owner.
+- `TARGET_REPO`: The parsed repository name.
+- `MAX_USERS`: The user's defined limit (or the total star count).
+- `MAX_CONCURRENT`: Set to 50 for optimal performance.
+### Step 5: Sequence Execution
+You must run the bundled scripts in the exact sequence below using the Bash tool. The JSONL checkpointing system ensures that if the process is interrupted, it can resume without losing data.
+1. **Deep Extraction**: Run `python scripts/stargazer_deep_extractor.py`. This executes the 5-tier OSINT extraction.
+2. **Real-time Statistics**: Run `python scripts/count_emails.py`. This analyzes the generated JSONL file (`{TARGET_OWNER}_{TARGET_REPO}_detailed.jsonl`) to count the exact number of emails successfully found vs. null emails.
+3. **CSV Conversion**: Run `python scripts/convert_to_csv.py`. This converts the final JSONL data into a structured CSV file with proper `utf-8-sig` encoding for Excel compatibility.
+### Step 6: Final Transparency Report
+After the sequence completes, you must be fully transparent with the user. Provide a final summary reporting:
+- The total number of people fetched.
+- The exact total of emails successfully extracted.
+- The total number of null (hidden) emails.
+- The absolute path to the final CSV deliverable.

package/skills/stargazer/stargazer-skill/assets/.env.example ADDED Viewed

@@ -0,0 +1,18 @@
+# Add your GitHub Personal Access Tokens here
+# Separate multiple tokens with commas for rotation (e.g. ghp_xyz,ghp_abc)
+GITHUB_PATS="ghp_token1,ghp_token2,ghp_token3,ghp_token4,ghp_token5"
+# Set the target repository here
+TARGET_OWNER="openai"
+TARGET_REPO="codex"
+# ----------------------------------------
+# SCALING OPTIONS (ENTERPRISE SETTINGS)
+# ----------------------------------------
+# STOP SCRAPING AT EXACTLY THIS MANY STARGAZERS
+MAX_USERS=25000
+# CONCURRENCY: How many users to process concurrently (Continuous Stream)
+# Higher = Faster, but might trigger secondary GitHub abuse limits. 50 is a safe maximum.
+MAX_CONCURRENT=50