@saulwade/swl-ses 1.6.3 → 1.6.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (46) hide show
  1. package/CLAUDE.md +3 -3
  2. package/README.md +2 -2
  3. package/agentes/gh-fix-ci-swl.md +275 -0
  4. package/agentes/nemesis-auditor-swl.md +90 -1
  5. package/comandos/swl/exportar-vault.md +106 -14
  6. package/comandos/swl/nemesis.md +70 -3
  7. package/comandos/swl/release.md +62 -2
  8. package/comandos/swl/salud.md +32 -0
  9. package/comandos/swl/verificar.md +116 -2
  10. package/habilidades/agent-browser/SKILL.md +111 -4
  11. package/habilidades/agent-deep-links/SKILL.md +148 -0
  12. package/habilidades/backend-async-postgres-testing/SKILL.md +215 -0
  13. package/habilidades/backend-error-design/SKILL.md +221 -0
  14. package/habilidades/browser-interaction-patterns/SKILL.md +514 -0
  15. package/habilidades/browser-research-domains/SKILL.md +635 -0
  16. package/habilidades/changelog-generator/SKILL.md +172 -0
  17. package/habilidades/changelog-generator/scripts/parse-commits.js +354 -0
  18. package/habilidades/devsecops-pipeline-security/SKILL.md +3 -0
  19. package/habilidades/fastapi-experto/SKILL.md +49 -4
  20. package/habilidades/harness-claude-code/SKILL.md +4 -1
  21. package/habilidades/postgresql-experto/SKILL.md +80 -4
  22. package/habilidades/proceso-discovery-machote/SKILL.md +157 -0
  23. package/habilidades/proceso-modular-split/SKILL.md +256 -0
  24. package/habilidades/tdd-workflow/SKILL.md +12 -5
  25. package/hooks/extraccion-aprendizajes.js +8 -0
  26. package/hooks/lib/deep-links.js +185 -0
  27. package/hooks/lib/evolution-tracker.js +148 -20
  28. package/hooks/lib/gateway-notify.js +70 -7
  29. package/manifiestos/modulos.json +13 -3
  30. package/manifiestos/skills-lock.json +1247 -1191
  31. package/package.json +92 -92
  32. package/plugin.json +371 -362
  33. package/reglas/arquitectura.md +38 -0
  34. package/reglas/arreglar-al-detectar.md +93 -0
  35. package/reglas/auditorias-documentales-estructurales.md +38 -0
  36. package/reglas/registro-componentes-nuevos.md +14 -0
  37. package/reglas/tests-cleanup.md +220 -0
  38. package/scripts/instalador.js +72 -4
  39. package/scripts/lib/mcp_config.py +29 -14
  40. package/scripts/lib/notificaciones-telegram.js +14 -0
  41. package/scripts/lib/transformadores/codex.js +4 -0
  42. package/scripts/lib/transformadores/cursor.js +5 -0
  43. package/scripts/mcp-orchestrator.py +153 -131
  44. package/scripts/mcp-pool-manager.py +132 -107
  45. package/scripts/mcp-telemetry.py +139 -120
  46. package/scripts/verificar-release.js +199 -1
@@ -0,0 +1,635 @@
1
+ ---
2
+ name: browser-research-domains
3
+ description: >
4
+ Atajos por dominio para research técnico que evitan abrir un browser cuando
5
+ hay API documentada disponible. Cubre GitHub, ArXiv, Hacker News, Stack
6
+ Overflow, PubMed, OpenAlex y SEC EDGAR con endpoint correcto, batch fetch,
7
+ rate limits y gotchas. Cargar cuando investigador-swl, agent-browser o un
8
+ flujo de research consume múltiples páginas de cualquiera de estos dominios
9
+ y se quiere reducir tokens y latencia.
10
+ version: "1.0.0"
11
+ herramientasPermitidas: [Read, Bash, WebFetch]
12
+ evolved: false
13
+ fuente: "browser-use/browser-harness — domain-skills (MIT License, 2026)"
14
+ evolvable: true
15
+ exclusiones:
16
+ - "No cargar si el research toca un dominio no listado aquí — usar agent-browser directo."
17
+ - "No cargar para automatización con login/interacción — esto es read-only research vía APIs públicas."
18
+ - "No cargar para tareas que requieren contenido renderizado por JS — usar agent-browser o browser-interaction-patterns."
19
+ - "No cargar para descarga de PDFs académicos — usar swl-markitdown sobre la URL del PDF directo."
20
+ ---
21
+
22
+ # Skill: browser-research-domains
23
+
24
+ Para los 7 dominios cubiertos aquí, **NUNCA abrir un browser**. Todo el dato
25
+ está accesible vía `http_get` + API REST/Atom/XML, sin auth o con auth opcional.
26
+ Reduce tokens 20-50× y latencia 5-20×.
27
+
28
+ Adaptado de `browser-use/browser-harness/agent-workspace/domain-skills/`
29
+ (MIT License). Los snippets son agnósticos al runtime — funcionan en cualquier
30
+ contexto Python que tenga `urllib.request` o equivalente. En SWL, llamar via
31
+ `agent-browser` CLI cuando sea necesario, o directamente con `Bash + curl/python`
32
+ si la situación lo permite.
33
+
34
+ ## Cuándo cargar
35
+
36
+ - Research técnico que tocará github, arxiv, hackernews, stackoverflow, pubmed,
37
+ openalex o sec edgar.
38
+ - Necesidad de batch-fetch (10+ items) de cualquiera de estos dominios.
39
+ - `agent-browser` está siendo invocado para algo que tiene API documentada.
40
+
41
+ ## Cuándo NO cargar
42
+
43
+ - Research que cruza dominios NO listados aquí.
44
+ - Tareas de automatización con interacción (login, formularios).
45
+ - Descarga de PDFs académicos — usar `swl-markitdown` sobre la URL del PDF.
46
+
47
+ ## Regla universal — API antes que browser
48
+
49
+ Para los 7 dominios cubiertos, el browser solo es necesario para:
50
+ - GitHub trending page (server-side rendered, sin API equivalente).
51
+ - Render de MathJax en Stack Overflow (raro).
52
+ - Cualquier contenido fuera de API pública.
53
+
54
+ **Todo lo demás se hace con `http_get` + JSON/XML.** Ganancia típica:
55
+ - `arxiv` browser: 5-8s por paper / API batch: 1.9s por 10 papers (~25× más rápido).
56
+ - `hackernews` browser: 3-8s por página / `http_get` regex: 170ms.
57
+
58
+ ---
59
+
60
+ ## 1. GitHub
61
+
62
+ Mezcla REST API + browser. El browser solo para `/trending` (server-side
63
+ rendered, sin equivalente API). Todo lo demás vía REST.
64
+
65
+ ### Metadata de repo
66
+
67
+ ```python
68
+ import json
69
+ data = json.loads(http_get("https://api.github.com/repos/{owner}/{repo}"))
70
+ # Campos: stargazers_count, forks_count, description, language, topics,
71
+ # open_issues_count, created_at, updated_at, pushed_at, default_branch,
72
+ # license, homepage, visibility
73
+ ```
74
+
75
+ ### Contenido de archivos — `raw.githubusercontent.com`
76
+
77
+ Sin rate limit, sin auth, sin base64.
78
+
79
+ ```python
80
+ readme = http_get("https://raw.githubusercontent.com/owner/repo/main/README.md")
81
+ ```
82
+
83
+ ### Búsqueda de repos
84
+
85
+ ```python
86
+ results = json.loads(http_get(
87
+ "https://api.github.com/search/repositories"
88
+ "?q=browser+automation+language:python&sort=stars&per_page=10"
89
+ ))
90
+ ```
91
+
92
+ Rate limit search: **10 req/min unauthenticated** (separado del core 60/hora).
93
+
94
+ ### Trending page — único caso con browser
95
+
96
+ ```python
97
+ import json
98
+ goto_url("https://github.com/trending") # o /trending/python?since=weekly
99
+ wait_for_load()
100
+ wait(2) # hidratación React tras readyState=complete
101
+
102
+ result = js("""
103
+ (function(){
104
+ var rows = Array.from(document.querySelectorAll('article.Box-row'));
105
+ return JSON.stringify(rows.map(function(el){
106
+ var h2link = el.querySelector('h2 a');
107
+ var starLink = el.querySelector('a[href*="/stargazers"]');
108
+ return {
109
+ name: h2link?.innerText.trim().replace(/\\s+/g,' '),
110
+ url: h2link ? 'https://github.com' + h2link.getAttribute('href') : null,
111
+ stars_total: starLink?.innerText.trim()
112
+ };
113
+ }));
114
+ })()
115
+ """)
116
+ ```
117
+
118
+ ### Auth
119
+
120
+ Sin token: 60 req/hora core, 10 req/min search.
121
+ Con `GITHUB_TOKEN`: 5,000/hora ambos.
122
+
123
+ ```python
124
+ headers = {"Authorization": f"Bearer {os.environ['GITHUB_TOKEN']}",
125
+ "X-GitHub-Api-Version": "2022-11-28"}
126
+ ```
127
+
128
+ ### Gotchas
129
+
130
+ - 404 raises `urllib.error.HTTPError`, no JSON error.
131
+ - Code search (`/search/code`) requiere auth, devuelve 401 sin token.
132
+ - Trending stars vienen como string `"4,548"` — `int(s.replace(',', ''))`.
133
+
134
+ ---
135
+
136
+ ## 2. ArXiv
137
+
138
+ **NUNCA usar browser para ArXiv.** Todo accesible via Atom API. `id_list`
139
+ soporta batch hasta 200 IDs por call.
140
+
141
+ ### Batch fetch (10× más rápido que paralelo)
142
+
143
+ ```python
144
+ import xml.etree.ElementTree as ET
145
+ NS = {'atom': 'http://www.w3.org/2005/Atom',
146
+ 'arxiv': 'http://arxiv.org/schemas/atom'}
147
+
148
+ ids = ['1706.03762', '1810.04805', '2005.14165']
149
+ xml = http_get(
150
+ f"http://export.arxiv.org/api/query?id_list={','.join(ids)}"
151
+ f"&max_results={len(ids)}"
152
+ )
153
+ root = ET.fromstring(xml)
154
+ for e in root.findall('atom:entry', NS):
155
+ arxiv_id = e.find('atom:id', NS).text.split('/')[-1]
156
+ title = e.find('atom:title', NS).text.strip()
157
+ abstract = e.find('atom:summary', NS).text.strip()
158
+ pdf_link = next((l.get('href') for l in e.findall('atom:link', NS)
159
+ if l.get('title') == 'pdf'), None)
160
+ ```
161
+
162
+ Confirmed: batch 10 IDs → 1.91s. Paralelo 10 single calls → 6.34s.
163
+
164
+ ### Search por categoría
165
+
166
+ ```python
167
+ xml = http_get(
168
+ "http://export.arxiv.org/api/query"
169
+ "?search_query=ti:transformer+AND+cat:cs.LG"
170
+ "&max_results=10&sortBy=submittedDate&sortOrder=descending"
171
+ )
172
+ ```
173
+
174
+ Field prefixes: `ti:` title, `au:` author, `abs:` abstract, `cat:` category,
175
+ `all:` all fields. Boolean: `AND`/`OR`/`ANDNOT`.
176
+
177
+ ### URL construction
178
+
179
+ ```python
180
+ arxiv_id = "1706.03762v7"
181
+ pdf_versioned = f"https://arxiv.org/pdf/{arxiv_id}"
182
+ pdf_latest = f"https://arxiv.org/pdf/{re.sub(r'v\\d+$', '', arxiv_id)}"
183
+ ```
184
+
185
+ ### Gotchas
186
+
187
+ - Rate limit: 3s entre requests para bulk crawling. Bursts rápidos de ~10
188
+ funcionan sin block.
189
+ - `atom:id` es URL completa `http://arxiv.org/abs/1706.03762v7` — siempre
190
+ split `/[-1]`.
191
+ - Batch `id_list` retorna entries en orden impredecible — indexar por ID, no
192
+ por posición.
193
+ - ~5% de papers no tienen `atom:summary` — guard con `if el is not None`.
194
+ - `max_results` cap es 2000 por call. Pagination con `start` offset + 3s sleep.
195
+
196
+ ---
197
+
198
+ ## 3. Hacker News
199
+
200
+ Tres paths, todos sin browser:
201
+
202
+ | Goal | Approach | Latency |
203
+ |------|----------|---------|
204
+ | Front page (30 stories) | `http_get` + regex | ~170ms |
205
+ | Historical / keyword search | Algolia API | ~400ms |
206
+ | Comment tree completo | Algolia items API | ~300ms |
207
+ | Item específico | Firebase API | ~200ms |
208
+ | 500 ranked IDs | Firebase topstories | ~200ms |
209
+
210
+ ### Front page scrape (más rápido para real-time)
211
+
212
+ ```python
213
+ import re, html as htmllib
214
+
215
+ page = http_get("https://news.ycombinator.com")
216
+ story_ids = re.findall(r'<tr class="athing submission" id="(\\d+)">', page)
217
+ titles_urls = re.findall(
218
+ r'class="titleline"[^>]*><a href="([^"]*)"[^>]*>(.*?)</a>', page
219
+ )
220
+ # titles DEBEN pasar por html.unescape() — contienen &#x27; &amp; etc.
221
+ ```
222
+
223
+ ### Algolia (search + nested comments)
224
+
225
+ ```python
226
+ import json
227
+
228
+ # Búsqueda por keyword
229
+ data = json.loads(http_get(
230
+ "https://hn.algolia.com/api/v1/search"
231
+ "?query=llm&tags=story&hitsPerPage=20"
232
+ ))
233
+
234
+ # Más recientes
235
+ data = json.loads(http_get(
236
+ "https://hn.algolia.com/api/v1/search_by_date"
237
+ "?tags=story&hitsPerPage=20"
238
+ ))
239
+
240
+ # Thread completo con árbol anidado
241
+ thread = json.loads(http_get(
242
+ "https://hn.algolia.com/api/v1/items/47806725"
243
+ ))
244
+ # thread['children'] = top-level comments con .children anidados
245
+ ```
246
+
247
+ ### Firebase oficial
248
+
249
+ ```python
250
+ top = json.loads(http_get("https://hacker-news.firebaseio.com/v0/topstories.json"))
251
+ item = json.loads(http_get(f"https://hacker-news.firebaseio.com/v0/item/{id}.json"))
252
+ user = json.loads(http_get(f"https://hacker-news.firebaseio.com/v0/user/{u}.json"))
253
+ ```
254
+
255
+ ### Gotchas
256
+
257
+ - Titles tienen entities HTML (`&#x27;`, `&amp;`) — siempre `html.unescape()`.
258
+ - Job posts no tienen score ni author — `scores_by_id.get(sid)` retorna `None`.
259
+ - Algolia comment fields usan `comment_text`, NO `text`.
260
+ - Firebase 500 items secuenciales = ~100s; Algolia con `tags=front_page` es
261
+ mucho más rápido para bulk.
262
+
263
+ ---
264
+
265
+ ## 4. Stack Overflow
266
+
267
+ Stack Exchange API v2.3, JSON, sin browser. Rate limit duro:
268
+ - **300 req/día por IP** sin key.
269
+ - **10,000 req/día** con key.
270
+
271
+ ### Top questions por tag
272
+
273
+ ```python
274
+ data = json.loads(http_get(
275
+ "https://api.stackexchange.com/2.3/questions"
276
+ "?order=desc&sort=votes&tagged=python&site=stackoverflow"
277
+ "&pagesize=5&filter=withbody"
278
+ ))
279
+ print("Quota:", data['quota_remaining'])
280
+ ```
281
+
282
+ `filter=withbody` es **obligatorio** para incluir body — sin él, el campo
283
+ simplemente no existe (sin error, sin warning).
284
+
285
+ ### Batch IDs (semicolon-delimited, hasta 100)
286
+
287
+ ```python
288
+ data = json.loads(http_get(
289
+ "https://api.stackexchange.com/2.3/questions/231767;419163;394809"
290
+ "?site=stackoverflow&filter=withbody"
291
+ ))
292
+ ```
293
+
294
+ ### Decoding
295
+
296
+ - `title`: tiene entities (`&quot;`, `&#39;`) → `html.unescape()`.
297
+ - `body`: HTML completo → `HTMLParser` para texto plano.
298
+ - `display_name`, `tags`: plain text, no decode.
299
+
300
+ ### Multi-site
301
+
302
+ `site=stackoverflow` se puede sustituir por: `superuser`, `serverfault`,
303
+ `askubuntu`, `unix`, `datascience`, `math`. Mismo API, mismo quota pool.
304
+
305
+ ### Gotchas
306
+
307
+ - Quota es por IP, resets midnight UTC. 6 tests consumen ~27 quota.
308
+ - Verificar `data.get('backoff')` y dormir si retorna int.
309
+ - Pagesize max 100. `page=` 1-indexed.
310
+ - Errors raise `HTTPError` exception, no JSON body accesible.
311
+
312
+ ---
313
+
314
+ ## 5. PubMed
315
+
316
+ NCBI E-utilities REST. **NUNCA browser.** Sin API key: 3 req/s. Con free key:
317
+ 10 req/s.
318
+
319
+ ### Pipeline ESearch → ESummary (más común)
320
+
321
+ ```python
322
+ import json
323
+
324
+ # Step 1: search → PMIDs
325
+ search = json.loads(http_get(
326
+ "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
327
+ "?db=pubmed&term=deep+learning+radiology&retmax=10&retmode=json"
328
+ ))
329
+ pmids = search['esearchresult']['idlist']
330
+
331
+ # Step 2: metadata batch
332
+ summary = json.loads(http_get(
333
+ f"https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi"
334
+ f"?db=pubmed&id={','.join(pmids)}&retmode=json"
335
+ ))
336
+ result = summary['result']
337
+ for uid in result['uids']:
338
+ art = result[uid]
339
+ title = art['title']
340
+ pubdate = art['pubdate'] # '2026 Apr 18'
341
+ authors = [a['name'] for a in art['authors']] # abreviados 'Last I'
342
+ doi = {x['idtype']: x['value'] for x in art['articleids']}.get('doi')
343
+ ```
344
+
345
+ ### EFetch XML (full abstract + MeSH + author full names)
346
+
347
+ ```python
348
+ import xml.etree.ElementTree as ET
349
+
350
+ raw = http_get(
351
+ "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi"
352
+ "?db=pubmed&id=41999029&retmode=xml&rettype=abstract"
353
+ )
354
+ root = ET.fromstring(raw)
355
+ for art in root.findall('.//PubmedArticle'):
356
+ mc = art.find('MedlineCitation')
357
+ pmid = mc.find('PMID').text
358
+ article = mc.find('Article')
359
+ title = ''.join(article.find('ArticleTitle').itertext()).strip()
360
+ # Abstract puede ser estructurado (BACKGROUND/METHODS/RESULTS/CONCLUSION)
361
+ abstract_el = article.find('Abstract')
362
+ if abstract_el is not None:
363
+ sections = []
364
+ for t in abstract_el.findall('AbstractText'):
365
+ label = t.get('Label', '')
366
+ text = ''.join(t.itertext()).strip()
367
+ sections.append(f"[{label}] {text}" if label else text)
368
+ abstract = ' '.join(sections)
369
+ ```
370
+
371
+ ### Bulk con WebEnv (>10k results)
372
+
373
+ ```python
374
+ search = json.loads(http_get(
375
+ "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
376
+ "?db=pubmed&term=CRISPR&retmax=0&retmode=json&usehistory=y"
377
+ ))
378
+ webenv = search['esearchresult']['webenv']
379
+ query_key = search['esearchresult']['querykey']
380
+
381
+ for start in range(0, 1000, 200):
382
+ raw = http_get(
383
+ f"https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi"
384
+ f"?db=pubmed&query_key={query_key}&WebEnv={webenv}"
385
+ f"&retstart={start}&retmax=200&retmode=xml&rettype=abstract"
386
+ )
387
+ ```
388
+
389
+ ### Gotchas
390
+
391
+ - `count` es string, no int. `int(search['esearchresult']['count'])`.
392
+ - EFetch retmode debe ser `xml`, NO `json` (json devuelve texto MEDLINE plano).
393
+ - `ArticleTitle` puede tener tags embebidos (`<i>`, `<sub>`) — usar `itertext()`.
394
+ - ~15% de articles sin abstract — guard `if abstract_el is not None`.
395
+ - Author puede ser `CollectiveName` (consortium) en lugar de `LastName/ForeName` —
396
+ check `CollectiveName` primero.
397
+ - ELink `pubmed_pubmed` (related) está roto persistentemente — usar DOI como
398
+ fallback.
399
+
400
+ ---
401
+
402
+ ## 6. OpenAlex
403
+
404
+ 260M+ works, 90M+ authors, JSON API, sin auth.
405
+
406
+ **Siempre incluir `mailto=` para usar polite pool** (10 req/s, más confiable).
407
+
408
+ ### Search papers
409
+
410
+ ```python
411
+ import json
412
+
413
+ data = json.loads(http_get(
414
+ "https://api.openalex.org/works"
415
+ "?search=transformer+attention"
416
+ "&per-page=5&sort=cited_by_count:desc"
417
+ "&select=id,doi,display_name,publication_year,cited_by_count,open_access"
418
+ "&mailto=you@example.com"
419
+ ))
420
+ for w in data["results"]:
421
+ bare_id = w["id"].split("/")[-1] # W2626778328
422
+ print(bare_id, w["publication_year"], w["cited_by_count"])
423
+ ```
424
+
425
+ ### Por DOI
426
+
427
+ ```python
428
+ w = json.loads(http_get(
429
+ "https://api.openalex.org/works/https://doi.org/10.1038/nature14539"
430
+ "?mailto=you@example.com"
431
+ ))
432
+ ```
433
+
434
+ ### Reconstruir abstract (inverted index)
435
+
436
+ ```python
437
+ w = json.loads(http_get(
438
+ "https://api.openalex.org/works/W2626778328"
439
+ "?select=id,abstract_inverted_index&mailto=you@example.com"
440
+ ))
441
+ aii = w.get("abstract_inverted_index") or {}
442
+ words_pos = [(pos, word) for word, positions in aii.items() for pos in positions]
443
+ abstract = " ".join(word for _, word in sorted(words_pos))
444
+ ```
445
+
446
+ ### Citation traversal
447
+
448
+ ```python
449
+ # Forward citations
450
+ citing = json.loads(http_get(
451
+ f"https://api.openalex.org/works?filter=cites:{paper_id}"
452
+ "&per-page=5&sort=cited_by_count:desc&mailto=you@example.com"
453
+ ))
454
+
455
+ # Backward references
456
+ paper = json.loads(http_get(
457
+ f"https://api.openalex.org/works/{paper_id}"
458
+ "?select=referenced_works&mailto=you@example.com"
459
+ ))
460
+ refs = paper.get("referenced_works", [])
461
+ ```
462
+
463
+ ### Cursor pagination (bulk >10k)
464
+
465
+ ```python
466
+ import urllib.parse
467
+ cursor = "*"
468
+ while True:
469
+ encoded = urllib.parse.quote(cursor, safe="")
470
+ data = json.loads(http_get(
471
+ f"https://api.openalex.org/works?filter={flt}"
472
+ f"&per-page=200&cursor={encoded}&mailto=you@example.com"
473
+ ))
474
+ if not data.get("results"): break
475
+ # process
476
+ cursor = data["meta"].get("next_cursor")
477
+ if not cursor: break
478
+ ```
479
+
480
+ ### Filter syntax
481
+
482
+ `filter=author.id:A5108093963,publication_year:>2020,open_access.is_oa:true`
483
+ - AND: comma
484
+ - OR: pipe `2022|2023`
485
+ - Negation: `!2020`
486
+ - Range: `>1000`, `<2010`, `100-500`
487
+
488
+ ### Entity ID prefixes
489
+
490
+ `W` Work, `A` Author, `I` Institution, `S` Source, `C` Concept, `T` Topic,
491
+ `F` Funder, `P` Publisher.
492
+
493
+ ### Gotchas
494
+
495
+ - `id` field es URL completa `https://openalex.org/W2626778328` — siempre
496
+ `.split("/")[-1]` para bare ID.
497
+ - DOI lookup usa URL completa: `/works/https://doi.org/...`, NO `/works/10.1038/...`.
498
+ - Page-based pagination hard stops at 10,000 results. Usar `cursor=*` para más.
499
+ - `cursor=*` debe URL-encodearse: `urllib.parse.quote(cursor, safe="")`.
500
+ - `group_by` y `page` incompatibles.
501
+ - `abstract_inverted_index` puede ser `null` para closed-access papers.
502
+ - `select=` reduce payload ~90% — usar en bulk harvests.
503
+
504
+ ---
505
+
506
+ ## 7. SEC EDGAR
507
+
508
+ Datos públicos sin auth. **`www.sec.gov` requiere User-Agent custom** o devuelve 403.
509
+
510
+ ```python
511
+ UA = {"User-Agent": "swl-ses research@example.com"}
512
+ # Formato requerido: "CompanyName contact@email.com"
513
+ ```
514
+
515
+ ### Ticker → CIK
516
+
517
+ ```python
518
+ import json
519
+ tickers = json.loads(http_get(
520
+ "https://www.sec.gov/files/company_tickers.json", headers=UA
521
+ ))
522
+ aapl = next(v for v in tickers.values() if v['ticker'] == 'AAPL')
523
+ cik = str(aapl['cik_str']).zfill(10) # "0000320193"
524
+ ```
525
+
526
+ ### Submissions (~1000 filings recientes)
527
+
528
+ ```python
529
+ data = json.loads(http_get(
530
+ f"https://data.sec.gov/submissions/CIK{cik}.json", headers=UA
531
+ ))
532
+ recent = data['filings']['recent']
533
+ filings_10k = [
534
+ (f, d, a, doc)
535
+ for f, d, a, doc in zip(
536
+ recent['form'], recent['filingDate'],
537
+ recent['accessionNumber'], recent['primaryDocument']
538
+ )
539
+ if f in ('10-K', '10-Q')
540
+ ]
541
+ ```
542
+
543
+ ### XBRL — un concepto sobre tiempo
544
+
545
+ ```python
546
+ data = json.loads(http_get(
547
+ f"https://data.sec.gov/api/xbrl/companyconcept/CIK{cik}"
548
+ "/us-gaap/Assets.json", headers=UA
549
+ ))
550
+ entries = data['units']['USD']
551
+ # Deduplicar — restatements multiplican entries por periodo
552
+ seen = {}
553
+ for e in entries:
554
+ if e.get('form') == '10-K' and e.get('fp') == 'FY':
555
+ end = e['end']
556
+ if end not in seen or e['filed'] > seen[end]['filed']:
557
+ seen[end] = e
558
+ ```
559
+
560
+ ### Cross-company (XBRL frames)
561
+
562
+ ```python
563
+ data = json.loads(http_get(
564
+ "https://data.sec.gov/api/xbrl/frames/us-gaap"
565
+ "/RevenueFromContractWithCustomerExcludingAssessedTax/USD/CY2024.json",
566
+ headers=UA
567
+ ))
568
+ # data['data'] = lista de todas las companies para ese concepto/periodo
569
+ ```
570
+
571
+ ### Full-text search
572
+
573
+ ```python
574
+ data = json.loads(http_get(
575
+ "https://efts.sec.gov/LATEST/search-index"
576
+ "?q=%22climate+risk%22&forms=10-K&dateRange=custom&startdt=2024-01-01",
577
+ headers=UA
578
+ ))
579
+ # efts.sec.gov acepta Mozilla/5.0 default
580
+ ```
581
+
582
+ ### Rate limit
583
+
584
+ **10 req/s** documentado. `max_workers ≤ 8` para `ThreadPoolExecutor`.
585
+
586
+ ### Gotchas
587
+
588
+ - `www.sec.gov` con `Mozilla/5.0` default devuelve 403. SIEMPRE `headers=UA`.
589
+ - `data.sec.gov` y `efts.sec.gov` son más permisivos pero usar UA igual por
590
+ política.
591
+ - XBRL contiene duplicates per period — dedup por `end` con latest `filed`.
592
+ - Revenue concept varía: `RevenueFromContractWithCustomerExcludingAssessedTax`
593
+ (post-2018) vs `SalesRevenueNet` (older).
594
+ - `fp` para anuales es `'FY'`; quarterly también aparecen en 10-K, filtrar
595
+ ambos `form == '10-K'` AND `fp == 'FY'`.
596
+ - `companyfacts.json` es ~5MB — para single metric usar `companyconcept`.
597
+ - CIK format: `cik_str` int en company_tickers; APIs requieren `str(cik).zfill(10)`.
598
+
599
+ ---
600
+
601
+ ## Tabla de decisión rápida
602
+
603
+ | Dominio | Browser necesario | Endpoint preferido |
604
+ |---------|-------------------|---------------------|
605
+ | GitHub | Solo para /trending | api.github.com (60/h sin token) |
606
+ | ArXiv | NUNCA | export.arxiv.org/api/query (Atom) |
607
+ | Hacker News | NUNCA | hn.algolia.com / hacker-news.firebaseio.com |
608
+ | Stack Overflow | NUNCA | api.stackexchange.com (300/día sin key) |
609
+ | PubMed | NUNCA | eutils.ncbi.nlm.nih.gov (3/s sin key) |
610
+ | OpenAlex | NUNCA | api.openalex.org (con mailto) |
611
+ | SEC EDGAR | NUNCA | data.sec.gov + UA obligatoria |
612
+
613
+ ## Patrón canónico
614
+
615
+ Para los 7 dominios, el patrón es:
616
+
617
+ 1. Skip `goto_url`, `wait_for_load`, `capture_screenshot`.
618
+ 2. Construir URL del API endpoint con query params.
619
+ 3. `http_get` (o `WebFetch` en SWL si el agente no tiene CLI Python).
620
+ 4. Parsear JSON / XML según corresponda.
621
+ 5. Verificar quota / rate limit response field si la API lo provee.
622
+
623
+ Si el dominio NO está en esta lista, default a `agent-browser` + `browser-interaction-patterns`.
624
+
625
+ ---
626
+
627
+ ## Relación con otras skills
628
+
629
+ - **`agent-browser`**: cuando el dominio NO está aquí o requiere JS/login.
630
+ - **`browser-interaction-patterns`**: patrones de bajo nivel CDP cuando hay
631
+ que automatizar UI.
632
+ - **`web-fetcher-routing`**: orquesta WebFetch vs agent-browser por tipo de URL.
633
+ - **`swl-markitdown`**: para PDFs académicos linked desde papers (arxiv, pubmed).
634
+
635
+ <!-- Adaptado de browser-use/browser-harness bajo MIT License (browser-use, 2026). -->