pubchem-cli 0.1.2 → 0.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -18,15 +18,30 @@ Global install:
18
18
 
19
19
  ```bash
20
20
  npm install -g pubchem-cli
21
+ pubchem --help
21
22
  ```
22
23
 
23
- For local development in this repo:
24
+ If you are hacking on this repository locally:
24
25
 
25
26
  ```bash
26
27
  go build -o ./pubchem ./cmd/pubchem
27
28
  ./pubchem --help
28
29
  ```
29
30
 
31
+ ## Mental Model
32
+
33
+ Use the CLI in this order:
34
+
35
+ 1. Resolve an identifier to a record when you start from a name, CAS RN, InChIKey, or source ID.
36
+ 2. Use exact identity commands when you already have the CID, SID, or AID.
37
+ 3. Use search commands when you need candidate matches.
38
+ 4. Use `view` when you need headings and sections from PUG View.
39
+ 5. Use `raw` only when there is no higher-level verb yet.
40
+ 6. Use `export` when you need files.
41
+ 7. Use `batch` when you need many records.
42
+
43
+ This CLI does not guess silently. That is intentional.
44
+
30
45
  ## Usage
31
46
 
32
47
  ```bash
@@ -38,8 +53,8 @@ Root flags:
38
53
  - `--json`: Output JSON to stdout.
39
54
  - `--plain`: Output stable plain text to stdout.
40
55
  - `--results-only`: In JSON mode, emit only the primary result.
41
- - `--select`: In JSON mode, select comma-separated fields with dot-path support.
42
- - `--force`: Skip confirmations for future destructive commands.
56
+ - `--select`: In JSON mode, select comma-separated fields with exact dot-path support.
57
+ - `--force`: Skip confirmations for future destructive commands and file overwrites.
43
58
  - `--no-input`: Never prompt; fail instead.
44
59
  - `--enable-commands`: Comma-separated list of enabled top-level commands.
45
60
  - `--version`: Print version and exit.
@@ -62,76 +77,472 @@ Top-level command families:
62
77
  - `agent`
63
78
  - `completion`
64
79
 
65
- ## Examples
80
+ ## What To Use
81
+
82
+ | Task | Command |
83
+ |---|---|
84
+ | Resolve a drug name, CAS RN, InChI, or InChIKey | `pubchem resolve compound ...` |
85
+ | Exact compound lookup by CID | `pubchem identity compound ...` or `pubchem compound get ...` |
86
+ | Similarity search around a known structure | `pubchem compound similar <cid> --query-type cid` |
87
+ | Substructure or superstructure search | `pubchem compound substructure <cid> --query-type cid` or `pubchem compound superstructure <cid> --query-type cid` |
88
+ | Resolve a substance by SID or depositor source ID | `pubchem resolve substance ...` or `pubchem substance sourceid ...` |
89
+ | Search assays by gene, protein, name, source, or keyword | `pubchem assay search ...` |
90
+ | PubMed, patents, and xrefs | `pubchem refs ...` |
91
+ | Browse nested PUG View sections | `pubchem view ...` |
92
+ | Fetch a raw endpoint that has no higher-level verb | `pubchem raw fetch ...` |
93
+ | Export files or tabular data | `pubchem export ...` |
94
+ | Repeat a workflow across many IDs | `pubchem batch ...` |
95
+ | Discover exact flags and JSON shapes | `pubchem schema ...` |
96
+ | Generate shell completion | `pubchem completion ...` |
97
+ | Inspect stable exit codes | `pubchem agent exit-codes` |
98
+
99
+ ## Agent Playbook
66
100
 
67
- Version:
101
+ If you are an agent, use this sequence:
102
+
103
+ 1. Start with `pubchem schema <command>` when you need the exact flag surface.
104
+ 2. Use `pubchem agent exit-codes` when you need a stable automation contract.
105
+ 3. Use `--json` by default.
106
+ 4. Add `--results-only` when the downstream consumer wants the primary payload only.
107
+ 5. Add `--select` only with exact field paths that already exist in the JSON output.
108
+ 6. Use `--enable-commands` when you need a restricted command allowlist.
109
+ 7. Prefer CID-based structure workflows over name-based structure workflows.
110
+
111
+ ## Resolve vs Identity
112
+
113
+ Use `resolve` when the input is a name, registry number, InChIKey, or other identifier that may need normalization.
114
+
115
+ Use `identity` when you already have the exact record ID and want the explicit no-guess form.
116
+
117
+ Examples:
68
118
 
69
119
  ```bash
70
- $ pubchem --version
71
- pubchem dev
120
+ pubchem identity compound 2244
121
+ pubchem identity substance 92297672
122
+ pubchem identity assay 541
72
123
  ```
73
124
 
74
- Compound search:
125
+ ## Compound Workflows
126
+
127
+ Resolve a drug name to a CID:
75
128
 
76
129
  ```bash
77
- $ pubchem compound search aspirin --max 1
78
- {
79
- "mode": "name",
80
- "query": "aspirin",
81
- "totalFound": 1,
82
- "returned": 1,
83
- "pageSize": 1,
84
- "results": [
85
- {
86
- "cid": 2244,
87
- "properties": {
88
- "Charge": 0,
89
- "Complexity": 212,
90
- "ConnectivitySMILES": "CC(=O)OC1=CC=CC=C1C(=O)O",
91
- "HBondAcceptorCount": 4,
92
- "HBondDonorCount": 1,
93
- "HeavyAtomCount": 13,
94
- "IUPACName": "2-acetyloxybenzoic acid",
95
- "InChIKey": "BSYNRYMUTXBXSQ-UHFFFAOYSA-N",
96
- "MolecularFormula": "C9H8O4",
97
- "MolecularWeight": "180.16",
98
- "RotatableBondCount": 3,
99
- "SMILES": "CC(=O)OC1=CC=CC=C1C(=O)O",
100
- "TPSA": 63.6,
101
- "XLogP": 1.2
102
- }
130
+ pubchem resolve compound imatinib --json --results-only --max 1
131
+ ```
132
+
133
+ Sample output:
134
+
135
+ ```json
136
+ [
137
+ {
138
+ "cid": 5291,
139
+ "properties": {
140
+ "InChIKey": "KTUFNOKKBVMGRW-UHFFFAOYSA-N",
141
+ "MolecularFormula": "C29H31N7O",
142
+ "MolecularWeight": "493.6"
103
143
  }
104
- ]
105
- }
144
+ }
145
+ ]
146
+ ```
147
+
148
+ When you already know the CID, use exact identity:
149
+
150
+ ```bash
151
+ pubchem identity compound 2244 --json
152
+ ```
153
+
154
+ Get compound details and common pharma fields:
155
+
156
+ ```bash
157
+ pubchem compound get 2244 --synonyms --description --classification --drug-likeness --json
158
+ ```
159
+
160
+ Pull a compact property panel:
161
+
162
+ ```bash
163
+ pubchem compound properties 2244 --json --results-only --select=cid,properties.MolecularFormula
164
+ ```
165
+
166
+ Sample output:
167
+
168
+ ```json
169
+ [
170
+ {
171
+ "cid": 2244,
172
+ "properties.MolecularFormula": "C9H8O4"
173
+ }
174
+ ]
175
+ ```
176
+
177
+ Search compounds by name:
178
+
179
+ ```bash
180
+ pubchem compound search aspirin --max 1
181
+ ```
182
+
183
+ Search compounds by formula:
184
+
185
+ ```bash
186
+ pubchem compound search C9H8O4 --mode formula --allow-other-elements --max 10
187
+ ```
188
+
189
+ Run SAR-style similarity search from a CID:
190
+
191
+ ```bash
192
+ pubchem compound similar 5291 --query-type cid --threshold 90 --max 20
193
+ ```
194
+
195
+ Run structure searches from a CID:
196
+
197
+ ```bash
198
+ pubchem compound substructure 5291 --query-type cid --max 20
199
+ pubchem compound superstructure 5291 --query-type cid --max 20
200
+ ```
201
+
202
+ Important rule:
203
+
204
+ - Do not feed a plain drug name directly into `compound similar`, `compound substructure`, or `compound superstructure` and expect the CLI to guess.
205
+ - If you start from a name, run `pubchem resolve compound <name>` first, then feed the resulting CID into the structure search.
206
+ - This is the deterministic, agent-safe path.
207
+
208
+ Name-to-SAR recipe:
209
+
210
+ ```bash
211
+ pubchem resolve compound imatinib --json --results-only --select=cid --max 1
212
+ pubchem compound similar <cid> --query-type cid --max 20
213
+ ```
214
+
215
+ Structure files and metadata:
216
+
217
+ ```bash
218
+ pubchem compound structure 2244 --record-type 3d --format json
219
+ pubchem compound structure 2244 --record-type 3d --format sdf --out aspirin-3d.sdf
220
+ pubchem compound structure 2244 --record-type 2d --format mol --out aspirin-2d.mol
221
+ ```
222
+
223
+ Compound images:
224
+
225
+ ```bash
226
+ pubchem compound image 2244 --inline --json
227
+ pubchem compound image 2244 --out aspirin.png --force
228
+ ```
229
+
230
+ Compound xrefs and safety:
231
+
232
+ ```bash
233
+ pubchem compound xref 2244
234
+ pubchem compound safety 2244
235
+ ```
236
+
237
+ ## Substance Workflows
238
+
239
+ Resolve a substance by SID:
240
+
241
+ ```bash
242
+ pubchem resolve substance 92297672 --json
243
+ ```
244
+
245
+ Search substances by name:
246
+
247
+ ```bash
248
+ pubchem substance search aspirin --max 20
249
+ ```
250
+
251
+ Search substances by depositor source and source ID:
252
+
253
+ ```bash
254
+ pubchem substance sourceid ChemIDplus 0002153982 --max 20
255
+ ```
256
+
257
+ Search substances by registry xref:
258
+
259
+ ```bash
260
+ pubchem substance search D41527A7-A9EB-472D-A7FC-312821130549 --mode xref --xref-type RegistryID
261
+ ```
262
+
263
+ Map substance records to compounds:
264
+
265
+ ```bash
266
+ pubchem substance cids 92297672
267
+ ```
268
+
269
+ Fetch substance xrefs for registry normalization:
270
+
271
+ ```bash
272
+ pubchem substance xref 92297672 --types RegistryID,DBURL,SBURL
273
+ ```
274
+
275
+ Batch substance workflows:
276
+
277
+ ```bash
278
+ pubchem batch substance cids 92297672 135052148 --progress
279
+ pubchem batch substance xref 92297672 135052148 --types RegistryID,DBURL,SBURL --progress
280
+ ```
281
+
282
+ ## Assay Workflows
283
+
284
+ Resolve an assay by AID or by target/name:
285
+
286
+ ```bash
287
+ pubchem resolve assay 541 --json
288
+ pubchem resolve assay EGFR --mode target --target-type genesymbol --json
289
+ ```
290
+
291
+ Search assays:
292
+
293
+ ```bash
294
+ pubchem assay search EGFR --mode target --target-type genesymbol --max 20
295
+ pubchem assay search viability --mode name --max 20
296
+ pubchem assay search ncgc --mode source --max 20
297
+ pubchem assay search kinase --mode keyword --max 20
298
+ ```
299
+
300
+ Fetch assay details and concise result tables:
301
+
302
+ ```bash
303
+ pubchem assay get 541
304
+ pubchem assay results 541 --outcome active --max 20
305
+ ```
306
+
307
+ Batch assay workflows:
308
+
309
+ ```bash
310
+ pubchem batch assay get 541 542 --progress
311
+ pubchem batch assay results 541 542 --outcome active --progress
312
+ ```
313
+
314
+ ## References, Literature, and Patents
315
+
316
+ Search PubMed:
317
+
318
+ ```bash
319
+ pubchem refs search aspirin --max 20
320
+ ```
321
+
322
+ Fetch PubMed metadata by PMID:
323
+
324
+ ```bash
325
+ pubchem refs get 22385 --json --results-only --select=url
326
+ ```
327
+
328
+ Sample output:
329
+
330
+ ```json
331
+ [
332
+ {
333
+ "url": "https://pubmed.ncbi.nlm.nih.gov/22385/"
334
+ }
335
+ ]
336
+ ```
337
+
338
+ Pull compound-linked literature:
339
+
340
+ ```bash
341
+ pubchem refs literature compound 2244 --max 20
342
+ ```
343
+
344
+ Pull compound-linked patents:
345
+
346
+ ```bash
347
+ pubchem refs patents compound 2244 --max 20
348
+ ```
349
+
350
+ Pull compound and substance xrefs through the reference surface:
351
+
352
+ ```bash
353
+ pubchem refs external compound 2244
354
+ pubchem refs external substance 92297672
355
+ ```
356
+
357
+ When you need a bibliography or patent landscape, prefer `refs` over raw PubMed scraping. It already normalizes the citation records.
358
+
359
+ ## PUG View Browsing
360
+
361
+ Use `view` when you want headings and sections, not just raw IDs.
362
+
363
+ Get or browse a compound record:
364
+
365
+ ```bash
366
+ pubchem view get compound 2244
367
+ pubchem view browse compound 2244
106
368
  ```
107
369
 
108
- 3D structure metadata:
370
+ Search inside a compound record:
109
371
 
110
372
  ```bash
111
- $ pubchem compound structure 2244 --record-type 3d --format json
373
+ pubchem view search compound 2244 Safety --json --max 1
374
+ ```
375
+
376
+ Sample output:
377
+
378
+ ```json
112
379
  {
113
- "recordType": "3d",
114
- "downloadUrl": "https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/2244/SDF?record_type=3d",
380
+ "entityType": "compound",
381
+ "identifier": "2244",
382
+ "query": "Safety",
383
+ "totalFound": 20,
384
+ "pageSize": 1,
385
+ "truncated": true,
115
386
  "results": [
116
387
  {
117
- "cid": 2244,
118
- "structure": {
119
- "title": "Aspirin",
120
- "molecularFormula": "C9H8O4",
121
- "molecularWeight": 180.16,
122
- "inchiKey": "BSYNRYMUTXBXSQ-UHFFFAOYSA-N",
123
- "has3d": true
124
- }
388
+ "path": "Chemical and Physical Properties",
389
+ "tocHeading": "Chemical and Physical Properties",
390
+ "description": "Various chemical and physical properties that are experimentally determined for this compound."
125
391
  }
126
392
  ]
127
393
  }
128
394
  ```
129
395
 
130
- For the full command tree and machine-readable schema, use:
396
+ Browse specific sections:
131
397
 
132
398
  ```bash
133
- pubchem --help
134
- pubchem schema
399
+ pubchem view section compound 2244 --heading "Safety and Hazards"
400
+ pubchem view section compound 2244 --heading "Names and Identifiers"
401
+ ```
402
+
403
+ The same patterns work for substances and assays.
404
+
405
+ ## Raw Escape Hatch
406
+
407
+ Use `raw fetch` only when the CLI does not yet have a dedicated verb.
408
+
409
+ ```bash
410
+ pubchem raw fetch pug /compound/cid/2244/JSON
411
+ pubchem raw fetch view /data/compound/2244/JSON
412
+ pubchem raw fetch pubmed /esummary.fcgi?db=pubmed&id=22385&retmode=json
413
+ ```
414
+
415
+ You can also pass an allowed full URL, but only for PubChem and NCBI hosts.
416
+
417
+ ## Exports
418
+
419
+ Export compound structures:
420
+
421
+ ```bash
422
+ pubchem export compound structure 2244 --format smiles
423
+ pubchem export compound structure 2244 --record-type 3d --format sdf --out aspirin-3d.sdf
424
+ pubchem export compound structure 2244 --record-type 2d --format mol --out aspirin-2d.mol
425
+ ```
426
+
427
+ Export compound properties:
428
+
429
+ ```bash
430
+ pubchem export compound properties 2244 5291 --properties MolecularFormula,MolecularWeight --format tsv --out compounds.tsv
431
+ ```
432
+
433
+ Export assay results:
434
+
435
+ ```bash
436
+ pubchem export assay results 541 --outcome active --format csv --out assay-results.csv
437
+ ```
438
+
439
+ ## Batch Automation
440
+
441
+ Use `batch` when you have multiple IDs and want one command to do the same work repeatedly.
442
+
443
+ ```bash
444
+ pubchem batch compound get 2244 5291 --progress
445
+ pubchem batch compound properties 2244 5291 --properties MolecularFormula,MolecularWeight --progress
446
+ pubchem batch compound bioactivity 2244 5291 --outcome active --progress
447
+ pubchem batch compound xref 2244 5291 --progress
448
+ ```
449
+
450
+ Progress goes to stderr so stdout stays machine-readable.
451
+
452
+ ## Selection and JSON Shape
453
+
454
+ `--select` is exact-path only. There is no fuzzy matching.
455
+
456
+ Good examples:
457
+
458
+ ```bash
459
+ pubchem compound search aspirin --json --results-only --select=cid,properties.MolecularFormula --max 1
460
+ pubchem refs get 22385 --json --results-only --select=url
461
+ pubchem substance search aspirin --json --results-only --select=sid --max 1
462
+ ```
463
+
464
+ If you need a nested field, use the exact path that appears in the JSON output.
465
+
466
+ ## Exit Codes
467
+
468
+ Use `pubchem agent exit-codes` for automation.
469
+
470
+ The current stable codes are:
471
+
472
+ - `0` success
473
+ - `1` generic error
474
+ - `2` usage error
475
+ - `3` not found
476
+ - `4` timeout
477
+ - `5` command disabled by `--enable-commands`
478
+
479
+ ## Shell Completion
480
+
481
+ ```bash
482
+ pubchem completion bash
483
+ pubchem completion zsh
484
+ pubchem completion fish
485
+ pubchem completion powershell
486
+ ```
487
+
488
+ ## Common Limits and Good Practice
489
+
490
+ - Prefer CID-based structure workflows whenever you can.
491
+ - If you start from a name, resolve first and reuse the CID.
492
+ - Broad SMILES similarity and substructure searches can still fail or time out upstream at PubChem.
493
+ - The CLI now reports those failures cleanly instead of hiding them behind parser noise.
494
+ - `raw fetch` is intentionally allowlisted; it is an escape hatch, not a free-for-all.
495
+ - Use `schema` instead of guessing flags.
496
+
497
+ ## Sample Command Set For Agents
498
+
499
+ If you want a small, practical agent allowlist:
500
+
501
+ ```bash
502
+ pubchem --enable-commands compound,substance,assay,refs,view,raw,resolve,identity,batch,export,schema,agent,completion compound search aspirin --max 1
503
+ ```
504
+
505
+ That style keeps the command surface predictable while still covering most PubChem work.
506
+
507
+ ## More Examples
508
+
509
+ Find a structure, then inspect it, then export a file:
510
+
511
+ ```bash
512
+ pubchem resolve compound imatinib --json --results-only --max 1
513
+ pubchem compound get 5291 --synonyms --classification --json
514
+ pubchem export compound structure 5291 --record-type 3d --format sdf --out imatinib-3d.sdf
515
+ ```
516
+
517
+ Trace a compound into the literature:
518
+
519
+ ```bash
520
+ pubchem refs literature compound 5291 --max 20
521
+ pubchem refs patents compound 5291 --max 20
522
+ ```
523
+
524
+ Move from an assay target to a hit list:
525
+
526
+ ```bash
527
+ pubchem assay search EGFR --mode target --target-type genesymbol --max 20
528
+ pubchem assay results 3364 --outcome active --max 20
529
+ ```
530
+
531
+ Inspect a PubChem record tree and then pull only one section:
532
+
533
+ ```bash
534
+ pubchem view search compound 2244 Hazard --max 5
535
+ pubchem view section compound 2244 --heading "Safety and Hazards"
536
+ ```
537
+
538
+ ## For New Users
539
+
540
+ If you only remember three commands, remember these:
541
+
542
+ ```bash
543
+ pubchem resolve compound aspirin
544
+ pubchem compound get 2244
545
+ pubchem view search compound 2244 Safety
135
546
  ```
136
547
 
137
- If you want the exact command and flag surface for automation, prefer `pubchem schema <command>` and `pubchem --help` over guessing from memory.
548
+ Everything else in the CLI builds from those patterns.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "pubchem-cli",
3
- "version": "0.1.2",
3
+ "version": "0.1.4",
4
4
  "description": "PubChem in your terminal.",
5
5
  "repository": {
6
6
  "type": "git",
Binary file
Binary file
Binary file
Binary file