pubchem-cli 0.1.3 → 0.1.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -6,6 +6,8 @@ PubChem in your terminal.
6
6
 
7
7
  The npm package ships prebuilt binaries for Linux, macOS, and Windows on `amd64` and `arm64`, so `npx pubchem-cli` works directly on common platforms.
8
8
 
9
+ NPM package: [pubchem-cli on npm](https://www.npmjs.com/package/pubchem-cli)
10
+
9
11
  ## Install
10
12
 
11
13
  Run without installing:
@@ -18,15 +20,30 @@ Global install:
18
20
 
19
21
  ```bash
20
22
  npm install -g pubchem-cli
23
+ pubchem --help
21
24
  ```
22
25
 
23
- For local development in this repo:
26
+ If you are hacking on this repository locally:
24
27
 
25
28
  ```bash
26
29
  go build -o ./pubchem ./cmd/pubchem
27
30
  ./pubchem --help
28
31
  ```
29
32
 
33
+ ## Mental Model
34
+
35
+ Use the CLI in this order:
36
+
37
+ 1. Resolve an identifier to a record when you start from a name, CAS RN, InChIKey, or source ID.
38
+ 2. Use exact identity commands when you already have the CID, SID, or AID.
39
+ 3. Use search commands when you need candidate matches.
40
+ 4. Use `view` when you need headings and sections from PUG View.
41
+ 5. Use `raw` only when there is no higher-level verb yet.
42
+ 6. Use `export` when you need files.
43
+ 7. Use `batch` when you need many records.
44
+
45
+ This CLI does not guess silently. That is intentional.
46
+
30
47
  ## Usage
31
48
 
32
49
  ```bash
@@ -38,8 +55,8 @@ Root flags:
38
55
  - `--json`: Output JSON to stdout.
39
56
  - `--plain`: Output stable plain text to stdout.
40
57
  - `--results-only`: In JSON mode, emit only the primary result.
41
- - `--select`: In JSON mode, select comma-separated fields with dot-path support.
42
- - `--force`: Skip confirmations for future destructive commands.
58
+ - `--select`: In JSON mode, select comma-separated fields with exact dot-path support.
59
+ - `--force`: Skip confirmations for future destructive commands and file overwrites.
43
60
  - `--no-input`: Never prompt; fail instead.
44
61
  - `--enable-commands`: Comma-separated list of enabled top-level commands.
45
62
  - `--version`: Print version and exit.
@@ -62,76 +79,472 @@ Top-level command families:
62
79
  - `agent`
63
80
  - `completion`
64
81
 
65
- ## Examples
82
+ ## What To Use
83
+
84
+ | Task | Command |
85
+ |---|---|
86
+ | Resolve a drug name, CAS RN, InChI, or InChIKey | `pubchem resolve compound ...` |
87
+ | Exact compound lookup by CID | `pubchem identity compound ...` or `pubchem compound get ...` |
88
+ | Similarity search around a known structure | `pubchem compound similar <cid> --query-type cid` |
89
+ | Substructure or superstructure search | `pubchem compound substructure <cid> --query-type cid` or `pubchem compound superstructure <cid> --query-type cid` |
90
+ | Resolve a substance by SID or depositor source ID | `pubchem resolve substance ...` or `pubchem substance sourceid ...` |
91
+ | Search assays by gene, protein, name, source, or keyword | `pubchem assay search ...` |
92
+ | PubMed, patents, and xrefs | `pubchem refs ...` |
93
+ | Browse nested PUG View sections | `pubchem view ...` |
94
+ | Fetch a raw endpoint that has no higher-level verb | `pubchem raw fetch ...` |
95
+ | Export files or tabular data | `pubchem export ...` |
96
+ | Repeat a workflow across many IDs | `pubchem batch ...` |
97
+ | Discover exact flags and JSON shapes | `pubchem schema ...` |
98
+ | Generate shell completion | `pubchem completion ...` |
99
+ | Inspect stable exit codes | `pubchem agent exit-codes` |
100
+
101
+ ## Agent Playbook
66
102
 
67
- Version:
103
+ If you are an agent, use this sequence:
104
+
105
+ 1. Start with `pubchem schema <command>` when you need the exact flag surface.
106
+ 2. Use `pubchem agent exit-codes` when you need a stable automation contract.
107
+ 3. Use `--json` by default.
108
+ 4. Add `--results-only` when the downstream consumer wants the primary payload only.
109
+ 5. Add `--select` only with exact field paths that already exist in the JSON output.
110
+ 6. Use `--enable-commands` when you need a restricted command allowlist.
111
+ 7. Prefer CID-based structure workflows over name-based structure workflows.
112
+
113
+ ## Resolve vs Identity
114
+
115
+ Use `resolve` when the input is a name, registry number, InChIKey, or other identifier that may need normalization.
116
+
117
+ Use `identity` when you already have the exact record ID and want the explicit no-guess form.
118
+
119
+ Examples:
68
120
 
69
121
  ```bash
70
- $ pubchem --version
71
- pubchem dev
122
+ pubchem identity compound 2244
123
+ pubchem identity substance 92297672
124
+ pubchem identity assay 541
72
125
  ```
73
126
 
74
- Compound search:
127
+ ## Compound Workflows
128
+
129
+ Resolve a drug name to a CID:
75
130
 
76
131
  ```bash
77
- $ pubchem compound search aspirin --max 1
78
- {
79
- "mode": "name",
80
- "query": "aspirin",
81
- "totalFound": 1,
82
- "returned": 1,
83
- "pageSize": 1,
84
- "results": [
85
- {
86
- "cid": 2244,
87
- "properties": {
88
- "Charge": 0,
89
- "Complexity": 212,
90
- "ConnectivitySMILES": "CC(=O)OC1=CC=CC=C1C(=O)O",
91
- "HBondAcceptorCount": 4,
92
- "HBondDonorCount": 1,
93
- "HeavyAtomCount": 13,
94
- "IUPACName": "2-acetyloxybenzoic acid",
95
- "InChIKey": "BSYNRYMUTXBXSQ-UHFFFAOYSA-N",
96
- "MolecularFormula": "C9H8O4",
97
- "MolecularWeight": "180.16",
98
- "RotatableBondCount": 3,
99
- "SMILES": "CC(=O)OC1=CC=CC=C1C(=O)O",
100
- "TPSA": 63.6,
101
- "XLogP": 1.2
102
- }
132
+ pubchem resolve compound imatinib --json --results-only --max 1
133
+ ```
134
+
135
+ Sample output:
136
+
137
+ ```json
138
+ [
139
+ {
140
+ "cid": 5291,
141
+ "properties": {
142
+ "InChIKey": "KTUFNOKKBVMGRW-UHFFFAOYSA-N",
143
+ "MolecularFormula": "C29H31N7O",
144
+ "MolecularWeight": "493.6"
103
145
  }
104
- ]
105
- }
146
+ }
147
+ ]
106
148
  ```
107
149
 
108
- 3D structure metadata:
150
+ When you already know the CID, use exact identity:
109
151
 
110
152
  ```bash
111
- $ pubchem compound structure 2244 --record-type 3d --format json
153
+ pubchem identity compound 2244 --json
154
+ ```
155
+
156
+ Get compound details and common pharma fields:
157
+
158
+ ```bash
159
+ pubchem compound get 2244 --synonyms --description --classification --drug-likeness --json
160
+ ```
161
+
162
+ Pull a compact property panel:
163
+
164
+ ```bash
165
+ pubchem compound properties 2244 --json --results-only --select=cid,properties.MolecularFormula
166
+ ```
167
+
168
+ Sample output:
169
+
170
+ ```json
171
+ [
172
+ {
173
+ "cid": 2244,
174
+ "properties.MolecularFormula": "C9H8O4"
175
+ }
176
+ ]
177
+ ```
178
+
179
+ Search compounds by name:
180
+
181
+ ```bash
182
+ pubchem compound search aspirin --max 1
183
+ ```
184
+
185
+ Search compounds by formula:
186
+
187
+ ```bash
188
+ pubchem compound search C9H8O4 --mode formula --allow-other-elements --max 10
189
+ ```
190
+
191
+ Run SAR-style similarity search from a CID:
192
+
193
+ ```bash
194
+ pubchem compound similar 5291 --query-type cid --threshold 90 --max 20
195
+ ```
196
+
197
+ Run structure searches from a CID:
198
+
199
+ ```bash
200
+ pubchem compound substructure 5291 --query-type cid --max 20
201
+ pubchem compound superstructure 5291 --query-type cid --max 20
202
+ ```
203
+
204
+ Important rule:
205
+
206
+ - Do not feed a plain drug name directly into `compound similar`, `compound substructure`, or `compound superstructure` and expect the CLI to guess.
207
+ - If you start from a name, run `pubchem resolve compound <name>` first, then feed the resulting CID into the structure search.
208
+ - This is the deterministic, agent-safe path.
209
+
210
+ Name-to-SAR recipe:
211
+
212
+ ```bash
213
+ pubchem resolve compound imatinib --json --results-only --select=cid --max 1
214
+ pubchem compound similar <cid> --query-type cid --max 20
215
+ ```
216
+
217
+ Structure files and metadata:
218
+
219
+ ```bash
220
+ pubchem compound structure 2244 --record-type 3d --format json
221
+ pubchem compound structure 2244 --record-type 3d --format sdf --out aspirin-3d.sdf
222
+ pubchem compound structure 2244 --record-type 2d --format mol --out aspirin-2d.mol
223
+ ```
224
+
225
+ Compound images:
226
+
227
+ ```bash
228
+ pubchem compound image 2244 --inline --json
229
+ pubchem compound image 2244 --out aspirin.png --force
230
+ ```
231
+
232
+ Compound xrefs and safety:
233
+
234
+ ```bash
235
+ pubchem compound xref 2244
236
+ pubchem compound safety 2244
237
+ ```
238
+
239
+ ## Substance Workflows
240
+
241
+ Resolve a substance by SID:
242
+
243
+ ```bash
244
+ pubchem resolve substance 92297672 --json
245
+ ```
246
+
247
+ Search substances by name:
248
+
249
+ ```bash
250
+ pubchem substance search aspirin --max 20
251
+ ```
252
+
253
+ Search substances by depositor source and source ID:
254
+
255
+ ```bash
256
+ pubchem substance sourceid ChemIDplus 0002153982 --max 20
257
+ ```
258
+
259
+ Search substances by registry xref:
260
+
261
+ ```bash
262
+ pubchem substance search D41527A7-A9EB-472D-A7FC-312821130549 --mode xref --xref-type RegistryID
263
+ ```
264
+
265
+ Map substance records to compounds:
266
+
267
+ ```bash
268
+ pubchem substance cids 92297672
269
+ ```
270
+
271
+ Fetch substance xrefs for registry normalization:
272
+
273
+ ```bash
274
+ pubchem substance xref 92297672 --types RegistryID,DBURL,SBURL
275
+ ```
276
+
277
+ Batch substance workflows:
278
+
279
+ ```bash
280
+ pubchem batch substance cids 92297672 135052148 --progress
281
+ pubchem batch substance xref 92297672 135052148 --types RegistryID,DBURL,SBURL --progress
282
+ ```
283
+
284
+ ## Assay Workflows
285
+
286
+ Resolve an assay by AID or by target/name:
287
+
288
+ ```bash
289
+ pubchem resolve assay 541 --json
290
+ pubchem resolve assay EGFR --mode target --target-type genesymbol --json
291
+ ```
292
+
293
+ Search assays:
294
+
295
+ ```bash
296
+ pubchem assay search EGFR --mode target --target-type genesymbol --max 20
297
+ pubchem assay search viability --mode name --max 20
298
+ pubchem assay search ncgc --mode source --max 20
299
+ pubchem assay search kinase --mode keyword --max 20
300
+ ```
301
+
302
+ Fetch assay details and concise result tables:
303
+
304
+ ```bash
305
+ pubchem assay get 541
306
+ pubchem assay results 541 --outcome active --max 20
307
+ ```
308
+
309
+ Batch assay workflows:
310
+
311
+ ```bash
312
+ pubchem batch assay get 541 542 --progress
313
+ pubchem batch assay results 541 542 --outcome active --progress
314
+ ```
315
+
316
+ ## References, Literature, and Patents
317
+
318
+ Search PubMed:
319
+
320
+ ```bash
321
+ pubchem refs search aspirin --max 20
322
+ ```
323
+
324
+ Fetch PubMed metadata by PMID:
325
+
326
+ ```bash
327
+ pubchem refs get 22385 --json --results-only --select=url
328
+ ```
329
+
330
+ Sample output:
331
+
332
+ ```json
333
+ [
334
+ {
335
+ "url": "https://pubmed.ncbi.nlm.nih.gov/22385/"
336
+ }
337
+ ]
338
+ ```
339
+
340
+ Pull compound-linked literature:
341
+
342
+ ```bash
343
+ pubchem refs literature compound 2244 --max 20
344
+ ```
345
+
346
+ Pull compound-linked patents:
347
+
348
+ ```bash
349
+ pubchem refs patents compound 2244 --max 20
350
+ ```
351
+
352
+ Pull compound and substance xrefs through the reference surface:
353
+
354
+ ```bash
355
+ pubchem refs external compound 2244
356
+ pubchem refs external substance 92297672
357
+ ```
358
+
359
+ When you need a bibliography or patent landscape, prefer `refs` over raw PubMed scraping. It already normalizes the citation records.
360
+
361
+ ## PUG View Browsing
362
+
363
+ Use `view` when you want headings and sections, not just raw IDs.
364
+
365
+ Get or browse a compound record:
366
+
367
+ ```bash
368
+ pubchem view get compound 2244
369
+ pubchem view browse compound 2244
370
+ ```
371
+
372
+ Search inside a compound record:
373
+
374
+ ```bash
375
+ pubchem view search compound 2244 Safety --json --max 1
376
+ ```
377
+
378
+ Sample output:
379
+
380
+ ```json
112
381
  {
113
- "recordType": "3d",
114
- "downloadUrl": "https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/2244/SDF?record_type=3d",
382
+ "entityType": "compound",
383
+ "identifier": "2244",
384
+ "query": "Safety",
385
+ "totalFound": 20,
386
+ "pageSize": 1,
387
+ "truncated": true,
115
388
  "results": [
116
389
  {
117
- "cid": 2244,
118
- "structure": {
119
- "title": "Aspirin",
120
- "molecularFormula": "C9H8O4",
121
- "molecularWeight": 180.16,
122
- "inchiKey": "BSYNRYMUTXBXSQ-UHFFFAOYSA-N",
123
- "has3d": true
124
- }
390
+ "path": "Chemical and Physical Properties",
391
+ "tocHeading": "Chemical and Physical Properties",
392
+ "description": "Various chemical and physical properties that are experimentally determined for this compound."
125
393
  }
126
394
  ]
127
395
  }
128
396
  ```
129
397
 
130
- For the full command tree and machine-readable schema, use:
398
+ Browse specific sections:
131
399
 
132
400
  ```bash
133
- pubchem --help
134
- pubchem schema
401
+ pubchem view section compound 2244 --heading "Safety and Hazards"
402
+ pubchem view section compound 2244 --heading "Names and Identifiers"
403
+ ```
404
+
405
+ The same patterns work for substances and assays.
406
+
407
+ ## Raw Escape Hatch
408
+
409
+ Use `raw fetch` only when the CLI does not yet have a dedicated verb.
410
+
411
+ ```bash
412
+ pubchem raw fetch pug /compound/cid/2244/JSON
413
+ pubchem raw fetch view /data/compound/2244/JSON
414
+ pubchem raw fetch pubmed /esummary.fcgi?db=pubmed&id=22385&retmode=json
415
+ ```
416
+
417
+ You can also pass an allowed full URL, but only for PubChem and NCBI hosts.
418
+
419
+ ## Exports
420
+
421
+ Export compound structures:
422
+
423
+ ```bash
424
+ pubchem export compound structure 2244 --format smiles
425
+ pubchem export compound structure 2244 --record-type 3d --format sdf --out aspirin-3d.sdf
426
+ pubchem export compound structure 2244 --record-type 2d --format mol --out aspirin-2d.mol
427
+ ```
428
+
429
+ Export compound properties:
430
+
431
+ ```bash
432
+ pubchem export compound properties 2244 5291 --properties MolecularFormula,MolecularWeight --format tsv --out compounds.tsv
433
+ ```
434
+
435
+ Export assay results:
436
+
437
+ ```bash
438
+ pubchem export assay results 541 --outcome active --format csv --out assay-results.csv
439
+ ```
440
+
441
+ ## Batch Automation
442
+
443
+ Use `batch` when you have multiple IDs and want one command to do the same work repeatedly.
444
+
445
+ ```bash
446
+ pubchem batch compound get 2244 5291 --progress
447
+ pubchem batch compound properties 2244 5291 --properties MolecularFormula,MolecularWeight --progress
448
+ pubchem batch compound bioactivity 2244 5291 --outcome active --progress
449
+ pubchem batch compound xref 2244 5291 --progress
450
+ ```
451
+
452
+ Progress goes to stderr so stdout stays machine-readable.
453
+
454
+ ## Selection and JSON Shape
455
+
456
+ `--select` is exact-path only. There is no fuzzy matching.
457
+
458
+ Good examples:
459
+
460
+ ```bash
461
+ pubchem compound search aspirin --json --results-only --select=cid,properties.MolecularFormula --max 1
462
+ pubchem refs get 22385 --json --results-only --select=url
463
+ pubchem substance search aspirin --json --results-only --select=sid --max 1
464
+ ```
465
+
466
+ If you need a nested field, use the exact path that appears in the JSON output.
467
+
468
+ ## Exit Codes
469
+
470
+ Use `pubchem agent exit-codes` for automation.
471
+
472
+ The current stable codes are:
473
+
474
+ - `0` success
475
+ - `1` generic error
476
+ - `2` usage error
477
+ - `3` not found
478
+ - `4` timeout
479
+ - `5` command disabled by `--enable-commands`
480
+
481
+ ## Shell Completion
482
+
483
+ ```bash
484
+ pubchem completion bash
485
+ pubchem completion zsh
486
+ pubchem completion fish
487
+ pubchem completion powershell
488
+ ```
489
+
490
+ ## Common Limits and Good Practice
491
+
492
+ - Prefer CID-based structure workflows whenever you can.
493
+ - If you start from a name, resolve first and reuse the CID.
494
+ - Broad SMILES similarity and substructure searches can still fail or time out upstream at PubChem.
495
+ - The CLI now reports those failures cleanly instead of hiding them behind parser noise.
496
+ - `raw fetch` is intentionally allowlisted; it is an escape hatch, not a free-for-all.
497
+ - Use `schema` instead of guessing flags.
498
+
499
+ ## Sample Command Set For Agents
500
+
501
+ If you want a small, practical agent allowlist:
502
+
503
+ ```bash
504
+ pubchem --enable-commands compound,substance,assay,refs,view,raw,resolve,identity,batch,export,schema,agent,completion compound search aspirin --max 1
505
+ ```
506
+
507
+ That style keeps the command surface predictable while still covering most PubChem work.
508
+
509
+ ## More Examples
510
+
511
+ Find a structure, then inspect it, then export a file:
512
+
513
+ ```bash
514
+ pubchem resolve compound imatinib --json --results-only --max 1
515
+ pubchem compound get 5291 --synonyms --classification --json
516
+ pubchem export compound structure 5291 --record-type 3d --format sdf --out imatinib-3d.sdf
517
+ ```
518
+
519
+ Trace a compound into the literature:
520
+
521
+ ```bash
522
+ pubchem refs literature compound 5291 --max 20
523
+ pubchem refs patents compound 5291 --max 20
524
+ ```
525
+
526
+ Move from an assay target to a hit list:
527
+
528
+ ```bash
529
+ pubchem assay search EGFR --mode target --target-type genesymbol --max 20
530
+ pubchem assay results 3364 --outcome active --max 20
531
+ ```
532
+
533
+ Inspect a PubChem record tree and then pull only one section:
534
+
535
+ ```bash
536
+ pubchem view search compound 2244 Hazard --max 5
537
+ pubchem view section compound 2244 --heading "Safety and Hazards"
538
+ ```
539
+
540
+ ## For New Users
541
+
542
+ If you only remember three commands, remember these:
543
+
544
+ ```bash
545
+ pubchem resolve compound aspirin
546
+ pubchem compound get 2244
547
+ pubchem view search compound 2244 Safety
135
548
  ```
136
549
 
137
- If you want the exact command and flag surface for automation, prefer `pubchem schema <command>` and `pubchem --help` over guessing from memory.
550
+ Everything else in the CLI builds from those patterns.
package/package.json CHANGED
@@ -1,10 +1,10 @@
1
1
  {
2
2
  "name": "pubchem-cli",
3
- "version": "0.1.3",
3
+ "version": "0.1.7",
4
4
  "description": "PubChem in your terminal.",
5
5
  "repository": {
6
6
  "type": "git",
7
- "url": "https://github.com/siddheshkothadi/pubchem-cli.git"
7
+ "url": "https://github.com/BrainGnosis/pubchem-cli.git"
8
8
  },
9
9
  "bin": {
10
10
  "pubchem": "bin/pubchem.js"
Binary file
Binary file
Binary file
Binary file