smiles-js 1.0.5 → 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/settings.local.json +11 -1
- package/API.md +443 -0
- package/IMPLEMENTATION_STATUS.md +27 -27
- package/README.md +33 -367
- package/bunfig.toml +8 -0
- package/docs/SAMPLE_PROMPT.md +1 -0
- package/docs/atorvastatin-named.js +17 -12
- package/docs/atorvastatin-synthesis.js +15 -10
- package/docs/esomeprazole-showcase.js +62 -57
- package/docs/readable-generated-code/AFTER_ACTION_REPORT.md +206 -0
- package/docs/readable-generated-code/EXECUTION_PLAN.md +275 -0
- package/docs/readable-generated-code/READABLE_GENERATED_CODE.md +255 -0
- package/docs/readable-generated-code/REFACTOR_PLAN.md +311 -0
- package/docs/readable-generated-code/UPDATED_PLAN.md +249 -0
- package/docs/ritonavir-synthesis.js +45 -38
- package/docs/roundtrip-validation-demo.js +67 -62
- package/docs/sildenafil-synthesis.js +28 -21
- package/docs/test-drive.js +21 -16
- package/examples/basic-usage.js +31 -26
- package/examples/decompiler-demo.js +34 -29
- package/examples/fused-ring-manipulation.js +72 -67
- package/examples/linear-manipulation.js +67 -62
- package/examples/parser-usage.js +68 -63
- package/package.json +1 -1
- package/src/ast.test.js +31 -0
- package/src/clone-utils.js +51 -0
- package/src/codegen/branch-crossing-ring.js +116 -0
- package/src/codegen/branch-crossing-ring.test.js +488 -0
- package/src/codegen/branch-walker.js +138 -0
- package/src/codegen/index.js +12 -0
- package/src/codegen/index.test.js +39 -0
- package/src/codegen/interleaved-fused-ring.js +173 -0
- package/src/codegen/interleaved-fused-ring.test.js +574 -0
- package/src/codegen/simple-fused-ring.js +123 -0
- package/src/codegen/simple-fused-ring.test.js +307 -0
- package/src/codegen/smiles-codegen-core.js +226 -0
- package/src/codegen/smiles-codegen-core.test.js +150 -0
- package/src/common.js +21 -11
- package/src/common.test.js +11 -0
- package/src/constructors.js +43 -788
- package/src/constructors.test.js +770 -1
- package/src/decompiler.js +691 -572
- package/src/decompiler.test.js +671 -0
- package/src/fragment.js +10 -4
- package/src/index.js +2 -2
- package/src/index.test.js +45 -0
- package/src/layout/index.js +780 -0
- package/src/layout/index.test.js +56 -0
- package/src/manipulation.js +135 -28
- package/src/manipulation.test.js +17 -0
- package/src/metadata.js +32 -0
- package/src/method-attachers.js +211 -0
- package/src/node-creators.js +97 -0
- package/src/parser/ast-builder.js +341 -0
- package/src/parser/atom-builder.js +177 -0
- package/src/parser/atom-builder.test.js +251 -0
- package/src/parser/branch-utils.js +128 -0
- package/src/parser/branch-utils.test.js +289 -0
- package/src/parser/index.js +6 -0
- package/src/parser/index.test.js +1566 -0
- package/src/parser/ring-group-builder.js +561 -0
- package/src/parser/ring-node-builder.js +279 -0
- package/src/parser/ring-node-builder.test.js +11 -0
- package/src/parser/ring-utils.js +234 -0
- package/src/parser/ring-utils.test.js +452 -0
- package/src/parser/smiles-parser-core.js +184 -0
- package/src/parser/smiles-parser-core.test.js +296 -0
- package/src/roundtrip.js +17 -11
- package/src/roundtrip.test.js +220 -165
- package/src/sequential-rings.test.js +117 -0
- package/src/tokenizer.js +24 -53
- package/test-integration/__snapshots__/acetaminophen.test.js.snap +153 -0
- package/test-integration/__snapshots__/adjuvant-analgesics.test.js.snap +576 -0
- package/test-integration/__snapshots__/cholesterol-drugs.test.js.snap +2255 -0
- package/test-integration/__snapshots__/dexamethasone.test.js.snap +265 -0
- package/test-integration/__snapshots__/endocannabinoids.test.js.snap +751 -0
- package/test-integration/__snapshots__/hypertension-medication.test.js.snap +589 -0
- package/test-integration/__snapshots__/local-anesthetics.test.js.snap +800 -0
- package/test-integration/__snapshots__/nsaids-otc.test.js.snap +497 -0
- package/test-integration/__snapshots__/nsaids-prescription.test.js.snap +957 -0
- package/test-integration/__snapshots__/opioids.test.js.snap +1049 -0
- package/test-integration/__snapshots__/steroids.test.js.snap +3757 -0
- package/test-integration/acetaminophen.test.js +9 -109
- package/test-integration/adjuvant-analgesics.test.js +21 -343
- package/test-integration/cbd.test.js +472 -0
- package/test-integration/cholesterol-drugs.smiles.js +85 -0
- package/test-integration/cholesterol-drugs.test.js +414 -0
- package/test-integration/cortisone.test.js +326 -0
- package/test-integration/dexamethasone.test.js +415 -0
- package/test-integration/endocannabinoids.test.js +29 -417
- package/test-integration/etodolac.test.js +176 -0
- package/test-integration/ezetimibe.test.js +284 -0
- package/test-integration/fluvastatin.test.js +290 -0
- package/test-integration/hypertension-medication.smiles.js +4 -13
- package/test-integration/hypertension-medication.test.js +23 -528
- package/test-integration/ketorolac.test.js +269 -0
- package/test-integration/leading-bond.test.js +86 -0
- package/test-integration/local-anesthetics.test.js +34 -538
- package/test-integration/nsaids-otc.test.js +23 -347
- package/test-integration/nsaids-prescription.test.js +41 -637
- package/test-integration/omeprazole.test.js +357 -0
- package/test-integration/opioids.test.js +93 -671
- package/test-integration/roundtrip.test.js +174 -0
- package/test-integration/steroids.test.js +255 -1623
- package/test-integration/telmisartan.test.js +773 -0
- package/test-integration/thc.test.js +219 -0
- package/test-integration/utils.js +38 -8
- package/test-integration/valsartan.test.js +345 -0
- package/testimony.png +0 -0
- package/todo +5 -0
- package/src/codegen.js +0 -718
- package/src/parser.branch-tracking.test.js +0 -189
- package/src/parser.js +0 -1231
- package/src/parser.test.js +0 -641
- package/src/telmisartan.test.js +0 -277
|
@@ -22,7 +22,17 @@
|
|
|
22
22
|
"Bash(move sildenafil-synthesis.js docs )",
|
|
23
23
|
"Bash(move ritonavir-synthesis.js docs )",
|
|
24
24
|
"Bash(gh run list:*)",
|
|
25
|
-
"Bash(echo:*)"
|
|
25
|
+
"Bash(echo:*)",
|
|
26
|
+
"Bash(for file in test-integration/*.test.js)",
|
|
27
|
+
"Bash(do sed -i \"s/import { stripExports } from ''.\\\\/utils.js'';/import { stripExports, createFunction, executeCode } from ''.\\\\/utils.js'';/\" \"$file\")",
|
|
28
|
+
"Bash(done)",
|
|
29
|
+
"Bash(do sed -i \"s/factory = new Function\\(''Ring'', ''Linear'', ''FusedRing'', ''Molecule'', executableCode\\);/factory = createFunction\\(''Ring'', ''Linear'', ''FusedRing'', ''Molecule'', executableCode\\);/\" \"$file\")",
|
|
30
|
+
"Bash(wc:*)",
|
|
31
|
+
"Bash(Select-String -Pattern \"Valsartan\" -Context 10)",
|
|
32
|
+
"Bash(find:*)",
|
|
33
|
+
"Bash(Select-String -Pattern \"pass|fail|% Lines|All files\")",
|
|
34
|
+
"Bash(Select-String -Pattern \"\\(pass|fail\\)\")",
|
|
35
|
+
"Bash(Select-Object -Last 5)"
|
|
26
36
|
]
|
|
27
37
|
}
|
|
28
38
|
}
|
package/API.md
ADDED
|
@@ -0,0 +1,443 @@
|
|
|
1
|
+
# SMILES-JS API Reference
|
|
2
|
+
|
|
3
|
+
Full API documentation for smiles-js. For a quick introduction, see the [README](./README.md).
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Constructors
|
|
8
|
+
|
|
9
|
+
### `Ring(options)`
|
|
10
|
+
|
|
11
|
+
Create ring structures with substitutions and attachments.
|
|
12
|
+
|
|
13
|
+
**Options:**
|
|
14
|
+
|
|
15
|
+
| Parameter | Type | Default | Description |
|
|
16
|
+
|-----------|------|---------|-------------|
|
|
17
|
+
| `atoms` | `string` | required | Base atom type (e.g., `'c'`, `'C'`, `'N'`) |
|
|
18
|
+
| `size` | `number` | required | Number of atoms in the ring |
|
|
19
|
+
| `ringNumber` | `number` | `1` | Ring number for SMILES notation |
|
|
20
|
+
| `offset` | `number` | `0` | Offset for fused rings |
|
|
21
|
+
| `substitutions` | `object` | `{}` | Position -> atom substitutions |
|
|
22
|
+
| `attachments` | `object` | `{}` | Position -> attachment list |
|
|
23
|
+
| `bonds` | `array` | `[]` | Bond types between atoms |
|
|
24
|
+
|
|
25
|
+
```javascript
|
|
26
|
+
// Simple benzene
|
|
27
|
+
const benzene = Ring({ atoms: 'c', size: 6 });
|
|
28
|
+
|
|
29
|
+
// Pyridine (nitrogen at position 5)
|
|
30
|
+
const pyridine = Ring({
|
|
31
|
+
atoms: 'c',
|
|
32
|
+
size: 6,
|
|
33
|
+
substitutions: { 5: 'n' }
|
|
34
|
+
});
|
|
35
|
+
|
|
36
|
+
// Toluene (methyl attached at position 1)
|
|
37
|
+
const toluene = Ring({
|
|
38
|
+
atoms: 'c',
|
|
39
|
+
size: 6,
|
|
40
|
+
attachments: { 1: [Linear(['C'])] }
|
|
41
|
+
});
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
### `Linear(atoms, bonds?, attachments?)`
|
|
45
|
+
|
|
46
|
+
Create linear chains with optional bond specifications.
|
|
47
|
+
|
|
48
|
+
| Parameter | Type | Default | Description |
|
|
49
|
+
|-----------|------|---------|-------------|
|
|
50
|
+
| `atoms` | `string[]` | required | Array of atom symbols |
|
|
51
|
+
| `bonds` | `string[]` | `[]` | Array of bond types (`'='`, `'#'`, `null`) |
|
|
52
|
+
| `attachments` | `object` | `{}` | Position -> attachment list |
|
|
53
|
+
|
|
54
|
+
```javascript
|
|
55
|
+
// Simple propane
|
|
56
|
+
const propane = Linear(['C', 'C', 'C']);
|
|
57
|
+
|
|
58
|
+
// Propene (with double bond)
|
|
59
|
+
const propene = Linear(['C', 'C', 'C'], [null, '=']);
|
|
60
|
+
|
|
61
|
+
// Ethanol with hydroxyl
|
|
62
|
+
const ethyl = Linear(['C', 'C']);
|
|
63
|
+
const hydroxyl = Linear(['O']);
|
|
64
|
+
const ethanol = ethyl.attach(hydroxyl, 2);
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
### `FusedRing(rings)`
|
|
68
|
+
|
|
69
|
+
Create fused ring systems like naphthalene.
|
|
70
|
+
|
|
71
|
+
| Parameter | Type | Description |
|
|
72
|
+
|-----------|------|-------------|
|
|
73
|
+
| `rings` | `Ring[]` | Array of Ring nodes (minimum 2) |
|
|
74
|
+
|
|
75
|
+
```javascript
|
|
76
|
+
const naphthalene = FusedRing([
|
|
77
|
+
Ring({ atoms: 'C', size: 10, offset: 0, ringNumber: 1 }),
|
|
78
|
+
Ring({ atoms: 'C', size: 6, offset: 2, ringNumber: 2 })
|
|
79
|
+
]);
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
### `Molecule(components)`
|
|
83
|
+
|
|
84
|
+
Combine multiple structural components.
|
|
85
|
+
|
|
86
|
+
| Parameter | Type | Description |
|
|
87
|
+
|-----------|------|-------------|
|
|
88
|
+
| `components` | `object[]` | Array of Ring, Linear, FusedRing, or Molecule nodes |
|
|
89
|
+
|
|
90
|
+
```javascript
|
|
91
|
+
const propyl = Linear(['C', 'C', 'C']);
|
|
92
|
+
const benzene = Ring({ atoms: 'c', size: 6 });
|
|
93
|
+
const propylbenzene = Molecule([propyl, benzene]);
|
|
94
|
+
|
|
95
|
+
console.log(propylbenzene.smiles); // CCCc1ccccc1
|
|
96
|
+
```
|
|
97
|
+
|
|
98
|
+
---
|
|
99
|
+
|
|
100
|
+
## Manipulation Methods
|
|
101
|
+
|
|
102
|
+
All manipulation methods are **immutable** -- they return new nodes and never modify the original.
|
|
103
|
+
|
|
104
|
+
### Ring Methods
|
|
105
|
+
|
|
106
|
+
```javascript
|
|
107
|
+
const benzene = Ring({ atoms: 'c', size: 6 });
|
|
108
|
+
|
|
109
|
+
// Attach substituent at position
|
|
110
|
+
const toluene = benzene.attach(Linear(['C']), 1);
|
|
111
|
+
|
|
112
|
+
// Substitute atom at position
|
|
113
|
+
const pyridine = benzene.substitute(5, 'n');
|
|
114
|
+
|
|
115
|
+
// Multiple substitutions
|
|
116
|
+
const triazine = benzene.substituteMultiple({ 1: 'n', 3: 'n', 5: 'n' });
|
|
117
|
+
|
|
118
|
+
// Fuse with another ring (offset = shared atom count)
|
|
119
|
+
const ring2 = Ring({ atoms: 'C', size: 6 });
|
|
120
|
+
const naphthalene = benzene.fuse(ring2, 2);
|
|
121
|
+
|
|
122
|
+
// Clone
|
|
123
|
+
const benzeneClone = benzene.clone();
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
#### `ring.attach(attachment, position, options?)`
|
|
127
|
+
|
|
128
|
+
Attach a node to the ring at a 1-indexed position.
|
|
129
|
+
|
|
130
|
+
| Parameter | Type | Description |
|
|
131
|
+
|-----------|------|-------------|
|
|
132
|
+
| `attachment` | `object` | Node to attach |
|
|
133
|
+
| `position` | `number` | 1-indexed ring position |
|
|
134
|
+
| `options.sibling` | `boolean` | If set, marks the attachment as sibling (true) or inline (false) |
|
|
135
|
+
|
|
136
|
+
#### `ring.substitute(position, newAtom)`
|
|
137
|
+
|
|
138
|
+
Replace the atom at position with a different atom symbol.
|
|
139
|
+
|
|
140
|
+
#### `ring.substituteMultiple(substitutionMap)`
|
|
141
|
+
|
|
142
|
+
Replace multiple atoms. `substitutionMap` is `{ position: atomSymbol }`.
|
|
143
|
+
|
|
144
|
+
#### `ring.fuse(otherRing, offset, options?)`
|
|
145
|
+
|
|
146
|
+
Fuse this ring with another ring. `offset` is how many positions into this ring the other ring starts.
|
|
147
|
+
|
|
148
|
+
#### `ring.clone()`
|
|
149
|
+
|
|
150
|
+
Return a deep copy of the ring.
|
|
151
|
+
|
|
152
|
+
### Linear Methods
|
|
153
|
+
|
|
154
|
+
```javascript
|
|
155
|
+
const butane = Linear(['C', 'C', 'C', 'C']);
|
|
156
|
+
|
|
157
|
+
// Attach branch at position
|
|
158
|
+
const methyl = Linear(['C']);
|
|
159
|
+
const branched = butane.attach(methyl, 2);
|
|
160
|
+
|
|
161
|
+
// Concatenate chains
|
|
162
|
+
const hexane = butane.concat(Linear(['C', 'C']));
|
|
163
|
+
|
|
164
|
+
// Branch at specific position
|
|
165
|
+
const isobutane = butane.branchAt({ 2: methyl });
|
|
166
|
+
|
|
167
|
+
// Branch with multiple attachments
|
|
168
|
+
const decorated = butane.branch(2, methyl, Linear(['O']));
|
|
169
|
+
```
|
|
170
|
+
|
|
171
|
+
#### `linear.attach(attachment, position)`
|
|
172
|
+
|
|
173
|
+
Attach a node at a 1-indexed position.
|
|
174
|
+
|
|
175
|
+
#### `linear.concat(other)`
|
|
176
|
+
|
|
177
|
+
Concatenate with another Linear (merges atoms/bonds) or other node (creates Molecule).
|
|
178
|
+
|
|
179
|
+
#### `linear.branch(position, ...branches)`
|
|
180
|
+
|
|
181
|
+
Attach one or more branches at a position.
|
|
182
|
+
|
|
183
|
+
#### `linear.branchAt(branchMap)`
|
|
184
|
+
|
|
185
|
+
Attach branches at multiple positions. `branchMap` is `{ position: node | [nodes] }`.
|
|
186
|
+
|
|
187
|
+
### Molecule Methods
|
|
188
|
+
|
|
189
|
+
```javascript
|
|
190
|
+
const mol = Molecule([Linear(['C', 'C', 'C'])]);
|
|
191
|
+
|
|
192
|
+
// Append component
|
|
193
|
+
const extended = mol.append(Ring({ atoms: 'c', size: 6 }));
|
|
194
|
+
|
|
195
|
+
// Prepend component
|
|
196
|
+
const withPrefix = mol.prepend(Linear(['C']));
|
|
197
|
+
|
|
198
|
+
// Get component by index
|
|
199
|
+
const first = mol.getComponent(0);
|
|
200
|
+
|
|
201
|
+
// Replace component by index
|
|
202
|
+
const modified = mol.replaceComponent(0, Linear(['N', 'N']));
|
|
203
|
+
|
|
204
|
+
// Concatenate molecules
|
|
205
|
+
const combined = mol.concat(Molecule([Ring({ atoms: 'c', size: 6 })]));
|
|
206
|
+
```
|
|
207
|
+
|
|
208
|
+
### FusedRing Methods
|
|
209
|
+
|
|
210
|
+
```javascript
|
|
211
|
+
const fused = ring1.fuse(ring2, 4);
|
|
212
|
+
|
|
213
|
+
// Add another ring to the fused system
|
|
214
|
+
const triple = fused.addRing(ring3, 8);
|
|
215
|
+
|
|
216
|
+
// Get a specific ring by number
|
|
217
|
+
const r = fused.getRing(1);
|
|
218
|
+
|
|
219
|
+
// Substitute in a specific ring
|
|
220
|
+
const modified = fused.substituteInRing(1, 3, 'N');
|
|
221
|
+
|
|
222
|
+
// Attach to a specific ring
|
|
223
|
+
const decorated = fused.attachToRing(1, Linear(['O']), 4);
|
|
224
|
+
|
|
225
|
+
// Renumber rings
|
|
226
|
+
const renumbered = fused.renumber(10);
|
|
227
|
+
|
|
228
|
+
// Add sequential continuation rings
|
|
229
|
+
const withSeq = fused.addSequentialRings([ring3, ring4], {
|
|
230
|
+
atomAttachments: { 25: [Linear(['O'], ['='])] }
|
|
231
|
+
});
|
|
232
|
+
|
|
233
|
+
// Add attachment to a sequential atom position
|
|
234
|
+
const withAtt = fused.addSequentialAtomAttachment(25, Linear(['O']));
|
|
235
|
+
```
|
|
236
|
+
|
|
237
|
+
---
|
|
238
|
+
|
|
239
|
+
## Parsing & Serialization
|
|
240
|
+
|
|
241
|
+
```javascript
|
|
242
|
+
import { parse, tokenize, buildSMILES, decompile } from 'smiles-js';
|
|
243
|
+
|
|
244
|
+
// Parse SMILES to AST
|
|
245
|
+
const ast = parse('c1ccccc1');
|
|
246
|
+
|
|
247
|
+
// Tokenize SMILES into token stream
|
|
248
|
+
const tokens = tokenize('C(=O)O');
|
|
249
|
+
|
|
250
|
+
// Generate SMILES from AST
|
|
251
|
+
const smiles = buildSMILES(ast);
|
|
252
|
+
|
|
253
|
+
// Decompile AST to JavaScript constructor code
|
|
254
|
+
const code = decompile(ast);
|
|
255
|
+
|
|
256
|
+
// Every node has a .smiles getter
|
|
257
|
+
console.log(ast.smiles); // c1ccccc1
|
|
258
|
+
|
|
259
|
+
// Every node has a .toCode() method
|
|
260
|
+
console.log(ast.toCode());
|
|
261
|
+
// const ring1 = Ring({ atoms: 'c', size: 6 });
|
|
262
|
+
```
|
|
263
|
+
|
|
264
|
+
---
|
|
265
|
+
|
|
266
|
+
## Round-Trip Validation
|
|
267
|
+
|
|
268
|
+
Validate SMILES parsing fidelity with built-in round-trip testing:
|
|
269
|
+
|
|
270
|
+
```javascript
|
|
271
|
+
import {
|
|
272
|
+
validateRoundTrip,
|
|
273
|
+
isValidRoundTrip,
|
|
274
|
+
normalize,
|
|
275
|
+
parseWithValidation
|
|
276
|
+
} from 'smiles-js';
|
|
277
|
+
|
|
278
|
+
// Quick boolean check
|
|
279
|
+
if (isValidRoundTrip('c1ccccc1')) {
|
|
280
|
+
console.log('Perfect round-trip!');
|
|
281
|
+
}
|
|
282
|
+
|
|
283
|
+
// Detailed validation
|
|
284
|
+
const result = validateRoundTrip('COc1ccc2nc(S(=O)Cc3ncc(C)c(OC)c3C)[nH]c2c1');
|
|
285
|
+
console.log(result.status); // 'perfect', 'stabilized', or 'unstable'
|
|
286
|
+
|
|
287
|
+
if (result.stabilizes) {
|
|
288
|
+
console.log('Use normalized form:', result.firstRoundTrip);
|
|
289
|
+
}
|
|
290
|
+
|
|
291
|
+
// Automatic normalization
|
|
292
|
+
const normalized = normalize('COc1ccc2nc(S(=O)Cc3ncc(C)c(OC)c3C)[nH]c2c1');
|
|
293
|
+
|
|
294
|
+
// Parse with automatic warnings
|
|
295
|
+
const ast = parseWithValidation(smiles);
|
|
296
|
+
|
|
297
|
+
// Silent mode
|
|
298
|
+
const ast2 = parseWithValidation(smiles, { silent: true });
|
|
299
|
+
|
|
300
|
+
// Strict mode (throws on imperfect)
|
|
301
|
+
const ast3 = parseWithValidation(smiles, { strict: true });
|
|
302
|
+
```
|
|
303
|
+
|
|
304
|
+
**Round-Trip Validation Logic:**
|
|
305
|
+
1. **Perfect**: First round-trip matches exactly -- no action needed
|
|
306
|
+
2. **Stabilized**: Second round-trip stabilizes -- use `normalize()` to get stable form
|
|
307
|
+
3. **Unstable**: Doesn't stabilize after 2 round-trips -- file a bug report
|
|
308
|
+
|
|
309
|
+
---
|
|
310
|
+
|
|
311
|
+
## AST Inspection
|
|
312
|
+
|
|
313
|
+
```javascript
|
|
314
|
+
import { parse, ASTNodeType } from 'smiles-js';
|
|
315
|
+
|
|
316
|
+
const mol = parse('c1ccccc1');
|
|
317
|
+
|
|
318
|
+
console.log(mol.type); // 'ring'
|
|
319
|
+
console.log(mol.atoms); // 'c'
|
|
320
|
+
console.log(mol.size); // 6
|
|
321
|
+
console.log(mol.substitutions); // {}
|
|
322
|
+
console.log(mol.attachments); // {}
|
|
323
|
+
```
|
|
324
|
+
|
|
325
|
+
### Node Types (`ASTNodeType`)
|
|
326
|
+
|
|
327
|
+
| Type | Description |
|
|
328
|
+
|------|-------------|
|
|
329
|
+
| `'ring'` | Ring structure |
|
|
330
|
+
| `'linear'` | Linear chain |
|
|
331
|
+
| `'fused_ring'` | Fused ring system |
|
|
332
|
+
| `'molecule'` | Multi-component molecule |
|
|
333
|
+
|
|
334
|
+
---
|
|
335
|
+
|
|
336
|
+
## Functional API
|
|
337
|
+
|
|
338
|
+
For a more functional programming style, import manipulation functions directly:
|
|
339
|
+
|
|
340
|
+
```javascript
|
|
341
|
+
import {
|
|
342
|
+
ringAttach,
|
|
343
|
+
ringSubstitute,
|
|
344
|
+
ringSubstituteMultiple,
|
|
345
|
+
ringFuse,
|
|
346
|
+
ringClone,
|
|
347
|
+
linearAttach,
|
|
348
|
+
linearConcat,
|
|
349
|
+
linearBranch,
|
|
350
|
+
linearBranchAt,
|
|
351
|
+
fusedRingAddRing,
|
|
352
|
+
fusedRingGetRing,
|
|
353
|
+
fusedRingSubstituteInRing,
|
|
354
|
+
fusedRingAttachToRing,
|
|
355
|
+
fusedRingRenumber,
|
|
356
|
+
fusedRingAddSequentialRings,
|
|
357
|
+
fusedRingAddSequentialAtomAttachment,
|
|
358
|
+
moleculeAppend,
|
|
359
|
+
moleculePrepend,
|
|
360
|
+
moleculeConcat,
|
|
361
|
+
moleculeGetComponent,
|
|
362
|
+
moleculeReplaceComponent,
|
|
363
|
+
} from 'smiles-js/manipulation';
|
|
364
|
+
|
|
365
|
+
const benzene = Ring({ atoms: 'c', size: 6 });
|
|
366
|
+
const toluene = ringAttach(benzene, Linear(['C']), 1);
|
|
367
|
+
```
|
|
368
|
+
|
|
369
|
+
---
|
|
370
|
+
|
|
371
|
+
## Clone Utilities
|
|
372
|
+
|
|
373
|
+
Deep-clone AST nodes for safe modification:
|
|
374
|
+
|
|
375
|
+
```javascript
|
|
376
|
+
import {
|
|
377
|
+
deepCloneRing,
|
|
378
|
+
deepCloneLinear,
|
|
379
|
+
deepCloneFusedRing,
|
|
380
|
+
deepCloneMolecule,
|
|
381
|
+
cloneAttachments,
|
|
382
|
+
cloneSubstitutions,
|
|
383
|
+
cloneComponents,
|
|
384
|
+
} from 'smiles-js';
|
|
385
|
+
```
|
|
386
|
+
|
|
387
|
+
---
|
|
388
|
+
|
|
389
|
+
## Integration with RDKit
|
|
390
|
+
|
|
391
|
+
```javascript
|
|
392
|
+
import { parse, Ring, Linear } from 'smiles-js';
|
|
393
|
+
import RDKit from '@rdkit/rdkit';
|
|
394
|
+
|
|
395
|
+
// Build molecule programmatically
|
|
396
|
+
const benzene = Ring({ atoms: 'c', size: 6 });
|
|
397
|
+
const methyl = Linear(['C']);
|
|
398
|
+
const toluene = benzene.attach(methyl, 1);
|
|
399
|
+
|
|
400
|
+
// Use with RDKit
|
|
401
|
+
const rdkit = await RDKit.load();
|
|
402
|
+
const mol = rdkit.get_mol(toluene.smiles);
|
|
403
|
+
console.log(mol.get_svg());
|
|
404
|
+
```
|
|
405
|
+
|
|
406
|
+
---
|
|
407
|
+
|
|
408
|
+
## Validation Results
|
|
409
|
+
|
|
410
|
+
The library has been validated with **32+ real-world pharmaceutical molecules**:
|
|
411
|
+
|
|
412
|
+
| Category | Molecules Tested | Status |
|
|
413
|
+
|----------|-----------------|--------|
|
|
414
|
+
| **Steroids** | Cortisone, Hydrocortisone, Prednisone, Dexamethasone | Perfect |
|
|
415
|
+
| **Opioids** | Fentanyl, Tramadol, Morphine, Oxycodone, Hydrocodone | Perfect |
|
|
416
|
+
| **NSAIDs** | Ibuprofen, Naproxen, Celecoxib, Meloxicam, Ketoprofen | Perfect |
|
|
417
|
+
| **Statins** | Atorvastatin (Lipitor) | Perfect |
|
|
418
|
+
| **PDE5 Inhibitors** | Sildenafil (Viagra) | Perfect |
|
|
419
|
+
| **HIV Protease Inhibitors** | Ritonavir (Norvir) | Works* |
|
|
420
|
+
| **Proton Pump Inhibitors** | Esomeprazole (Nexium), Omeprazole | Works* |
|
|
421
|
+
| **Cannabinoids** | THC, CBD, Nabilone | Perfect |
|
|
422
|
+
|
|
423
|
+
*Minor notation differences that don't affect structure
|
|
424
|
+
|
|
425
|
+
---
|
|
426
|
+
|
|
427
|
+
## Known Limitations
|
|
428
|
+
|
|
429
|
+
### Minor Round-Trip Issues
|
|
430
|
+
|
|
431
|
+
Some complex molecules may have minor notation differences during round-trip:
|
|
432
|
+
|
|
433
|
+
1. **Terminal substituents** - May be omitted in certain edge cases
|
|
434
|
+
2. **Bond notation in branches** - `C(N)=O` may serialize as `C(N)O`
|
|
435
|
+
|
|
436
|
+
**Impact**: Low - Structure is preserved, only notation differs.
|
|
437
|
+
|
|
438
|
+
### toCode() Limitation
|
|
439
|
+
|
|
440
|
+
The `.toCode()` method has a limitation with certain sequential continuation patterns in very complex nested structures. This does NOT affect:
|
|
441
|
+
- Parsing SMILES -> AST
|
|
442
|
+
- Serializing AST -> SMILES
|
|
443
|
+
- Round-trip fidelity
|
package/IMPLEMENTATION_STATUS.md
CHANGED
|
@@ -1,22 +1,41 @@
|
|
|
1
1
|
# Implementation Status
|
|
2
2
|
|
|
3
|
-
##
|
|
4
|
-
|
|
5
|
-
-
|
|
6
|
-
-
|
|
3
|
+
## 1618 TESTS. 0 FAILURES. ALL CODEGEN ROUND-TRIPS PASSING.
|
|
4
|
+
|
|
5
|
+
**Date:** 2026-02-06
|
|
6
|
+
**Branch:** sneaky-bugs
|
|
7
|
+
**Result:** 1618 tests across 43 files in ~379ms
|
|
8
|
+
|
|
9
|
+
Every molecule parses, serializes, decompiles to JavaScript, and round-trips back to the exact same SMILES string. No exceptions.
|
|
7
10
|
|
|
8
11
|
## Supported Molecule Classes
|
|
9
12
|
|
|
10
13
|
| Category | Examples | Status |
|
|
11
14
|
|----------|----------|--------|
|
|
12
|
-
| NSAIDs | Ibuprofen, Naproxen, Celecoxib, Meloxicam, Piroxicam | ✅ |
|
|
15
|
+
| NSAIDs | Ibuprofen, Naproxen, Celecoxib, Meloxicam, Piroxicam, Etodolac | ✅ |
|
|
13
16
|
| Opioids | Fentanyl, Tramadol, Morphine, Codeine, Oxycodone | ✅ |
|
|
14
17
|
| Steroids | Cortisone, Hydrocortisone, Prednisone, Dexamethasone | ✅ |
|
|
15
18
|
| Cannabinoids | THC, CBD, Nabilone | ✅ |
|
|
16
19
|
| Hypertension | Losartan, Valsartan, Telmisartan | ✅ |
|
|
20
|
+
| Cholesterol | Fluvastatin, Ezetimibe, Fenofibrate | ✅ |
|
|
17
21
|
| Analgesics | Acetaminophen, Phenacetin, Gabapentin, Pregabalin | ✅ |
|
|
18
22
|
| Endocannabinoids | Anandamide, 2-AG | ✅ |
|
|
19
23
|
|
|
24
|
+
## Recent Fixes (sneaky-bugs branch)
|
|
25
|
+
|
|
26
|
+
1. **Fluvastatin directional bonds** - `/C=C/` stereochemistry was lost or double-emitted through the parser/codegen/decompiler pipeline. Fixed `metaLeadingBond` handling across three modules.
|
|
27
|
+
2. **Ezetimibe ring attachments** - `(O)` and `(F)` attachments on base and sequential rings were silently dropped by `decompileComplexFusedRing`. Extracted `generateAttachmentCode` helper.
|
|
28
|
+
3. **Etodolac seq atom attachments** - `metaSeqAtomAttachments` always serialized as empty maps, losing branches like `(=O)`. Fixed serialization and variable ordering.
|
|
29
|
+
|
|
30
|
+
## Divide-and-Conquer Test Files
|
|
31
|
+
|
|
32
|
+
- `test-integration/telmisartan.test.js`
|
|
33
|
+
- `test-integration/fluvastatin.test.js` - 67 tests
|
|
34
|
+
- `test-integration/ezetimibe.test.js` - 56 tests
|
|
35
|
+
- `test-integration/etodolac.test.js` - 44 tests
|
|
36
|
+
|
|
37
|
+
Each file builds up from simple fragments to the full molecule, testing both AST and codegen round-trips at every step.
|
|
38
|
+
|
|
20
39
|
## API Completeness
|
|
21
40
|
|
|
22
41
|
**Constructors**: Ring, Linear, FusedRing, Molecule, Fragment
|
|
@@ -31,33 +50,14 @@
|
|
|
31
50
|
|
|
32
51
|
**Parsing**: tokenize, parse, buildSMILES, decompile, .smiles getters, .toCode()
|
|
33
52
|
|
|
34
|
-
## Implementation Checkpoints: 21/21
|
|
53
|
+
## Implementation Checkpoints: 21/21
|
|
35
54
|
|
|
36
55
|
All planned features implemented:
|
|
37
56
|
- Foundation (AST types, constructors)
|
|
38
57
|
- Ring/Linear/FusedRing/Molecule constructors and manipulation
|
|
39
58
|
- SMILES tokenizer and parser
|
|
40
|
-
- Code generator (AST
|
|
41
|
-
- Decompiler (AST
|
|
59
|
+
- Code generator (AST to SMILES)
|
|
60
|
+
- Decompiler (AST to JavaScript)
|
|
42
61
|
- Fragment integration
|
|
43
62
|
- Round-trip validation
|
|
44
63
|
- Documentation and examples
|
|
45
|
-
|
|
46
|
-
## Known Limitation
|
|
47
|
-
|
|
48
|
-
**toCode() for complex nested structures**: The decompiler cannot generate JavaScript code for certain sequential continuation patterns. This does NOT affect:
|
|
49
|
-
- ✅ Parsing SMILES → AST
|
|
50
|
-
- ✅ Serializing AST → SMILES
|
|
51
|
-
- ✅ Round-trip fidelity
|
|
52
|
-
|
|
53
|
-
## Key Fixes Implemented
|
|
54
|
-
|
|
55
|
-
1. Double bonds in rings preserved
|
|
56
|
-
2. Rings inside branches handled correctly
|
|
57
|
-
3. Ring closures at different branch depths
|
|
58
|
-
4. Steroid polycyclic structures (shared atoms)
|
|
59
|
-
5. Sequential continuation rings (celecoxib pattern)
|
|
60
|
-
6. Deeply nested branches with multiple rings
|
|
61
|
-
7. Bracket atom serialization (`[nH]`)
|
|
62
|
-
8. Linear chain double bond positions
|
|
63
|
-
9. Bridge ring detection for morphinans
|