debase 0.1.11__py3-none-any.whl → 0.1.16__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- debase/_version.py +1 -1
- debase/enzyme_lineage_extractor.py +373 -222
- debase/reaction_info_extractor.py +3 -3
- debase/substrate_scope_extractor.py +516 -67
- {debase-0.1.11.dist-info → debase-0.1.16.dist-info}/METADATA +1 -1
- debase-0.1.16.dist-info/RECORD +16 -0
- debase/PIPELINE_FLOW.md +0 -100
- debase-0.1.11.dist-info/RECORD +0 -17
- {debase-0.1.11.dist-info → debase-0.1.16.dist-info}/WHEEL +0 -0
- {debase-0.1.11.dist-info → debase-0.1.16.dist-info}/entry_points.txt +0 -0
- {debase-0.1.11.dist-info → debase-0.1.16.dist-info}/licenses/LICENSE +0 -0
- {debase-0.1.11.dist-info → debase-0.1.16.dist-info}/top_level.txt +0 -0
@@ -0,0 +1,16 @@
|
|
1
|
+
debase/__init__.py,sha256=YeKveGj_8fwuu5ozoK2mUU86so_FjiCwsvg1d_lYVZU,586
|
2
|
+
debase/__main__.py,sha256=LbxYt2x9TG5Ced7LpzzX_8gkWyXeZSlVHzqHfqAiPwQ,160
|
3
|
+
debase/_version.py,sha256=l25FRqoNjxB5d3qBHsLMMA_9YWsIZ7nJ5BiTLj0qYE8,50
|
4
|
+
debase/build_db.py,sha256=bW574GxsL1BJtDwM19urLbciPcejLzfraXZPpzm09FQ,7167
|
5
|
+
debase/cleanup_sequence.py,sha256=QyhUqvTBVFTGM7ebAHmP3tif3Jq-8hvoLApYwAJtpH4,32702
|
6
|
+
debase/enzyme_lineage_extractor.py,sha256=jNxNCh8VF0dUFxUlTall0w1-oQojXRXLnWcuPFs5ij8,106879
|
7
|
+
debase/lineage_format.py,sha256=mACni9M1RXA_1tIyDZJpStQoutd_HLG2qQMAORTusZs,30045
|
8
|
+
debase/reaction_info_extractor.py,sha256=9DkEZh7TgsxKpFkKbLyUhS_w0Z84LczkDFv-v_NEHE4,112174
|
9
|
+
debase/substrate_scope_extractor.py,sha256=9XDF-DxOqB63AwaVceAMvg7BcjoTQXE_pG2c_seM_DA,100698
|
10
|
+
debase/wrapper.py,sha256=lTx375a57EVuXcZ_roXaj5UDj8HjRcb5ViNaSgPN4Ik,10352
|
11
|
+
debase-0.1.16.dist-info/licenses/LICENSE,sha256=5sk9_tcNmr1r2iMIUAiioBo7wo38u8BrPlO7f0seqgE,1075
|
12
|
+
debase-0.1.16.dist-info/METADATA,sha256=7sv2OcIuHaoOImkBdoEtRzyOjp9Kuoz2ZmgK4tosaUc,10790
|
13
|
+
debase-0.1.16.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
|
14
|
+
debase-0.1.16.dist-info/entry_points.txt,sha256=hUcxA1b4xORu-HHBFTe9u2KTdbxPzt0dwz95_6JNe9M,48
|
15
|
+
debase-0.1.16.dist-info/top_level.txt,sha256=2BUeq-4kmQr0Rhl06AnRzmmZNs8WzBRK9OcJehkcdk8,7
|
16
|
+
debase-0.1.16.dist-info/RECORD,,
|
debase/PIPELINE_FLOW.md
DELETED
@@ -1,100 +0,0 @@
|
|
1
|
-
# DEBase Pipeline Flow
|
2
|
-
|
3
|
-
## Overview
|
4
|
-
The DEBase pipeline extracts enzyme engineering data from chemistry papers through a series of modular steps.
|
5
|
-
|
6
|
-
## Pipeline Architecture
|
7
|
-
|
8
|
-
```
|
9
|
-
┌─────────────────────┐ ┌─────────────────────┐
|
10
|
-
│ Manuscript PDF │ │ SI PDF │
|
11
|
-
└──────────┬──────────┘ └──────────┬──────────┘
|
12
|
-
│ │
|
13
|
-
└───────────┬───────────────┘
|
14
|
-
│
|
15
|
-
▼
|
16
|
-
┌─────────────────────────────┐
|
17
|
-
│ 1. enzyme_lineage_extractor │
|
18
|
-
│ - Extract enzyme variants │
|
19
|
-
│ - Parse mutations │
|
20
|
-
│ - Get basic metadata │
|
21
|
-
└─────────────┬───────────────┘
|
22
|
-
│
|
23
|
-
▼
|
24
|
-
┌─────────────────────────────┐
|
25
|
-
│ 2. cleanup_sequence │
|
26
|
-
│ - Validate sequences │
|
27
|
-
│ - Fix formatting issues │
|
28
|
-
│ - Generate full sequences │
|
29
|
-
└─────────────┬───────────────┘
|
30
|
-
│
|
31
|
-
┌───────────┴───────────────┐
|
32
|
-
│ │
|
33
|
-
▼ ▼
|
34
|
-
┌─────────────────────────┐ ┌─────────────────────────┐
|
35
|
-
│ 3a. reaction_info │ │ 3b. substrate_scope │
|
36
|
-
│ _extractor │ │ _extractor │
|
37
|
-
│ - Performance metrics │ │ - Substrate variations │
|
38
|
-
│ - Model reaction │ │ - Additional variants │
|
39
|
-
│ - Conditions │ │ - Scope data │
|
40
|
-
└───────────┬─────────────┘ └───────────┬─────────────┘
|
41
|
-
│ │
|
42
|
-
└───────────┬───────────────┘
|
43
|
-
│
|
44
|
-
▼
|
45
|
-
┌─────────────────────────────┐
|
46
|
-
│ 4. lineage_format_o3 │
|
47
|
-
│ - Merge all data │
|
48
|
-
│ - Fill missing sequences │
|
49
|
-
│ - Format final output │
|
50
|
-
└─────────────┬───────────────┘
|
51
|
-
│
|
52
|
-
▼
|
53
|
-
┌─────────────┐
|
54
|
-
│ Final CSV │
|
55
|
-
└─────────────┘
|
56
|
-
```
|
57
|
-
|
58
|
-
## Module Details
|
59
|
-
|
60
|
-
### 1. enzyme_lineage_extractor.py
|
61
|
-
- **Input**: Manuscript PDF, SI PDF
|
62
|
-
- **Output**: CSV with enzyme variants and mutations
|
63
|
-
- **Function**: Extracts enzyme identifiers, mutation lists, and basic metadata
|
64
|
-
|
65
|
-
### 2. cleanup_sequence.py
|
66
|
-
- **Input**: Enzyme lineage CSV
|
67
|
-
- **Output**: CSV with validated sequences
|
68
|
-
- **Function**: Validates protein sequences, generates full sequences from mutations
|
69
|
-
|
70
|
-
### 3a. reaction_info_extractor.py
|
71
|
-
- **Input**: PDFs + cleaned enzyme CSV
|
72
|
-
- **Output**: CSV with reaction performance data
|
73
|
-
- **Function**: Extracts yield, TTN, selectivity, reaction conditions
|
74
|
-
|
75
|
-
### 3b. substrate_scope_extractor.py
|
76
|
-
- **Input**: PDFs + cleaned enzyme CSV
|
77
|
-
- **Output**: CSV with substrate scope entries
|
78
|
-
- **Function**: Extracts substrate variations tested with different enzymes
|
79
|
-
|
80
|
-
### 4. lineage_format_o3.py
|
81
|
-
- **Input**: Reaction CSV + Substrate scope CSV
|
82
|
-
- **Output**: Final formatted CSV
|
83
|
-
- **Function**: Merges data, fills missing sequences, applies consistent formatting
|
84
|
-
|
85
|
-
## Key Features
|
86
|
-
|
87
|
-
1. **Modular Design**: Each step can be run independently
|
88
|
-
2. **Parallel Extraction**: Steps 3a and 3b run independently
|
89
|
-
3. **Error Recovery**: Pipeline can resume from any step
|
90
|
-
4. **Clean Interfaces**: Each module has well-defined inputs/outputs
|
91
|
-
|
92
|
-
## Usage
|
93
|
-
|
94
|
-
```bash
|
95
|
-
# Full pipeline
|
96
|
-
python -m debase.wrapper_clean manuscript.pdf --si si.pdf --output results.csv
|
97
|
-
|
98
|
-
# With intermediate files kept for debugging
|
99
|
-
python -m debase.wrapper_clean manuscript.pdf --si si.pdf --keep-intermediates
|
100
|
-
```
|
debase-0.1.11.dist-info/RECORD
DELETED
@@ -1,17 +0,0 @@
|
|
1
|
-
debase/PIPELINE_FLOW.md,sha256=S4nQyZlX39-Bchw1gQWPK60sHiFpB1eWHqo5GR9oTY8,4741
|
2
|
-
debase/__init__.py,sha256=YeKveGj_8fwuu5ozoK2mUU86so_FjiCwsvg1d_lYVZU,586
|
3
|
-
debase/__main__.py,sha256=LbxYt2x9TG5Ced7LpzzX_8gkWyXeZSlVHzqHfqAiPwQ,160
|
4
|
-
debase/_version.py,sha256=L4sqaU-oAJRWrcboH-vA95jHfUiXr5-fAsrF7lqZSyQ,50
|
5
|
-
debase/build_db.py,sha256=bW574GxsL1BJtDwM19urLbciPcejLzfraXZPpzm09FQ,7167
|
6
|
-
debase/cleanup_sequence.py,sha256=QyhUqvTBVFTGM7ebAHmP3tif3Jq-8hvoLApYwAJtpH4,32702
|
7
|
-
debase/enzyme_lineage_extractor.py,sha256=at4OYHdXtgMku1FR_6AsHWk64UKInWkGQL9m3H6cKIQ,99809
|
8
|
-
debase/lineage_format.py,sha256=mACni9M1RXA_1tIyDZJpStQoutd_HLG2qQMAORTusZs,30045
|
9
|
-
debase/reaction_info_extractor.py,sha256=6wWj4IyUNSugNjxpwMGjABSAp68yHABaz_7ZRjh9GEk,112162
|
10
|
-
debase/substrate_scope_extractor.py,sha256=dbve8q3K7ggA3A6EwB-KK9L19BnMNgPZMZ05G937dSY,82262
|
11
|
-
debase/wrapper.py,sha256=lTx375a57EVuXcZ_roXaj5UDj8HjRcb5ViNaSgPN4Ik,10352
|
12
|
-
debase-0.1.11.dist-info/licenses/LICENSE,sha256=5sk9_tcNmr1r2iMIUAiioBo7wo38u8BrPlO7f0seqgE,1075
|
13
|
-
debase-0.1.11.dist-info/METADATA,sha256=ZSR0Yl36Al_rQm9Ht9jut7om3xQT8yqyobIjEUH_Xfo,10790
|
14
|
-
debase-0.1.11.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
|
15
|
-
debase-0.1.11.dist-info/entry_points.txt,sha256=hUcxA1b4xORu-HHBFTe9u2KTdbxPzt0dwz95_6JNe9M,48
|
16
|
-
debase-0.1.11.dist-info/top_level.txt,sha256=2BUeq-4kmQr0Rhl06AnRzmmZNs8WzBRK9OcJehkcdk8,7
|
17
|
-
debase-0.1.11.dist-info/RECORD,,
|
File without changes
|
File without changes
|
File without changes
|
File without changes
|