debase 0.1.9__py3-none-any.whl → 0.1.16__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: debase
3
- Version: 0.1.9
3
+ Version: 0.1.16
4
4
  Summary: Enzyme lineage analysis and sequence extraction package
5
5
  Home-page: https://github.com/YuemingLong/DEBase
6
6
  Author: DEBase Team
@@ -0,0 +1,16 @@
1
+ debase/__init__.py,sha256=YeKveGj_8fwuu5ozoK2mUU86so_FjiCwsvg1d_lYVZU,586
2
+ debase/__main__.py,sha256=LbxYt2x9TG5Ced7LpzzX_8gkWyXeZSlVHzqHfqAiPwQ,160
3
+ debase/_version.py,sha256=l25FRqoNjxB5d3qBHsLMMA_9YWsIZ7nJ5BiTLj0qYE8,50
4
+ debase/build_db.py,sha256=bW574GxsL1BJtDwM19urLbciPcejLzfraXZPpzm09FQ,7167
5
+ debase/cleanup_sequence.py,sha256=QyhUqvTBVFTGM7ebAHmP3tif3Jq-8hvoLApYwAJtpH4,32702
6
+ debase/enzyme_lineage_extractor.py,sha256=jNxNCh8VF0dUFxUlTall0w1-oQojXRXLnWcuPFs5ij8,106879
7
+ debase/lineage_format.py,sha256=mACni9M1RXA_1tIyDZJpStQoutd_HLG2qQMAORTusZs,30045
8
+ debase/reaction_info_extractor.py,sha256=9DkEZh7TgsxKpFkKbLyUhS_w0Z84LczkDFv-v_NEHE4,112174
9
+ debase/substrate_scope_extractor.py,sha256=9XDF-DxOqB63AwaVceAMvg7BcjoTQXE_pG2c_seM_DA,100698
10
+ debase/wrapper.py,sha256=lTx375a57EVuXcZ_roXaj5UDj8HjRcb5ViNaSgPN4Ik,10352
11
+ debase-0.1.16.dist-info/licenses/LICENSE,sha256=5sk9_tcNmr1r2iMIUAiioBo7wo38u8BrPlO7f0seqgE,1075
12
+ debase-0.1.16.dist-info/METADATA,sha256=7sv2OcIuHaoOImkBdoEtRzyOjp9Kuoz2ZmgK4tosaUc,10790
13
+ debase-0.1.16.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
14
+ debase-0.1.16.dist-info/entry_points.txt,sha256=hUcxA1b4xORu-HHBFTe9u2KTdbxPzt0dwz95_6JNe9M,48
15
+ debase-0.1.16.dist-info/top_level.txt,sha256=2BUeq-4kmQr0Rhl06AnRzmmZNs8WzBRK9OcJehkcdk8,7
16
+ debase-0.1.16.dist-info/RECORD,,
debase/PIPELINE_FLOW.md DELETED
@@ -1,100 +0,0 @@
1
- # DEBase Pipeline Flow
2
-
3
- ## Overview
4
- The DEBase pipeline extracts enzyme engineering data from chemistry papers through a series of modular steps.
5
-
6
- ## Pipeline Architecture
7
-
8
- ```
9
- ┌─────────────────────┐ ┌─────────────────────┐
10
- │ Manuscript PDF │ │ SI PDF │
11
- └──────────┬──────────┘ └──────────┬──────────┘
12
- │ │
13
- └───────────┬───────────────┘
14
-
15
-
16
- ┌─────────────────────────────┐
17
- │ 1. enzyme_lineage_extractor │
18
- │ - Extract enzyme variants │
19
- │ - Parse mutations │
20
- │ - Get basic metadata │
21
- └─────────────┬───────────────┘
22
-
23
-
24
- ┌─────────────────────────────┐
25
- │ 2. cleanup_sequence │
26
- │ - Validate sequences │
27
- │ - Fix formatting issues │
28
- │ - Generate full sequences │
29
- └─────────────┬───────────────┘
30
-
31
- ┌───────────┴───────────────┐
32
- │ │
33
- ▼ ▼
34
- ┌─────────────────────────┐ ┌─────────────────────────┐
35
- │ 3a. reaction_info │ │ 3b. substrate_scope │
36
- │ _extractor │ │ _extractor │
37
- │ - Performance metrics │ │ - Substrate variations │
38
- │ - Model reaction │ │ - Additional variants │
39
- │ - Conditions │ │ - Scope data │
40
- └───────────┬─────────────┘ └───────────┬─────────────┘
41
- │ │
42
- └───────────┬───────────────┘
43
-
44
-
45
- ┌─────────────────────────────┐
46
- │ 4. lineage_format_o3 │
47
- │ - Merge all data │
48
- │ - Fill missing sequences │
49
- │ - Format final output │
50
- └─────────────┬───────────────┘
51
-
52
-
53
- ┌─────────────┐
54
- │ Final CSV │
55
- └─────────────┘
56
- ```
57
-
58
- ## Module Details
59
-
60
- ### 1. enzyme_lineage_extractor.py
61
- - **Input**: Manuscript PDF, SI PDF
62
- - **Output**: CSV with enzyme variants and mutations
63
- - **Function**: Extracts enzyme identifiers, mutation lists, and basic metadata
64
-
65
- ### 2. cleanup_sequence.py
66
- - **Input**: Enzyme lineage CSV
67
- - **Output**: CSV with validated sequences
68
- - **Function**: Validates protein sequences, generates full sequences from mutations
69
-
70
- ### 3a. reaction_info_extractor.py
71
- - **Input**: PDFs + cleaned enzyme CSV
72
- - **Output**: CSV with reaction performance data
73
- - **Function**: Extracts yield, TTN, selectivity, reaction conditions
74
-
75
- ### 3b. substrate_scope_extractor.py
76
- - **Input**: PDFs + cleaned enzyme CSV
77
- - **Output**: CSV with substrate scope entries
78
- - **Function**: Extracts substrate variations tested with different enzymes
79
-
80
- ### 4. lineage_format_o3.py
81
- - **Input**: Reaction CSV + Substrate scope CSV
82
- - **Output**: Final formatted CSV
83
- - **Function**: Merges data, fills missing sequences, applies consistent formatting
84
-
85
- ## Key Features
86
-
87
- 1. **Modular Design**: Each step can be run independently
88
- 2. **Parallel Extraction**: Steps 3a and 3b run independently
89
- 3. **Error Recovery**: Pipeline can resume from any step
90
- 4. **Clean Interfaces**: Each module has well-defined inputs/outputs
91
-
92
- ## Usage
93
-
94
- ```bash
95
- # Full pipeline
96
- python -m debase.wrapper_clean manuscript.pdf --si si.pdf --output results.csv
97
-
98
- # With intermediate files kept for debugging
99
- python -m debase.wrapper_clean manuscript.pdf --si si.pdf --keep-intermediates
100
- ```
@@ -1,17 +0,0 @@
1
- debase/PIPELINE_FLOW.md,sha256=S4nQyZlX39-Bchw1gQWPK60sHiFpB1eWHqo5GR9oTY8,4741
2
- debase/__init__.py,sha256=YeKveGj_8fwuu5ozoK2mUU86so_FjiCwsvg1d_lYVZU,586
3
- debase/__main__.py,sha256=LbxYt2x9TG5Ced7LpzzX_8gkWyXeZSlVHzqHfqAiPwQ,160
4
- debase/_version.py,sha256=j8HFA4JhFiintNU67gau8Re8N3rsxPqodcW8xAgdwqY,49
5
- debase/build_db.py,sha256=bW574GxsL1BJtDwM19urLbciPcejLzfraXZPpzm09FQ,7167
6
- debase/cleanup_sequence.py,sha256=QyhUqvTBVFTGM7ebAHmP3tif3Jq-8hvoLApYwAJtpH4,32702
7
- debase/enzyme_lineage_extractor.py,sha256=mHKo6cxQdcAFuthQTpxc4fsGH73JO3VuLXSsixA7mOA,97421
8
- debase/lineage_format.py,sha256=mACni9M1RXA_1tIyDZJpStQoutd_HLG2qQMAORTusZs,30045
9
- debase/reaction_info_extractor.py,sha256=6wWj4IyUNSugNjxpwMGjABSAp68yHABaz_7ZRjh9GEk,112162
10
- debase/substrate_scope_extractor.py,sha256=dbve8q3K7ggA3A6EwB-KK9L19BnMNgPZMZ05G937dSY,82262
11
- debase/wrapper.py,sha256=lTx375a57EVuXcZ_roXaj5UDj8HjRcb5ViNaSgPN4Ik,10352
12
- debase-0.1.9.dist-info/licenses/LICENSE,sha256=5sk9_tcNmr1r2iMIUAiioBo7wo38u8BrPlO7f0seqgE,1075
13
- debase-0.1.9.dist-info/METADATA,sha256=QVjGEYd1VWDmQszko8IQ5jJ9xMQoT45SCY_oG9XvbMs,10789
14
- debase-0.1.9.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
15
- debase-0.1.9.dist-info/entry_points.txt,sha256=hUcxA1b4xORu-HHBFTe9u2KTdbxPzt0dwz95_6JNe9M,48
16
- debase-0.1.9.dist-info/top_level.txt,sha256=2BUeq-4kmQr0Rhl06AnRzmmZNs8WzBRK9OcJehkcdk8,7
17
- debase-0.1.9.dist-info/RECORD,,