PyPI - opencloning - Versions diffs - 0.2.8.2__py3-none-any.whl → 0.3.0__py3-none-any.whl - Mend

opencloning 0.2.8.2py3-none-any.whl → 0.3.0py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (19) hide show

opencloning/_version.py +7 -0
opencloning/assembly2.py +11 -0
opencloning/batch_cloning/pombe/__init__.py +0 -6
opencloning/batch_cloning/ziqiang_et_al2024/ziqiang_et_al2024.json +39 -46
opencloning/bug_fixing/README.md +138 -0
opencloning/bug_fixing/__init__.py +0 -0
opencloning/bug_fixing/backend_v0_3.py +117 -0
opencloning/cre_lox.py +58 -0
opencloning/endpoints/assembly.py +12 -3
opencloning/endpoints/external_import.py +3 -3
opencloning/endpoints/other.py +49 -7
opencloning/endpoints/primer_design.py +5 -6
opencloning/gateway.py +5 -4
opencloning/pydantic_models.py +119 -25
opencloning/utils.py +0 -12
{opencloning-0.2.8.2.dist-info → opencloning-0.3.0.dist-info}/METADATA +11 -4
{opencloning-0.2.8.2.dist-info → opencloning-0.3.0.dist-info}/RECORD +19 -15
{opencloning-0.2.8.2.dist-info → opencloning-0.3.0.dist-info}/LICENSE +0 -0
{opencloning-0.2.8.2.dist-info → opencloning-0.3.0.dist-info}/WHEEL +0 -0

opencloning/bug_fixing/README.md ADDED Viewed

@@ -0,0 +1,138 @@
+# Bug Fixing
+## backend_v0_3.py
+PR fixing this is here: https://github.com/manulera/OpenCloning_backend/pull/305
+### Bug in assemblies with locations spanning the origin
+Before version 0.3, there was a bug for assembly fields that included locations spanning the origin. For example, let's take the following two circular sequences from [this test file](../../../tests/test_files/bug_fixing/digestion_spanning_origin.json):
+```
+ttcaaaagaa
+ttcccccccgaa
+```
+In both of them, the EcoRI site `GAATTC` is splitted by the origin. The assembly field in the current format should be:
+```json
+{
+"assembly": [
+        {
+          "sequence": 2,
+          "left_location": "join(9..10,1..2)",
+          "right_location": "join(9..10,1..2)",
+          "reverse_complemented": false
+        },
+        {
+          "sequence": 4,
+          "left_location": "join(11..12,1..2)",
+          "right_location": "join(11..12,1..2)",
+          "reverse_complemented": false
+        }
+      ],
+      "restriction_enzymes": [
+        "EcoRI"
+      ]
+}
+```
+However, the old code was not handling this use-case correctly, and produced something like this (`left_location` and `right_location` span the entire sequence rather than the common part):
+```json
+{
+"assembly": [
+        {
+          "sequence": 2,
+          "left_location": "1..10",
+          "right_location": "1..10",
+          "reverse_complemented": false
+        },
+        {
+          "sequence": 4,
+          "left_location": "1..12",
+          "right_location": "1..12",
+          "reverse_complemented": false
+        }
+      ],
+      "restriction_enzymes": [
+        "EcoRI"
+      ]
+}
+```
+This was being used in `generate_assemblies` and producing wrong assembly products.
+### Bug in gateway assemblies (rare, but could happen)
+`gateway_overlap` was returning the entire overlap, which matched regex like `twtGTACAAAaaa` (for attB1). That created assemblies in which
+the overlapping part may have mismatches on the w (might be rare). Now, instead of returning the whole `twtGTACAAAaaa` as overlap, it returns only the common part `GTACAAA`. For example in the [test file](../../../tests/test_files/bug_fixing/gateway_13bp_overlap.json)
+Wrong (before fix):
+```json
+{
+    "assembly": [
+        {
+          "sequence": 4,
+          "left_location": "2893..2905", # < Length 13 (applies to all locations)
+          "right_location": "649..661",
+          "reverse_complemented": false
+        },
+        {
+          "sequence": 8,
+          "left_location": "10..22",
+          "right_location": "3112..3124",
+          "reverse_complemented": false
+        }
+      ],
+      "reaction_type": "BP",
+}
+```
+Right (after fix):
+```json
+{
+    "assembly": [
+        {
+            "sequence": 4,
+            "left_location": "2896..2902", # < Length 7 (common part, all locations)
+            "right_location": "652..658",
+            "reverse_complemented": false
+        },
+        {
+            "sequence": 8,
+            "left_location": "13..19",
+            "right_location": "3115..3121",
+            "reverse_complemented": false
+        }
+        ],
+        "reaction_type": "BP",
+}
+```
+### Fixing these bugs
+If you load a json file into the web application, it will automatically apply the fix.
+If you want to fix several bugs from the command line, you can use the `backend_v0_3.py` script as below.
+Before running this script, you need to migrate the data to the latest version of the schema. See [full documentation](https://github.com/OpenCloning/OpenCloning_LinkML?tab=readme-ov-file#migration-from-previous-versions-of-the-schema), but basically:
+```bash
+python -m opencloning.migrations.migrate file1.json file2.json ...
+```
+Then, you can run the script:
+```bash
+python -m opencloning.bug_fixing.backend_v0_3 file1.json file2.json ...
+```
+For each file:
+* If the file does not need fixing, it will be skipped.
+* If the file needs fixing, it will create a new file `file_1_needs_fixing.json` at the same location where the original file is, with the problematic sources replaced by templates.
+* You can then load these files into the web application and run the correct steps manually.
+Unless you are using gateway a lot, most files should not need fixing.

opencloning/bug_fixing/__init__.py ADDED Viewed

File without changes

opencloning/bug_fixing/backend_v0_3.py ADDED Viewed

@@ -0,0 +1,117 @@
+"""
+See info in README.md
+"""
+from ..pydantic_models import (
+    BaseCloningStrategy as CloningStrategy,
+    AssemblySource,
+    TextFileSequence,
+    PrimerModel,
+    SequenceLocationStr,
+)
+from .._version import __version__
+import json
+import os
+from packaging import version
+import copy
+def fix_backend_v0_3(input_data: dict) -> CloningStrategy | None:
+    data = copy.deepcopy(input_data)
+    # Make sure that it is a valid CloningStrategy
+    cs = CloningStrategy.model_validate(data)
+    # First fix gateway assemblies
+    problematic_source_ids = set()
+    for source in data['sources']:
+        if source['type'] == 'GatewaySource':
+            # Take the first assembly value and check that the length of features is 7
+            assembly = source['assembly']
+            if len(assembly):
+                feat2check = (
+                    assembly[0]['left_location']
+                    if assembly[0]['left_location'] is not None
+                    else assembly[0]['right_location']
+                )
+                if len(SequenceLocationStr(feat2check).to_biopython_location()) != 7:
+                    problematic_source_ids.add(source['id'])
+        elif 'assembly' in source:
+            assembly_source = AssemblySource(
+                id=source['id'],
+                input=source['input'],
+                output=source['output'],
+                circular=source['circular'],
+                assembly=source['assembly'],
+            )
+            input_seqs = [
+                TextFileSequence.model_validate(s) for s in data['sequences'] if s['id'] in assembly_source.input
+            ]
+            # Sort input_seqs as in input
+            input_seqs.sort(key=lambda x: assembly_source.input.index(x.id))
+            if source['type'] == 'PCRSource':
+                primer_ids = [assembly_source.assembly[0].sequence, assembly_source.assembly[2].sequence]
+                primers = [PrimerModel.model_validate(p) for p in data['primers'] if p['id'] in primer_ids]
+                input_seqs = [primers[0], input_seqs[0], primers[1]]
+            assembly_plan = assembly_source.get_assembly_plan(input_seqs)
+            for join in assembly_plan:
+                if len(join[2]) != len(join[3]):
+                    problematic_source_ids.add(source['id'])
+                    break
+    if len(problematic_source_ids) == 0:
+        return None
+    # Replace problematic sources and their output sequences by templates
+    problematic_source_ids.update(sum([cs.all_children_source_ids(s) for s in problematic_source_ids], []))
+    for source_id in problematic_source_ids:
+        source = next(s for s in data['sources'] if s['id'] == source_id)
+        output_seq = next(s for s in data['sequences'] if s['id'] == source['output'])
+        remove_keys = ['assembly', 'circular']
+        source_keep = {key: value for key, value in source.items() if key not in remove_keys}
+        source.clear()
+        source.update(source_keep)
+        seq_keep = {'id': output_seq['id'], 'type': 'TemplateSequence'}
+        output_seq.clear()
+        output_seq.update(seq_keep)
+    return CloningStrategy.model_validate(data)
+def main(file_path: str):
+    file_dir = os.path.dirname(file_path)
+    file_base = os.path.splitext(os.path.basename(file_path))[0]
+    new_file_path = os.path.join(file_dir, f'{file_base}_needs_fixing.json')
+    with open(file_path, 'r') as f:
+        data = json.load(f)
+    if 'backend_version' not in data or data['backend_version'] is None:
+        # Fix the data
+        cs = fix_backend_v0_3(data)
+        if cs is not None:
+            cs.backend_version = __version__ if version.parse(__version__) > version.parse('0.3') else '0.3'
+            with open(new_file_path, 'w') as f:
+                f.write(cs.model_dump_json(indent=2, exclude_none=True))
+if __name__ == '__main__':
+    import sys
+    if len(sys.argv) == 1:
+        print('Usage: python assembly_features_spanning_origin.py <file1> <file2> ...')
+        sys.exit(1)
+    file_paths = sys.argv[1:]
+    for file_path in file_paths:
+        if file_path.endswith('_needs_fixing.json'):
+            print(f'Skipping {file_path}')
+            continue
+        main(file_path)

opencloning/cre_lox.py CHANGED Viewed

@@ -3,6 +3,8 @@ from pydna.dseqrecord import Dseqrecord
 from Bio.Data.IUPACData import ambiguous_dna_values
 from Bio.Seq import reverse_complement
 from .dna_utils import compute_regex_site, dseqrecord_finditer
+from Bio.SeqFeature import Location, SimpleLocation, SeqFeature
+from pydna.utils import shift_location
 # We create a dictionary to map ambiguous bases to their consensus base
 # For example, ambigous_base_dict['ACGT'] -> 'N'
@@ -56,3 +58,59 @@ def cre_loxP_overlap(x: Dseqrecord, y: Dseqrecord, _l: None = None) -> list[tupl
         if item not in unique_out:
             unique_out.append(item)
     return unique_out
+loxP_dict = {
+    'loxP': 'ATAACTTCGTATANNNTANNNTATACGAAGTTAT',
+    'lox66': 'ATAACTTCGTATANNNTANNNTATACGAACGGTA',
+    'lox71': 'TACCGTTCGTATANNNTANNNTATACGAAGTTAT',
+    'loxP_mutant': 'TACCGTTCGTATANNNTANNNTATACGAACGGTA',
+}
+def get_regex_dict(original_dict: dict[str, str]) -> dict[str, str]:
+    """Get the regex dictionary for the original dictionary."""
+    out = dict()
+    for site in original_dict:
+        consensus_seq = original_dict[site]
+        is_palindromic = consensus_seq == reverse_complement(consensus_seq)
+        out[site] = {
+            'forward_regex': compute_regex_site(original_dict[site]),
+            'reverse_regex': None if is_palindromic else compute_regex_site(reverse_complement(original_dict[site])),
+        }
+    return out
+def find_loxP_sites(seq: Dseqrecord) -> dict[str, list[Location]]:
+    """Find all gateway sites in a sequence and return a dictionary with the name and positions of the sites."""
+    out = dict()
+    regex_dict = get_regex_dict(loxP_dict)
+    for site in loxP_dict:
+        for pattern in ['forward_regex', 'reverse_regex']:
+            # Palindromic sequences have no reverse complement
+            if regex_dict[site][pattern] is None:
+                continue
+            matches = list(dseqrecord_finditer(regex_dict[site][pattern], seq))
+            for match in matches:
+                if site not in out:
+                    out[site] = []
+                strand = 1 if pattern == 'forward_regex' else -1
+                loc = SimpleLocation(match.start(), match.end(), strand)
+                loc = shift_location(loc, 0, len(seq))
+                out[site].append(loc)
+    return out
+def annotate_loxP_sites(seq: Dseqrecord) -> Dseqrecord:
+    sites = find_loxP_sites(seq)
+    for site in sites:
+        for loc in sites[site]:
+            # Don't add the same feature twice
+            if not any(
+                f.location == loc and f.type == 'protein_bind' and f.qualifiers.get('label', []) == [site]
+                for f in seq.features
+            ):
+                seq.features.append(SeqFeature(loc, type='protein_bind', qualifiers={'label': [site]}))
+    return seq

opencloning/endpoints/assembly.py CHANGED Viewed

@@ -5,7 +5,7 @@ from pydna.primer import Primer as PydnaPrimer
 from pydna.crispr import cas9
 from pydantic import conlist, create_model
 from Bio.Restriction.Restriction import RestrictionBatch
-from opencloning.cre_lox import cre_loxP_overlap
+from opencloning.cre_lox import cre_loxP_overlap, annotate_loxP_sites
 from ..dna_functions import (
     get_invalid_enzyme_names,
     format_sequence_genbank,
@@ -57,7 +57,9 @@ def format_known_assembly_response(
     # If a specific assembly is requested
     assembly_plan = source.get_assembly_plan(fragments)
     for s in out_sources:
-        if s == source:
+        # TODO: it seems that assemble() is not getting is_insertion ever
+        other_assembly_plan = s.get_assembly_plan(fragments)
+        if assembly_plan == other_assembly_plan:
             return {
                 'sequences': [
                     format_sequence_genbank(product_callback(assemble(fragments, assembly_plan)), s.output_name)
@@ -553,7 +555,14 @@ async def cre_lox_recombination(source: CreLoxRecombinationSource, sequences: co
         )
     resp = generate_assemblies(
-        source, create_source, fragments, False, cre_loxP_overlap, True, recombination_mode=True
+        source,
+        create_source,
+        fragments,
+        False,
+        cre_loxP_overlap,
+        True,
+        recombination_mode=True,
+        product_callback=annotate_loxP_sites,
     )
     if len(resp['sources']) == 0:

opencloning/endpoints/external_import.py CHANGED Viewed

@@ -24,7 +24,7 @@ from ..pydantic_models import (
     GenomeCoordinatesSource,
     SequenceFileFormat,
     SEVASource,
-    SimpleSequenceLocation,
+    SequenceLocationStr,
 )
 from ..dna_functions import (
     format_sequence_genbank,
@@ -150,12 +150,12 @@ async def read_from_file(
     seq_feature = None
     if start is not None and end is not None:
-        seq_feature = SimpleSequenceLocation(start=start, end=end)
         extracted_sequences = list()
         for dseq in dseqs:
             try:
+                seq_feature = SequenceLocationStr.from_start_and_end(start=start, end=end, seq_len=len(dseq))
                 # TODO: We could use extract when this is addressed: https://github.com/biopython/biopython/issues/4989
-                location = seq_feature.to_biopython_location(circular=dseq.circular, seq_len=len(dseq))
+                location = seq_feature.to_biopython_location()
                 i, j = location_boundaries(location)
                 extracted_sequence = dseq[i:j]
                 # Only add the sequence if the interval is not out of bounds

opencloning/endpoints/other.py CHANGED Viewed

@@ -1,6 +1,10 @@
-from fastapi import Query, HTTPException
+from fastapi import Query, HTTPException, Response
 from Bio.Restriction.Restriction_Dictionary import rest_dict
+from pydantic import ValidationError
+from opencloning_linkml.migrations import migrate
+from opencloning_linkml._version import __version__ as schema_version
+from ..bug_fixing.backend_v0_3 import fix_backend_v0_3
 from ..dna_functions import (
     format_sequence_genbank,
@@ -12,7 +16,7 @@ from ..pydantic_models import (
     BaseCloningStrategy,
 )
 from ..get_router import get_router
-from ..utils import api_version
+from .._version import __version__ as backend_version
 router = get_router()
@@ -20,7 +24,7 @@ router = get_router()
 @router.get('/version')
 async def get_version():
-    return api_version()
+    return {'backend_version': backend_version, 'schema_version': schema_version}
 @router.get('/restriction_enzyme_list', response_model=dict[str, list[str]])
@@ -32,12 +36,50 @@ async def get_restriction_enzyme_list():
 @router.post(
     '/validate',
     summary='Validate a cloning strategy',
+    responses={
+        200: {
+            'description': 'The cloning strategy is valid',
+            'headers': {
+                'x-warning': {
+                    'description': 'A warning returned if the file either contains errors or is in a previous version of the model',
+                    'schema': {'type': 'string'},
+                },
+            },
+        },
+        422: {
+            'description': 'The cloning strategy is invalid',
+        },
+    },
 )
-async def cloning_strategy_is_valid(
-    cloning_strategy: BaseCloningStrategy,
-) -> bool:
+async def cloning_strategy_is_valid(data: dict, response: Response):
     """Validate a cloning strategy"""
-    return True
+    warnings = []
+    if any(key not in data for key in ['primers', 'sources', 'sequences']):
+        raise HTTPException(status_code=422, detail='The cloning strategy is invalid')
+    try:
+        migrated_data = migrate(data)
+        if migrated_data is None:
+            BaseCloningStrategy.model_validate(data)
+            return None
+        data = migrated_data
+        warnings.append(
+            'The cloning strategy is in a previous version of the model and has been migrated to the latest version.'
+        )
+        fixed_data = fix_backend_v0_3(data)
+        if fixed_data is not None:
+            data = fixed_data
+            warnings.append('The cloning strategy contained an error and has been turned into a template.')
+        cs = BaseCloningStrategy.model_validate(data)
+        if len(warnings) > 0:
+            response.headers['x-warning'] = ';'.join(warnings)
+            return cs
+        return None
+    except ValidationError:
+        raise HTTPException(status_code=422, detail='The cloning strategy is invalid')
 @router.post('/rename_sequence', response_model=TextFileSequence)

opencloning/endpoints/primer_design.py CHANGED Viewed

@@ -62,10 +62,10 @@ async def primer_design_homologous_recombination(
     validate_spacers(spacers, 1, False)
     pcr_seq = read_dsrecord_from_json(pcr_template.sequence)
-    pcr_loc = pcr_template.location.to_biopython_location(pcr_seq.circular, len(pcr_seq))
+    pcr_loc = pcr_template.location.to_biopython_location()
     hr_seq = read_dsrecord_from_json(homologous_recombination_target.sequence)
-    hr_loc = homologous_recombination_target.location.to_biopython_location(hr_seq.circular, len(hr_seq))
+    hr_loc = homologous_recombination_target.location.to_biopython_location()
     insert_forward = pcr_template.forward_orientation
@@ -112,7 +112,7 @@ async def primer_design_gibson_assembly(
     templates = list()
     for query in pcr_templates:
         dseqr = read_dsrecord_from_json(query.sequence)
-        location = query.location.to_biopython_location(dseqr.circular, len(dseqr))
+        location = query.location.to_biopython_location()
         template = location.extract(dseqr)
         if not query.forward_orientation:
             template = template.reverse_complement()
@@ -167,7 +167,7 @@ async def primer_design_simple_pair(
     validate_spacers(spacers, 1, False)
     dseqr = read_dsrecord_from_json(pcr_template.sequence)
-    location = pcr_template.location.to_biopython_location(dseqr.circular, len(dseqr))
+    location = pcr_template.location.to_biopython_location()
     template = location.extract(dseqr)
     if not pcr_template.forward_orientation:
         template = template.reverse_complement()
@@ -201,8 +201,7 @@ async def primer_design_ebic(
 ):
     """Design primers for EBIC"""
     dseqr = read_dsrecord_from_json(template.sequence)
-    location = template.location.to_biopython_location(dseqr.circular, len(dseqr))
+    location = template.location.to_biopython_location()
     return {'primers': ebic_primers(dseqr, location, max_inside, max_outside)}

opencloning/gateway.py CHANGED Viewed

@@ -105,7 +105,8 @@ def gateway_overlap(seqx: _Dseqrecord, seqy: _Dseqrecord, reaction: str, greedy:
                     continue
                 for match_x, match_y in _itertools.product(matches_x, matches_y):
-                    # Find the overlap sequence within each match
+                    # Find the overlap sequence within each match, and use the
+                    # core 7 pbs that are constant
                     overlap_x = re.search(overlap_regex, match_x.group())
                     overlap_y = re.search(overlap_regex, match_y.group())
@@ -116,9 +117,9 @@ def gateway_overlap(seqx: _Dseqrecord, seqy: _Dseqrecord, reaction: str, greedy:
                     out.append(
                         (
-                            match_x.start() + overlap_x.start(),
-                            match_y.start() + overlap_y.start(),
-                            len(overlap_x.group()),
+                            match_x.start() + overlap_x.start() + 3,
+                            match_y.start() + overlap_y.start() + 3,
+                            7,
                         )
                     )

opencloning 0.2.8.2__py3-none-any.whl → 0.3.0__py3-none-any.whl

opencloning 0.2.8.2py3-none-any.whl → 0.3.0py3-none-any.whl