PyPI - span-aligner - Versions diffs - 0.1.0__tar.gz → 0.1.2__tar.gz - Mend

span-aligner 0.1.0tar.gz → 0.1.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (16) hide show

span_aligner-0.1.2/PKG-INFO ADDED Viewed

@@ -0,0 +1,169 @@
+Metadata-Version: 2.4
+Name: span-aligner
+Version: 0.1.2
+Summary: A utility for aligning and mapping text spans between different text representations.
+License: MIT
+Requires-Python: >=3.8
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: rapidfuzz>=3.0.0
+Provides-Extra: dev
+Requires-Dist: pytest>=7.0.0; extra == "dev"
+Dynamic: license-file
+# Span Aligner
+A utility for aligning and mapping text spans between different text representations, particularly useful for Label Studio annotation compatibility.
+## Features
+- Sanitize span boundaries to avoid special characters.
+- Find exact and fuzzy matches of text segments in original documents.
+- Map spans from one text representation to another.
+- Rebuild tagged text with nested annotations.
+- Merge result objects containing span annotations.
+## Installation
+Install from source:
+```bash
+pip install span-aligner
+```
+## Usage
+### Get Annotations from Tagged Text
+Extract structured spans and entities from a string with inline tags.
+```python
+tagged_input = "<administrative_body>Environmental Committee</administrative_body> discussed the <impact_location>central park</impact_location> renovation on <publication_date>2025-12-15</publication_date>."
+ner_map = {
+    "administrative_body": "ADMINISTRATIVE BODY",
+    "publication_date": "PUBLICATION DATE",
+    "impact_location": "PRIMARY LOCATION"
+}
+span_map ={
+    "motivation" : "MOTIVATION"
+}
+annotations = SpanAligner.get_annotations_from_tagged_text(
+    tagged_input,
+    ner_map=ner_map,
+    span_map=span_map
+)
+print(annotations["entities"])
+# Output:
+#[
+#    {'start': 0, 'end': 23, 'text': 'Environmental Committee', 'labels': ['ADMINISTRATIVE BODY']},
+#    {'start': 38, 'end': 50, 'text': 'central park', 'labels': ['PRIMARY LOCATION']},
+#    {'start': 65, 'end': 75, 'text': '2025-12-15', 'labels': ['PUBLICATION DATE']}
+#]
+print(annotations["spans"])
+# Output:
+#[
+#    {'start': 0, 'end': 76, 'text': 'Environmental Committee discussed the central park renovation on 2025-12-15.', 'labels': ['MOTIVATION']}
+#]
+print(annotations["plain_text"])
+# Output: "Environmental Committee discussed the central park renovation on 2025-12-15."
+```
+### Rebuild Tagged Text
+Reconstruct a string with XML-like tags from raw text and span/entity lists.
+```python
+text = "On 2026-01-12, the Budget Committee finalized the annual report."
+# Spans corresponding to 'MOTIVATION' label, mapped to 'motivation' tag
+spans = [{"start": 0, "end": 64, "labels": ["motivation"]}]
+# Entities corresponding to 'ADMINISTRATIVE BODY' label, mapped to 'administrative_body' tag
+entities = [{"start": 15, "end": 35, "labels": ["administrative_body"]}]
+tagged, stats = SpanAligner.rebuild_tagged_text(text, spans, entities)
+print(tagged)
+# Output: <motivation>On 2026-01-12, the <administrative_body>Budget Committee</administrative_body> finalized the annual report.</motivation>
+```
+### Rebuild Tagged Text from Task
+Generate tagged text directly from a Label Studio task object.
+```python
+# Assuming 'task' is a Label Studio task object (or similar structure)
+# with .data['text'] and .annotations attributes
+mapping = {
+    "DECISION": "decision",
+    "LEGAL FRAMEWORK": "legal_framework",
+    "EXPIRATION DATE": "expiry_date"
+}
+tagged_output = SpanAligner.rebuild_tagged_text_from_task(task, mapping)
+print(tagged_output)
+```
+### Map Tags to Original
+Align annotated spans from a tagged string back to their positions in the original text, keeping the mistakes and text as written in the original.
+```python
+original_text = "Budget Budget Committee met on 2026-01-12 to view\n\n the central park prject."
+# Imagine the text was slightly modified or translated, but tags are present
+tagged_text = "<administrative_body>Budget Committee</administrative_body> met on <publication_date>2026-01-12</publication_date> to review the <impact_location>central park</impact_location> project."
+mapped_tagged_text = SpanAligner.map_tags_to_original(
+    original_text=original_text,
+    tagged_text=tagged_text,
+    min_ratio=0.7
+)
+print(mapped_tagged_text)
+# Output might look like: "Budget <administrative_body>Budget Committee</administrative_body> met on <publication_date>2026-01-12</publication_date> to view\n\n the <impact_location>central park</impact_location> prject."
+```
+### Map Tags to Original and Get Positions
+Combine mapping tags to original text and extracting entities with correct labels.
+```python
+original_text = "Legal basis: Art. 5. The Env. Committee met on 2026-01-12."
+tagged_text = "Legal basis: <article>Art. 5</article>. The <administrative_body>Environmental Committee</administrative_body> met on <session_date>2026-01-12</session_date>."
+# 1. Map tags to the noisy original text
+mapped_tagged_text = SpanAligner.map_tags_to_original(
+    original_text=original_text,
+    tagged_text=tagged_text,
+    min_ratio=0.7
+)
+# 2. Extract annotations using the mapping
+ner_label_mapping = {
+    "administrative_body": "ADMINISTRATIVE BODY",
+    "session_date": "SESSION DATE",
+    "article": "ARTICLE"
+}
+annotations = SpanAligner.get_annotations_from_tagged_text(
+    mapped_tagged_text,
+    ner_map=ner_label_mapping
+)
+print(annotations["entities"])
+# Output:
+# [
+#  {'start': 13, 'end': 19, 'text': 'Art. 5', 'labels': ['ARTICLE']},
+#  {'start': 47, 'end': 57, 'text': '2026-01-12', 'labels': ['SESSION DATE']}
+# ]
+```

span_aligner-0.1.2/README.md ADDED Viewed

@@ -0,0 +1,156 @@
+# Span Aligner
+A utility for aligning and mapping text spans between different text representations, particularly useful for Label Studio annotation compatibility.
+## Features
+- Sanitize span boundaries to avoid special characters.
+- Find exact and fuzzy matches of text segments in original documents.
+- Map spans from one text representation to another.
+- Rebuild tagged text with nested annotations.
+- Merge result objects containing span annotations.
+## Installation
+Install from source:
+```bash
+pip install span-aligner
+```
+## Usage
+### Get Annotations from Tagged Text
+Extract structured spans and entities from a string with inline tags.
+```python
+tagged_input = "<administrative_body>Environmental Committee</administrative_body> discussed the <impact_location>central park</impact_location> renovation on <publication_date>2025-12-15</publication_date>."
+ner_map = {
+    "administrative_body": "ADMINISTRATIVE BODY",
+    "publication_date": "PUBLICATION DATE",
+    "impact_location": "PRIMARY LOCATION"
+}
+span_map ={
+    "motivation" : "MOTIVATION"
+}
+annotations = SpanAligner.get_annotations_from_tagged_text(
+    tagged_input,
+    ner_map=ner_map,
+    span_map=span_map
+)
+print(annotations["entities"])
+# Output:
+#[
+#    {'start': 0, 'end': 23, 'text': 'Environmental Committee', 'labels': ['ADMINISTRATIVE BODY']},
+#    {'start': 38, 'end': 50, 'text': 'central park', 'labels': ['PRIMARY LOCATION']},
+#    {'start': 65, 'end': 75, 'text': '2025-12-15', 'labels': ['PUBLICATION DATE']}
+#]
+print(annotations["spans"])
+# Output:
+#[
+#    {'start': 0, 'end': 76, 'text': 'Environmental Committee discussed the central park renovation on 2025-12-15.', 'labels': ['MOTIVATION']}
+#]
+print(annotations["plain_text"])
+# Output: "Environmental Committee discussed the central park renovation on 2025-12-15."
+```
+### Rebuild Tagged Text
+Reconstruct a string with XML-like tags from raw text and span/entity lists.
+```python
+text = "On 2026-01-12, the Budget Committee finalized the annual report."
+# Spans corresponding to 'MOTIVATION' label, mapped to 'motivation' tag
+spans = [{"start": 0, "end": 64, "labels": ["motivation"]}]
+# Entities corresponding to 'ADMINISTRATIVE BODY' label, mapped to 'administrative_body' tag
+entities = [{"start": 15, "end": 35, "labels": ["administrative_body"]}]
+tagged, stats = SpanAligner.rebuild_tagged_text(text, spans, entities)
+print(tagged)
+# Output: <motivation>On 2026-01-12, the <administrative_body>Budget Committee</administrative_body> finalized the annual report.</motivation>
+```
+### Rebuild Tagged Text from Task
+Generate tagged text directly from a Label Studio task object.
+```python
+# Assuming 'task' is a Label Studio task object (or similar structure)
+# with .data['text'] and .annotations attributes
+mapping = {
+    "DECISION": "decision",
+    "LEGAL FRAMEWORK": "legal_framework",
+    "EXPIRATION DATE": "expiry_date"
+}
+tagged_output = SpanAligner.rebuild_tagged_text_from_task(task, mapping)
+print(tagged_output)
+```
+### Map Tags to Original
+Align annotated spans from a tagged string back to their positions in the original text, keeping the mistakes and text as written in the original.
+```python
+original_text = "Budget Budget Committee met on 2026-01-12 to view\n\n the central park prject."
+# Imagine the text was slightly modified or translated, but tags are present
+tagged_text = "<administrative_body>Budget Committee</administrative_body> met on <publication_date>2026-01-12</publication_date> to review the <impact_location>central park</impact_location> project."
+mapped_tagged_text = SpanAligner.map_tags_to_original(
+    original_text=original_text,
+    tagged_text=tagged_text,
+    min_ratio=0.7
+)
+print(mapped_tagged_text)
+# Output might look like: "Budget <administrative_body>Budget Committee</administrative_body> met on <publication_date>2026-01-12</publication_date> to view\n\n the <impact_location>central park</impact_location> prject."
+```
+### Map Tags to Original and Get Positions
+Combine mapping tags to original text and extracting entities with correct labels.
+```python
+original_text = "Legal basis: Art. 5. The Env. Committee met on 2026-01-12."
+tagged_text = "Legal basis: <article>Art. 5</article>. The <administrative_body>Environmental Committee</administrative_body> met on <session_date>2026-01-12</session_date>."
+# 1. Map tags to the noisy original text
+mapped_tagged_text = SpanAligner.map_tags_to_original(
+    original_text=original_text,
+    tagged_text=tagged_text,
+    min_ratio=0.7
+)
+# 2. Extract annotations using the mapping
+ner_label_mapping = {
+    "administrative_body": "ADMINISTRATIVE BODY",
+    "session_date": "SESSION DATE",
+    "article": "ARTICLE"
+}
+annotations = SpanAligner.get_annotations_from_tagged_text(
+    mapped_tagged_text,
+    ner_map=ner_label_mapping
+)
+print(annotations["entities"])
+# Output:
+# [
+#  {'start': 13, 'end': 19, 'text': 'Art. 5', 'labels': ['ARTICLE']},
+#  {'start': 47, 'end': 57, 'text': '2026-01-12', 'labels': ['SESSION DATE']}
+# ]
+```

{span_aligner-0.1.0 → span_aligner-0.1.2}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "span-aligner"
-version = "0.1.0"
+version = "0.1.2"
 description = "A utility for aligning and mapping text spans between different text representations."
 readme = "README.md"
 requires-python = ">=3.8"

{span_aligner-0.1.0 → span_aligner-0.1.2}/span_aligner/aligner.py RENAMED Viewed

@@ -442,7 +442,7 @@ class SpanAligner:
         all_spans_aligned, updated["spans"] = realign(input_spans, enable_fuzzy)
         all_entities_aligned, updated["entities"] = realign(input_entities, enable_fuzzy)
         updated["task"]["data"]["text"] = original_text
-        return all_spans_aligned and all_entities_aligned, updated
+        return  updated, all_spans_aligned and all_entities_aligned
     @staticmethod
@@ -713,7 +713,7 @@ class SpanAligner:
             raise ValueError("No tagged_text found in input result.")
         # Default allowed tags (from your SYSTEM/USER prompts)
-        if allowed_tags is None:
+        if allowed_tags is None and (span_map or ner_map):
             # Safely handle None maps
             s_map = span_map or {}
             n_map = ner_map or {}
@@ -725,10 +725,6 @@ class SpanAligner:
             if mapping:
                 annotation_map.update(mapping)
-        # If annotation_map ends up empty, initialize with identity mapping
-        if not annotation_map:
-            annotation_map = {t: t for t in allowed_tags}
         # Regex to capture bare tags like <tag> or </tag>
         tag_re = re.compile(r"<(/?)([a-zA-Z_][a-zA-Z0-9_-]*)>")
@@ -770,7 +766,7 @@ class SpanAligner:
                     inside_attachments_level = max(0, inside_attachments_level - 1)
             # Handle span stack only for allowed tags
-            if tag_name in allowed_tags:
+            if allowed_tags is None or tag_name in allowed_tags:
                 if not is_closing:
                     # Opening tag
                     stack.append((tag_name, pos_out))
@@ -809,13 +805,14 @@ class SpanAligner:
                                         "start": adjusted_start,
                                         "end": adjusted_end,
                                         "text": span_text,
-                                        "labels": [annotation_map.get(tag_name, tag_name)]
+                                        "labels": [annotation_map.get(tag_name, tag_name) if annotation_map else tag_name]
                                     }
                                     if ner_map and tag_name in ner_map:
                                         entities.append(annotation_entry)
                                     else:
                                         spans.append(annotation_entry)
                             found_open = True
                             break
                     # If no matching opening tag found, ignore gracefully
@@ -914,8 +911,8 @@ class SpanAligner:
     @staticmethod
     def rebuild_tagged_text(
         original_text: str,
-        spans: List[Dict[str, Any]],
-        entities: List[Dict[str, Any]],
+        spans: List[Dict[str, Any]] = None,
+        entities: List[Dict[str, Any]] = None,
         label_to_tag: Optional[Dict[str, str]] = None
     ) -> Tuple[str, Dict[str, int]]:
         """
@@ -983,8 +980,10 @@ class SpanAligner:
                     "length": e - s,
                 })
-        add_items(spans)
-        add_items(entities)
+        if spans and len(spans)>0:
+            add_items(spans)
+        if entities and len(entities)>0:
+            add_items(entities)
         # Sort: by start asc, longer first (end desc) to open outers before inners
         annotations.sort(key=lambda a: (a["start"], -a["length"]))
@@ -1079,7 +1078,9 @@ class SpanAligner:
         tagged_text: str,
         min_ratio: float = 0.8,
         max_dist: int = 20,
+        enable_fuzzy: bool = False,
         logging: bool = False
     ) -> str:
         """
         Map spans from tagged text back to their positions in the original text.
@@ -1115,12 +1116,15 @@ class SpanAligner:
         }
         # Now map spans/entities back to original_text
-        success, mapped = SpanAligner.map_spans_to_original(
+        mapped, _ = SpanAligner.map_spans_to_original(
             original_text,
             result_obj,
             min_ratio=min_ratio,
             max_dist=max_dist,
-            logging=logging
+            enable_fuzzy = enable_fuzzy,
+            logging=logging,
         )
-        return mapped["task"]["data"].get("tagged_text", "")
+        original_text_tagged, _ = SpanAligner.rebuild_tagged_text(original_text, spans = mapped.get("spans", []))
+        return original_text_tagged

span_aligner-0.1.2/span_aligner.egg-info/PKG-INFO ADDED Viewed

@@ -0,0 +1,169 @@
+Metadata-Version: 2.4
+Name: span-aligner
+Version: 0.1.2
+Summary: A utility for aligning and mapping text spans between different text representations.
+License: MIT
+Requires-Python: >=3.8
+Description-Content-Type: text/markdown
+License-File: LICENSE
+Requires-Dist: rapidfuzz>=3.0.0
+Provides-Extra: dev
+Requires-Dist: pytest>=7.0.0; extra == "dev"
+Dynamic: license-file
+# Span Aligner
+A utility for aligning and mapping text spans between different text representations, particularly useful for Label Studio annotation compatibility.
+## Features
+- Sanitize span boundaries to avoid special characters.
+- Find exact and fuzzy matches of text segments in original documents.
+- Map spans from one text representation to another.
+- Rebuild tagged text with nested annotations.
+- Merge result objects containing span annotations.
+## Installation
+Install from source:
+```bash
+pip install span-aligner
+```
+## Usage
+### Get Annotations from Tagged Text
+Extract structured spans and entities from a string with inline tags.
+```python
+tagged_input = "<administrative_body>Environmental Committee</administrative_body> discussed the <impact_location>central park</impact_location> renovation on <publication_date>2025-12-15</publication_date>."
+ner_map = {
+    "administrative_body": "ADMINISTRATIVE BODY",
+    "publication_date": "PUBLICATION DATE",
+    "impact_location": "PRIMARY LOCATION"
+}
+span_map ={
+    "motivation" : "MOTIVATION"
+}
+annotations = SpanAligner.get_annotations_from_tagged_text(
+    tagged_input,
+    ner_map=ner_map,
+    span_map=span_map
+)
+print(annotations["entities"])
+# Output:
+#[
+#    {'start': 0, 'end': 23, 'text': 'Environmental Committee', 'labels': ['ADMINISTRATIVE BODY']},
+#    {'start': 38, 'end': 50, 'text': 'central park', 'labels': ['PRIMARY LOCATION']},
+#    {'start': 65, 'end': 75, 'text': '2025-12-15', 'labels': ['PUBLICATION DATE']}
+#]
+print(annotations["spans"])
+# Output:
+#[
+#    {'start': 0, 'end': 76, 'text': 'Environmental Committee discussed the central park renovation on 2025-12-15.', 'labels': ['MOTIVATION']}
+#]
+print(annotations["plain_text"])
+# Output: "Environmental Committee discussed the central park renovation on 2025-12-15."
+```
+### Rebuild Tagged Text
+Reconstruct a string with XML-like tags from raw text and span/entity lists.
+```python
+text = "On 2026-01-12, the Budget Committee finalized the annual report."
+# Spans corresponding to 'MOTIVATION' label, mapped to 'motivation' tag
+spans = [{"start": 0, "end": 64, "labels": ["motivation"]}]
+# Entities corresponding to 'ADMINISTRATIVE BODY' label, mapped to 'administrative_body' tag
+entities = [{"start": 15, "end": 35, "labels": ["administrative_body"]}]
+tagged, stats = SpanAligner.rebuild_tagged_text(text, spans, entities)
+print(tagged)
+# Output: <motivation>On 2026-01-12, the <administrative_body>Budget Committee</administrative_body> finalized the annual report.</motivation>
+```
+### Rebuild Tagged Text from Task
+Generate tagged text directly from a Label Studio task object.
+```python
+# Assuming 'task' is a Label Studio task object (or similar structure)
+# with .data['text'] and .annotations attributes
+mapping = {
+    "DECISION": "decision",
+    "LEGAL FRAMEWORK": "legal_framework",
+    "EXPIRATION DATE": "expiry_date"
+}
+tagged_output = SpanAligner.rebuild_tagged_text_from_task(task, mapping)
+print(tagged_output)
+```
+### Map Tags to Original
+Align annotated spans from a tagged string back to their positions in the original text, keeping the mistakes and text as written in the original.
+```python
+original_text = "Budget Budget Committee met on 2026-01-12 to view\n\n the central park prject."
+# Imagine the text was slightly modified or translated, but tags are present
+tagged_text = "<administrative_body>Budget Committee</administrative_body> met on <publication_date>2026-01-12</publication_date> to review the <impact_location>central park</impact_location> project."
+mapped_tagged_text = SpanAligner.map_tags_to_original(
+    original_text=original_text,
+    tagged_text=tagged_text,
+    min_ratio=0.7
+)
+print(mapped_tagged_text)
+# Output might look like: "Budget <administrative_body>Budget Committee</administrative_body> met on <publication_date>2026-01-12</publication_date> to view\n\n the <impact_location>central park</impact_location> prject."
+```
+### Map Tags to Original and Get Positions
+Combine mapping tags to original text and extracting entities with correct labels.
+```python
+original_text = "Legal basis: Art. 5. The Env. Committee met on 2026-01-12."
+tagged_text = "Legal basis: <article>Art. 5</article>. The <administrative_body>Environmental Committee</administrative_body> met on <session_date>2026-01-12</session_date>."
+# 1. Map tags to the noisy original text
+mapped_tagged_text = SpanAligner.map_tags_to_original(
+    original_text=original_text,
+    tagged_text=tagged_text,
+    min_ratio=0.7
+)
+# 2. Extract annotations using the mapping
+ner_label_mapping = {
+    "administrative_body": "ADMINISTRATIVE BODY",
+    "session_date": "SESSION DATE",
+    "article": "ARTICLE"
+}
+annotations = SpanAligner.get_annotations_from_tagged_text(
+    mapped_tagged_text,
+    ner_map=ner_label_mapping
+)
+print(annotations["entities"])
+# Output:
+# [
+#  {'start': 13, 'end': 19, 'text': 'Art. 5', 'labels': ['ARTICLE']},
+#  {'start': 47, 'end': 57, 'text': '2026-01-12', 'labels': ['SESSION DATE']}
+# ]
+```

span_aligner-0.1.0/PKG-INFO DELETED Viewed

@@ -1,122 +0,0 @@
-Metadata-Version: 2.4
-Name: span-aligner
-Version: 0.1.0
-Summary: A utility for aligning and mapping text spans between different text representations.
-License: MIT
-Requires-Python: >=3.8
-Description-Content-Type: text/markdown
-License-File: LICENSE
-Requires-Dist: rapidfuzz>=3.0.0
-Provides-Extra: dev
-Requires-Dist: pytest>=7.0.0; extra == "dev"
-Dynamic: license-file
-# Span Aligner
-A utility for aligning and mapping text spans between different text representations, particularly useful for Label Studio annotation compatibility.
-## Features
-- Sanitize span boundaries to avoid special characters.
-- Find exact and fuzzy matches of text segments in original documents.
-- Map spans from one text representation to another.
-- Rebuild tagged text with nested annotations.
-- Merge result objects containing span annotations.
-## Installation
-Install from source:
-```bash
-pip install .
-```
-For development:
-```bash
-pip install -e ".[dev]"
-```
-## Usage
-```python
-from span_aligner import SpanAligner
-original = "Hello, World!"
-result_obj = {
-    "spans": [{"start": 0, "end": 5, "text": "Hello", "labels": ["greeting"]}],
-    "entities": [],
-    "task": {"data": {"text": ""}}
-}
-success, mapped = SpanAligner.map_spans_to_original(original, result_obj)
-print(mapped)
-```
-### Map Tags to Original
-Align annotated spans from a tagged string back to their positions in the original text, keeping the mistakes and original text as written in the original.
-```python
-original_text = "The quick brown fox jumps\n\n over the dog."
-# Imagine the text was slightly modified or translated, but tags are present
-tagged_text = "The <adj>quick</adj> brown fox jumps over the <animal>dog</animal>."
-mapped_tagged_text = SpanAligner.map_tags_to_original(
-    original_text=original_text,
-    tagged_text=tagged_text,
-    min_ratio=0.8
-)
-print(mapped_tagged_text)
-# Output might look like: "The <adj>quick</adj> brown fox jumps\n\n over the <animal>dog</animal>."
-# (If original text differed slightly, tags would be placed on best matching spans)
-```
-### Rebuild Tagged Text
-Reconstruct a string with XML-like tags from raw text and span/entity lists.
-```python
-text = "Hello World"
-spans = [{"start": 0, "end": 11, "labels": ["sentence"]}]
-entities = [{"start": 6, "end": 11, "labels": ["location"]}]
-tagged, stats = SpanAligner.rebuild_tagged_text(text, spans, entities)
-print(tagged)
-# Output: <sentence>Hello <location>World</location></sentence>
-```
-### Rebuild Tagged Text from Task
-Generate tagged text directly from a Label Studio task object.
-```python
-# Assuming 'task' is a Label Studio task object (or similar structure)
-# with .data['text'] and .annotations attributes
-mapping = {"Location": "loc", "Person": "per"}
-tagged_output = SpanAligner.rebuild_tagged_text_from_task(task, mapping)
-print(tagged_output)
-```
-### Get Annotations from Tagged Text
-Extract structured spans and entities from a string with inline tags.
-```python
-tagged_input = "Visit <loc>Paris</loc> and see the <landmark>Eiffel Tower</landmark>."
-annotations = SpanAligner.get_annotations_from_tagged_text(
-    tagged_input,
-    ner_map={"loc": "Location", "landmark": "Location"}
-)
-print(annotations["entities"])
-# Output:
-# [
-#   {"start": 6, "end": 11, "text": "Paris", "labels": ["Location"]},
-#   {"start": 24, "end": 36, "text": "Eiffel Tower", "labels": ["Location"]}
-# ]
-print(annotations["plain_text"])
-# Output: "Visit Paris and see the Eiffel Tower."
-```

span_aligner-0.1.0/README.md DELETED Viewed

@@ -1,109 +0,0 @@
-# Span Aligner
-A utility for aligning and mapping text spans between different text representations, particularly useful for Label Studio annotation compatibility.
-## Features
-- Sanitize span boundaries to avoid special characters.
-- Find exact and fuzzy matches of text segments in original documents.
-- Map spans from one text representation to another.
-- Rebuild tagged text with nested annotations.
-- Merge result objects containing span annotations.
-## Installation
-Install from source:
-```bash
-pip install .
-```
-For development:
-```bash
-pip install -e ".[dev]"
-```
-## Usage
-```python
-from span_aligner import SpanAligner
-original = "Hello, World!"
-result_obj = {
-    "spans": [{"start": 0, "end": 5, "text": "Hello", "labels": ["greeting"]}],
-    "entities": [],
-    "task": {"data": {"text": ""}}
-}
-success, mapped = SpanAligner.map_spans_to_original(original, result_obj)
-print(mapped)
-```
-### Map Tags to Original
-Align annotated spans from a tagged string back to their positions in the original text, keeping the mistakes and original text as written in the original.
-```python
-original_text = "The quick brown fox jumps\n\n over the dog."
-# Imagine the text was slightly modified or translated, but tags are present
-tagged_text = "The <adj>quick</adj> brown fox jumps over the <animal>dog</animal>."
-mapped_tagged_text = SpanAligner.map_tags_to_original(
-    original_text=original_text,
-    tagged_text=tagged_text,
-    min_ratio=0.8
-)
-print(mapped_tagged_text)
-# Output might look like: "The <adj>quick</adj> brown fox jumps\n\n over the <animal>dog</animal>."
-# (If original text differed slightly, tags would be placed on best matching spans)
-```
-### Rebuild Tagged Text
-Reconstruct a string with XML-like tags from raw text and span/entity lists.
-```python
-text = "Hello World"
-spans = [{"start": 0, "end": 11, "labels": ["sentence"]}]
-entities = [{"start": 6, "end": 11, "labels": ["location"]}]
-tagged, stats = SpanAligner.rebuild_tagged_text(text, spans, entities)
-print(tagged)
-# Output: <sentence>Hello <location>World</location></sentence>
-```
-### Rebuild Tagged Text from Task
-Generate tagged text directly from a Label Studio task object.
-```python
-# Assuming 'task' is a Label Studio task object (or similar structure)
-# with .data['text'] and .annotations attributes
-mapping = {"Location": "loc", "Person": "per"}
-tagged_output = SpanAligner.rebuild_tagged_text_from_task(task, mapping)
-print(tagged_output)
-```
-### Get Annotations from Tagged Text
-Extract structured spans and entities from a string with inline tags.
-```python
-tagged_input = "Visit <loc>Paris</loc> and see the <landmark>Eiffel Tower</landmark>."
-annotations = SpanAligner.get_annotations_from_tagged_text(
-    tagged_input,
-    ner_map={"loc": "Location", "landmark": "Location"}
-)
-print(annotations["entities"])
-# Output:
-# [
-#   {"start": 6, "end": 11, "text": "Paris", "labels": ["Location"]},
-#   {"start": 24, "end": 36, "text": "Eiffel Tower", "labels": ["Location"]}
-# ]
-print(annotations["plain_text"])
-# Output: "Visit Paris and see the Eiffel Tower."
-```

span_aligner-0.1.0/span_aligner.egg-info/PKG-INFO DELETED Viewed

@@ -1,122 +0,0 @@
-Metadata-Version: 2.4
-Name: span-aligner
-Version: 0.1.0
-Summary: A utility for aligning and mapping text spans between different text representations.
-License: MIT
-Requires-Python: >=3.8
-Description-Content-Type: text/markdown
-License-File: LICENSE
-Requires-Dist: rapidfuzz>=3.0.0
-Provides-Extra: dev
-Requires-Dist: pytest>=7.0.0; extra == "dev"
-Dynamic: license-file
-# Span Aligner
-A utility for aligning and mapping text spans between different text representations, particularly useful for Label Studio annotation compatibility.
-## Features
-- Sanitize span boundaries to avoid special characters.
-- Find exact and fuzzy matches of text segments in original documents.
-- Map spans from one text representation to another.
-- Rebuild tagged text with nested annotations.
-- Merge result objects containing span annotations.
-## Installation
-Install from source:
-```bash
-pip install .
-```
-For development:
-```bash
-pip install -e ".[dev]"
-```
-## Usage
-```python
-from span_aligner import SpanAligner
-original = "Hello, World!"
-result_obj = {
-    "spans": [{"start": 0, "end": 5, "text": "Hello", "labels": ["greeting"]}],
-    "entities": [],
-    "task": {"data": {"text": ""}}
-}
-success, mapped = SpanAligner.map_spans_to_original(original, result_obj)
-print(mapped)
-```
-### Map Tags to Original
-Align annotated spans from a tagged string back to their positions in the original text, keeping the mistakes and original text as written in the original.
-```python
-original_text = "The quick brown fox jumps\n\n over the dog."
-# Imagine the text was slightly modified or translated, but tags are present
-tagged_text = "The <adj>quick</adj> brown fox jumps over the <animal>dog</animal>."
-mapped_tagged_text = SpanAligner.map_tags_to_original(
-    original_text=original_text,
-    tagged_text=tagged_text,
-    min_ratio=0.8
-)
-print(mapped_tagged_text)
-# Output might look like: "The <adj>quick</adj> brown fox jumps\n\n over the <animal>dog</animal>."
-# (If original text differed slightly, tags would be placed on best matching spans)
-```
-### Rebuild Tagged Text
-Reconstruct a string with XML-like tags from raw text and span/entity lists.
-```python
-text = "Hello World"
-spans = [{"start": 0, "end": 11, "labels": ["sentence"]}]
-entities = [{"start": 6, "end": 11, "labels": ["location"]}]
-tagged, stats = SpanAligner.rebuild_tagged_text(text, spans, entities)
-print(tagged)
-# Output: <sentence>Hello <location>World</location></sentence>
-```
-### Rebuild Tagged Text from Task
-Generate tagged text directly from a Label Studio task object.
-```python
-# Assuming 'task' is a Label Studio task object (or similar structure)
-# with .data['text'] and .annotations attributes
-mapping = {"Location": "loc", "Person": "per"}
-tagged_output = SpanAligner.rebuild_tagged_text_from_task(task, mapping)
-print(tagged_output)
-```
-### Get Annotations from Tagged Text
-Extract structured spans and entities from a string with inline tags.
-```python
-tagged_input = "Visit <loc>Paris</loc> and see the <landmark>Eiffel Tower</landmark>."
-annotations = SpanAligner.get_annotations_from_tagged_text(
-    tagged_input,
-    ner_map={"loc": "Location", "landmark": "Location"}
-)
-print(annotations["entities"])
-# Output:
-# [
-#   {"start": 6, "end": 11, "text": "Paris", "labels": ["Location"]},
-#   {"start": 24, "end": 36, "text": "Eiffel Tower", "labels": ["Location"]}
-# ]
-print(annotations["plain_text"])
-# Output: "Visit Paris and see the Eiffel Tower."
-```