struggle-annotator 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1 @@
1
+ recursive-include struggle_annotator/frontend/build *
@@ -0,0 +1,196 @@
1
+ Metadata-Version: 2.4
2
+ Name: struggle-annotator
3
+ Version: 0.1.0
4
+ Summary: Streamlit custom component for interactive NER-style text annotation
5
+ License: MIT
6
+ Requires-Python: >=3.9
7
+ Description-Content-Type: text/markdown
8
+ Requires-Dist: streamlit>=1.28
9
+ Provides-Extra: dev
10
+ Requires-Dist: build; extra == "dev"
11
+ Requires-Dist: twine; extra == "dev"
12
+
13
+ # Struggle Annotator
14
+
15
+ A Streamlit custom component for interactive text annotation, useful for NER-style labeling tasks. The Python wrapper is published as `struggle_annotator`; the frontend is built with TypeScript and React per the standard Streamlit Components pattern.
16
+
17
+ ## Installation
18
+
19
+ ```bash
20
+ pip install struggle-annotator
21
+ ```
22
+
23
+ ## Quick Start
24
+
25
+ ```python
26
+ import streamlit as st
27
+ from struggle_annotator import txt_annotator
28
+
29
+ text = (
30
+ "Yesterday, at 3 PM, Emily Johnson and Michael Smith met at the Central Park "
31
+ "in New York to discuss the merger between TechCorp and Global Solutions.\n\n"
32
+ "The deal, worth approximately 500 million dollars, is expected to "
33
+ "significantly impact the tech industry. Later, at 6 PM, they joined a "
34
+ "conference call with the CEO of TechCorp, David Brown, who was in London "
35
+ "for a technology summit. During the call, they discussed the market trends "
36
+ "in Asia and Europe and planned for the next quarterly meeting, which is "
37
+ "scheduled for January 15th, 2024, in Paris."
38
+ )
39
+
40
+ label_dict = {
41
+ "Personal names": {"color": "red"},
42
+ "Organizations": {"color": "blue"},
43
+ "Locations": {"color": "green"},
44
+ "Time": {"color": "orange"},
45
+ "Money": {"color": "purple"},
46
+ }
47
+
48
+ label_dict = txt_annotator(text, label_dict)
49
+ st.json(label_dict)
50
+ ```
51
+
52
+ ## UI Layout
53
+
54
+ The component renders two stacked regions:
55
+
56
+ 1. **Top — Entity legend.** One button per entity, styled with that entity's color. Clicking a button makes it the *active* entity; the active button is visually emphasized (border + slight scale).
57
+ 2. **Bottom — Annotatable document.** The full `text` is shown with any existing annotations highlighted in their entity's color. Below the text, a live status line shows the currently selected span and the active label.
58
+
59
+ ## API
60
+
61
+ ### Signature
62
+
63
+ ```python
64
+ txt_annotator(text: str, label_dict: dict, key: str | None = None) -> dict
65
+ ```
66
+
67
+ ### Parameters
68
+
69
+ - **`text`** (`str`): The raw text to annotate. Treated as a Python string; offsets are measured in Python `str` indices (UTF-16-independent, code-point–based).
70
+ - **`label_dict`** (`dict[str, dict]`): Defines entities. Each key is the label name; each value must contain:
71
+ - `color` (`str`, required): Any valid CSS color (`"red"`, `"#ff8800"`, `"rgb(0, 128, 255)"`).
72
+ - `annotation` (`list[dict]`, optional): Pre-existing spans rendered on load. Each entry has the shape `{"start": int, "end": int, "value": str}`. If omitted, it is initialized to `[]`.
73
+ - **`key`** (`str`, optional): Standard Streamlit component key. Required if you render multiple annotators on the same page.
74
+
75
+ ### Returns
76
+
77
+ The same `label_dict` shape, with every entity's `annotation` list reflecting the current state of the UI. The function returns on every interaction (Streamlit's standard component re-run model), so the latest annotations are always available after the call.
78
+
79
+ ### Offsets
80
+
81
+ - Half-open interval `[start, end)`, matching Python slicing: `text[start:end] == value`.
82
+ - Indices are over the raw `text` string, including newlines and whitespace.
83
+
84
+ ## Annotation Workflow
85
+
86
+ 1. The user clicks an entity button in the legend. That entity becomes active.
87
+ 2. The user selects a span of text with the mouse.
88
+ 3. On `mouseup`, leading/trailing whitespace is trimmed from the selection. If the trimmed span is empty, the selection is ignored.
89
+ 4. The trimmed span is highlighted in the active entity's color and appended to that entity's `annotation` list.
90
+ 5. The status line below the text updates to show the selected text and label.
91
+ 6. To remove an annotation, the user clicks an existing highlighted span (a single click with no drag). The highlight is removed and the corresponding entry is deleted from `label_dict`.
92
+
93
+ If text is selected while **no** entity is active, the selection is ignored and a hint is shown in the status line ("Select an entity first").
94
+
95
+ ### Overlap Policy
96
+
97
+ New spans that overlap an existing annotation are **rejected** by default, and a brief warning is shown in the status line. This avoids ambiguous nested annotations in v1. (Allowing nesting or replacement is out of scope for the initial release; see *Non-goals* below.)
98
+
99
+ ### Click vs. Drag
100
+
101
+ A click on a highlighted span is interpreted as "remove" only when the `mousedown` and `mouseup` positions are within the same span and no selection range was produced. Any drag that produces a non-empty selection is treated as a new annotation attempt, never as a remove.
102
+
103
+ ## State Model
104
+
105
+ The component uses Streamlit's standard component value mechanism. Internally, the frontend keeps its own annotation state and sends the updated `label_dict` back to Python on every change. Streamlit re-runs the script with the new return value; no `st.session_state` plumbing is required from the caller.
106
+
107
+ If you want to persist annotations across page reloads or sessions, store the returned `label_dict` in `st.session_state` or write it to disk yourself.
108
+
109
+ ## Data Examples
110
+
111
+ ### Input
112
+
113
+ ```python
114
+ label_dict = {
115
+ "Personal names": {"color": "red"},
116
+ "Organizations": {"color": "blue"},
117
+ "Locations": {"color": "green"},
118
+ "Time": {"color": "orange"},
119
+ "Money": {"color": "purple"},
120
+ }
121
+ ```
122
+
123
+ ### Output (after annotation)
124
+
125
+ ```python
126
+ {
127
+ "Personal names": {
128
+ "color": "red",
129
+ "annotation": [
130
+ {"start": 20, "end": 33, "value": "Emily Johnson"},
131
+ {"start": 38, "end": 51, "value": "Michael Smith"},
132
+ {"start": 327, "end": 338, "value": "David Brown"},
133
+ ],
134
+ },
135
+ "Organizations": {
136
+ "color": "blue",
137
+ "annotation": [
138
+ {"start": 118, "end": 126, "value": "TechCorp"},
139
+ {"start": 131, "end": 147, "value": "Global Solutions"},
140
+ ],
141
+ },
142
+ "Locations": {
143
+ "color": "green",
144
+ "annotation": [
145
+ {"start": 63, "end": 75, "value": "Central Park"},
146
+ {"start": 79, "end": 87, "value": "New York"},
147
+ {"start": 351, "end": 357, "value": "London"},
148
+ {"start": 436, "end": 440, "value": "Asia"},
149
+ {"start": 445, "end": 451, "value": "Europe"},
150
+ {"start": 542, "end": 547, "value": "Paris"},
151
+ ],
152
+ },
153
+ "Time": {
154
+ "color": "orange",
155
+ "annotation": [
156
+ {"start": 0, "end": 9, "value": "Yesterday"},
157
+ {"start": 14, "end": 18, "value": "3 PM"},
158
+ {"start": 265, "end": 269, "value": "6 PM"},
159
+ {"start": 519, "end": 531, "value": "January 15th"},
160
+ {"start": 533, "end": 537, "value": "2024"},
161
+ ],
162
+ },
163
+ "Money": {
164
+ "color": "purple",
165
+ "annotation": [
166
+ {"start": 179, "end": 198, "value": "500 million dollars"},
167
+ ],
168
+ },
169
+ }
170
+ ```
171
+
172
+ Annotations within each entity are sorted by `start` ascending. Key order within each annotation is `start`, `end`, `value`.
173
+
174
+ ## Non-goals (v1)
175
+
176
+ - Nested or overlapping annotations.
177
+ - Relation annotation between spans.
178
+ - Multi-document workflows or document navigation.
179
+ - Keyboard shortcuts (planned for a future release).
180
+ - Annotation history / undo-redo beyond the most recent action.
181
+
182
+ ## Development
183
+
184
+ The frontend lives in `frontend/` (TypeScript + React, built with Vite). The Python wrapper in `struggle_annotator/__init__.py` declares the component via `streamlit.components.v1.declare_component` and re-exports `txt_annotator`.
185
+
186
+ ```bash
187
+ # Frontend dev
188
+ cd frontend
189
+ npm install
190
+ npm run dev
191
+
192
+ # Python (editable install)
193
+ pip install -e .
194
+ ```
195
+
196
+ Set `_RELEASE = False` in `struggle_annotator/__init__.py` during local development to point at the Vite dev server.
@@ -0,0 +1,184 @@
1
+ # Struggle Annotator
2
+
3
+ A Streamlit custom component for interactive text annotation, useful for NER-style labeling tasks. The Python wrapper is published as `struggle_annotator`; the frontend is built with TypeScript and React per the standard Streamlit Components pattern.
4
+
5
+ ## Installation
6
+
7
+ ```bash
8
+ pip install struggle-annotator
9
+ ```
10
+
11
+ ## Quick Start
12
+
13
+ ```python
14
+ import streamlit as st
15
+ from struggle_annotator import txt_annotator
16
+
17
+ text = (
18
+ "Yesterday, at 3 PM, Emily Johnson and Michael Smith met at the Central Park "
19
+ "in New York to discuss the merger between TechCorp and Global Solutions.\n\n"
20
+ "The deal, worth approximately 500 million dollars, is expected to "
21
+ "significantly impact the tech industry. Later, at 6 PM, they joined a "
22
+ "conference call with the CEO of TechCorp, David Brown, who was in London "
23
+ "for a technology summit. During the call, they discussed the market trends "
24
+ "in Asia and Europe and planned for the next quarterly meeting, which is "
25
+ "scheduled for January 15th, 2024, in Paris."
26
+ )
27
+
28
+ label_dict = {
29
+ "Personal names": {"color": "red"},
30
+ "Organizations": {"color": "blue"},
31
+ "Locations": {"color": "green"},
32
+ "Time": {"color": "orange"},
33
+ "Money": {"color": "purple"},
34
+ }
35
+
36
+ label_dict = txt_annotator(text, label_dict)
37
+ st.json(label_dict)
38
+ ```
39
+
40
+ ## UI Layout
41
+
42
+ The component renders two stacked regions:
43
+
44
+ 1. **Top — Entity legend.** One button per entity, styled with that entity's color. Clicking a button makes it the *active* entity; the active button is visually emphasized (border + slight scale).
45
+ 2. **Bottom — Annotatable document.** The full `text` is shown with any existing annotations highlighted in their entity's color. Below the text, a live status line shows the currently selected span and the active label.
46
+
47
+ ## API
48
+
49
+ ### Signature
50
+
51
+ ```python
52
+ txt_annotator(text: str, label_dict: dict, key: str | None = None) -> dict
53
+ ```
54
+
55
+ ### Parameters
56
+
57
+ - **`text`** (`str`): The raw text to annotate. Treated as a Python string; offsets are measured in Python `str` indices (UTF-16-independent, code-point–based).
58
+ - **`label_dict`** (`dict[str, dict]`): Defines entities. Each key is the label name; each value must contain:
59
+ - `color` (`str`, required): Any valid CSS color (`"red"`, `"#ff8800"`, `"rgb(0, 128, 255)"`).
60
+ - `annotation` (`list[dict]`, optional): Pre-existing spans rendered on load. Each entry has the shape `{"start": int, "end": int, "value": str}`. If omitted, it is initialized to `[]`.
61
+ - **`key`** (`str`, optional): Standard Streamlit component key. Required if you render multiple annotators on the same page.
62
+
63
+ ### Returns
64
+
65
+ The same `label_dict` shape, with every entity's `annotation` list reflecting the current state of the UI. The function returns on every interaction (Streamlit's standard component re-run model), so the latest annotations are always available after the call.
66
+
67
+ ### Offsets
68
+
69
+ - Half-open interval `[start, end)`, matching Python slicing: `text[start:end] == value`.
70
+ - Indices are over the raw `text` string, including newlines and whitespace.
71
+
72
+ ## Annotation Workflow
73
+
74
+ 1. The user clicks an entity button in the legend. That entity becomes active.
75
+ 2. The user selects a span of text with the mouse.
76
+ 3. On `mouseup`, leading/trailing whitespace is trimmed from the selection. If the trimmed span is empty, the selection is ignored.
77
+ 4. The trimmed span is highlighted in the active entity's color and appended to that entity's `annotation` list.
78
+ 5. The status line below the text updates to show the selected text and label.
79
+ 6. To remove an annotation, the user clicks an existing highlighted span (a single click with no drag). The highlight is removed and the corresponding entry is deleted from `label_dict`.
80
+
81
+ If text is selected while **no** entity is active, the selection is ignored and a hint is shown in the status line ("Select an entity first").
82
+
83
+ ### Overlap Policy
84
+
85
+ New spans that overlap an existing annotation are **rejected** by default, and a brief warning is shown in the status line. This avoids ambiguous nested annotations in v1. (Allowing nesting or replacement is out of scope for the initial release; see *Non-goals* below.)
86
+
87
+ ### Click vs. Drag
88
+
89
+ A click on a highlighted span is interpreted as "remove" only when the `mousedown` and `mouseup` positions are within the same span and no selection range was produced. Any drag that produces a non-empty selection is treated as a new annotation attempt, never as a remove.
90
+
91
+ ## State Model
92
+
93
+ The component uses Streamlit's standard component value mechanism. Internally, the frontend keeps its own annotation state and sends the updated `label_dict` back to Python on every change. Streamlit re-runs the script with the new return value; no `st.session_state` plumbing is required from the caller.
94
+
95
+ If you want to persist annotations across page reloads or sessions, store the returned `label_dict` in `st.session_state` or write it to disk yourself.
96
+
97
+ ## Data Examples
98
+
99
+ ### Input
100
+
101
+ ```python
102
+ label_dict = {
103
+ "Personal names": {"color": "red"},
104
+ "Organizations": {"color": "blue"},
105
+ "Locations": {"color": "green"},
106
+ "Time": {"color": "orange"},
107
+ "Money": {"color": "purple"},
108
+ }
109
+ ```
110
+
111
+ ### Output (after annotation)
112
+
113
+ ```python
114
+ {
115
+ "Personal names": {
116
+ "color": "red",
117
+ "annotation": [
118
+ {"start": 20, "end": 33, "value": "Emily Johnson"},
119
+ {"start": 38, "end": 51, "value": "Michael Smith"},
120
+ {"start": 327, "end": 338, "value": "David Brown"},
121
+ ],
122
+ },
123
+ "Organizations": {
124
+ "color": "blue",
125
+ "annotation": [
126
+ {"start": 118, "end": 126, "value": "TechCorp"},
127
+ {"start": 131, "end": 147, "value": "Global Solutions"},
128
+ ],
129
+ },
130
+ "Locations": {
131
+ "color": "green",
132
+ "annotation": [
133
+ {"start": 63, "end": 75, "value": "Central Park"},
134
+ {"start": 79, "end": 87, "value": "New York"},
135
+ {"start": 351, "end": 357, "value": "London"},
136
+ {"start": 436, "end": 440, "value": "Asia"},
137
+ {"start": 445, "end": 451, "value": "Europe"},
138
+ {"start": 542, "end": 547, "value": "Paris"},
139
+ ],
140
+ },
141
+ "Time": {
142
+ "color": "orange",
143
+ "annotation": [
144
+ {"start": 0, "end": 9, "value": "Yesterday"},
145
+ {"start": 14, "end": 18, "value": "3 PM"},
146
+ {"start": 265, "end": 269, "value": "6 PM"},
147
+ {"start": 519, "end": 531, "value": "January 15th"},
148
+ {"start": 533, "end": 537, "value": "2024"},
149
+ ],
150
+ },
151
+ "Money": {
152
+ "color": "purple",
153
+ "annotation": [
154
+ {"start": 179, "end": 198, "value": "500 million dollars"},
155
+ ],
156
+ },
157
+ }
158
+ ```
159
+
160
+ Annotations within each entity are sorted by `start` ascending. Key order within each annotation is `start`, `end`, `value`.
161
+
162
+ ## Non-goals (v1)
163
+
164
+ - Nested or overlapping annotations.
165
+ - Relation annotation between spans.
166
+ - Multi-document workflows or document navigation.
167
+ - Keyboard shortcuts (planned for a future release).
168
+ - Annotation history / undo-redo beyond the most recent action.
169
+
170
+ ## Development
171
+
172
+ The frontend lives in `frontend/` (TypeScript + React, built with Vite). The Python wrapper in `struggle_annotator/__init__.py` declares the component via `streamlit.components.v1.declare_component` and re-exports `txt_annotator`.
173
+
174
+ ```bash
175
+ # Frontend dev
176
+ cd frontend
177
+ npm install
178
+ npm run dev
179
+
180
+ # Python (editable install)
181
+ pip install -e .
182
+ ```
183
+
184
+ Set `_RELEASE = False` in `struggle_annotator/__init__.py` during local development to point at the Vite dev server.
@@ -0,0 +1,22 @@
1
+ [build-system]
2
+ requires = ["setuptools>=68", "wheel"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "struggle-annotator"
7
+ version = "0.1.0"
8
+ description = "Streamlit custom component for interactive NER-style text annotation"
9
+ readme = "README.md"
10
+ requires-python = ">=3.9"
11
+ license = { text = "MIT" }
12
+ dependencies = ["streamlit>=1.28"]
13
+
14
+ [project.optional-dependencies]
15
+ dev = ["build", "twine"]
16
+
17
+ [tool.setuptools.packages.find]
18
+ where = ["."]
19
+ include = ["struggle_annotator*"]
20
+
21
+ [tool.setuptools.package-data]
22
+ struggle_annotator = ["frontend/build/**/*"]
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
@@ -0,0 +1,48 @@
1
+ import os
2
+ import streamlit.components.v1 as components
3
+
4
+ # Set to False during local frontend development (points at Vite dev server).
5
+ _RELEASE = True
6
+
7
+ if not _RELEASE:
8
+ _component_func = components.declare_component(
9
+ "txt_annotator",
10
+ url="http://localhost:5173",
11
+ )
12
+ else:
13
+ _build_dir = os.path.join(
14
+ os.path.dirname(os.path.abspath(__file__)), "frontend", "build"
15
+ )
16
+ _component_func = components.declare_component("txt_annotator", path=_build_dir)
17
+
18
+
19
+ def txt_annotator(text: str, label_dict: dict, key: str | None = None) -> dict:
20
+ """Interactive NER-style text annotation component.
21
+
22
+ Parameters
23
+ ----------
24
+ text:
25
+ The raw text to annotate. Offsets are code-point–based Python str indices.
26
+ label_dict:
27
+ Entity definitions. Keys are label names; values must contain ``color``
28
+ (any valid CSS color) and optionally ``annotation`` (list of
29
+ ``{"start": int, "end": int, "value": str}`` dicts).
30
+ key:
31
+ Standard Streamlit component key — required when rendering multiple
32
+ annotators on the same page.
33
+
34
+ Returns
35
+ -------
36
+ dict
37
+ Updated ``label_dict`` with each entity's ``annotation`` list reflecting
38
+ the current UI state. Annotations within each entity are sorted by ``start``.
39
+ """
40
+ normalised = {
41
+ label: {
42
+ "color": cfg["color"],
43
+ "annotation": list(cfg.get("annotation", [])),
44
+ }
45
+ for label, cfg in label_dict.items()
46
+ }
47
+ result = _component_func(text=text, label_dict=normalised, key=key, default=normalised)
48
+ return result if result is not None else normalised