adaptive-oci-chunking 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (29) hide show
  1. adaptive_oci_chunking-0.1.0/.env.example +8 -0
  2. adaptive_oci_chunking-0.1.0/.github/workflows/ci.yml +21 -0
  3. adaptive_oci_chunking-0.1.0/.gitignore +13 -0
  4. adaptive_oci_chunking-0.1.0/Architecture.png +0 -0
  5. adaptive_oci_chunking-0.1.0/CONTRIBUTING.md +105 -0
  6. adaptive_oci_chunking-0.1.0/LICENSE +21 -0
  7. adaptive_oci_chunking-0.1.0/PKG-INFO +343 -0
  8. adaptive_oci_chunking-0.1.0/README.md +298 -0
  9. adaptive_oci_chunking-0.1.0/examples/basic_adaptive_chunking.py +36 -0
  10. adaptive_oci_chunking-0.1.0/examples/custom_selector.py +54 -0
  11. adaptive_oci_chunking-0.1.0/examples/langchain_integration.py +25 -0
  12. adaptive_oci_chunking-0.1.0/examples/llama_index_integration.py +23 -0
  13. adaptive_oci_chunking-0.1.0/examples/oci_object_storage.py +21 -0
  14. adaptive_oci_chunking-0.1.0/examples/sample.md +12 -0
  15. adaptive_oci_chunking-0.1.0/pyproject.toml +66 -0
  16. adaptive_oci_chunking-0.1.0/src/adaptive_chunking/__init__.py +5 -0
  17. adaptive_oci_chunking-0.1.0/src/adaptive_chunking/api.py +40 -0
  18. adaptive_oci_chunking-0.1.0/src/adaptive_chunking/chunkers.py +342 -0
  19. adaptive_oci_chunking-0.1.0/src/adaptive_chunking/cli.py +62 -0
  20. adaptive_oci_chunking-0.1.0/src/adaptive_chunking/io.py +24 -0
  21. adaptive_oci_chunking-0.1.0/src/adaptive_chunking/langchain.py +33 -0
  22. adaptive_oci_chunking-0.1.0/src/adaptive_chunking/llama_index.py +74 -0
  23. adaptive_oci_chunking-0.1.0/src/adaptive_chunking/metrics.py +287 -0
  24. adaptive_oci_chunking-0.1.0/src/adaptive_chunking/models.py +61 -0
  25. adaptive_oci_chunking-0.1.0/src/adaptive_chunking/oci.py +73 -0
  26. adaptive_oci_chunking-0.1.0/src/adaptive_chunking/pipeline.py +25 -0
  27. adaptive_oci_chunking-0.1.0/src/adaptive_chunking/selector.py +30 -0
  28. adaptive_oci_chunking-0.1.0/src/adaptive_chunking/text.py +71 -0
  29. adaptive_oci_chunking-0.1.0/tests/test_adaptive_chunking.py +186 -0
@@ -0,0 +1,8 @@
1
+ OCI_CONFIG_FILE=~/.oci/config
2
+ OCI_PROFILE=DEFAULT
3
+ OCI_COMPARTMENT_ID=ocid1.compartment.oc1..example
4
+ OCI_GENAI_ENDPOINT=https://inference.generativeai.us-chicago-1.oci.oraclecloud.com
5
+ OCI_GENAI_EMBEDDING_MODEL=cohere.embed-english-v3.0
6
+ OCI_OBJECT_STORAGE_NAMESPACE=my-namespace
7
+ OCI_OBJECT_STORAGE_BUCKET=my-bucket
8
+
@@ -0,0 +1,21 @@
1
+ name: CI
2
+
3
+ on:
4
+ push:
5
+ pull_request:
6
+
7
+ jobs:
8
+ test:
9
+ runs-on: ubuntu-latest
10
+ strategy:
11
+ matrix:
12
+ python-version: ["3.10", "3.11", "3.12"]
13
+ steps:
14
+ - uses: actions/checkout@v4
15
+ - uses: actions/setup-python@v5
16
+ with:
17
+ python-version: ${{ matrix.python-version }}
18
+ - run: python -m pip install --upgrade pip
19
+ - run: pip install -e ".[dev]"
20
+ - run: ruff check .
21
+ - run: pytest
@@ -0,0 +1,13 @@
1
+ .venv/
2
+ __pycache__/
3
+ *.py[cod]
4
+ .pytest_cache/
5
+ .ruff_cache/
6
+ .mypy_cache/
7
+ .coverage
8
+ dist/
9
+ build/
10
+ *.egg-info/
11
+ .env
12
+ .DS_Store
13
+
@@ -0,0 +1,105 @@
1
+ # Contributing
2
+
3
+ Thanks for helping improve Adaptive OCI Chunking. This project is intended to be useful for practitioners building RAG systems, researchers testing chunking strategies, and maintainers who want a clean place to compare document-splitting ideas.
4
+
5
+ Maintainer: [Yash Shukla](https://www.linkedin.com/in/yashtechi/), focused on AI, cloud, and RAG systems.
6
+
7
+ ## Ways to Contribute
8
+
9
+ - Add new chunkers for specific document structures or domains.
10
+ - Improve intrinsic metrics or add new evaluation dimensions.
11
+ - Add examples for LangChain, LlamaIndex, OCI, or other RAG workflows.
12
+ - Improve tests, documentation, type hints, and packaging.
13
+ - Report bugs with small reproducible examples.
14
+ - Share benchmark results from real document collections.
15
+
16
+ ## Development Setup
17
+
18
+ Clone the repo and install it in editable mode:
19
+
20
+ ```bash
21
+ python -m venv .venv
22
+ .venv\Scripts\activate
23
+ pip install -e ".[dev]"
24
+ ```
25
+
26
+ On macOS or Linux:
27
+
28
+ ```bash
29
+ python -m venv .venv
30
+ source .venv/bin/activate
31
+ pip install -e ".[dev]"
32
+ ```
33
+
34
+ If you only want optional integration support:
35
+
36
+ ```bash
37
+ pip install -e ".[langchain,llama-index,oci,api]"
38
+ ```
39
+
40
+ ## Running Checks
41
+
42
+ ```bash
43
+ ruff check .
44
+ pytest
45
+ python -m compileall src tests examples
46
+ ```
47
+
48
+ If you add an example, make sure it either runs without optional credentials or clearly documents the required environment variables.
49
+
50
+ ## Adding a Chunker
51
+
52
+ Chunkers live in `src/adaptive_chunking/chunkers.py` and implement `BaseChunker`.
53
+
54
+ A good chunker should:
55
+
56
+ - Preserve source order.
57
+ - Return non-empty `Chunk` objects with stable `start_char` and `end_char` spans.
58
+ - Avoid silently dropping text.
59
+ - Fall back gracefully when its preferred structure is not present.
60
+ - Include focused tests in `tests/test_adaptive_chunking.py`.
61
+ - Be added to `default_chunkers()` only if it is broadly useful.
62
+
63
+ ## Adding a Metric
64
+
65
+ Metrics live in `src/adaptive_chunking/metrics.py`.
66
+
67
+ A good metric should:
68
+
69
+ - Return a bounded score from `0.0` to `1.0`.
70
+ - Be explainable from document and chunk structure alone.
71
+ - Have a default weight in `MetricWeights`.
72
+ - Include an explanation string in `IntrinsicMetricEvaluator.evaluate`.
73
+ - Include tests for normal and edge cases.
74
+
75
+ ## Pull Request Checklist
76
+
77
+ Before opening a PR:
78
+
79
+ - Run `ruff check .`.
80
+ - Run `pytest`.
81
+ - Run `python -m compileall src tests examples`.
82
+ - Update README or examples when behavior changes.
83
+ - Add or update tests for code changes.
84
+ - Keep changes focused on one concern where possible.
85
+
86
+ ## Design Principles
87
+
88
+ - The core package should stay dependency-light.
89
+ - Optional integrations should import their heavy dependencies only when used.
90
+ - Chunking behavior should be inspectable and explainable.
91
+ - Metrics should help users understand tradeoffs, not hide them behind a black box.
92
+ - OCI support should remain optional.
93
+
94
+ ## Reporting Issues
95
+
96
+ Please include:
97
+
98
+ - Python version.
99
+ - Installation command.
100
+ - Minimal input text or document shape.
101
+ - Expected chunking behavior.
102
+ - Actual chunking behavior.
103
+ - Any traceback or metric output.
104
+
105
+ For private or sensitive documents, replace content with a synthetic example that preserves the relevant structure.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Yash Shukla
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,343 @@
1
+ Metadata-Version: 2.4
2
+ Name: adaptive-oci-chunking
3
+ Version: 0.1.0
4
+ Summary: Adaptive document chunking for RAG with optional Oracle Cloud Infrastructure integrations.
5
+ Project-URL: Repository, https://github.com/CaptnSalazar/adaptive-oci-chunking
6
+ Project-URL: LinkedIn, https://www.linkedin.com/in/yashtechi/
7
+ Author: Yash Shukla
8
+ License-Expression: MIT
9
+ License-File: LICENSE
10
+ Keywords: adaptive-chunking,chunking,document-ai,genai,oci,rag
11
+ Classifier: Development Status :: 3 - Alpha
12
+ Classifier: Intended Audience :: Developers
13
+ Classifier: License :: OSI Approved :: MIT License
14
+ Classifier: Programming Language :: Python :: 3
15
+ Classifier: Programming Language :: Python :: 3.10
16
+ Classifier: Programming Language :: Python :: 3.11
17
+ Classifier: Programming Language :: Python :: 3.12
18
+ Classifier: Topic :: Text Processing
19
+ Requires-Python: >=3.10
20
+ Requires-Dist: pydantic>=2.7
21
+ Requires-Dist: rich>=13.7
22
+ Requires-Dist: typer>=0.12
23
+ Provides-Extra: api
24
+ Requires-Dist: fastapi>=0.111; extra == 'api'
25
+ Requires-Dist: uvicorn[standard]>=0.30; extra == 'api'
26
+ Provides-Extra: dev
27
+ Requires-Dist: fastapi>=0.111; extra == 'dev'
28
+ Requires-Dist: langchain-text-splitters>=0.2; extra == 'dev'
29
+ Requires-Dist: llama-index-core>=0.10; extra == 'dev'
30
+ Requires-Dist: mypy>=1.10; extra == 'dev'
31
+ Requires-Dist: oci>=2.130; extra == 'dev'
32
+ Requires-Dist: pytest>=8.0; extra == 'dev'
33
+ Requires-Dist: ruff>=0.5; extra == 'dev'
34
+ Requires-Dist: uvicorn[standard]>=0.30; extra == 'dev'
35
+ Provides-Extra: integrations
36
+ Requires-Dist: langchain-text-splitters>=0.2; extra == 'integrations'
37
+ Requires-Dist: llama-index-core>=0.10; extra == 'integrations'
38
+ Provides-Extra: langchain
39
+ Requires-Dist: langchain-text-splitters>=0.2; extra == 'langchain'
40
+ Provides-Extra: llama-index
41
+ Requires-Dist: llama-index-core>=0.10; extra == 'llama-index'
42
+ Provides-Extra: oci
43
+ Requires-Dist: oci>=2.130; extra == 'oci'
44
+ Description-Content-Type: text/markdown
45
+
46
+ <div align="center">
47
+
48
+ # Adaptive OCI Chunking
49
+
50
+ **Adaptive chunking toolkit for RAG with OCI, LangChain, and LlamaIndex support**
51
+
52
+ [![CI](https://github.com/CaptnSalazar/adaptive-oci-chunking/actions/workflows/ci.yml/badge.svg)](https://github.com/CaptnSalazar/adaptive-oci-chunking/actions/workflows/ci.yml)
53
+ [![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
54
+ [![Python 3.10+](https://img.shields.io/badge/Python-3.10%2B-blue.svg)](https://www.python.org/downloads/)
55
+ [![arXiv](https://img.shields.io/badge/arXiv-2603.25333-b31b1b.svg)](https://arxiv.org/abs/2603.25333)
56
+
57
+ </div>
58
+
59
+ Adaptive OCI Chunking is an extensible Python implementation for document-aware chunk selection in Retrieval-Augmented Generation (RAG). It is inspired by Ekimetrics' `adaptive-chunking` repository and the paper _Adaptive Chunking: Optimizing Chunking-Method Selection for RAG_.
60
+
61
+ The package evaluates several chunking strategies for each document, scores them with intrinsic metrics, and selects the best candidate before indexing or generation. Oracle Cloud Infrastructure (OCI) integrations are optional: the core chunking engine runs locally, while OCI Object Storage and Generative AI can be enabled when needed.
62
+
63
+ ## Architecture
64
+
65
+ ![Adaptive OCI Chunking architecture](Architecture.png)
66
+
67
+ ## What is Adaptive Chunking?
68
+
69
+ No single chunking method works best for every document in a RAG pipeline. Adaptive chunking treats chunking as a selection problem: try multiple splitting strategies, score each result with intrinsic quality metrics, and choose the best candidate for the document at hand.
70
+
71
+ This repo builds on that idea as a practical toolkit. It keeps the core dependency-light, adds extra production-oriented metrics, and includes optional adapters for OCI, LangChain, and LlamaIndex.
72
+
73
+ ## Features
74
+
75
+ - Candidate chunkers:
76
+ - single-document
77
+ - fixed window with overlap
78
+ - recursive split
79
+ - split-then-merge
80
+ - section-aware
81
+ - delimiter-aware
82
+ - page-aware
83
+ - semantic lexical drift
84
+ - regex-guided section splitting
85
+ - Metric-guided selection using paper-aligned intrinsic metrics:
86
+ - References Completeness (RC)
87
+ - Intrachunk Cohesion (ICC)
88
+ - Document Contextual Coherence (DCC)
89
+ - Block Integrity (BI)
90
+ - Size Compliance (SC)
91
+ - Additional practical metrics:
92
+ - source coverage
93
+ - overlap control
94
+ - boundary quality
95
+ - semantic drift
96
+ - information density
97
+ - redundancy
98
+ - Weighted strategy selection with explainable per-metric scores.
99
+ - LangChain `TextSplitter` adapter.
100
+ - LlamaIndex node conversion and parser-style adapter.
101
+ - CLI for local text/Markdown files.
102
+ - Optional OCI Object Storage loader and OCI Generative AI embedding adapter.
103
+ - Small, dependency-light core for local document chunking workflows.
104
+
105
+ ## Contributing
106
+
107
+ Contributions are welcome for new chunkers, metrics, examples, integrations, benchmarks, documentation, and bug fixes.
108
+
109
+ See [CONTRIBUTING.md](CONTRIBUTING.md) for setup instructions, PR expectations, and guidance for adding chunkers or metrics.
110
+
111
+ Maintained by [Yash Shukla](https://www.linkedin.com/in/yashtechi/), focused on AI, cloud, and RAG systems.
112
+
113
+ ## Install
114
+
115
+ ```bash
116
+ pip install -e ".[dev]"
117
+ ```
118
+
119
+ With OCI support:
120
+
121
+ ```bash
122
+ pip install -e ".[oci]"
123
+ ```
124
+
125
+ With the API server:
126
+
127
+ ```bash
128
+ pip install -e ".[api]"
129
+ ```
130
+
131
+ With framework integrations:
132
+
133
+ ```bash
134
+ pip install -e ".[langchain,llama-index]"
135
+ ```
136
+
137
+ ## Quick Start
138
+
139
+ ```bash
140
+ adaptive-chunk chunk examples/sample.md --json
141
+ ```
142
+
143
+ Python usage:
144
+
145
+ ```python
146
+ from adaptive_chunking import AdaptiveChunker
147
+
148
+ text = "## Introduction\nAdaptive chunking chooses a splitter per document.\n\n## Details\n..."
149
+ chunker = AdaptiveChunker()
150
+ result = chunker.chunk(text, document_id="demo")
151
+
152
+ print(result.strategy_name)
153
+ for chunk in result.chunks:
154
+ print(chunk.text)
155
+ ```
156
+
157
+ ## Examples
158
+
159
+ Runnable examples live in `examples/`:
160
+
161
+ - `basic_adaptive_chunking.py`: end-to-end adaptive selection with metric output.
162
+ - `custom_selector.py`: custom chunker list and metric weights.
163
+ - `langchain_integration.py`: LangChain `TextSplitter` usage.
164
+ - `llama_index_integration.py`: LlamaIndex `TextNode` conversion.
165
+ - `oci_object_storage.py`: loading source text from OCI Object Storage.
166
+
167
+ ## Chunker Options
168
+
169
+ ```python
170
+ from adaptive_chunking.chunkers import (
171
+ DelimiterChunker,
172
+ PageChunker,
173
+ SectionAwareChunker,
174
+ SemanticChunker,
175
+ )
176
+ from adaptive_chunking.selector import AdaptiveSelector
177
+ from adaptive_chunking import AdaptiveChunker
178
+
179
+ selector = AdaptiveSelector(
180
+ chunkers=[
181
+ SectionAwareChunker(max_size=1800),
182
+ DelimiterChunker(delimiter="\n---\n"),
183
+ PageChunker(page_delimiter="\f"),
184
+ SemanticChunker(max_size=1400, similarity_threshold=0.08),
185
+ ]
186
+ )
187
+
188
+ result = AdaptiveChunker(selector=selector).chunk(text)
189
+ ```
190
+
191
+ ## Metrics
192
+
193
+ The selector ranks every candidate by a weighted average of intrinsic scores. The first five metrics follow the paper's evaluation dimensions; the additional metrics make the implementation more practical for production RAG systems where dropped text, excessive overlap, and duplicated chunks are common failure modes.
194
+
195
+ Weights can be tuned:
196
+
197
+ ```python
198
+ from adaptive_chunking.metrics import IntrinsicMetricEvaluator, MetricConfig, MetricWeights
199
+ from adaptive_chunking.selector import AdaptiveSelector
200
+
201
+ weights = MetricWeights(
202
+ block_integrity=1.4,
203
+ coverage=1.5,
204
+ redundancy=0.8,
205
+ )
206
+ evaluator = IntrinsicMetricEvaluator(MetricConfig(weights=weights))
207
+ selector = AdaptiveSelector(evaluator=evaluator)
208
+ ```
209
+
210
+ ## Adaptive Scoring
211
+
212
+ For each document, the selector runs every candidate chunker and evaluates the chunks it produces. Each candidate receives a normalized weighted score:
213
+
214
+ ```text
215
+ score(candidate) = sum(metric_value_i * metric_weight_i) / sum(metric_weight_i)
216
+ ```
217
+
218
+ Where:
219
+
220
+ - `metric_value_i` is the metric score for a candidate, normalized from `0.0` to `1.0`.
221
+ - `metric_weight_i` controls how important that metric is for selection.
222
+ - Higher scores are better.
223
+ - Candidates are ranked from highest score to lowest score.
224
+
225
+ For example, a domain that cares about preserving source text and section boundaries might emphasize `coverage` and `block_integrity`:
226
+
227
+ | Metric | Value | Weight | Weighted value |
228
+ |--------|------:|-------:|---------------:|
229
+ | coverage | 1.00 | 1.50 | 1.50 |
230
+ | block_integrity | 0.90 | 1.40 | 1.26 |
231
+ | redundancy | 0.80 | 0.80 | 0.64 |
232
+
233
+ ```text
234
+ score = (1.50 + 1.26 + 0.64) / (1.50 + 1.40 + 0.80)
235
+ = 3.40 / 3.70
236
+ = 0.919
237
+ ```
238
+
239
+ You can inspect every candidate, not just the winner:
240
+
241
+ ```python
242
+ from adaptive_chunking import AdaptiveChunker
243
+
244
+ result = AdaptiveChunker().chunk(text, document_id="demo")
245
+
246
+ for candidate in result.candidates:
247
+ print(candidate.strategy_name, round(candidate.score, 3), len(candidate.chunks))
248
+ for metric in candidate.metrics:
249
+ print(" ", metric.name, metric.value, "weight=", metric.weight)
250
+ ```
251
+
252
+ This makes the selection process explainable: if a chunker loses, you can see whether it dropped content, produced excessive overlap, cut through structure, or failed a size constraint.
253
+
254
+ ## LangChain
255
+
256
+ ```python
257
+ from adaptive_chunking.langchain import LangChainAdaptiveTextSplitter
258
+
259
+ splitter = LangChainAdaptiveTextSplitter()
260
+ documents = splitter.create_documents([text])
261
+ ```
262
+
263
+ ## LlamaIndex
264
+
265
+ ```python
266
+ from adaptive_chunking import AdaptiveChunker
267
+ from adaptive_chunking.llama_index import result_to_llama_nodes
268
+
269
+ result = AdaptiveChunker().chunk(text, document_id="policy")
270
+ nodes = result_to_llama_nodes(result)
271
+ ```
272
+
273
+ ## OCI Usage
274
+
275
+ Copy `.env.example` and set the values for your tenancy and compartment. The core library does not require OCI credentials unless you instantiate an OCI adapter.
276
+
277
+ ```python
278
+ from adaptive_chunking.oci import OCIObjectStorageTextLoader
279
+
280
+ loader = OCIObjectStorageTextLoader(
281
+ namespace="my-namespace",
282
+ bucket_name="documents",
283
+ )
284
+ text = loader.load_text("policies/example.md")
285
+ ```
286
+
287
+ ## API Server
288
+
289
+ ```bash
290
+ uvicorn adaptive_chunking.api:app --reload
291
+ ```
292
+
293
+ Then post:
294
+
295
+ ```bash
296
+ curl -X POST http://127.0.0.1:8000/chunk \
297
+ -H "Content-Type: application/json" \
298
+ -d "{\"text\":\"# Title\nBody text\", \"document_id\":\"demo\"}"
299
+ ```
300
+
301
+ ## Project Layout
302
+
303
+ ```text
304
+ src/adaptive_chunking/
305
+ chunkers.py # candidate splitting strategies
306
+ metrics.py # intrinsic metric implementations
307
+ selector.py # weighted adaptive strategy selection
308
+ pipeline.py # high-level AdaptiveChunker
309
+ langchain.py # optional LangChain TextSplitter adapter
310
+ llama_index.py # optional LlamaIndex node helpers
311
+ oci.py # optional OCI adapters
312
+ api.py # optional FastAPI app
313
+ cli.py # command line interface
314
+ tests/
315
+ examples/
316
+ ```
317
+
318
+ ## Notes
319
+
320
+ This repo is designed as a clean, extensible foundation rather than a verbatim copy of the reference implementation. The metric implementations are practical approximations intended for engineering use and experimentation. Production RAG deployments should calibrate weights, chunk sizes, and embedding models against their document domains.
321
+
322
+ ## References
323
+
324
+ - Ekimetrics reference implementation: [ekimetrics/adaptive-chunking](https://github.com/ekimetrics/adaptive-chunking)
325
+ - Paper: [Adaptive Chunking: Optimizing Chunking-Method Selection for RAG](https://arxiv.org/abs/2603.25333)
326
+
327
+ ## Citation
328
+
329
+ If this project helps your work, please cite the original adaptive chunking paper:
330
+
331
+ ```bibtex
332
+ @inproceedings{demoura2026adaptive,
333
+ title={Adaptive Chunking: Optimizing Chunking-Method Selection for RAG},
334
+ author={de Moura Junior, Paulo Roberto and Lelong, Jean and Blangero, Annabelle},
335
+ booktitle={Proceedings of the 15th Language Resources and Evaluation Conference (LREC 2026)},
336
+ year={2026},
337
+ url={https://arxiv.org/abs/2603.25333},
338
+ }
339
+ ```
340
+
341
+ ## License
342
+
343
+ This project is licensed under the [MIT License](LICENSE).