eba-xbridge 1.5.0rc2__py3-none-any.whl → 1.5.0rc4__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,308 @@
1
+ Metadata-Version: 2.4
2
+ Name: eba-xbridge
3
+ Version: 1.5.0rc4
4
+ Summary: XBRL-XML to XBRL-CSV converter for EBA Taxonomy (version 4.1)
5
+ License: Apache 2.0
6
+ License-File: LICENSE
7
+ Keywords: xbrl,eba,taxonomy,csv,xml
8
+ Author: MeaningfulData
9
+ Author-email: info@meaningfuldata.eu
10
+ Maintainer: Antonio Olleros
11
+ Maintainer-email: antonio.olleros@meaningfuldata.eu
12
+ Requires-Python: >=3.9
13
+ Classifier: Development Status :: 5 - Production/Stable
14
+ Classifier: Intended Audience :: Developers
15
+ Classifier: Intended Audience :: Information Technology
16
+ Classifier: Intended Audience :: Science/Research
17
+ Classifier: Programming Language :: Python :: 3
18
+ Classifier: Typing :: Typed
19
+ Requires-Dist: lxml (>=5.2.1,<6.0)
20
+ Requires-Dist: numpy (>=1.23.2,<2) ; python_version < "3.13"
21
+ Requires-Dist: numpy (>=2.1.0) ; python_version >= "3.13"
22
+ Requires-Dist: pandas (>=2.1.4,<3.0)
23
+ Project-URL: Documentation, https://docs.xbridge.meaningfuldata.eu
24
+ Project-URL: IssueTracker, https://github.com/Meaningful-Data/xbridge/issues
25
+ Project-URL: MeaningfulData, https://www.meaningfuldata.eu/
26
+ Project-URL: Repository, https://github.com/Meaningful-Data/xbridge
27
+ Description-Content-Type: text/x-rst
28
+
29
+ XBridge (eba-xbridge)
30
+ #####################
31
+
32
+ .. image:: https://img.shields.io/pypi/v/eba-xbridge.svg
33
+ :target: https://pypi.org/project/eba-xbridge/
34
+ :alt: PyPI version
35
+
36
+ .. image:: https://img.shields.io/pypi/pyversions/eba-xbridge.svg
37
+ :target: https://pypi.org/project/eba-xbridge/
38
+ :alt: Python versions
39
+
40
+ .. image:: https://img.shields.io/github/license/Meaningful-Data/xbridge.svg
41
+ :target: https://github.com/Meaningful-Data/xbridge/blob/main/LICENSE
42
+ :alt: License
43
+
44
+ .. image:: https://img.shields.io/github/actions/workflow/status/Meaningful-Data/xbridge/testing.yml?branch=main
45
+ :target: https://github.com/Meaningful-Data/xbridge/actions
46
+ :alt: Build status
47
+
48
+ Overview
49
+ ========
50
+
51
+ XBridge is a Python library for converting XBRL-XML files into XBRL-CSV files using the EBA (European Banking Authority) taxonomy. It provides a simple, reliable way to transform regulatory reporting data from XML format to CSV format.
52
+
53
+ The library currently supports **EBA Taxonomy version 4.2** and includes support for DORA (Digital Operational Resilience Act) CSV conversion. The library must be updated with each new EBA taxonomy version release.
54
+
55
+ Key Features
56
+ ============
57
+
58
+ * **XBRL-XML to XBRL-CSV Conversion**: Seamlessly convert XBRL-XML instance files to XBRL-CSV format
59
+ * **Command-Line Interface**: Quick conversions without writing code using the ``xbridge`` CLI
60
+ * **Python API**: Programmatic conversion for integration with other tools and workflows
61
+ * **EBA Taxonomy 4.2 Support**: Built for the latest EBA taxonomy specification
62
+ * **DORA CSV Conversion**: Support for Digital Operational Resilience Act reporting
63
+ * **Configurable Validation**: Flexible filing indicator validation with strict or warning modes
64
+ * **Decimal Handling**: Intelligent decimal precision handling with configurable options
65
+ * **Type Safety**: Fully typed codebase with MyPy strict mode compliance
66
+ * **Python 3.9+**: Supports Python 3.9 through 3.13
67
+
68
+ Prerequisites
69
+ =============
70
+
71
+ * **Python**: 3.9 or higher
72
+ * **7z Command-Line Tool**: Required for loading compressed taxonomy files (7z or ZIP format)
73
+
74
+ * On Ubuntu/Debian: ``sudo apt-get install p7zip-full``
75
+ * On macOS: ``brew install p7zip``
76
+ * On Windows: Download from `7-zip.org <https://www.7-zip.org/>`_
77
+
78
+ Installation
79
+ ============
80
+
81
+ Install XBridge from PyPI using pip:
82
+
83
+ .. code-block:: bash
84
+
85
+ pip install eba-xbridge
86
+
87
+ For development installation, see `CONTRIBUTING.md <CONTRIBUTING.md>`_.
88
+
89
+ Quick Start
90
+ ===========
91
+
92
+ XBridge offers two ways to convert XBRL-XML files to XBRL-CSV: a command-line interface (CLI) for quick conversions, and a Python API for programmatic use.
93
+
94
+ Command-Line Interface
95
+ ----------------------
96
+
97
+ The CLI provides a quick way to convert files without writing code:
98
+
99
+ .. code-block:: bash
100
+
101
+ # Basic conversion (output to same directory as input)
102
+ xbridge instance.xbrl
103
+
104
+ # Specify output directory
105
+ xbridge instance.xbrl --output-path ./output
106
+
107
+ # Continue with warnings instead of errors
108
+ xbridge instance.xbrl --no-strict-validation
109
+
110
+ # Include headers as datapoints
111
+ xbridge instance.xbrl --headers-as-datapoints
112
+
113
+ **CLI Options:**
114
+
115
+ * ``--output-path PATH``: Output directory (default: same as input file)
116
+ * ``--headers-as-datapoints``: Treat headers as datapoints (default: False)
117
+ * ``--strict-validation``: Raise errors on validation failures (default: True)
118
+ * ``--no-strict-validation``: Emit warnings instead of errors
119
+
120
+ For more CLI options, run ``xbridge --help``.
121
+
122
+ Python API - Basic Conversion
123
+ ------------------------------
124
+
125
+ Convert an XBRL-XML instance file to XBRL-CSV using the Python API:
126
+
127
+ .. code-block:: python
128
+
129
+ from xbridge.api import convert_instance
130
+
131
+ # Basic conversion
132
+ input_path = "path/to/instance.xbrl"
133
+ output_path = "path/to/output"
134
+
135
+ convert_instance(input_path, output_path)
136
+
137
+ The converted XBRL-CSV files will be saved as a ZIP archive in the output directory.
138
+
139
+ Python API - Advanced Usage
140
+ ----------------------------
141
+
142
+ Customize the conversion with additional parameters:
143
+
144
+ .. code-block:: python
145
+
146
+ from xbridge.api import convert_instance
147
+
148
+ # Conversion with custom options
149
+ convert_instance(
150
+ instance_path="path/to/instance.xbrl",
151
+ output_path="path/to/output",
152
+ headers_as_datapoints=True, # Treat headers as datapoints
153
+ validate_filing_indicators=True, # Validate filing indicators
154
+ strict_validation=False, # Emit warnings instead of errors for orphaned facts
155
+ )
156
+
157
+ Loading an Instance
158
+ -------------------
159
+
160
+ Load and inspect an XBRL-XML instance without converting:
161
+
162
+ .. code-block:: python
163
+
164
+ from xbridge.api import load_instance
165
+
166
+ instance = load_instance("path/to/instance.xbrl")
167
+
168
+ # Access instance properties
169
+ print(f"Entity: {instance.entity}")
170
+ print(f"Period: {instance.period}")
171
+ print(f"Facts count: {len(instance.facts)}")
172
+
173
+ How XBridge Works
174
+ =================
175
+
176
+ XBridge performs the conversion in several steps:
177
+
178
+ 1. **Load the XBRL-XML instance**: Parse and extract facts, contexts, scenarios, and filing indicators
179
+ 2. **Load the EBA taxonomy**: Access pre-processed taxonomy modules containing tables and variables
180
+ 3. **Match and validate**: Join instance facts with taxonomy definitions
181
+ 4. **Generate CSV files**: Create XBRL-CSV files including:
182
+
183
+ * Data tables with facts and dimensions
184
+ * Filing indicators showing reported tables
185
+ * Parameters (entity, period, base currency, decimals)
186
+
187
+ 5. **Package output**: Bundle all CSV files into a ZIP archive
188
+
189
+ Output Structure
190
+ ----------------
191
+
192
+ The output ZIP file contains:
193
+
194
+ * **META-INF/**: JSON report package metadata
195
+ * **reports/**: CSV files for each reported table
196
+ * **filing-indicators.csv**: Table reporting indicators
197
+ * **parameters.csv**: Report-level parameters
198
+
199
+ Documentation
200
+ =============
201
+
202
+ Comprehensive documentation is available at `docs.xbridge.meaningfuldata.eu <https://docs.xbridge.meaningfuldata.eu>`_.
203
+
204
+ The documentation includes:
205
+
206
+ * **API Reference**: Complete API documentation
207
+ * **Quickstart Guide**: Step-by-step tutorials
208
+ * **Technical Notes**: Architecture and design details
209
+ * **FAQ**: Frequently asked questions
210
+
211
+ Taxonomy Loading
212
+ ================
213
+
214
+ If you need to work with the EBA taxonomy directly, you can load it using:
215
+
216
+ .. code-block:: bash
217
+
218
+ python -m xbridge.taxonomy_loader --input_path path/to/FullTaxonomy.7z
219
+
220
+ This generates an ``index.json`` file containing module references and pre-processed taxonomy data.
221
+
222
+ .. warning::
223
+ Loading the taxonomy from a 7z package may take several minutes. Ensure the ``7z`` command is available on your system.
224
+
225
+ Configuration Options
226
+ =====================
227
+
228
+ convert_instance Parameters
229
+ ----------------------------
230
+
231
+ * **instance_path** (str | Path): Path to the XBRL-XML instance file
232
+ * **output_path** (str | Path | None): Output directory for CSV files (default: current directory)
233
+ * **headers_as_datapoints** (bool): Treat table headers as datapoints (default: False)
234
+ * **validate_filing_indicators** (bool): Validate that facts belong to reported tables (default: True)
235
+ * **strict_validation** (bool): Raise errors on validation failures; if False, emit warnings (default: True)
236
+
237
+ Troubleshooting
238
+ ===============
239
+
240
+ Common Issues
241
+ -------------
242
+
243
+ **7z command not found**
244
+ Install the 7z command-line tool using your system's package manager (see Prerequisites).
245
+
246
+ **Taxonomy version mismatch**
247
+ Ensure you're using the correct version of XBridge for your taxonomy version. XBridge 1.5.x supports EBA Taxonomy 4.1.
248
+
249
+ **Orphaned facts warning/error**
250
+ Facts that don't belong to any reported table. Set ``strict_validation=False`` to continue with warnings instead of errors.
251
+
252
+ **Decimal precision issues**
253
+ XBridge automatically handles decimal precision from the taxonomy. Check the parameters.csv file for applied decimal settings.
254
+
255
+ For more issues, see our `FAQ <https://docs.xbridge.meaningfuldata.eu/faq.html>`_ or `open an issue <https://github.com/Meaningful-Data/xbridge/issues>`_.
256
+
257
+ Contributing
258
+ ============
259
+
260
+ We welcome contributions! Please see `CONTRIBUTING.md <CONTRIBUTING.md>`_ for:
261
+
262
+ * Development setup instructions
263
+ * Code style guidelines
264
+ * Testing requirements
265
+ * Pull request process
266
+
267
+ Before contributing, please read our `Code of Conduct <CODE_OF_CONDUCT.md>`_.
268
+
269
+ Changelog
270
+ =========
271
+
272
+ See `CHANGELOG.md <CHANGELOG.md>`_ for a detailed history of changes.
273
+
274
+ Support
275
+ =======
276
+
277
+ * **Documentation**: https://docs.xbridge.meaningfuldata.eu
278
+ * **Issue Tracker**: https://github.com/Meaningful-Data/xbridge/issues
279
+ * **Email**: info@meaningfuldata.eu
280
+ * **Company**: https://www.meaningfuldata.eu/
281
+
282
+ Security
283
+ ========
284
+
285
+ For security issues, please see our `Security Policy <SECURITY.md>`_.
286
+
287
+ License
288
+ =======
289
+
290
+ This project is licensed under the Apache License 2.0 - see the `LICENSE <LICENSE>`_ file for details.
291
+
292
+ Authors & Maintainers
293
+ =====================
294
+
295
+ **MeaningfulData** - https://www.meaningfuldata.eu/
296
+
297
+ Maintainers:
298
+
299
+ * Antonio Olleros (antonio.olleros@meaningfuldata.eu)
300
+ * Jesus Simon (jesus.simon@meaningfuldata.eu)
301
+ * Francisco Javier Hernandez del Caño (javier.hernandez@meaningfuldata.eu)
302
+ * Guillermo Garcia Martin (guillermo.garcia@meaningfuldata.eu)
303
+
304
+ Acknowledgments
305
+ ===============
306
+
307
+ This project is designed to work with the European Banking Authority (EBA) taxonomy for regulatory reporting
308
+
@@ -1,7 +1,8 @@
1
- xbridge/__init__.py,sha256=BrHDgfv0XiuLA3wGkiWOTyi13tkIQKlHy2_r9kdC8mE,68
2
- xbridge/api.py,sha256=IhP-nMHxxw5RLQgKWi1-c5v8OXMRIWuSkS6G5RLmZII,1326
3
- xbridge/converter.py,sha256=X6ZFSyIiXFq_MKpCqtvK9Jno1A-umozs3gs_MiZx0ZQ,25992
4
- xbridge/instance.py,sha256=_Cjle0vt3cEfyWQeStLT9if0aDOio4185ig7YykVGNs,27984
1
+ xbridge/__init__.py,sha256=joASbfhYee_2irYZhRCZ6J4oTn6u1fjt6ilQbXwL4M4,68
2
+ xbridge/__main__.py,sha256=trtFEv7TRJgrLL84leIapPvgC_iVTj05qLHRRS1Olts,2219
3
+ xbridge/api.py,sha256=NCBz7VRJWE3gID6ndgL4Awoxw0w1yMIIf_OTLRuZyyQ,1559
4
+ xbridge/converter.py,sha256=uu6djzgGZcmq0nibrkmg5lW-npcolB4XtQoNWu1p_3o,23498
5
+ xbridge/instance.py,sha256=KQpXhsZIM9oTYJf2hyrzc9pqFY2-1JBF5y1xbnLbqk8,29991
5
6
  xbridge/modules/ae_ae_4.2.json,sha256=AdFvwZqX0KVP3jF1iHeQc5QSnSMvvT3GvoA2G1AgXis,460165
6
7
  xbridge/modules/ae_con_cir-680-2014_2017-04-04.json,sha256=4n0t9dKJNU8Nb5QHpssrDs8ZLwzI-Mw75ax-ar9pLu0,363273
7
8
  xbridge/modules/ae_con_cir-680-2014_2018-03-31.json,sha256=aVWeLLs20p39kQQUthUzqrxBGKTycqhgX9WLk1rVlNw,363538
@@ -379,10 +380,11 @@ xbridge/modules/sbpimv_ind_its-2016-svbxx_2016-02-01.json,sha256=SED-dW--UKxhHNY
379
380
  xbridge/modules/sbpimv_sbp_4.2.json,sha256=Bj4z7zofZngG9EJ7-q74F-JF41O1FK_mX8RTfYdLP9I,7023
380
381
  xbridge/modules/sepa_ipr_pay_4.1.json,sha256=awsJeBUDhMIFs5so6CWUQmlcHSDcGMd8fnLy_r_iMik,27054
381
382
  xbridge/modules/sepa_ipr_pay_4.2.json,sha256=JLJvR02LOAJy6SWPRuhV1TT02oXQhsG83FBn176KWsA,27742
382
- xbridge/modules.py,sha256=8TheJY7oZIy_n-doALa_9AYwwZFu284jaBWt-aol0MA,22292
383
+ xbridge/modules.py,sha256=bTvBXtp3w4Gad2DpEQE7Hb-UfuUQLlRl8gywRstQtpU,22399
383
384
  xbridge/py.typed,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
384
385
  xbridge/taxonomy_loader.py,sha256=K0lnJVryvkKsaoK3fMis-L2JpmwLO6z3Ruq3yj9FxDY,9317
385
- eba_xbridge-1.5.0rc2.dist-info/METADATA,sha256=XwKBzNPYFZSqK_KtlWwzXlKeCfl90o4_79gsZucf0fs,2088
386
- eba_xbridge-1.5.0rc2.dist-info/WHEEL,sha256=zp0Cn7JsFoX2ATtOhtaFYIiE2rmFAD4OcMhtUki8W3U,88
387
- eba_xbridge-1.5.0rc2.dist-info/licenses/LICENSE,sha256=xx0jnfkXJvxRnG63LTGOxlggYnIysveWIZ6H3PNdCrQ,11357
388
- eba_xbridge-1.5.0rc2.dist-info/RECORD,,
386
+ eba_xbridge-1.5.0rc4.dist-info/METADATA,sha256=5BAX_xFnRrIxcQiJbNi3y68A_42F8dR-qpL6Z-bBT0U,10430
387
+ eba_xbridge-1.5.0rc4.dist-info/WHEEL,sha256=zp0Cn7JsFoX2ATtOhtaFYIiE2rmFAD4OcMhtUki8W3U,88
388
+ eba_xbridge-1.5.0rc4.dist-info/entry_points.txt,sha256=FATct4icSewM04cegjhybtm7xcQWhaSahL-DTtuFdZw,49
389
+ eba_xbridge-1.5.0rc4.dist-info/licenses/LICENSE,sha256=xx0jnfkXJvxRnG63LTGOxlggYnIysveWIZ6H3PNdCrQ,11357
390
+ eba_xbridge-1.5.0rc4.dist-info/RECORD,,
@@ -0,0 +1,3 @@
1
+ [console_scripts]
2
+ xbridge=xbridge.__main__:main
3
+
xbridge/__init__.py CHANGED
@@ -2,4 +2,4 @@
2
2
  Init file for eba-xbridge library
3
3
  """
4
4
 
5
- __version__ = "1.5.0rc2"
5
+ __version__ = "1.5.0rc4"
xbridge/__main__.py ADDED
@@ -0,0 +1,82 @@
1
+ """Command-line interface for xbridge."""
2
+
3
+ import argparse
4
+ import sys
5
+ from pathlib import Path
6
+
7
+ from xbridge.api import convert_instance
8
+
9
+
10
+ def main() -> None:
11
+ """Main CLI entry point for xbridge converter."""
12
+ parser = argparse.ArgumentParser(
13
+ description="Convert XBRL-XML instances to XBRL-CSV format",
14
+ prog="xbridge",
15
+ )
16
+
17
+ parser.add_argument(
18
+ "input_file",
19
+ type=str,
20
+ help="Path to the input XBRL-XML file",
21
+ )
22
+
23
+ parser.add_argument(
24
+ "--output-path",
25
+ type=str,
26
+ default=None,
27
+ help="Output directory path (default: same folder as input file)",
28
+ )
29
+
30
+ parser.add_argument(
31
+ "--headers-as-datapoints",
32
+ action="store_true",
33
+ default=False,
34
+ help="Treat headers as datapoints (default: False)",
35
+ )
36
+
37
+ parser.add_argument(
38
+ "--strict-validation",
39
+ action="store_true",
40
+ default=True,
41
+ help="Raise errors on validation failures (default: True)",
42
+ )
43
+
44
+ parser.add_argument(
45
+ "--no-strict-validation",
46
+ action="store_false",
47
+ dest="strict_validation",
48
+ help="Emit warnings instead of errors for validation failures",
49
+ )
50
+
51
+ args = parser.parse_args()
52
+
53
+ # Determine output path
54
+ input_path = Path(args.input_file)
55
+ if not input_path.exists():
56
+ print(f"Error: Input file not found: {args.input_file}", file=sys.stderr)
57
+ sys.exit(1)
58
+
59
+ if args.output_path is None:
60
+ output_path = input_path.parent
61
+ else:
62
+ output_path = Path(args.output_path)
63
+ if not output_path.exists():
64
+ print(f"Error: Output path does not exist: {args.output_path}", file=sys.stderr)
65
+ sys.exit(1)
66
+
67
+ try:
68
+ result_path = convert_instance(
69
+ instance_path=input_path,
70
+ output_path=output_path,
71
+ headers_as_datapoints=args.headers_as_datapoints,
72
+ validate_filing_indicators=True,
73
+ strict_validation=args.strict_validation,
74
+ )
75
+ print(f"Conversion successful: {result_path}")
76
+ except Exception as e:
77
+ print(f"Conversion failed: {e}", file=sys.stderr)
78
+ sys.exit(1)
79
+
80
+
81
+ if __name__ == "__main__":
82
+ main()
xbridge/api.py CHANGED
@@ -14,6 +14,7 @@ def convert_instance(
14
14
  output_path: Optional[Union[str, Path]] = None,
15
15
  headers_as_datapoints: bool = False,
16
16
  validate_filing_indicators: bool = True,
17
+ strict_validation: bool = True,
17
18
  ) -> Path:
18
19
  """
19
20
  Convert one single instance of XBRL-XML file to a CSV file
@@ -27,6 +28,9 @@ def convert_instance(
27
28
  :param validate_filing_indicators: If True, validate that no facts are orphaned
28
29
  (belong only to non-reported tables). Default is True.
29
30
 
31
+ :param strict_validation: If True (default), raise an error on orphaned facts. If False,
32
+ emit a warning instead and continue.
33
+
30
34
  :return: Converted CSV file.
31
35
 
32
36
  """
@@ -34,7 +38,12 @@ def convert_instance(
34
38
  output_path = Path(".")
35
39
 
36
40
  converter = Converter(instance_path)
37
- return converter.convert(output_path, headers_as_datapoints, validate_filing_indicators)
41
+ return converter.convert(
42
+ output_path,
43
+ headers_as_datapoints,
44
+ validate_filing_indicators,
45
+ strict_validation,
46
+ )
38
47
 
39
48
 
40
49
  def load_instance(instance_path: Union[str, Path]) -> Instance:
xbridge/converter.py CHANGED
@@ -6,10 +6,11 @@ from __future__ import annotations
6
6
 
7
7
  import csv
8
8
  import json
9
+ import warnings
9
10
  from pathlib import Path
10
11
  from shutil import rmtree
11
12
  from tempfile import TemporaryDirectory
12
- from typing import Any, Dict, Set, Union
13
+ from typing import Any, Dict, Union
13
14
  from zipfile import ZipFile
14
15
 
15
16
  import pandas as pd
@@ -76,6 +77,7 @@ class Converter:
76
77
  output_path: Union[str, Path],
77
78
  headers_as_datapoints: bool = False,
78
79
  validate_filing_indicators: bool = True,
80
+ strict_validation: bool = False,
79
81
  ) -> Path:
80
82
  """Convert the ``XML Instance`` to a CSV file or between CSV formats"""
81
83
  if not output_path:
@@ -90,7 +92,9 @@ class Converter:
90
92
  raise ValueError("Module of the instance file not found in the taxonomy")
91
93
 
92
94
  if isinstance(self.instance, XmlInstance):
93
- return self.convert_xml(output_path, headers_as_datapoints, validate_filing_indicators)
95
+ return self.convert_xml(
96
+ output_path, headers_as_datapoints, validate_filing_indicators, strict_validation
97
+ )
94
98
  elif isinstance(self.instance, CsvInstance):
95
99
  if self.module.architecture != "headers":
96
100
  raise ValueError("Cannot convert CSV instance with non-headers architecture")
@@ -103,6 +107,7 @@ class Converter:
103
107
  output_path: Path,
104
108
  headers_as_datapoints: bool = False,
105
109
  validate_filing_indicators: bool = True,
110
+ strict_validation: bool = True,
106
111
  ) -> Path:
107
112
  module_filind_codes = [table.filing_indicator_code for table in self.module.tables]
108
113
 
@@ -147,7 +152,7 @@ class Converter:
147
152
  self._convert_filing_indicator(report_dir)
148
153
 
149
154
  if validate_filing_indicators:
150
- self._validate_filing_indicators()
155
+ self._validate_filing_indicators(strict_validation=strict_validation)
151
156
 
152
157
  with open(MAPPING_PATH / self.module.dim_dom_file_name, "r", encoding="utf-8") as fl:
153
158
  mapping_dict: Dict[str, str] = json.load(fl)
@@ -280,111 +285,44 @@ class Converter:
280
285
  instance_df = instance_df.loc[mask]
281
286
  instance_df.drop(columns=nrd_list, inplace=True)
282
287
 
283
- return instance_df
284
-
285
- def _normalize_allowed_values(
286
- self, table_df: pd.DataFrame, datapoint_df: pd.DataFrame
287
- ) -> pd.DataFrame:
288
- """
289
- Normalizes fact values against allowed_values for each variable.
290
-
291
- For variables with allowed_values:
292
- 1. Extracts code part from fact values (after ":")
293
- 2. Maps to correct namespaced value from allowed_values
294
- 3. Updates dimension columns with normalized values
295
- 4. Validates no unmatched codes remain
296
-
297
- :param table_df: The merged dataframe with facts and variables
298
- :param datapoint_df: The dataframe with variable definitions including allowed_values
299
- :return: The normalized dataframe
300
- """
301
- if "allowed_values" not in datapoint_df.columns:
302
- return table_df
303
-
304
- # Build mapping: datapoint → {code → full_value}
305
- datapoint_allowed_map: Dict[str, Dict[str, str]] = {}
288
+ # Rows missing values for required open keys do not belong to the table
289
+ if open_keys:
290
+ instance_df.dropna(subset=list(open_keys), inplace=True)
306
291
 
307
- for _, row in datapoint_df.iterrows():
308
- datapoint = row.get("datapoint")
309
- allowed_values = row.get("allowed_values")
310
-
311
- if not datapoint or not allowed_values or len(allowed_values) == 0:
312
- continue
292
+ return instance_df
313
293
 
314
- # Group allowed values by the dimension they apply to
315
- # For now, we'll apply them to all dimension columns
316
- # In the future, we could make this more sophisticated
317
- code_map: Dict[str, str] = {}
318
- for allowed_val in allowed_values:
319
- if ":" in allowed_val:
320
- code = allowed_val.split(":")[-1]
321
- code_map[code] = allowed_val
322
-
323
- if code_map:
324
- datapoint_allowed_map[datapoint] = code_map
325
-
326
- if not datapoint_allowed_map:
327
- return table_df
328
-
329
- # Identify columns to normalize
330
- # We normalize both dimension columns AND the value column (for enumerated values)
331
- exclude_cols = {"datapoint", "decimals", "unit", "data_type", "allowed_values"}
332
- columns_to_check = [col for col in table_df.columns if col not in exclude_cols]
333
-
334
- # For each column that might contain namespaced values
335
- for dim_col in columns_to_check:
336
- if dim_col not in table_df.columns or table_df[dim_col].isna().all():
337
- continue
294
+ def _matching_fact_indices(self, table: Table) -> set[int]:
295
+ """Return indices of instance facts that actually match the table definition."""
296
+ if self.instance.instance_df is None:
297
+ return set()
338
298
 
339
- # Check if column contains namespaced values (contains ":")
340
- sample_values = table_df[dim_col].dropna()
341
- if sample_values.empty:
342
- continue
299
+ instance_df = self._get_instance_df(table)
300
+ if instance_df.empty or table.variable_df is None:
301
+ return set()
343
302
 
344
- has_namespace = sample_values.astype(str).str.contains(":", regex=False).any()
345
- if not has_namespace:
346
- continue
303
+ open_keys = set(table.open_keys)
347
304
 
348
- # Extract codes from values (vectorized operation)
349
- mask = table_df[dim_col].notna()
350
- temp_code_col = f"_{dim_col}_temp_code"
351
- table_df.loc[mask, temp_code_col] = (
352
- table_df.loc[mask, dim_col].astype(str).str.split(":").str[-1]
353
- )
305
+ datapoint_df = table.variable_df.copy()
354
306
 
355
- # Normalize values for each datapoint
356
- for datapoint, code_map in datapoint_allowed_map.items():
357
- dp_mask = (table_df["datapoint"] == datapoint) & mask
307
+ # For validation we match minimally on metric (concept) and any open keys present
308
+ merge_cols: list[str] = []
309
+ if "metric" in datapoint_df.columns and "metric" in instance_df.columns:
310
+ merge_cols.append("metric")
311
+ merge_cols.extend(
312
+ [key for key in open_keys if key in datapoint_df.columns and key in instance_df.columns]
313
+ )
358
314
 
359
- if not dp_mask.any():
360
- continue
315
+ instance_df = instance_df.copy()
316
+ instance_df["_idx"] = instance_df.index
361
317
 
362
- # Store original values for error reporting
363
- original_values = table_df.loc[dp_mask, dim_col].copy()
364
-
365
- # Map codes to correct full values
366
- normalized_values = table_df.loc[dp_mask, temp_code_col].map(code_map)
367
-
368
- # Update only the values that were successfully mapped
369
- mapped_mask = dp_mask & normalized_values.notna()
370
- table_df.loc[mapped_mask, dim_col] = normalized_values[mapped_mask]
371
-
372
- # Check for values that couldn't be mapped (validation errors)
373
- unmapped_mask = dp_mask & normalized_values.isna()
374
- if unmapped_mask.any():
375
- invalid_codes = table_df.loc[unmapped_mask, temp_code_col].unique()
376
- valid_codes = list(code_map.keys())
377
- raise ValueError(
378
- f"Invalid values for datapoint '{datapoint}' in column '{dim_col}': "
379
- f"Found codes {list(invalid_codes)} but only {valid_codes} are allowed. "
380
- f"Original values: {original_values[unmapped_mask].tolist()}"
381
- )
318
+ merged_df = pd.merge(datapoint_df, instance_df, on=merge_cols, how="inner")
382
319
 
383
- # Clean up temporary column
384
- if temp_code_col in table_df.columns:
385
- table_df.drop(columns=[temp_code_col], inplace=True)
320
+ if open_keys:
321
+ valid_open_keys = [key for key in open_keys if key in merged_df.columns]
322
+ if valid_open_keys:
323
+ merged_df.dropna(subset=valid_open_keys, inplace=True)
386
324
 
387
- return table_df
325
+ return set(merged_df["_idx"].tolist())
388
326
 
389
327
  def _variable_generator(self, table: Table) -> pd.DataFrame:
390
328
  """Returns the dataframe with the CSV file for the table
@@ -406,7 +344,7 @@ class Converter:
406
344
  )
407
345
 
408
346
  # Do the intersection and drop from datapoints the columns and records
409
- datapoint_df = table.variable_df
347
+ datapoint_df = table.variable_df.copy()
410
348
  missing_cols = list(variable_columns - instance_columns)
411
349
  if "data_type" in missing_cols:
412
350
  missing_cols.remove("data_type")
@@ -417,10 +355,8 @@ class Converter:
417
355
 
418
356
  # Join the dataframes on the datapoint_columns
419
357
  merge_cols = list(variable_columns & instance_columns)
420
- table_df = pd.merge(datapoint_df, instance_df, on=merge_cols, how="inner")
421
358
 
422
- # Normalize values against allowed_values
423
- table_df = self._normalize_allowed_values(table_df, datapoint_df)
359
+ table_df = pd.merge(datapoint_df, instance_df, on=merge_cols, how="inner")
424
360
 
425
361
  if "data_type" in table_df.columns and "decimals" in table_df.columns:
426
362
  decimals_table = table_df[["decimals", "data_type"]].drop_duplicates()
@@ -432,17 +368,27 @@ class Converter:
432
368
  decimals = row["decimals"]
433
369
 
434
370
  if data_type not in self._decimals_parameters:
435
- self._decimals_parameters[data_type] = decimals
371
+ self._decimals_parameters[data_type] = (
372
+ int(decimals) if decimals not in {"INF", "#none"} else decimals
373
+ )
436
374
  else:
437
375
  # If new value is a special value, skip it (prefer numeric values)
438
376
  if decimals in {"INF", "#none"}:
439
377
  pass
440
378
  # If new value is numeric
441
379
  else:
380
+ try:
381
+ decimals = int(decimals)
382
+ except ValueError:
383
+ raise ValueError(
384
+ f"Invalid decimals value: {decimals}, "
385
+ "should be integer, 'INF' or '#none'"
386
+ )
387
+
442
388
  # If existing value is special, replace with numeric
443
- if self._decimals_parameters[data_type] in {"INF", "#none"} or (
444
- isinstance(self._decimals_parameters[data_type], int)
445
- and decimals < self._decimals_parameters[data_type]
389
+ if (
390
+ self._decimals_parameters[data_type] in {"INF", "#none"}
391
+ or decimals < self._decimals_parameters[data_type]
446
392
  ):
447
393
  self._decimals_parameters[data_type] = decimals
448
394
 
@@ -497,13 +443,6 @@ class Converter:
497
443
  # Defined by the EBA in the JSON files. We take them from the taxonomy
498
444
  # Because EBA is using exactly those for the JSON files.
499
445
 
500
- for open_key in table.open_keys:
501
- if open_key in datapoints.columns:
502
- dim_name = mapping_dict.get(open_key)
503
- # For open keys, there are no dim_names (they are not mapped)
504
- if dim_name and not datapoints.empty:
505
- datapoints[open_key] = dim_name + ":" + datapoints[open_key].astype(str)
506
-
507
446
  datapoints.sort_values(by=["datapoint"], ascending=True, inplace=True)
508
447
  output_path_table = temp_dir_path / (table.url or "table.csv")
509
448
 
@@ -550,7 +489,7 @@ class Converter:
550
489
  if fil_ind.value and fil_ind.table:
551
490
  self._reported_tables.append(fil_ind.table)
552
491
 
553
- def _validate_filing_indicators(self) -> None:
492
+ def _validate_filing_indicators(self, strict_validation: bool = True) -> None:
554
493
  """Validate that no facts are orphaned (belong only to non-reported tables).
555
494
 
556
495
  Raises:
@@ -559,44 +498,56 @@ class Converter:
559
498
  if self.instance.instance_df is None or self.instance.instance_df.empty:
560
499
  return
561
500
 
562
- # Step 1: Collect indices of facts that belong to ANY reported table
563
- reported_fact_indices: Set[int] = set()
501
+ # Step 1: Track which facts belong to ANY reported table without materializing a huge set
502
+ reported_mask = pd.Series(False, index=self.instance.instance_df.index)
564
503
  for table in self.module.tables:
565
504
  if table.filing_indicator_code in self._reported_tables:
566
- instance_df = self._get_instance_df(table)
567
- if not instance_df.empty:
568
- # Add all fact indices (DataFrame row indices) to the set
569
- reported_fact_indices.update(instance_df.index)
505
+ reported_indices = self._matching_fact_indices(table)
506
+ if reported_indices:
507
+ reported_mask.loc[list(reported_indices)] = True
570
508
 
571
509
  # Step 2: Find facts that belong ONLY to non-reported tables
572
- all_orphaned_indices = set()
510
+ orphaned_mask = pd.Series(False, index=self.instance.instance_df.index)
573
511
  orphaned_per_table = {}
574
512
 
575
513
  for table in self.module.tables:
576
514
  if table.filing_indicator_code not in self._reported_tables:
577
- instance_df = self._get_instance_df(table)
578
- if not instance_df.empty:
579
- # Find facts that are in this table but NOT in any reported table
580
- orphaned_in_this_table = set(instance_df.index) - reported_fact_indices
515
+ orphaned_indices = self._matching_fact_indices(table)
516
+ if orphaned_indices:
517
+ # Facts in this table that never appear in a reported table
518
+ orphaned_in_this_table = [
519
+ idx for idx in orphaned_indices if not reported_mask.loc[idx]
520
+ ]
581
521
  if orphaned_in_this_table:
522
+ orphaned_mask.loc[orphaned_in_this_table] = True
582
523
  orphaned_per_table[table.filing_indicator_code] = len(
583
524
  orphaned_in_this_table
584
525
  )
585
- all_orphaned_indices.update(orphaned_in_this_table)
586
526
 
587
- if all_orphaned_indices:
527
+ total_orphaned = int(orphaned_mask.sum())
528
+
529
+ if total_orphaned:
588
530
  error_msg = (
589
531
  f"Filing indicator inconsistency detected:\n"
590
- f"Found {len(all_orphaned_indices)} fact(s) that belong ONLY"
532
+ f"Found {total_orphaned} fact(s) that belong ONLY"
591
533
  f" to non-reported tables:\n"
592
534
  )
593
535
  for table_code, count in orphaned_per_table.items():
594
536
  error_msg += f" - {table_code}: {count} fact(s)\n"
537
+
538
+ if strict_validation:
539
+ error_msg += (
540
+ "\nThe conversion process will not continue due to strict validation mode. "
541
+ "Either set filed=true for the relevant tables "
542
+ "or remove these facts from the XML."
543
+ )
544
+ raise ValueError(error_msg)
595
545
  error_msg += (
596
546
  "\nThese facts will be excluded from the output. "
597
- "Either set filed=true for the relevant tables or remove these facts from the XML."
547
+ "Consider setting filed=true for the relevant tables "
548
+ "or removing these facts from the XML."
598
549
  )
599
- raise ValueError(error_msg)
550
+ warnings.warn(error_msg)
600
551
 
601
552
  def _convert_parameters(self, temp_dir_path: Path) -> None:
602
553
  # Workaround;
xbridge/instance.py CHANGED
@@ -13,6 +13,59 @@ from zipfile import ZipFile
13
13
  import pandas as pd
14
14
  from lxml import etree
15
15
 
16
+ # Cache namespace → CSV prefix derivations to avoid repeated string work during parse
17
+ _namespace_prefix_cache: Dict[str, str] = {}
18
+
19
+
20
+ def _derive_csv_prefix(namespace_uri: str) -> Optional[str]:
21
+ """Derive the fixed CSV prefix from a namespace URI using the EBA convention."""
22
+ if not namespace_uri:
23
+ return None
24
+
25
+ cached = _namespace_prefix_cache.get(namespace_uri)
26
+ if cached is not None:
27
+ return cached
28
+
29
+ cleaned = namespace_uri.rstrip("#/")
30
+ if "#" in namespace_uri:
31
+ segment = namespace_uri.rsplit("#", 1)[-1]
32
+ else:
33
+ segment = cleaned.rsplit("/", 1)[-1] if "/" in cleaned else cleaned
34
+
35
+ if not segment:
36
+ return None
37
+
38
+ prefix = f"eba_{segment}"
39
+ _namespace_prefix_cache[namespace_uri] = prefix
40
+ return prefix
41
+
42
+
43
+ def _normalize_namespaced_value(
44
+ value: Optional[str], nsmap: Dict[Optional[str], str]
45
+ ) -> Optional[str]:
46
+ """
47
+ Normalize a namespaced value (e.g., 'dom:qAE' or '{uri}qAE') to the CSV prefix convention.
48
+ Returns the original value if no namespace can be resolved.
49
+ """
50
+ if value is None:
51
+ return None
52
+
53
+ # Clark notation: {uri}local
54
+ if value.startswith("{") and "}" in value:
55
+ uri, local = value[1:].split("}", 1)
56
+ derived = _derive_csv_prefix(uri)
57
+ return f"{derived}:{local}" if derived else value
58
+
59
+ # Prefixed notation: prefix:local
60
+ if ":" in value:
61
+ potential_prefix, local = value.split(":", 1)
62
+ namespace_uri = nsmap.get(potential_prefix)
63
+ if namespace_uri:
64
+ derived = _derive_csv_prefix(namespace_uri)
65
+ return f"{derived}:{local}" if derived else value
66
+
67
+ return value
68
+
16
69
 
17
70
  class Instance:
18
71
  """
@@ -548,7 +601,7 @@ class Scenario:
548
601
  continue
549
602
  dimension = dimension_raw.split(":")[1]
550
603
  value = self.get_value(child)
551
- value = value.split(":")[1] if ":" in value else value
604
+ value = _normalize_namespaced_value(value, child.nsmap) or ""
552
605
  self.dimensions[dimension] = value
553
606
 
554
607
  @staticmethod
@@ -667,7 +720,7 @@ class Fact:
667
720
  def parse(self) -> None:
668
721
  """Parse the XML node with the `fact <https://www.xbrl.org/guidance/xbrl-glossary/#:~:text=accounting%20standards%20body.-,Fact,-A%20fact%20is>`_."""
669
722
  self.metric = self.fact_xml.tag
670
- self.value = self.fact_xml.text
723
+ self.value = _normalize_namespaced_value(self.fact_xml.text, self.fact_xml.nsmap)
671
724
  self.decimals = self.fact_xml.attrib.get("decimals")
672
725
  self.context = self.fact_xml.attrib.get("contextRef")
673
726
  self.unit = self.fact_xml.attrib.get("unitRef")
@@ -675,7 +728,11 @@ class Fact:
675
728
  def __dict__(self) -> Dict[str, Any]: # type: ignore[override]
676
729
  metric_clean = ""
677
730
  if self.metric:
678
- metric_clean = self.metric.split("}")[1] if "}" in self.metric else self.metric
731
+ # Normalize metric to use consistent eba_* prefix like other dimensions
732
+ metric_clean = _normalize_namespaced_value(self.metric, self.fact_xml.nsmap) or ""
733
+ # If still in Clark notation, extract the local name
734
+ if metric_clean.startswith("{") and "}" in metric_clean:
735
+ metric_clean = metric_clean.split("}", 1)[1]
679
736
 
680
737
  return {
681
738
  "metric": metric_clean,
xbridge/modules.py CHANGED
@@ -306,9 +306,9 @@ class Table:
306
306
  variable_info: dict[str, Any] = {}
307
307
  for dim_k, dim_v in variable.dimensions.items():
308
308
  if dim_k not in ("unit", "decimals"):
309
- variable_info[dim_k] = dim_v.split(":")[1]
309
+ variable_info[dim_k] = dim_v
310
310
  if "concept" in variable.dimensions:
311
- variable_info["metric"] = variable.dimensions["concept"].split(":")[1]
311
+ variable_info["metric"] = variable.dimensions["concept"]
312
312
  del variable_info["concept"]
313
313
 
314
314
  if variable.code is None:
@@ -324,9 +324,11 @@ class Table:
324
324
  if "dimensions" in column:
325
325
  for dim_k, dim_v in column["dimensions"].items():
326
326
  if dim_k == "concept":
327
- variable_info["metric"] = dim_v.split(":")[1]
327
+ variable_info["metric"] = dim_v
328
328
  elif dim_k not in ("unit", "decimals"):
329
- variable_info[dim_k.split(":")[1]] = dim_v.split(":")[1]
329
+ # Keep the full dimension key and value with prefixes
330
+ dim_k_clean = dim_k.split(":")[1] if ":" in dim_k else dim_k
331
+ variable_info[dim_k_clean] = dim_v
330
332
 
331
333
  if "decimals" in column:
332
334
  variable_info["data_type"] = column["decimals"]
@@ -1,62 +0,0 @@
1
- Metadata-Version: 2.4
2
- Name: eba-xbridge
3
- Version: 1.5.0rc2
4
- Summary: XBRL-XML to XBRL-CSV converter for EBA Taxonomy (version 4.1)
5
- License: Apache 2.0
6
- License-File: LICENSE
7
- Keywords: xbrl,eba,taxonomy,csv,xml
8
- Author: MeaningfulData
9
- Author-email: info@meaningfuldata.eu
10
- Maintainer: Antonio Olleros
11
- Maintainer-email: antonio.olleros@meaningfuldata.eu
12
- Requires-Python: >=3.9
13
- Classifier: Development Status :: 5 - Production/Stable
14
- Classifier: Intended Audience :: Developers
15
- Classifier: Intended Audience :: Information Technology
16
- Classifier: Intended Audience :: Science/Research
17
- Classifier: Programming Language :: Python :: 3
18
- Classifier: Typing :: Typed
19
- Requires-Dist: lxml (>=5.2.1,<6.0)
20
- Requires-Dist: numpy (>=1.23.2,<2) ; python_version < "3.13"
21
- Requires-Dist: numpy (>=2.1.0) ; python_version >= "3.13"
22
- Requires-Dist: pandas (>=2.1.4,<3.0)
23
- Project-URL: Documentation, https://docs.xbridge.meaningfuldata.eu
24
- Project-URL: IssueTracker, https://github.com/Meaningful-Data/xbridge/issues
25
- Project-URL: MeaningfulData, https://www.meaningfuldata.eu/
26
- Project-URL: Repository, https://github.com/Meaningful-Data/xbridge
27
- Description-Content-Type: text/x-rst
28
-
29
- Overview
30
- ============
31
- XBridge is a Python library which main function is to convert XBRL-XML files into XBRL-CSV files by using EBA's taxonomy.
32
- It works with EBA Taxonomy latest published version (4.1). Library must be updated on each new EBA taxonomy version.
33
-
34
- Installation
35
- ============
36
-
37
- To install the library, run the following command:
38
-
39
- .. code:: bash
40
-
41
- pip install eba-xbridge
42
-
43
-
44
- How XBridge works:
45
- =========================
46
-
47
- Firstly, an XBRL-XML file has to be selected to convert it. Then, that XBRL-XML file is input in the following function contained in the ``API`` package:
48
-
49
- .. code:: python
50
-
51
- >>> from xbridge.api import convert_instance
52
-
53
- >>> input_path = "data/input"
54
-
55
- >>> output_path = "data/output"
56
-
57
- >>> convert_instance(input_path, output_path)
58
-
59
- The sources to do this process are two: The XML-instances and EBA´s taxonomy.
60
-
61
- The output is the converted XBRL-CSV file placed in the output_path, as zip format
62
-