PyPI - eba-xbridge - Versions diffs - 1.5.0rc2__py3-none-any.whl → 1.5.0rc4__py3-none-any.whl - Mend

eba-xbridge 1.5.0rc2py3-none-any.whl → 1.5.0rc4py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

eba_xbridge-1.5.0rc4.dist-info/METADATA +308 -0
{eba_xbridge-1.5.0rc2.dist-info → eba_xbridge-1.5.0rc4.dist-info}/RECORD +11 -9
eba_xbridge-1.5.0rc4.dist-info/entry_points.txt +3 -0
xbridge/__init__.py +1 -1
xbridge/__main__.py +82 -0
xbridge/api.py +10 -1
xbridge/converter.py +81 -130
xbridge/instance.py +60 -3
xbridge/modules.py +6 -4
eba_xbridge-1.5.0rc2.dist-info/METADATA +0 -62
{eba_xbridge-1.5.0rc2.dist-info → eba_xbridge-1.5.0rc4.dist-info}/WHEEL +0 -0
{eba_xbridge-1.5.0rc2.dist-info → eba_xbridge-1.5.0rc4.dist-info}/licenses/LICENSE +0 -0

eba_xbridge-1.5.0rc4.dist-info/METADATA ADDED Viewed

@@ -0,0 +1,308 @@
+Metadata-Version: 2.4
+Name: eba-xbridge
+Version: 1.5.0rc4
+Summary: XBRL-XML to XBRL-CSV converter for EBA Taxonomy (version 4.1)
+License: Apache 2.0
+License-File: LICENSE
+Keywords: xbrl,eba,taxonomy,csv,xml
+Author: MeaningfulData
+Author-email: info@meaningfuldata.eu
+Maintainer: Antonio Olleros
+Maintainer-email: antonio.olleros@meaningfuldata.eu
+Requires-Python: >=3.9
+Classifier: Development Status :: 5 - Production/Stable
+Classifier: Intended Audience :: Developers
+Classifier: Intended Audience :: Information Technology
+Classifier: Intended Audience :: Science/Research
+Classifier: Programming Language :: Python :: 3
+Classifier: Typing :: Typed
+Requires-Dist: lxml (>=5.2.1,<6.0)
+Requires-Dist: numpy (>=1.23.2,<2) ; python_version < "3.13"
+Requires-Dist: numpy (>=2.1.0) ; python_version >= "3.13"
+Requires-Dist: pandas (>=2.1.4,<3.0)
+Project-URL: Documentation, https://docs.xbridge.meaningfuldata.eu
+Project-URL: IssueTracker, https://github.com/Meaningful-Data/xbridge/issues
+Project-URL: MeaningfulData, https://www.meaningfuldata.eu/
+Project-URL: Repository, https://github.com/Meaningful-Data/xbridge
+Description-Content-Type: text/x-rst
+XBridge (eba-xbridge)
+#####################
+.. image:: https://img.shields.io/pypi/v/eba-xbridge.svg
+   :target: https://pypi.org/project/eba-xbridge/
+   :alt: PyPI version
+.. image:: https://img.shields.io/pypi/pyversions/eba-xbridge.svg
+   :target: https://pypi.org/project/eba-xbridge/
+   :alt: Python versions
+.. image:: https://img.shields.io/github/license/Meaningful-Data/xbridge.svg
+   :target: https://github.com/Meaningful-Data/xbridge/blob/main/LICENSE
+   :alt: License
+.. image:: https://img.shields.io/github/actions/workflow/status/Meaningful-Data/xbridge/testing.yml?branch=main
+   :target: https://github.com/Meaningful-Data/xbridge/actions
+   :alt: Build status
+Overview
+========
+XBridge is a Python library for converting XBRL-XML files into XBRL-CSV files using the EBA (European Banking Authority) taxonomy. It provides a simple, reliable way to transform regulatory reporting data from XML format to CSV format.
+The library currently supports **EBA Taxonomy version 4.2** and includes support for DORA (Digital Operational Resilience Act) CSV conversion. The library must be updated with each new EBA taxonomy version release.
+Key Features
+============
+* **XBRL-XML to XBRL-CSV Conversion**: Seamlessly convert XBRL-XML instance files to XBRL-CSV format
+* **Command-Line Interface**: Quick conversions without writing code using the ``xbridge`` CLI
+* **Python API**: Programmatic conversion for integration with other tools and workflows
+* **EBA Taxonomy 4.2 Support**: Built for the latest EBA taxonomy specification
+* **DORA CSV Conversion**: Support for Digital Operational Resilience Act reporting
+* **Configurable Validation**: Flexible filing indicator validation with strict or warning modes
+* **Decimal Handling**: Intelligent decimal precision handling with configurable options
+* **Type Safety**: Fully typed codebase with MyPy strict mode compliance
+* **Python 3.9+**: Supports Python 3.9 through 3.13
+Prerequisites
+=============
+* **Python**: 3.9 or higher
+* **7z Command-Line Tool**: Required for loading compressed taxonomy files (7z or ZIP format)
+  * On Ubuntu/Debian: ``sudo apt-get install p7zip-full``
+  * On macOS: ``brew install p7zip``
+  * On Windows: Download from `7-zip.org <https://www.7-zip.org/>`_
+Installation
+============
+Install XBridge from PyPI using pip:
+.. code-block:: bash
+    pip install eba-xbridge
+For development installation, see `CONTRIBUTING.md <CONTRIBUTING.md>`_.
+Quick Start
+===========
+XBridge offers two ways to convert XBRL-XML files to XBRL-CSV: a command-line interface (CLI) for quick conversions, and a Python API for programmatic use.
+Command-Line Interface
+----------------------
+The CLI provides a quick way to convert files without writing code:
+.. code-block:: bash
+    # Basic conversion (output to same directory as input)
+    xbridge instance.xbrl
+    # Specify output directory
+    xbridge instance.xbrl --output-path ./output
+    # Continue with warnings instead of errors
+    xbridge instance.xbrl --no-strict-validation
+    # Include headers as datapoints
+    xbridge instance.xbrl --headers-as-datapoints
+**CLI Options:**
+* ``--output-path PATH``: Output directory (default: same as input file)
+* ``--headers-as-datapoints``: Treat headers as datapoints (default: False)
+* ``--strict-validation``: Raise errors on validation failures (default: True)
+* ``--no-strict-validation``: Emit warnings instead of errors
+For more CLI options, run ``xbridge --help``.
+Python API - Basic Conversion
+------------------------------
+Convert an XBRL-XML instance file to XBRL-CSV using the Python API:
+.. code-block:: python
+    from xbridge.api import convert_instance
+    # Basic conversion
+    input_path = "path/to/instance.xbrl"
+    output_path = "path/to/output"
+    convert_instance(input_path, output_path)
+The converted XBRL-CSV files will be saved as a ZIP archive in the output directory.
+Python API - Advanced Usage
+----------------------------
+Customize the conversion with additional parameters:
+.. code-block:: python
+    from xbridge.api import convert_instance
+    # Conversion with custom options
+    convert_instance(
+        instance_path="path/to/instance.xbrl",
+        output_path="path/to/output",
+        headers_as_datapoints=True,  # Treat headers as datapoints
+        validate_filing_indicators=True,  # Validate filing indicators
+        strict_validation=False,  # Emit warnings instead of errors for orphaned facts
+    )
+Loading an Instance
+-------------------
+Load and inspect an XBRL-XML instance without converting:
+.. code-block:: python
+    from xbridge.api import load_instance
+    instance = load_instance("path/to/instance.xbrl")
+    # Access instance properties
+    print(f"Entity: {instance.entity}")
+    print(f"Period: {instance.period}")
+    print(f"Facts count: {len(instance.facts)}")
+How XBridge Works
+=================
+XBridge performs the conversion in several steps:
+1. **Load the XBRL-XML instance**: Parse and extract facts, contexts, scenarios, and filing indicators
+2. **Load the EBA taxonomy**: Access pre-processed taxonomy modules containing tables and variables
+3. **Match and validate**: Join instance facts with taxonomy definitions
+4. **Generate CSV files**: Create XBRL-CSV files including:
+   * Data tables with facts and dimensions
+   * Filing indicators showing reported tables
+   * Parameters (entity, period, base currency, decimals)
+5. **Package output**: Bundle all CSV files into a ZIP archive
+Output Structure
+----------------
+The output ZIP file contains:
+* **META-INF/**: JSON report package metadata
+* **reports/**: CSV files for each reported table
+* **filing-indicators.csv**: Table reporting indicators
+* **parameters.csv**: Report-level parameters
+Documentation
+=============
+Comprehensive documentation is available at `docs.xbridge.meaningfuldata.eu <https://docs.xbridge.meaningfuldata.eu>`_.
+The documentation includes:
+* **API Reference**: Complete API documentation
+* **Quickstart Guide**: Step-by-step tutorials
+* **Technical Notes**: Architecture and design details
+* **FAQ**: Frequently asked questions
+Taxonomy Loading
+================
+If you need to work with the EBA taxonomy directly, you can load it using:
+.. code-block:: bash
+    python -m xbridge.taxonomy_loader --input_path path/to/FullTaxonomy.7z
+This generates an ``index.json`` file containing module references and pre-processed taxonomy data.
+.. warning::
+    Loading the taxonomy from a 7z package may take several minutes. Ensure the ``7z`` command is available on your system.
+Configuration Options
+=====================
+convert_instance Parameters
+----------------------------
+* **instance_path** (str | Path): Path to the XBRL-XML instance file
+* **output_path** (str | Path | None): Output directory for CSV files (default: current directory)
+* **headers_as_datapoints** (bool): Treat table headers as datapoints (default: False)
+* **validate_filing_indicators** (bool): Validate that facts belong to reported tables (default: True)
+* **strict_validation** (bool): Raise errors on validation failures; if False, emit warnings (default: True)
+Troubleshooting
+===============
+Common Issues
+-------------
+**7z command not found**
+    Install the 7z command-line tool using your system's package manager (see Prerequisites).
+**Taxonomy version mismatch**
+    Ensure you're using the correct version of XBridge for your taxonomy version. XBridge 1.5.x supports EBA Taxonomy 4.1.
+**Orphaned facts warning/error**
+    Facts that don't belong to any reported table. Set ``strict_validation=False`` to continue with warnings instead of errors.
+**Decimal precision issues**
+    XBridge automatically handles decimal precision from the taxonomy. Check the parameters.csv file for applied decimal settings.
+For more issues, see our `FAQ <https://docs.xbridge.meaningfuldata.eu/faq.html>`_ or `open an issue <https://github.com/Meaningful-Data/xbridge/issues>`_.
+Contributing
+============
+We welcome contributions! Please see `CONTRIBUTING.md <CONTRIBUTING.md>`_ for:
+* Development setup instructions
+* Code style guidelines
+* Testing requirements
+* Pull request process
+Before contributing, please read our `Code of Conduct <CODE_OF_CONDUCT.md>`_.
+Changelog
+=========
+See `CHANGELOG.md <CHANGELOG.md>`_ for a detailed history of changes.
+Support
+=======
+* **Documentation**: https://docs.xbridge.meaningfuldata.eu
+* **Issue Tracker**: https://github.com/Meaningful-Data/xbridge/issues
+* **Email**: info@meaningfuldata.eu
+* **Company**: https://www.meaningfuldata.eu/
+Security
+========
+For security issues, please see our `Security Policy <SECURITY.md>`_.
+License
+=======
+This project is licensed under the Apache License 2.0 - see the `LICENSE <LICENSE>`_ file for details.
+Authors & Maintainers
+=====================
+**MeaningfulData** - https://www.meaningfuldata.eu/
+Maintainers:
+* Antonio Olleros (antonio.olleros@meaningfuldata.eu)
+* Jesus Simon (jesus.simon@meaningfuldata.eu)
+* Francisco Javier Hernandez del Caño (javier.hernandez@meaningfuldata.eu)
+* Guillermo Garcia Martin (guillermo.garcia@meaningfuldata.eu)
+Acknowledgments
+===============
+This project is designed to work with the European Banking Authority (EBA) taxonomy for regulatory reporting

{eba_xbridge-1.5.0rc2.dist-info → eba_xbridge-1.5.0rc4.dist-info}/RECORD RENAMED Viewed

@@ -1,7 +1,8 @@
-xbridge/__init__.py,sha256=BrHDgfv0XiuLA3wGkiWOTyi13tkIQKlHy2_r9kdC8mE,68
-xbridge/api.py,sha256=IhP-nMHxxw5RLQgKWi1-c5v8OXMRIWuSkS6G5RLmZII,1326
-xbridge/converter.py,sha256=X6ZFSyIiXFq_MKpCqtvK9Jno1A-umozs3gs_MiZx0ZQ,25992
-xbridge/instance.py,sha256=_Cjle0vt3cEfyWQeStLT9if0aDOio4185ig7YykVGNs,27984
+xbridge/__init__.py,sha256=joASbfhYee_2irYZhRCZ6J4oTn6u1fjt6ilQbXwL4M4,68
+xbridge/__main__.py,sha256=trtFEv7TRJgrLL84leIapPvgC_iVTj05qLHRRS1Olts,2219
+xbridge/api.py,sha256=NCBz7VRJWE3gID6ndgL4Awoxw0w1yMIIf_OTLRuZyyQ,1559
+xbridge/converter.py,sha256=uu6djzgGZcmq0nibrkmg5lW-npcolB4XtQoNWu1p_3o,23498
+xbridge/instance.py,sha256=KQpXhsZIM9oTYJf2hyrzc9pqFY2-1JBF5y1xbnLbqk8,29991
 xbridge/modules/ae_ae_4.2.json,sha256=AdFvwZqX0KVP3jF1iHeQc5QSnSMvvT3GvoA2G1AgXis,460165
 xbridge/modules/ae_con_cir-680-2014_2017-04-04.json,sha256=4n0t9dKJNU8Nb5QHpssrDs8ZLwzI-Mw75ax-ar9pLu0,363273
 xbridge/modules/ae_con_cir-680-2014_2018-03-31.json,sha256=aVWeLLs20p39kQQUthUzqrxBGKTycqhgX9WLk1rVlNw,363538
@@ -379,10 +380,11 @@ xbridge/modules/sbpimv_ind_its-2016-svbxx_2016-02-01.json,sha256=SED-dW--UKxhHNY
 xbridge/modules/sbpimv_sbp_4.2.json,sha256=Bj4z7zofZngG9EJ7-q74F-JF41O1FK_mX8RTfYdLP9I,7023
 xbridge/modules/sepa_ipr_pay_4.1.json,sha256=awsJeBUDhMIFs5so6CWUQmlcHSDcGMd8fnLy_r_iMik,27054
 xbridge/modules/sepa_ipr_pay_4.2.json,sha256=JLJvR02LOAJy6SWPRuhV1TT02oXQhsG83FBn176KWsA,27742
-xbridge/modules.py,sha256=8TheJY7oZIy_n-doALa_9AYwwZFu284jaBWt-aol0MA,22292
+xbridge/modules.py,sha256=bTvBXtp3w4Gad2DpEQE7Hb-UfuUQLlRl8gywRstQtpU,22399
 xbridge/py.typed,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
 xbridge/taxonomy_loader.py,sha256=K0lnJVryvkKsaoK3fMis-L2JpmwLO6z3Ruq3yj9FxDY,9317
-eba_xbridge-1.5.0rc2.dist-info/METADATA,sha256=XwKBzNPYFZSqK_KtlWwzXlKeCfl90o4_79gsZucf0fs,2088
-eba_xbridge-1.5.0rc2.dist-info/WHEEL,sha256=zp0Cn7JsFoX2ATtOhtaFYIiE2rmFAD4OcMhtUki8W3U,88
-eba_xbridge-1.5.0rc2.dist-info/licenses/LICENSE,sha256=xx0jnfkXJvxRnG63LTGOxlggYnIysveWIZ6H3PNdCrQ,11357
-eba_xbridge-1.5.0rc2.dist-info/RECORD,,
+eba_xbridge-1.5.0rc4.dist-info/METADATA,sha256=5BAX_xFnRrIxcQiJbNi3y68A_42F8dR-qpL6Z-bBT0U,10430
+eba_xbridge-1.5.0rc4.dist-info/WHEEL,sha256=zp0Cn7JsFoX2ATtOhtaFYIiE2rmFAD4OcMhtUki8W3U,88
+eba_xbridge-1.5.0rc4.dist-info/entry_points.txt,sha256=FATct4icSewM04cegjhybtm7xcQWhaSahL-DTtuFdZw,49
+eba_xbridge-1.5.0rc4.dist-info/licenses/LICENSE,sha256=xx0jnfkXJvxRnG63LTGOxlggYnIysveWIZ6H3PNdCrQ,11357
+eba_xbridge-1.5.0rc4.dist-info/RECORD,,

eba_xbridge-1.5.0rc4.dist-info/entry_points.txt ADDED Viewed

@@ -0,0 +1,3 @@
+[console_scripts]
+xbridge=xbridge.__main__:main

xbridge/__init__.py CHANGED Viewed

@@ -2,4 +2,4 @@
 Init file for eba-xbridge library
 """
-__version__ = "1.5.0rc2"
+__version__ = "1.5.0rc4"

xbridge/__main__.py ADDED Viewed

@@ -0,0 +1,82 @@
+"""Command-line interface for xbridge."""
+import argparse
+import sys
+from pathlib import Path
+from xbridge.api import convert_instance
+def main() -> None:
+    """Main CLI entry point for xbridge converter."""
+    parser = argparse.ArgumentParser(
+        description="Convert XBRL-XML instances to XBRL-CSV format",
+        prog="xbridge",
+    )
+    parser.add_argument(
+        "input_file",
+        type=str,
+        help="Path to the input XBRL-XML file",
+    )
+    parser.add_argument(
+        "--output-path",
+        type=str,
+        default=None,
+        help="Output directory path (default: same folder as input file)",
+    )
+    parser.add_argument(
+        "--headers-as-datapoints",
+        action="store_true",
+        default=False,
+        help="Treat headers as datapoints (default: False)",
+    )
+    parser.add_argument(
+        "--strict-validation",
+        action="store_true",
+        default=True,
+        help="Raise errors on validation failures (default: True)",
+    )
+    parser.add_argument(
+        "--no-strict-validation",
+        action="store_false",
+        dest="strict_validation",
+        help="Emit warnings instead of errors for validation failures",
+    )
+    args = parser.parse_args()
+    # Determine output path
+    input_path = Path(args.input_file)
+    if not input_path.exists():
+        print(f"Error: Input file not found: {args.input_file}", file=sys.stderr)
+        sys.exit(1)
+    if args.output_path is None:
+        output_path = input_path.parent
+    else:
+        output_path = Path(args.output_path)
+        if not output_path.exists():
+            print(f"Error: Output path does not exist: {args.output_path}", file=sys.stderr)
+            sys.exit(1)
+    try:
+        result_path = convert_instance(
+            instance_path=input_path,
+            output_path=output_path,
+            headers_as_datapoints=args.headers_as_datapoints,
+            validate_filing_indicators=True,
+            strict_validation=args.strict_validation,
+        )
+        print(f"Conversion successful: {result_path}")
+    except Exception as e:
+        print(f"Conversion failed: {e}", file=sys.stderr)
+        sys.exit(1)
+if __name__ == "__main__":
+    main()

xbridge/api.py CHANGED Viewed

@@ -14,6 +14,7 @@ def convert_instance(
     output_path: Optional[Union[str, Path]] = None,
     headers_as_datapoints: bool = False,
     validate_filing_indicators: bool = True,
+    strict_validation: bool = True,
 ) -> Path:
     """
     Convert one single instance of XBRL-XML file to a CSV file
@@ -27,6 +28,9 @@ def convert_instance(
     :param validate_filing_indicators: If True, validate that no facts are orphaned
         (belong only to non-reported tables). Default is True.
+    :param strict_validation: If True (default), raise an error on orphaned facts. If False,
+        emit a warning instead and continue.
     :return: Converted CSV file.
     """
@@ -34,7 +38,12 @@ def convert_instance(
         output_path = Path(".")
     converter = Converter(instance_path)
-    return converter.convert(output_path, headers_as_datapoints, validate_filing_indicators)
+    return converter.convert(
+        output_path,
+        headers_as_datapoints,
+        validate_filing_indicators,
+        strict_validation,
+    )
 def load_instance(instance_path: Union[str, Path]) -> Instance:

xbridge/converter.py CHANGED Viewed

@@ -6,10 +6,11 @@ from __future__ import annotations
 import csv
 import json
+import warnings
 from pathlib import Path
 from shutil import rmtree
 from tempfile import TemporaryDirectory
-from typing import Any, Dict, Set, Union
+from typing import Any, Dict, Union
 from zipfile import ZipFile
 import pandas as pd
@@ -76,6 +77,7 @@ class Converter:
         output_path: Union[str, Path],
         headers_as_datapoints: bool = False,
         validate_filing_indicators: bool = True,
+        strict_validation: bool = False,
     ) -> Path:
         """Convert the ``XML Instance`` to a CSV file or between CSV formats"""
         if not output_path:
@@ -90,7 +92,9 @@ class Converter:
             raise ValueError("Module of the instance file not found in the taxonomy")
         if isinstance(self.instance, XmlInstance):
-            return self.convert_xml(output_path, headers_as_datapoints, validate_filing_indicators)
+            return self.convert_xml(
+                output_path, headers_as_datapoints, validate_filing_indicators, strict_validation
+            )
         elif isinstance(self.instance, CsvInstance):
             if self.module.architecture != "headers":
                 raise ValueError("Cannot convert CSV instance with non-headers architecture")
@@ -103,6 +107,7 @@ class Converter:
         output_path: Path,
         headers_as_datapoints: bool = False,
         validate_filing_indicators: bool = True,
+        strict_validation: bool = True,
     ) -> Path:
         module_filind_codes = [table.filing_indicator_code for table in self.module.tables]
@@ -147,7 +152,7 @@ class Converter:
         self._convert_filing_indicator(report_dir)
         if validate_filing_indicators:
-            self._validate_filing_indicators()
+            self._validate_filing_indicators(strict_validation=strict_validation)
         with open(MAPPING_PATH / self.module.dim_dom_file_name, "r", encoding="utf-8") as fl:
             mapping_dict: Dict[str, str] = json.load(fl)
@@ -280,111 +285,44 @@ class Converter:
             instance_df = instance_df.loc[mask]
             instance_df.drop(columns=nrd_list, inplace=True)
-        return instance_df
-    def _normalize_allowed_values(
-        self, table_df: pd.DataFrame, datapoint_df: pd.DataFrame
-    ) -> pd.DataFrame:
-        """
-        Normalizes fact values against allowed_values for each variable.
-        For variables with allowed_values:
-        1. Extracts code part from fact values (after ":")
-        2. Maps to correct namespaced value from allowed_values
-        3. Updates dimension columns with normalized values
-        4. Validates no unmatched codes remain
-        :param table_df: The merged dataframe with facts and variables
-        :param datapoint_df: The dataframe with variable definitions including allowed_values
-        :return: The normalized dataframe
-        """
-        if "allowed_values" not in datapoint_df.columns:
-            return table_df
-        # Build mapping: datapoint → {code → full_value}
-        datapoint_allowed_map: Dict[str, Dict[str, str]] = {}
+        # Rows missing values for required open keys do not belong to the table
+        if open_keys:
+            instance_df.dropna(subset=list(open_keys), inplace=True)
-        for _, row in datapoint_df.iterrows():
-            datapoint = row.get("datapoint")
-            allowed_values = row.get("allowed_values")
-            if not datapoint or not allowed_values or len(allowed_values) == 0:
-                continue
+        return instance_df
-            # Group allowed values by the dimension they apply to
-            # For now, we'll apply them to all dimension columns
-            # In the future, we could make this more sophisticated
-            code_map: Dict[str, str] = {}
-            for allowed_val in allowed_values:
-                if ":" in allowed_val:
-                    code = allowed_val.split(":")[-1]
-                    code_map[code] = allowed_val
-            if code_map:
-                datapoint_allowed_map[datapoint] = code_map
-        if not datapoint_allowed_map:
-            return table_df
-        # Identify columns to normalize
-        # We normalize both dimension columns AND the value column (for enumerated values)
-        exclude_cols = {"datapoint", "decimals", "unit", "data_type", "allowed_values"}
-        columns_to_check = [col for col in table_df.columns if col not in exclude_cols]
-        # For each column that might contain namespaced values
-        for dim_col in columns_to_check:
-            if dim_col not in table_df.columns or table_df[dim_col].isna().all():
-                continue
+    def _matching_fact_indices(self, table: Table) -> set[int]:
+        """Return indices of instance facts that actually match the table definition."""
+        if self.instance.instance_df is None:
+            return set()
-            # Check if column contains namespaced values (contains ":")
-            sample_values = table_df[dim_col].dropna()
-            if sample_values.empty:
-                continue
+        instance_df = self._get_instance_df(table)
+        if instance_df.empty or table.variable_df is None:
+            return set()
-            has_namespace = sample_values.astype(str).str.contains(":", regex=False).any()
-            if not has_namespace:
-                continue
+        open_keys = set(table.open_keys)
-            # Extract codes from values (vectorized operation)
-            mask = table_df[dim_col].notna()
-            temp_code_col = f"_{dim_col}_temp_code"
-            table_df.loc[mask, temp_code_col] = (
-                table_df.loc[mask, dim_col].astype(str).str.split(":").str[-1]
-            )
+        datapoint_df = table.variable_df.copy()
-            # Normalize values for each datapoint
-            for datapoint, code_map in datapoint_allowed_map.items():
-                dp_mask = (table_df["datapoint"] == datapoint) & mask
+        # For validation we match minimally on metric (concept) and any open keys present
+        merge_cols: list[str] = []
+        if "metric" in datapoint_df.columns and "metric" in instance_df.columns:
+            merge_cols.append("metric")
+        merge_cols.extend(
+            [key for key in open_keys if key in datapoint_df.columns and key in instance_df.columns]
+        )
-                if not dp_mask.any():
-                    continue
+        instance_df = instance_df.copy()
+        instance_df["_idx"] = instance_df.index
-                # Store original values for error reporting
-                original_values = table_df.loc[dp_mask, dim_col].copy()
-                # Map codes to correct full values
-                normalized_values = table_df.loc[dp_mask, temp_code_col].map(code_map)
-                # Update only the values that were successfully mapped
-                mapped_mask = dp_mask & normalized_values.notna()
-                table_df.loc[mapped_mask, dim_col] = normalized_values[mapped_mask]
-                # Check for values that couldn't be mapped (validation errors)
-                unmapped_mask = dp_mask & normalized_values.isna()
-                if unmapped_mask.any():
-                    invalid_codes = table_df.loc[unmapped_mask, temp_code_col].unique()
-                    valid_codes = list(code_map.keys())
-                    raise ValueError(
-                        f"Invalid values for datapoint '{datapoint}' in column '{dim_col}': "
-                        f"Found codes {list(invalid_codes)} but only {valid_codes} are allowed. "
-                        f"Original values: {original_values[unmapped_mask].tolist()}"
-                    )
+        merged_df = pd.merge(datapoint_df, instance_df, on=merge_cols, how="inner")
-            # Clean up temporary column
-            if temp_code_col in table_df.columns:
-                table_df.drop(columns=[temp_code_col], inplace=True)
+        if open_keys:
+            valid_open_keys = [key for key in open_keys if key in merged_df.columns]
+            if valid_open_keys:
+                merged_df.dropna(subset=valid_open_keys, inplace=True)
-        return table_df
+        return set(merged_df["_idx"].tolist())
     def _variable_generator(self, table: Table) -> pd.DataFrame:
         """Returns the dataframe with the CSV file for the table
@@ -406,7 +344,7 @@ class Converter:
         )
         # Do the intersection and drop from datapoints the columns and records
-        datapoint_df = table.variable_df
+        datapoint_df = table.variable_df.copy()
         missing_cols = list(variable_columns - instance_columns)
         if "data_type" in missing_cols:
             missing_cols.remove("data_type")
@@ -417,10 +355,8 @@ class Converter:
         # Join the dataframes on the datapoint_columns
         merge_cols = list(variable_columns & instance_columns)
-        table_df = pd.merge(datapoint_df, instance_df, on=merge_cols, how="inner")
-        # Normalize values against allowed_values
-        table_df = self._normalize_allowed_values(table_df, datapoint_df)
+        table_df = pd.merge(datapoint_df, instance_df, on=merge_cols, how="inner")
         if "data_type" in table_df.columns and "decimals" in table_df.columns:
             decimals_table = table_df[["decimals", "data_type"]].drop_duplicates()
@@ -432,17 +368,27 @@ class Converter:
                 decimals = row["decimals"]
                 if data_type not in self._decimals_parameters:
-                    self._decimals_parameters[data_type] = decimals
+                    self._decimals_parameters[data_type] = (
+                        int(decimals) if decimals not in {"INF", "#none"} else decimals
+                    )
                 else:
                     # If new value is a special value, skip it (prefer numeric values)
                     if decimals in {"INF", "#none"}:
                         pass
                     # If new value is numeric
                     else:
+                        try:
+                            decimals = int(decimals)
+                        except ValueError:
+                            raise ValueError(
+                                f"Invalid decimals value: {decimals}, "
+                                "should be integer, 'INF' or '#none'"
+                            )
                         # If existing value is special, replace with numeric
-                        if self._decimals_parameters[data_type] in {"INF", "#none"} or (
-                            isinstance(self._decimals_parameters[data_type], int)
-                            and decimals < self._decimals_parameters[data_type]
+                        if (
+                            self._decimals_parameters[data_type] in {"INF", "#none"}
+                            or decimals < self._decimals_parameters[data_type]
                         ):
                             self._decimals_parameters[data_type] = decimals
@@ -497,13 +443,6 @@ class Converter:
             # Defined by the EBA in the JSON files. We take them from the taxonomy
             # Because EBA is using exactly those for the JSON files.
-            for open_key in table.open_keys:
-                if open_key in datapoints.columns:
-                    dim_name = mapping_dict.get(open_key)
-                    # For open keys, there are no dim_names (they are not mapped)
-                    if dim_name and not datapoints.empty:
-                        datapoints[open_key] = dim_name + ":" + datapoints[open_key].astype(str)
             datapoints.sort_values(by=["datapoint"], ascending=True, inplace=True)
             output_path_table = temp_dir_path / (table.url or "table.csv")
@@ -550,7 +489,7 @@ class Converter:
                 if fil_ind.value and fil_ind.table:
                     self._reported_tables.append(fil_ind.table)
-    def _validate_filing_indicators(self) -> None:
+    def _validate_filing_indicators(self, strict_validation: bool = True) -> None:
         """Validate that no facts are orphaned (belong only to non-reported tables).
         Raises:
@@ -559,44 +498,56 @@ class Converter:
         if self.instance.instance_df is None or self.instance.instance_df.empty:
             return
-        # Step 1: Collect indices of facts that belong to ANY reported table
-        reported_fact_indices: Set[int] = set()
+        # Step 1: Track which facts belong to ANY reported table without materializing a huge set
+        reported_mask = pd.Series(False, index=self.instance.instance_df.index)
         for table in self.module.tables:
             if table.filing_indicator_code in self._reported_tables:
-                instance_df = self._get_instance_df(table)
-                if not instance_df.empty:
-                    # Add all fact indices (DataFrame row indices) to the set
-                    reported_fact_indices.update(instance_df.index)
+                reported_indices = self._matching_fact_indices(table)
+                if reported_indices:
+                    reported_mask.loc[list(reported_indices)] = True
         # Step 2: Find facts that belong ONLY to non-reported tables
-        all_orphaned_indices = set()
+        orphaned_mask = pd.Series(False, index=self.instance.instance_df.index)
         orphaned_per_table = {}
         for table in self.module.tables:
             if table.filing_indicator_code not in self._reported_tables:
-                instance_df = self._get_instance_df(table)
-                if not instance_df.empty:
-                    # Find facts that are in this table but NOT in any reported table
-                    orphaned_in_this_table = set(instance_df.index) - reported_fact_indices
+                orphaned_indices = self._matching_fact_indices(table)
+                if orphaned_indices:
+                    # Facts in this table that never appear in a reported table
+                    orphaned_in_this_table = [
+                        idx for idx in orphaned_indices if not reported_mask.loc[idx]
+                    ]
                     if orphaned_in_this_table:
+                        orphaned_mask.loc[orphaned_in_this_table] = True
                         orphaned_per_table[table.filing_indicator_code] = len(
                             orphaned_in_this_table
                         )
-                        all_orphaned_indices.update(orphaned_in_this_table)
-        if all_orphaned_indices:
+        total_orphaned = int(orphaned_mask.sum())
+        if total_orphaned:
             error_msg = (
                 f"Filing indicator inconsistency detected:\n"
-                f"Found {len(all_orphaned_indices)} fact(s) that belong ONLY"
+                f"Found {total_orphaned} fact(s) that belong ONLY"
                 f" to non-reported tables:\n"
             )
             for table_code, count in orphaned_per_table.items():
                 error_msg += f"  - {table_code}: {count} fact(s)\n"
+            if strict_validation:
+                error_msg += (
+                    "\nThe conversion process will not continue due to strict validation mode. "
+                    "Either set filed=true for the relevant tables "
+                    "or remove these facts from the XML."
+                )
+                raise ValueError(error_msg)
             error_msg += (
                 "\nThese facts will be excluded from the output. "
-                "Either set filed=true for the relevant tables or remove these facts from the XML."
+                "Consider setting filed=true for the relevant tables "
+                "or removing these facts from the XML."
             )
-            raise ValueError(error_msg)
+            warnings.warn(error_msg)
     def _convert_parameters(self, temp_dir_path: Path) -> None:
         # Workaround;

xbridge/instance.py CHANGED Viewed

@@ -13,6 +13,59 @@ from zipfile import ZipFile
 import pandas as pd
 from lxml import etree
+# Cache namespace → CSV prefix derivations to avoid repeated string work during parse
+_namespace_prefix_cache: Dict[str, str] = {}
+def _derive_csv_prefix(namespace_uri: str) -> Optional[str]:
+    """Derive the fixed CSV prefix from a namespace URI using the EBA convention."""
+    if not namespace_uri:
+        return None
+    cached = _namespace_prefix_cache.get(namespace_uri)
+    if cached is not None:
+        return cached
+    cleaned = namespace_uri.rstrip("#/")
+    if "#" in namespace_uri:
+        segment = namespace_uri.rsplit("#", 1)[-1]
+    else:
+        segment = cleaned.rsplit("/", 1)[-1] if "/" in cleaned else cleaned
+    if not segment:
+        return None
+    prefix = f"eba_{segment}"
+    _namespace_prefix_cache[namespace_uri] = prefix
+    return prefix
+def _normalize_namespaced_value(
+    value: Optional[str], nsmap: Dict[Optional[str], str]
+) -> Optional[str]:
+    """
+    Normalize a namespaced value (e.g., 'dom:qAE' or '{uri}qAE') to the CSV prefix convention.
+    Returns the original value if no namespace can be resolved.
+    """
+    if value is None:
+        return None
+    # Clark notation: {uri}local
+    if value.startswith("{") and "}" in value:
+        uri, local = value[1:].split("}", 1)
+        derived = _derive_csv_prefix(uri)
+        return f"{derived}:{local}" if derived else value
+    # Prefixed notation: prefix:local
+    if ":" in value:
+        potential_prefix, local = value.split(":", 1)
+        namespace_uri = nsmap.get(potential_prefix)
+        if namespace_uri:
+            derived = _derive_csv_prefix(namespace_uri)
+            return f"{derived}:{local}" if derived else value
+    return value
 class Instance:
     """
@@ -548,7 +601,7 @@ class Scenario:
                     continue
                 dimension = dimension_raw.split(":")[1]
                 value = self.get_value(child)
-                value = value.split(":")[1] if ":" in value else value
+                value = _normalize_namespaced_value(value, child.nsmap) or ""
                 self.dimensions[dimension] = value
     @staticmethod
@@ -667,7 +720,7 @@ class Fact:
     def parse(self) -> None:
         """Parse the XML node with the `fact <https://www.xbrl.org/guidance/xbrl-glossary/#:~:text=accounting%20standards%20body.-,Fact,-A%20fact%20is>`_."""
         self.metric = self.fact_xml.tag
-        self.value = self.fact_xml.text
+        self.value = _normalize_namespaced_value(self.fact_xml.text, self.fact_xml.nsmap)
         self.decimals = self.fact_xml.attrib.get("decimals")
         self.context = self.fact_xml.attrib.get("contextRef")
         self.unit = self.fact_xml.attrib.get("unitRef")
@@ -675,7 +728,11 @@ class Fact:
     def __dict__(self) -> Dict[str, Any]:  # type: ignore[override]
         metric_clean = ""
         if self.metric:
-            metric_clean = self.metric.split("}")[1] if "}" in self.metric else self.metric
+            # Normalize metric to use consistent eba_* prefix like other dimensions
+            metric_clean = _normalize_namespaced_value(self.metric, self.fact_xml.nsmap) or ""
+            # If still in Clark notation, extract the local name
+            if metric_clean.startswith("{") and "}" in metric_clean:
+                metric_clean = metric_clean.split("}", 1)[1]
         return {
             "metric": metric_clean,

xbridge/modules.py CHANGED Viewed

@@ -306,9 +306,9 @@ class Table:
                 variable_info: dict[str, Any] = {}
                 for dim_k, dim_v in variable.dimensions.items():
                     if dim_k not in ("unit", "decimals"):
-                        variable_info[dim_k] = dim_v.split(":")[1]
+                        variable_info[dim_k] = dim_v
                 if "concept" in variable.dimensions:
-                    variable_info["metric"] = variable.dimensions["concept"].split(":")[1]
+                    variable_info["metric"] = variable.dimensions["concept"]
                     del variable_info["concept"]
                 if variable.code is None:
@@ -324,9 +324,11 @@ class Table:
                 if "dimensions" in column:
                     for dim_k, dim_v in column["dimensions"].items():
                         if dim_k == "concept":
-                            variable_info["metric"] = dim_v.split(":")[1]
+                            variable_info["metric"] = dim_v
                         elif dim_k not in ("unit", "decimals"):
-                            variable_info[dim_k.split(":")[1]] = dim_v.split(":")[1]
+                            # Keep the full dimension key and value with prefixes
+                            dim_k_clean = dim_k.split(":")[1] if ":" in dim_k else dim_k
+                            variable_info[dim_k_clean] = dim_v
                 if "decimals" in column:
                     variable_info["data_type"] = column["decimals"]

eba_xbridge-1.5.0rc2.dist-info/METADATA DELETED Viewed

@@ -1,62 +0,0 @@
-Metadata-Version: 2.4
-Name: eba-xbridge
-Version: 1.5.0rc2
-Summary: XBRL-XML to XBRL-CSV converter for EBA Taxonomy (version 4.1)
-License: Apache 2.0
-License-File: LICENSE
-Keywords: xbrl,eba,taxonomy,csv,xml
-Author: MeaningfulData
-Author-email: info@meaningfuldata.eu
-Maintainer: Antonio Olleros
-Maintainer-email: antonio.olleros@meaningfuldata.eu
-Requires-Python: >=3.9
-Classifier: Development Status :: 5 - Production/Stable
-Classifier: Intended Audience :: Developers
-Classifier: Intended Audience :: Information Technology
-Classifier: Intended Audience :: Science/Research
-Classifier: Programming Language :: Python :: 3
-Classifier: Typing :: Typed
-Requires-Dist: lxml (>=5.2.1,<6.0)
-Requires-Dist: numpy (>=1.23.2,<2) ; python_version < "3.13"
-Requires-Dist: numpy (>=2.1.0) ; python_version >= "3.13"
-Requires-Dist: pandas (>=2.1.4,<3.0)
-Project-URL: Documentation, https://docs.xbridge.meaningfuldata.eu
-Project-URL: IssueTracker, https://github.com/Meaningful-Data/xbridge/issues
-Project-URL: MeaningfulData, https://www.meaningfuldata.eu/
-Project-URL: Repository, https://github.com/Meaningful-Data/xbridge
-Description-Content-Type: text/x-rst
-Overview
-============
-XBridge is a Python library which main function is to convert XBRL-XML files into XBRL-CSV files by using EBA's taxonomy.
-It works with EBA Taxonomy latest published version (4.1). Library must be updated on each new EBA taxonomy version.
-Installation
-============
-To install the library, run the following command:
-.. code:: bash
-    pip install eba-xbridge
-How XBridge works:
-=========================
-Firstly, an XBRL-XML file has to be selected to convert it. Then, that XBRL-XML file is input in the following function contained in the ``API`` package:
-.. code:: python
-  >>> from xbridge.api import convert_instance
-  >>> input_path = "data/input"
-  >>> output_path = "data/output"
-  >>> convert_instance(input_path, output_path)
-The sources to do this process are two: The XML-instances and EBA´s taxonomy.
-The output is the converted XBRL-CSV file placed in the output_path, as zip format

{eba_xbridge-1.5.0rc2.dist-info → eba_xbridge-1.5.0rc4.dist-info}/WHEEL RENAMED Viewed

File without changes

{eba_xbridge-1.5.0rc2.dist-info → eba_xbridge-1.5.0rc4.dist-info}/licenses/LICENSE RENAMED Viewed

File without changes

eba-xbridge 1.5.0rc2__py3-none-any.whl → 1.5.0rc4__py3-none-any.whl

eba-xbridge 1.5.0rc2py3-none-any.whl → 1.5.0rc4py3-none-any.whl