PyPI - ingestr - Versions diffs - 0.13.21__tar.gz → 0.13.22__tar.gz - Mend

ingestr 0.13.21tar.gz → 0.13.22tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of ingestr might be problematic. Click here for more details.

Files changed (254) hide show

{ingestr-0.13.21 → ingestr-0.13.22}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: ingestr
-Version: 0.13.21
+Version: 0.13.22
 Summary: ingestr is a command-line application that ingests data from various sources and stores them in any database.
 Project-URL: Homepage, https://github.com/bruin-data/ingestr
 Project-URL: Issues, https://github.com/bruin-data/ingestr/issues
@@ -16,7 +16,7 @@ Classifier: Topic :: Database
 Requires-Python: >=3.9
 Requires-Dist: aiobotocore==2.21.1
 Requires-Dist: aiohappyeyeballs==2.4.8
-Requires-Dist: aiohttp==3.11.13
+Requires-Dist: aiohttp==3.11.15
 Requires-Dist: aioitertools==0.12.0
 Requires-Dist: aiosignal==1.3.2
 Requires-Dist: alembic==1.15.1
@@ -55,8 +55,8 @@ Requires-Dist: facebook-business==20.0.0
 Requires-Dist: filelock==3.17.0
 Requires-Dist: flatten-json==0.1.14
 Requires-Dist: frozenlist==1.5.0
-Requires-Dist: fsspec==2024.10.0
-Requires-Dist: gcsfs==2024.10.0
+Requires-Dist: fsspec==2025.3.2
+Requires-Dist: gcsfs==2025.3.2
 Requires-Dist: gitdb==4.0.12
 Requires-Dist: gitpython==3.1.44
 Requires-Dist: giturlparse==0.12.0
@@ -149,7 +149,7 @@ Requires-Dist: rich-argparse==1.7.0
 Requires-Dist: rich==13.9.4
 Requires-Dist: rsa==4.9
 Requires-Dist: rudder-sdk-python==2.1.4
-Requires-Dist: s3fs==2024.10.0
+Requires-Dist: s3fs==2025.3.2
 Requires-Dist: s3transfer==0.11.3
 Requires-Dist: scramp==1.4.5
 Requires-Dist: semver==3.0.4

{ingestr-0.13.21 → ingestr-0.13.22}/docs/.vitepress/config.mjs RENAMED Viewed

@@ -126,6 +126,7 @@ export default defineConfig({
               { text: "LinkedIn Ads", link: "/supported-sources/linkedin_ads.md" },
               { text: "Notion", link: "/supported-sources/notion.md" },
               { text: "Personio", link: "/supported-sources/personio.md" },
+              { text: "Pipedrive", link: "/supported-sources/pipedrive.md" },
               { text: "S3", link: "/supported-sources/s3.md" },
               { text: "Salesforce", link: "/supported-sources/salesforce.md" },
               { text: "Shopify", link: "/supported-sources/shopify.md" },

ingestr-0.13.22/docs/media/pipedrive.png ADDED Viewed

Binary file

ingestr-0.13.22/docs/supported-sources/pipedrive.md ADDED Viewed

@@ -0,0 +1,43 @@
+# Pipedrive
+[Pipedrive](https://www.pipedrive.com/) is a cloud-based sales Customer Relationship Management (CRM) tool designed to help businesses manage leads and deals, track communication, and automate sales processes.
+ingestr supports pipedrive as a source.
+## URI format
+The URI format for pipedrive is as follows:
+```plaintext
+pipedrive://?api_token=<api_token>
+```
+URI parameters:
+- api_token: token used for authentication with the Pipedrive API
+## Setting up a pipedrive Integration
+To grab pipedrive credentials, please follow the guide [here](https://dlthub.com/docs/dlt-ecosystem/verified-sources/pipedrive#grab-api-token).
+Once you complete the guide, you should have a `api_token`. Let's say your `api_token` is token_123, here's a sample command that will copy the data from pipedriveinto a DuckDB database:
+```bash
+ingestr ingest \
+--source-uri 'pipedrive://?api_token=token' \
+--source-table 'users' \
+--dest-uri duckdb:///pipedrive.duckdb \
+--dest-table 'dest.users'
+```
+<img alt="pipedrive_img" src="../media/pipedrive.png"/>
+pipedrive source allows ingesting the following resources into separate tables:
+- `activities`: Refers to scheduled events or tasks associated with deals, contacts, or organizations
+- `organizations`: Refers to company or entity with which you have potential or existing business dealings.
+- `products`: Refers to items or services offered for sale that can be associated with deals
+- `deals`: Refers to potential sale or transaction that you can track through various stages
+- `users`: Refers to Individual with a unique login credential who can access and use the platform
+- `persons`: Refers individual contacts or leads that can be linked to sales deals
+Use these as `--source-table` parameter in the `ingestr ingest` command.

ingestr-0.13.22/docs/tutorials/load-kinesis-bigquery.md ADDED Viewed

@@ -0,0 +1,130 @@
+# Load Data from Amazon Kinesis to Google BigQuery
+Welcome! 👋
+ This beginner-friendly guide will help you load data from `Amazon Kinesis` into `Google BigQuery` using `ingestr` — a simple yet powerful command-line tool. No prior experience is needed, and best of all, no coding required!
+By the end of this guide, you'll have your Kinesis data securely stored in BigQuery. But before we dive in, let’s take a quick look at `ingestr`.
+## Overview of ingestr
+`ingestr` is a command-line tool that simplifies data ingestion by allowing users to load data from a source to a destination using simple command-line flags.
+### ingestr Command
+```bash
+ingestr ingest \
+   --source-uri '<your-source-uri-here>' \
+   --source-table '<your-schema>.<your-table>' \
+   --dest-uri '<your-destination-uri-here>' \
+   --dest-table '<your-schema>.<your-table>'
+```
+- `ingestr ingest`: Executes the data ingestion process.
+- `--source-uri TEXT`: Specifies the URI of the data source.
+- `--dest-uri TEXT`: Specifies the URI of the destination.
+- `--source-table TEXT`: Defines the table to fetch data from.
+- `--dest-table TEXT`: Specifies the destination table. If not provided, it defaults to `--source-table`.
+With this command, we connect to the source, retrieve the specified data, and load it into the destination database.
+## Let's Load Data from Kinesis to BigQuery Together!
+Amazon Kinesis is a cloud-based service for real-time data streaming and analytics that processes large data streams. To analyze this data, you may need to load it into a data warehouse like Google BigQuery. `ingestr` makes this process simple.
+### Step 1: Install ingestr
+Ensure `ingestr` is installed. If not, follow the installation guide [here](../getting-started/quickstart.md#Installation).
+### Step 2: Get AWS Credentials
+Kinesis will be our data source. To access it, you need AWS credentials.
+1. Log in to your AWS account.
+2. Navigate to `IAM` (Identity and Access Management).
+3. Create a new IAM user or select an existing one.
+4. Assign necessary permissions (e.g., `AmazonKinesisReadOnlyAccess`).
+5. Generate and copy the `Access Key ID` and `Secret Access Key`.
+For more details, read [here](https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html).
+### Step 3: Configure Kinesis as Source
+#### `--source-uri`
+This flag connects to your Kinesis stream. The URI format is:
+```bash
+kinesis://?aws_access_key_id=<YOUR_KEY_ID>&aws_secret_access_key=<YOUR_SECRET_KEY>&region_name=<YOUR_REGION>
+```
+Required parameters:
+- `aws_access_key_id`: Your AWS access key
+- `aws_secret_access_key`: Your AWS secret key
+- `region_name`: AWS region of your Kinesis stream
+#### `--source-table`
+This flag specifies which Kinesis stream to read from:
+```bash
+--source-table 'kinesis_stream_name'
+```
+This flag specifies which Kinesis stream to read from:
+### Step 4: Configure BigQuery as Destination
+#### `--dest-uri`
+This flag connects to BigQuery. The URI format is:
+```bash
+bigquery://<project-name>?credentials_path=/path/to/service/account.json&location=<location>
+```
+Required parameters:
+- `project-name`: Your BigQuery project name
+- `credentials_path`: Path to the service account JSON file
+- `location`: (Optional) Dataset location
+#### `--dest-table`
+This flag specifies where to save the data:
+```bash
+--dest-table 'dataset.table_name'
+```
+### Step 5: Run the ingestr Command
+Execute the following command to load data from Kinesis to BigQuery:
+```bash
+ingestr ingest \
+    --source-uri 'kinesis://?aws_access_key_id=<YOUR_KEY_ID>&aws_secret_access_key=<YOUR_SECRET_KEY>&region_name=eu-central-1' \
+    --source-table 'kinesis_stream_name' \
+    --dest-uri 'bigquery://project-name?credentials_path=/Users/abc.json' \
+    --dest-table 'dataset.results'
+```
+### Step 6: Verify Data in BigQuery
+Once the command runs successfully, your Kinesis data will be available in BigQuery. Follow these steps to verify the data:
+1. Open the [BigQuery Console](https://console.cloud.google.com/bigquery) and select your project.
+2. In the left-hand side panel:
+    - Expand your project.
+    - Navigate to the appropriate dataset and click on the table name.
+3. Select the "Preview" tab to view a sample of the ingested data.
+    - Confirm that rows are present and fields appear as expected.
+5. Go to the "Query" tab and run a basic query to inspect your data more closely. For example:
+```sql
+SELECT * FROM `project-name.dataset.results` LIMIT 100;
+```
+Ensure that the retrieved data matches what was expected from the Kinesis stream.
+### Example Output
+After running the ingestion process, your Kinesis data will be available in BigQuery. Here's an example of what the data might look like:
+<img alt="kinesis_bigquery" src="../media/kinesis.bigquery.png" />
+## 🎉 Congratulations!
+You have successfully loaded data from Amazon Kinesis to BigQuery using `ingestr`.

{ingestr-0.13.21 → ingestr-0.13.22}/ingestr/src/adjust/adjust_helpers.py RENAMED Viewed

@@ -82,7 +82,9 @@ class AdjustAPI:
             items = result.get("rows", [])
             yield items
         else:
-            raise HTTPError(f"Request failed with status code: {response.status_code}, {response.text}.")
+            raise HTTPError(
+                f"Request failed with status code: {response.status_code}, {response.text}."
+            )
     def fetch_events(self):
         headers = {"Authorization": f"Bearer {self.api_key}"}
@@ -93,7 +95,9 @@ class AdjustAPI:
             result = response.json()
             yield result
         else:
-            raise HTTPError(f"Request failed with status code: {response.status_code}, {response.text}.")
+            raise HTTPError(
+                f"Request failed with status code: {response.status_code}, {response.text}."
+            )
 def parse_filters(filters_raw: str) -> dict:

{ingestr-0.13.21 → ingestr-0.13.22}/ingestr/src/applovin_max/__init__.py RENAMED Viewed

@@ -105,11 +105,13 @@ def get_data(
         if response.status_code == 404:
             if "No Mediation App Id found for platform" in response.text:
                 return None
-        error_message = f"AppLovin MAX API error (status {response.status_code}): {response.text}"
+        error_message = (
+            f"AppLovin MAX API error (status {response.status_code}): {response.text}"
+        )
         raise requests.HTTPError(error_message)
     response_url = response.json().get("ad_revenue_report_url")
     df = pd.read_csv(response_url)
     df["Date"] = pd.to_datetime(df["Date"])
     df["partition_date"] = df["Date"].dt.date
-    return df
+    return df

ingestr-0.13.22/ingestr/src/buildinfo.py ADDED Viewed

	@@ -0,0 +1 @@
1	+ version = "v0.13.22"

{ingestr-0.13.21 → ingestr-0.13.22}/ingestr/src/factory.py RENAMED Viewed

@@ -43,6 +43,7 @@ from ingestr.src.sources import (
     MongoDbSource,
     NotionSource,
     PersonioSource,
+    PipedriveSource,
     S3Source,
     SalesforceSource,
     ShopifySource,
@@ -144,6 +145,7 @@ class SourceDestinationFactory:
         "salesforce": SalesforceSource,
         "personio": PersonioSource,
         "kinesis": KinesisSource,
+        "pipedrive": PipedriveSource,
     }
     destinations: Dict[str, Type[DestinationProtocol]] = {
         "bigquery": BigQueryDestination,

{ingestr-0.13.21 → ingestr-0.13.22}/ingestr/src/hubspot/__init__.py RENAMED Viewed

@@ -199,7 +199,6 @@ def crm_objects(
     props = ",".join(sorted(list(set(props))))
     params = {"properties": props, "limit": 100}
     yield from fetch_data(CRM_OBJECT_ENDPOINTS[object_type], api_key, params=params)

{ingestr-0.13.21 → ingestr-0.13.22}/ingestr/src/kinesis/__init__.py RENAMED Viewed

@@ -16,7 +16,7 @@ from .helpers import get_shard_iterator, max_sequence_by_shard
     name=lambda args: args["stream_name"],
     primary_key="kinesis_msg_id",
     standalone=True,
-    max_table_nesting=0
+    max_table_nesting=0,
 )
 def kinesis_stream(
     stream_name: str,
@@ -75,7 +75,6 @@ def kinesis_stream(
     # get next shard to fetch messages from
     while shard_id := shard_ids.pop(0) if shard_ids else None:
         shard_iterator, _ = get_shard_iterator(
             kinesis_client,
             stream_name,
@@ -83,14 +82,14 @@ def kinesis_stream(
             last_msg,  # type: ignore
             initial_at_datetime,  # type: ignore
         )
         while shard_iterator:
             records = []
             records_response = kinesis_client.get_records(
                 ShardIterator=shard_iterator,
                 Limit=chunk_size,  # The size of data can be up to 1 MB, it must be controlled by the user
             )
             for record in records_response["Records"]:
                 sequence_number = record["SequenceNumber"]
                 content = record["Data"]

{ingestr-0.13.21 → ingestr-0.13.22}/ingestr/src/partition.py RENAMED Viewed

@@ -13,7 +13,6 @@ def apply_athena_hints(
     additional_hints: Dict[str, TColumnSchema] = {},
 ) -> None:
     def _apply_partition_hint(resource: DltResource) -> None:
         columns = resource.columns if resource.columns else {}
         partition_hint = (
@@ -24,7 +23,8 @@ def apply_athena_hints(
         athena_adapter(
             resource,
             athena_partition.day(partition_column)
-            if partition_hint and partition_hint.get("data_type") in ("timestamp", "date")
+            if partition_hint
+            and partition_hint.get("data_type") in ("timestamp", "date")
             else partition_column,
         )

ingestr-0.13.22/ingestr/src/pipedrive/__init__.py ADDED Viewed

@@ -0,0 +1,198 @@
+"""Highly customizable source for Pipedrive, supports endpoint addition, selection and column rename
+Pipedrive api docs: https://developers.pipedrive.com/docs/api/v1
+Pipedrive changes or deprecates fields and endpoints without versioning the api.
+If something breaks, it's a good idea to check the changelog.
+Api changelog: https://developers.pipedrive.com/changelog
+To get an api key: https://pipedrive.readme.io/docs/how-to-find-the-api-token
+"""
+from typing import Any, Dict, Iterator, List, Optional, Union  # noqa: F401
+import dlt
+from dlt.common import pendulum
+from dlt.common.time import ensure_pendulum_datetime
+from dlt.sources import DltResource, TDataItems
+from .helpers import group_deal_flows
+from .helpers.custom_fields_munger import rename_fields, update_fields_mapping
+from .helpers.pages import get_pages, get_recent_items_incremental
+from .settings import ENTITY_MAPPINGS, RECENTS_ENTITIES
+from .typing import TDataPage
+@dlt.source(name="pipedrive", max_table_nesting=0)
+def pipedrive_source(
+    pipedrive_api_key: str = dlt.secrets.value,
+    since_timestamp: Optional[Union[pendulum.DateTime, str]] = "1970-01-01 00:00:00",
+) -> Iterator[DltResource]:
+    """
+    Get data from the Pipedrive API. Supports incremental loading and custom fields mapping.
+    Args:
+        pipedrive_api_key: https://pipedrive.readme.io/docs/how-to-find-the-api-token
+        since_timestamp: Starting timestamp for incremental loading. By default complete history is loaded on first run.
+        incremental: Enable or disable incremental loading.
+    Returns resources:
+        custom_fields_mapping
+        activities
+        activityTypes
+        deals
+        deals_flow
+        deals_participants
+        files
+        filters
+        notes
+        persons
+        organizations
+        pipelines
+        products
+        stages
+        users
+        leads
+    For custom fields rename the `custom_fields_mapping` resource must be selected or loaded before other resources.
+    Resources that depend on another resource are implemented as transformers
+    so they can re-use the original resource data without re-downloading.
+    Examples:  deals_participants, deals_flow
+    """
+    # yield nice rename mapping
+    yield create_state(pipedrive_api_key) | parsed_mapping
+    # parse timestamp and build kwargs
+    since_timestamp = ensure_pendulum_datetime(since_timestamp).strftime(
+        "%Y-%m-%d %H:%M:%S"
+    )
+    resource_kwargs: Any = (
+        {"since_timestamp": since_timestamp} if since_timestamp else {}
+    )
+    # create resources for all endpoints
+    endpoints_resources = {}
+    for entity, resource_name in RECENTS_ENTITIES.items():
+        endpoints_resources[resource_name] = dlt.resource(
+            get_recent_items_incremental,
+            name=resource_name,
+            primary_key="id",
+            write_disposition="merge",
+        )(entity, pipedrive_api_key, **resource_kwargs)
+    yield from endpoints_resources.values()
+    # create transformers for deals to participants and flows
+    yield endpoints_resources["deals"] | dlt.transformer(
+        name="deals_participants", write_disposition="merge", primary_key="id"
+    )(_get_deals_participants)(pipedrive_api_key)
+    yield endpoints_resources["deals"] | dlt.transformer(
+        name="deals_flow", write_disposition="merge", primary_key="id"
+    )(_get_deals_flow)(pipedrive_api_key)
+    yield leads(pipedrive_api_key, update_time=since_timestamp)
+def _get_deals_flow(
+    deals_page: TDataPage, pipedrive_api_key: str
+) -> Iterator[TDataItems]:
+    custom_fields_mapping = dlt.current.source_state().get("custom_fields_mapping", {})
+    for row in deals_page:
+        url = f"deals/{row['id']}/flow"
+        pages = get_pages(url, pipedrive_api_key)
+        for entity, page in group_deal_flows(pages):
+            yield dlt.mark.with_table_name(
+                rename_fields(page, custom_fields_mapping.get(entity, {})),
+                "deals_flow_" + entity,
+            )
+def _get_deals_participants(
+    deals_page: TDataPage, pipedrive_api_key: str
+) -> Iterator[TDataPage]:
+    for row in deals_page:
+        url = f"deals/{row['id']}/participants"
+        yield from get_pages(url, pipedrive_api_key)
+@dlt.resource(selected=False)
+def create_state(pipedrive_api_key: str) -> Iterator[Dict[str, Any]]:
+    def _get_pages_for_rename(
+        entity: str, fields_entity: str, pipedrive_api_key: str
+    ) -> Dict[str, Any]:
+        existing_fields_mapping: Dict[str, Dict[str, str]] = (
+            custom_fields_mapping.setdefault(entity, {})
+        )
+        # we need to process all pages before yielding
+        for page in get_pages(fields_entity, pipedrive_api_key):
+            existing_fields_mapping = update_fields_mapping(
+                page, existing_fields_mapping
+            )
+        return existing_fields_mapping
+    # gets all *Fields data and stores in state
+    custom_fields_mapping = dlt.current.source_state().setdefault(
+        "custom_fields_mapping", {}
+    )
+    for entity, fields_entity, _ in ENTITY_MAPPINGS:
+        if fields_entity is None:
+            continue
+        custom_fields_mapping[entity] = _get_pages_for_rename(
+            entity, fields_entity, pipedrive_api_key
+        )
+    yield custom_fields_mapping
+@dlt.transformer(
+    name="custom_fields_mapping",
+    write_disposition="replace",
+    columns={"options": {"data_type": "json"}},
+)
+def parsed_mapping(
+    custom_fields_mapping: Dict[str, Any],
+) -> Optional[Iterator[List[Dict[str, str]]]]:
+    """
+    Parses and yields custom fields' mapping in order to be stored in destiny by dlt
+    """
+    for endpoint, data_item_mapping in custom_fields_mapping.items():
+        yield [
+            {
+                "endpoint": endpoint,
+                "hash_string": hash_string,
+                "name": names["name"],
+                "normalized_name": names["normalized_name"],
+                "options": names["options"],
+                "field_type": names["field_type"],
+            }
+            for hash_string, names in data_item_mapping.items()
+        ]
+@dlt.resource(primary_key="id", write_disposition="merge")
+def leads(
+    pipedrive_api_key: str = dlt.secrets.value,
+    update_time: dlt.sources.incremental[str] = dlt.sources.incremental(
+        "update_time", "1970-01-01 00:00:00"
+    ),
+) -> Iterator[TDataPage]:
+    """Resource to incrementally load pipedrive leads by update_time"""
+    # Leads inherit custom fields from deals
+    fields_mapping = (
+        dlt.current.source_state().get("custom_fields_mapping", {}).get("deals", {})
+    )
+    # Load leads pages sorted from newest to oldest and stop loading when
+    # last incremental value is reached
+    pages = get_pages(
+        "leads",
+        pipedrive_api_key,
+        extra_params={"sort": "update_time DESC"},
+    )
+    for page in pages:
+        yield rename_fields(page, fields_mapping)
+        if update_time.start_out_of_range:
+            return

ingestr-0.13.22/ingestr/src/pipedrive/helpers/__init__.py ADDED Viewed

@@ -0,0 +1,23 @@
+"""Pipedrive source helpers"""
+from itertools import groupby
+from typing import Any, Dict, Iterable, List, Tuple, cast  # noqa: F401
+from dlt.common import pendulum  # noqa: F401
+def _deals_flow_group_key(item: Dict[str, Any]) -> str:
+    return item["object"]  # type: ignore[no-any-return]
+def group_deal_flows(
+    pages: Iterable[Iterable[Dict[str, Any]]],
+) -> Iterable[Tuple[str, List[Dict[str, Any]]]]:
+    for page in pages:
+        for entity, items in groupby(
+            sorted(page, key=_deals_flow_group_key), key=_deals_flow_group_key
+        ):
+            yield (
+                entity,
+                [dict(item["data"], timestamp=item["timestamp"]) for item in items],
+            )

ingestr-0.13.22/ingestr/src/pipedrive/helpers/custom_fields_munger.py ADDED Viewed

@@ -0,0 +1,102 @@
+from typing import Any, Dict, Optional, TypedDict
+import dlt
+from ..typing import TDataPage
+class TFieldMapping(TypedDict):
+    name: str
+    normalized_name: str
+    options: Optional[Dict[str, str]]
+    field_type: str
+def update_fields_mapping(
+    new_fields_mapping: TDataPage, existing_fields_mapping: Dict[str, Any]
+) -> Dict[str, Any]:
+    """
+    Specific function to perform data munging and push changes to custom fields' mapping stored in dlt's state
+    The endpoint must be an entity fields' endpoint
+    """
+    for data_item in new_fields_mapping:
+        # 'edit_flag' field contains a boolean value, which is set to 'True' for custom fields and 'False' otherwise.
+        if data_item.get("edit_flag"):
+            # Regarding custom fields, 'key' field contains pipedrive's hash string representation of its name
+            # We assume that pipedrive's hash strings are meant to be an univoque representation of custom fields' name, so dlt's state shouldn't be updated while those values
+            # remain unchanged
+            existing_fields_mapping = _update_field(data_item, existing_fields_mapping)
+        # Built in enum and set fields are mapped if their options have int ids
+        # Enum fields with bool and string key options are left intact
+        elif data_item.get("field_type") in {"set", "enum"}:
+            options = data_item.get("options", [])
+            first_option = options[0]["id"] if len(options) >= 1 else None
+            if isinstance(first_option, int) and not isinstance(first_option, bool):
+                existing_fields_mapping = _update_field(
+                    data_item, existing_fields_mapping
+                )
+    return existing_fields_mapping
+def _update_field(
+    data_item: Dict[str, Any],
+    existing_fields_mapping: Optional[Dict[str, TFieldMapping]],
+) -> Dict[str, TFieldMapping]:
+    """Create or update the given field's info the custom fields state
+    If the field hash already exists in the state from previous runs the name is not updated.
+    New enum options (if any) are appended to the state.
+    """
+    existing_fields_mapping = existing_fields_mapping or {}
+    key = data_item["key"]
+    options = data_item.get("options", [])
+    new_options_map = {str(o["id"]): o["label"] for o in options}
+    existing_field = existing_fields_mapping.get(key)
+    if not existing_field:
+        existing_fields_mapping[key] = dict(
+            name=data_item["name"],
+            normalized_name=_normalized_name(data_item["name"]),
+            options=new_options_map,
+            field_type=data_item["field_type"],
+        )
+        return existing_fields_mapping
+    existing_options = existing_field.get("options", {})
+    if not existing_options or existing_options == new_options_map:
+        existing_field["options"] = new_options_map
+        existing_field["field_type"] = data_item[
+            "field_type"
+        ]  # Add for backwards compat
+        return existing_fields_mapping
+    # Add new enum options to the existing options array
+    # so that when option is renamed the original label remains valid
+    new_option_keys = set(new_options_map) - set(existing_options)
+    for key in new_option_keys:
+        existing_options[key] = new_options_map[key]
+    existing_field["options"] = existing_options
+    return existing_fields_mapping
+def _normalized_name(name: str) -> str:
+    source_schema = dlt.current.source_schema()
+    normalized_name = name.strip()  # remove leading and trailing spaces
+    return source_schema.naming.normalize_identifier(normalized_name)
+def rename_fields(data: TDataPage, fields_mapping: Dict[str, Any]) -> TDataPage:
+    if not fields_mapping:
+        return data
+    for data_item in data:
+        for hash_string, field in fields_mapping.items():
+            if hash_string not in data_item:
+                continue
+            field_value = data_item.pop(hash_string)
+            field_name = field["name"]
+            options_map = field["options"]
+            # Get label instead of ID for 'enum' and 'set' fields
+            if field_value and field["field_type"] == "set":  # Multiple choice
+                field_value = [
+                    options_map.get(str(enum_id), enum_id) for enum_id in field_value
+                ]
+            elif field_value and field["field_type"] == "enum":
+                field_value = options_map.get(str(field_value), field_value)
+            data_item[field_name] = field_value
+    return data

ingestr 0.13.21__tar.gz → 0.13.22__tar.gz

Potentially problematic release.

ingestr 0.13.21tar.gz → 0.13.22tar.gz