PyPI - undatum - Versions diffs - 1.0.14__tar.gz → 1.0.15__tar.gz - Mend

undatum 1.0.14tar.gz → 1.0.15tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of undatum might be problematic. Click here for more details.

Files changed (45) hide show

{undatum-1.0.14/undatum.egg-info → undatum-1.0.15}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
-Metadata-Version: 2.1
+Metadata-Version: 2.4
 Name: undatum
-Version: 1.0.14
+Version: 1.0.15
 Summary: undatum: a command-line tool for data processing. Brings CSV simplicity to JSON lines and BSON
 Home-page: https://github.com/datacoon/undatum/
 Download-URL: https://github.com/datacoon/undatum/
@@ -8,7 +8,6 @@ Author: Ivan Begtin
 Author-email: ivan@begtin.tech
 License: MIT
 Keywords: json jsonl csv bson cli dataset
-Platform: UNKNOWN
 Classifier: Development Status :: 5 - Production/Stable
 Classifier: Programming Language :: Python
 Classifier: Programming Language :: Python :: 3 :: Only
@@ -24,9 +23,42 @@ Classifier: Topic :: Text Processing
 Classifier: Topic :: Utilities
 Requires-Python: >=3.8
 Description-Content-Type: text/x-rst
-Provides-Extra: python_version == "3.8" or python_version == "3.8"
 License-File: LICENSE
 License-File: AUTHORS.rst
+Requires-Dist: chardet>=3.0.4
+Requires-Dist: click>=8.0.3
+Requires-Dist: dictquery>=0.4.0
+Requires-Dist: jsonlines>=1.2.0
+Requires-Dist: openpyxl>=3.0.5
+Requires-Dist: orjson>=3.6.6
+Requires-Dist: pandas>=1.1.3
+Requires-Dist: pymongo>=3.11.0
+Requires-Dist: qddate>=0.1.1
+Requires-Dist: tabulate>=0.8.7
+Requires-Dist: validators>=0.18.1
+Requires-Dist: xlrd>=1.2.0
+Requires-Dist: xmltodict
+Requires-Dist: rich
+Requires-Dist: duckdb
+Requires-Dist: pyzstd
+Requires-Dist: pydantic
+Requires-Dist: typer
+Provides-Extra: python-version-3-8-or-python-version-3-8
+Requires-Dist: argparse>=1.2.1; extra == "python-version-3-8-or-python-version-3-8"
+Dynamic: author
+Dynamic: author-email
+Dynamic: classifier
+Dynamic: description
+Dynamic: description-content-type
+Dynamic: download-url
+Dynamic: home-page
+Dynamic: keywords
+Dynamic: license
+Dynamic: license-file
+Dynamic: provides-extra
+Dynamic: requires-dist
+Dynamic: requires-python
+Dynamic: summary
 ==================================================
 undatum -- a command-line tool for data processing
@@ -52,7 +84,7 @@ Main features
 * Common data operations against CSV, JSON lines and BSON files
 * Built-in data filtering
 * Support data compressed with ZIP, XZ, GZ, BZ2
-* Conversion between CSV, JSONl, BSON, XML, XLS, XLSX, Parquet file types
+* Conversion between CSV, JSONl, BSON, XML, XLS, XLSX, Parquet, AVRO and ORC file types
 * Low memory footprint
 * Support for compressed datasets
 * Advanced statistics calculations
@@ -278,6 +310,24 @@ Converts CSV file feddomains.csv to Parquet file feddomains.parquet
     $ undatum convert examples/feddomains.csv examples/feddomains.parquet
+*Data formats conversion table map*
+ ============ ====== ============ ======= ======= ====== ======= ====== ========== ====== =======
+  From / To    CSV    JSONlines    BSON    JSON    XLS    XLSX    XML    Parquet    ORC    AVRO
+ ============ ====== ============ ======= ======= ====== ======= ====== ========== ====== =======
+  CSV          -      Yes          Yes     No      No     No      No     Yes        Yes    Yes
+  JSONlines    Yes    -            No      No      No     No      No     Yes        Yes    No
+  BSON         No     Yes          -       No      No     No      No     No         No     No
+  JSON         No     Yes          No      -       No     No      No     No         No     No
+  XLS          No     Yes          Yes     No      -      No      No     No         No     No
+  XLSX         No     Yes          Yes     No      No     -       No     No         No     No
+  XML          No     Yes          No      No      No     No      -      No         No     No
+  Parquet      No     No           No      No      No     No      No     -          No     No
+  ORC          No     No           No      No      No     No      No     No         -      No
+  AVRO         No     No           No      No      No     No      No     No         No     -
+ ============ ====== ============ ======= ======= ====== ======= ====== ========== ====== =======
 Validate command
 ----------------
@@ -483,5 +533,3 @@ JSONl
 -----
 JSON lines is a replacement to CSV and JSON files, with JSON flexibility and ability to process data line by line, without loading everything into memory.

{undatum-1.0.14 → undatum-1.0.15}/README.rst RENAMED Viewed

@@ -22,7 +22,7 @@ Main features
 * Common data operations against CSV, JSON lines and BSON files
 * Built-in data filtering
 * Support data compressed with ZIP, XZ, GZ, BZ2
-* Conversion between CSV, JSONl, BSON, XML, XLS, XLSX, Parquet file types
+* Conversion between CSV, JSONl, BSON, XML, XLS, XLSX, Parquet, AVRO and ORC file types
 * Low memory footprint
 * Support for compressed datasets
 * Advanced statistics calculations
@@ -248,6 +248,24 @@ Converts CSV file feddomains.csv to Parquet file feddomains.parquet
     $ undatum convert examples/feddomains.csv examples/feddomains.parquet
+*Data formats conversion table map*
+ ============ ====== ============ ======= ======= ====== ======= ====== ========== ====== =======
+  From / To    CSV    JSONlines    BSON    JSON    XLS    XLSX    XML    Parquet    ORC    AVRO
+ ============ ====== ============ ======= ======= ====== ======= ====== ========== ====== =======
+  CSV          -      Yes          Yes     No      No     No      No     Yes        Yes    Yes
+  JSONlines    Yes    -            No      No      No     No      No     Yes        Yes    No
+  BSON         No     Yes          -       No      No     No      No     No         No     No
+  JSON         No     Yes          No      -       No     No      No     No         No     No
+  XLS          No     Yes          Yes     No      -      No      No     No         No     No
+  XLSX         No     Yes          Yes     No      No     -       No     No         No     No
+  XML          No     Yes          No      No      No     No      -      No         No     No
+  Parquet      No     No           No      No      No     No      No     -          No     No
+  ORC          No     No           No      No      No     No      No     No         -      No
+  AVRO         No     No           No      No      No     No      No     No         No     -
+ ============ ====== ============ ======= ======= ====== ======= ====== ========== ====== =======
 Validate command
 ----------------

{undatum-1.0.14 → undatum-1.0.15}/setup.py RENAMED Viewed

@@ -1,3 +1,4 @@
+# -*- coding: utf8 -*-
 # This is purely the result of trial and error.
 import sys
@@ -46,7 +47,12 @@ install_requires = [
     'tabulate>=0.8.7',
     'validators>=0.18.1',
     'xlrd>=1.2.0',
-    'xmltodict'
+    'xmltodict',
+    'rich',
+    'duckdb',
+    'pyzstd',
+    'pydantic',
+    'typer'
 ]

undatum-1.0.15/tests/test.py ADDED Viewed

@@ -0,0 +1,12 @@
+import pandas as pd
+from undatum.cmds.analyzer import duckdb_decompose
+DATA = [
+    {"foo": 1, "bar": "some string", "baz": 1.23},
+    {"foo": 2, "bar": "some other string", "baz": 2.34},
+    {"foo": 3, "bar": "yet another string", "baz": 3.45},
+]
+df = pd.DataFrame(DATA)
+print(duckdb_decompose(frame=df))

{undatum-1.0.14 → undatum-1.0.15}/undatum/__init__.py RENAMED Viewed

@@ -1,8 +1,9 @@
+# -*- coding: utf8 -*-
 """
 undatum: a command-line tool for data processing. Brings CSV simplicity to JSON lines and BSON
 """
-__version__ = "1.0.14"
+__version__ = "1.0.15"
 __author__ = 'Ivan Begtin'
 __licence__ = 'MIT'

{undatum-1.0.14 → undatum-1.0.15}/undatum/__main__.py RENAMED Viewed

@@ -1,3 +1,4 @@
+# -*- coding: utf8 -*-
 #!/usr/bin/env python
 """The main entry point. Invoke as `undatum' or `python -m undatum`.
@@ -7,12 +8,12 @@ import sys
 def main():
     try:
-        from .core import cli
-        exit_status = cli()
+        from .core import app
+        app()
     except KeyboardInterrupt:
         print("Ctrl-C pressed. Aborting")
     sys.exit(0)
 if __name__ == '__main__':
-    main()
+    app()

undatum-1.0.15/undatum/ai/perplexity.py ADDED Viewed

@@ -0,0 +1,78 @@
+import requests
+import csv
+import sys
+import os
+from io import StringIO
+PERPLEXITY_API_KEY = os.getenv('PERPLEXITY_API_KEY', )
+def find_between( s, first, last ):
+    try:
+        start = s.index( first ) + len( first )
+        end = s.index( last, start )
+        return s[start:end]
+    except ValueError:
+        return ""
+def get_fields_info(fields, language='English'):
+    """Returns information about data fields"""
+    url = "https://api.perplexity.ai/chat/completions"
+    headers = {"Authorization": f"Bearer {PERPLEXITY_API_KEY}"}
+    payload = {
+        "model": "sonar",
+        "messages": [
+            {"role": "system", "content": "Be precise and concise, provide data output only CSV or JSON, accrording to request"},
+            {"role": "user", "content": (
+                f"Please describe in {language} these fields delimited by comma: {fields}"
+                "Please output as single csv table only with following fields: name and description"
+            )},
+        ],
+        "response_format": {
+                "type": "text",
+        },
+    }
+    response = requests.post(url, headers=headers, json=payload).json()
+    text = response["choices"][0]["message"]["content"]
+    a_text = find_between(text, "```csv", "```").strip()
+    if len(a_text) == 0:
+        a_text = find_between(text, "```", "```").strip()
+    f = StringIO()
+    f.write(a_text)
+    f.seek(0)
+    table = {}
+    dr = csv.reader(f, delimiter=',')
+    n = 0
+    for r in dr:
+        n += 1
+        if n == 1: continue
+        table[r[0]] = r[1]
+    return table
+def get_description(data, language='English'):
+    url = "https://api.perplexity.ai/chat/completions"
+    headers = {"Authorization": f"Bearer {PERPLEXITY_API_KEY}"}
+    payload = {
+        "model": "sonar",
+        "messages": [
+            {"role": "system", "content": "Be precise and concise, provide data output only CSV or JSON, accrording to request"},
+            {"role": "user", "content": (
+                f"""
+I have the following CSV data:
+{data}
+Please provide short description in {language} about this data in English. Consider this data as sample of the bigger dataset.Don't generate any code and data examples""")},
+        ],
+        "response_format": {
+                "type": "text",
+        },
+    }
+    response = requests.post(url, headers=headers, json=payload).json()
+    return response["choices"][0]["message"]["content"]
+if __name__ == "__main__":
+    print(get_fields_info(sys.argv[1], sys.argv[2]))

undatum 1.0.14__tar.gz → 1.0.15__tar.gz

Potentially problematic release.

undatum 1.0.14tar.gz → 1.0.15tar.gz