PyPI - xmlpydict - Versions diffs - 0.0.8__tar.gz → 0.0.12__tar.gz - Mend

xmlpydict 0.0.8tar.gz → 0.0.12tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

{xmlpydict-0.0.8/src/xmlpydict.egg-info → xmlpydict-0.0.12}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
-Metadata-Version: 2.1
+Metadata-Version: 2.4
 Name: xmlpydict
-Version: 0.0.8
+Version: 0.0.12
 Summary: xml to dictionary tool for python
 Author-email: Matthew Taylor <matthew.taylor.andre@gmail.com>
 Project-URL: Homepage, https://github.com/MatthewAndreTaylor/xml-to-pydict
@@ -10,29 +10,30 @@ Classifier: Intended Audience :: Developers
 Classifier: License :: OSI Approved :: MIT License
 Classifier: Programming Language :: Python :: 3
 Classifier: Programming Language :: Python :: 3 :: Only
-Classifier: Programming Language :: Python :: 3.7
 Classifier: Programming Language :: Python :: 3.8
 Classifier: Programming Language :: Python :: 3.9
 Classifier: Programming Language :: Python :: 3.10
 Classifier: Programming Language :: Python :: 3.11
 Classifier: Programming Language :: Python :: Implementation :: CPython
 Classifier: Topic :: Software Development :: Libraries :: Python Modules
-Requires-Python: >=3.7
+Classifier: Topic :: Text Processing :: Markup :: XML
+Requires-Python: >=3.8
 Description-Content-Type: text/markdown
 License-File: LICENSE
 Provides-Extra: tests
 Requires-Dist: pytest; extra == "tests"
-Requires-Dist: xmltodict; extra == "tests"
+Requires-Dist: requests; extra == "tests"
+Dynamic: license-file
 # xmlpydict 📑
 [![XML Tests](https://github.com/MatthewAndreTaylor/xml-to-pydict/actions/workflows/tests.yml/badge.svg)](https://github.com/MatthewAndreTaylor/xml-to-pydict/actions/workflows/tests.yml)
-[![PyPI versions](https://img.shields.io/badge/python-3.7%2B-blue)](https://github.com/MatthewAndreTaylor/xml-to-pydict)
+[![PyPI versions](https://img.shields.io/badge/python-3.8%2B-blue)](https://github.com/MatthewAndreTaylor/xml-to-pydict)
 [![PyPI](https://img.shields.io/pypi/v/xmlpydict.svg)](https://pypi.org/project/xmlpydict/)
 ## Requirements
-- `python 3.7+`
+- `python 3.8+`
 ## Installation
@@ -54,13 +55,11 @@ pip install xmlpydict
 ## Goals
-Create a consistent parsing strategy between xml and python dictionaries.
-xmlpydict takes a more laid pack approack to enforcing the syntax of xml.
+Create a consistent parsing strategy between XML and Python dictionaries using the specification found [here](https://www.xml.com/pub/a/2006/05/31/converting-between-xml-and-json.html). `xmlpydict` focuses on speed; see the benchmarks below.
-## Features
+<img width="256" alt="small_xml_document" src="https://github.com/user-attachments/assets/0248a408-6bb6-4790-bd0f-f90537e2f21a" />
+<img width="256" alt="large_xml_document" src="https://github.com/user-attachments/assets/539a2a69-f475-46a5-bffc-1e8805a5a5e7" />
-xmlpydict allows for multiple root elements.
-The root object is treated as the python object.
 ### xmlpydict supports the following
@@ -72,19 +71,15 @@ The root object is treated as the python object.
 [Characters](https://www.w3.org/TR/xml/#charsets):  Similar to CDATA text is stored as {'#text': Char} , however this text is stripped.
-### dict.get(key[, default]) will not cause exceptions
 ```py
 # Empty tags are containers
 >>> from xmlpydict import parse
 >>> parse("<a></a>")
-{'a': {}}
+{'a': None}
 >>> parse("<a/>")
-{'a': {}}
+{'a': None}
 >>> parse("<a/>").get('href')
 None
->>> parse("")
-{}
 ```
 ### Attribute prefixing
@@ -103,7 +98,7 @@ None
 # Grammar and structure of the xml_content is checked while parsing
 >>> from xmlpydict import parse
 >>> parse("<a></ a>")
-Exception: not well formed (violation at pos=5)
+xml.parsers.expat.ExpatError: not well-formed (invalid token): line 1, column 5
 ```

{xmlpydict-0.0.8 → xmlpydict-0.0.12}/README.md RENAMED Viewed

@@ -1,12 +1,12 @@
 # xmlpydict 📑
 [![XML Tests](https://github.com/MatthewAndreTaylor/xml-to-pydict/actions/workflows/tests.yml/badge.svg)](https://github.com/MatthewAndreTaylor/xml-to-pydict/actions/workflows/tests.yml)
-[![PyPI versions](https://img.shields.io/badge/python-3.7%2B-blue)](https://github.com/MatthewAndreTaylor/xml-to-pydict)
+[![PyPI versions](https://img.shields.io/badge/python-3.8%2B-blue)](https://github.com/MatthewAndreTaylor/xml-to-pydict)
 [![PyPI](https://img.shields.io/pypi/v/xmlpydict.svg)](https://pypi.org/project/xmlpydict/)
 ## Requirements
-- `python 3.7+`
+- `python 3.8+`
 ## Installation
@@ -28,13 +28,11 @@ pip install xmlpydict
 ## Goals
-Create a consistent parsing strategy between xml and python dictionaries.
-xmlpydict takes a more laid pack approack to enforcing the syntax of xml.
+Create a consistent parsing strategy between XML and Python dictionaries using the specification found [here](https://www.xml.com/pub/a/2006/05/31/converting-between-xml-and-json.html). `xmlpydict` focuses on speed; see the benchmarks below.
-## Features
+<img width="256" alt="small_xml_document" src="https://github.com/user-attachments/assets/0248a408-6bb6-4790-bd0f-f90537e2f21a" />
+<img width="256" alt="large_xml_document" src="https://github.com/user-attachments/assets/539a2a69-f475-46a5-bffc-1e8805a5a5e7" />
-xmlpydict allows for multiple root elements.
-The root object is treated as the python object.
 ### xmlpydict supports the following
@@ -46,19 +44,15 @@ The root object is treated as the python object.
 [Characters](https://www.w3.org/TR/xml/#charsets):  Similar to CDATA text is stored as {'#text': Char} , however this text is stripped.
-### dict.get(key[, default]) will not cause exceptions
 ```py
 # Empty tags are containers
 >>> from xmlpydict import parse
 >>> parse("<a></a>")
-{'a': {}}
+{'a': None}
 >>> parse("<a/>")
-{'a': {}}
+{'a': None}
 >>> parse("<a/>").get('href')
 None
->>> parse("")
-{}
 ```
 ### Attribute prefixing
@@ -77,7 +71,7 @@ None
 # Grammar and structure of the xml_content is checked while parsing
 >>> from xmlpydict import parse
 >>> parse("<a></ a>")
-Exception: not well formed (violation at pos=5)
+xml.parsers.expat.ExpatError: not well-formed (invalid token): line 1, column 5
 ```

{xmlpydict-0.0.8 → xmlpydict-0.0.12}/pyproject.toml RENAMED Viewed

@@ -4,13 +4,13 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "xmlpydict"
-version = "0.0.8"
+version = "0.0.12"
 description="xml to dictionary tool for python"
 authors = [
     {name = "Matthew Taylor", email = "matthew.taylor.andre@gmail.com"},
 ]
 urls = {Homepage = "https://github.com/MatthewAndreTaylor/xml-to-pydict"}
-requires-python = ">=3.7"
+requires-python = ">=3.8"
 keywords = [ "xml", "dictionary" ]
 classifiers = [
     "Development Status :: 3 - Alpha",
@@ -18,13 +18,13 @@ classifiers = [
     "License :: OSI Approved :: MIT License",
     "Programming Language :: Python :: 3",
     "Programming Language :: Python :: 3 :: Only",
-    "Programming Language :: Python :: 3.7",
     "Programming Language :: Python :: 3.8",
     "Programming Language :: Python :: 3.9",
     "Programming Language :: Python :: 3.10",
     "Programming Language :: Python :: 3.11",
     "Programming Language :: Python :: Implementation :: CPython",
     "Topic :: Software Development :: Libraries :: Python Modules",
+    "Topic :: Text Processing :: Markup :: XML",
 ]
 [project.readme]
@@ -32,4 +32,4 @@ file = "README.md"
 content-type = "text/markdown"
 [project.optional-dependencies]
-tests = [ "pytest", "xmltodict" ]
+tests = [ "pytest", "requests" ]

{xmlpydict-0.0.8 → xmlpydict-0.0.12}/setup.py RENAMED Viewed

@@ -16,8 +16,8 @@ class build_ext(build_ext_orig):
 setup(
     include_package_data=True,
     ext_modules=[
-        Extension("xmlpydict", ["src/xmlparse.cpp"]),
+        Extension("pyxmlhandler", ["src/xmlparse.cpp"]),
     ],
     cmdclass={"build_ext": build_ext},
-    package_data={"xmlpydict": ["py.typed"], "": ["xmlpydict.pyi"]},
+    packages=["xmlpydict"],
 )

xmlpydict-0.0.12/src/xmlparse.cpp ADDED Viewed

@@ -0,0 +1,222 @@
+/**
+ * Copyright (c) 2023 Matthew Andre Taylor
+ */
+#include <Python.h>
+#include <stdio.h>
+#include <cctype>
+#include <vector>
+static PyObject* strip(PyObject* s_obj) {
+    Py_ssize_t start = 0;
+    Py_ssize_t end = PyUnicode_GetLength(s_obj);
+    while (start < end && std::isspace(PyUnicode_ReadChar(s_obj, start))) {
+      ++start;
+    }
+    while (end > start && std::isspace(PyUnicode_ReadChar(s_obj, end - 1))) {
+      --end;
+    }
+    return PyUnicode_Substring(s_obj, start, end);
+}
+typedef struct {
+    PyObject_HEAD PyObject* item;          // current dict
+    PyObject* data;        // character data buffer
+    std::vector<PyObject*> item_stack;
+    std::vector<PyObject*> data_stack;
+    PyObject* attr_prefix;
+    PyObject* cdata_key;
+} PyDictHandler;
+static PyObject* PyDictHandler_new(PyTypeObject* type, PyObject* args,
+                            PyObject* kwargs) {
+    PyDictHandler* self;
+    self = (PyDictHandler*)type->tp_alloc(type, 0);
+    return (PyObject*)self;
+}
+static int PyDictHandler_init(PyDictHandler* self, PyObject* args,
+                          PyObject* kwargs) {
+    const char* attr_prefix = "@";
+    const char* cdata_key = "#text";
+    static char* kwlist[] = {"attr_prefix", "cdata_key", NULL};
+    if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|ss", kwlist,
+                                     &attr_prefix, &cdata_key))
+        return -1;
+    self->item = Py_None;
+    self->data = PyUnicode_New(0, 127); // empty string
+    self->attr_prefix = PyUnicode_FromString(attr_prefix);
+    self->cdata_key = PyUnicode_FromString(cdata_key);
+    return 0;
+}
+static PyObject* characters(PyDictHandler* self, PyObject* data_obj) {
+    PyUnicode_Append(&self->data, data_obj);
+    Py_RETURN_NONE;
+}
+static PyObject* startElement(PyDictHandler* self, PyObject* args) {
+    self->item_stack.push_back(self->item);
+    self->data_stack.push_back(self->data);
+    self->data = PyUnicode_New(0, 127); // reset data buffer
+    const char* name;
+    PyObject* attrs;
+    if (!PyArg_ParseTuple(args, "sO", &name, &attrs)) {
+        return NULL;
+    }
+    if (!PyDict_Check(attrs) || PyDict_Size(attrs) == 0) {
+        self->item = Py_None;
+        Py_RETURN_NONE;
+    }
+    PyObject* newDict = PyDict_New();
+    PyObject *key, *value;
+    Py_ssize_t pos = 0;
+    while (PyDict_Next(attrs, &pos, &key, &value)) {
+        PyObject* prefixed_key = PyUnicode_Concat(self->attr_prefix, key);
+        PyDict_SetItem(newDict, prefixed_key, value);
+    }
+    self->item = newDict;
+    Py_RETURN_NONE;
+}
+static PyObject* updateChildren(PyObject*& target, PyObject* key, PyObject* value) {
+    if (target == Py_None) {
+        target = PyDict_New();
+    }
+    if (!PyDict_Contains(target, key)) {
+        PyDict_SetItem(target, key, value);
+    } else {
+        PyObject* existing = PyDict_GetItem(target, key);
+        if (PyList_Check(existing)) {
+            PyList_Append(existing, value);
+        } else {
+            PyObject* newList = PyList_New(2);
+            PyList_SetItem(newList, 0, existing);
+            PyList_SetItem(newList, 1, value);
+            PyDict_SetItem(target, key, newList);
+        }
+    }
+    return target;
+}
+static PyObject* endElement(PyDictHandler* self, PyObject* name_obj) {
+    if (!self->data_stack.empty()) {
+        PyObject* temp_data = strip(self->data);
+        bool has_data = (PyUnicode_GetLength(temp_data) > 0);
+        PyObject* py_data = has_data ? temp_data : Py_None;
+        PyObject* temp_item = self->item;
+        self->item = self->item_stack.back();
+        self->data = self->data_stack.back();
+        self->item_stack.pop_back();
+        self->data_stack.pop_back();
+        if (temp_item != Py_None) {
+            if (has_data) {
+                PyDict_SetItem(temp_item, self->cdata_key, py_data);
+            }
+            temp_item = PyDict_Copy(temp_item);
+            self->item = updateChildren(self->item, name_obj, temp_item);
+        }
+        else {
+            self->item = updateChildren(self->item, name_obj, py_data);
+        }
+    }
+    Py_RETURN_NONE;
+}
+static PyMethodDef PyDictHandler_methods[] = {
+    {"characters", (PyCFunction)characters, METH_O, "Handle character data"},
+    {"startElement", (PyCFunction)startElement, METH_VARARGS, "Handle start of an element"},
+    {"endElement", (PyCFunction)endElement, METH_O, "Handle end of an element"},
+    {NULL, NULL, 0, NULL}
+};
+static PyObject* PyDictHandler_get_item(PyDictHandler *self, void *closure)
+{
+    Py_INCREF(self->item);
+    return self->item;
+}
+static PyGetSetDef PyDictHandler_getset[] = {
+    {
+        "item",                                   /* name */
+        (getter)PyDictHandler_get_item,           /* get */
+        NULL,           /* set */
+        NULL,                    /* doc */
+        NULL                                      /* closure */
+    },
+    {NULL}  /* Sentinel */
+};
+static PyTypeObject PyDictHandlerType = {
+    PyVarObject_HEAD_INIT(NULL, 0) "pyxmlhandler._PyDictHandler", // tp_name
+    sizeof(PyDictHandler),                                    // tp_basicsize
+    0,                                                        // tp_itemsize
+    0,                                                        // tp_dealloc
+    0,                                                        // tp_vectorcall_offset
+    0,                                                        // tp_getattr
+    0,                                                        // tp_setattr
+    0,                                                        // tp_as_async
+    0,                                                        // tp_repr
+    0,                                                        // tp_as_number
+    0,                                                        // tp_as_sequence
+    0,                                                        // tp_as_mapping
+    0,                                                        // tp_hash
+    0,                                                        // tp_call
+    0,                                                        // tp_str
+    0,                                                        // tp_getattro
+    0,                                                        // tp_setattro
+    0,                                                        // tp_as_buffer
+    Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE,                // tp_flags
+    "Handler that converts XML to Python dict",               // tp_doc
+    0,                                                        // tp_traverse
+    0,                                                        // tp_clear
+    0,                                                        // tp_richcompare
+    0,                                                        // tp_weaklistoffset
+    0,                                                        // tp_iter
+    0,                                                        // tp_iternext
+    PyDictHandler_methods,                                    // tp_methods
+    0,                                                        // tp_members
+    PyDictHandler_getset,                                     // tp_getset
+    0,                                                        // tp_base
+    0,                                                        // tp_dict
+    0,                                                        // tp_descr_get
+    0,                                                        // tp_descr_set
+    0,                                                        // tp_dictoffset
+    (initproc)PyDictHandler_init,                             // tp_init
+    0,                                                        // tp_alloc
+    PyDictHandler_new,                                        // tp_new
+};
+static PyModuleDef pyxmlhandlermodule = {
+    PyModuleDef_HEAD_INIT,
+    "pyxmlhandler",
+    "Module that provides XML to Python dict parsing",
+    -1,
+    NULL, NULL, NULL, NULL, NULL
+};
+PyMODINIT_FUNC PyInit_pyxmlhandler(void) {
+    PyObject* m;
+    if (PyType_Ready(&PyDictHandlerType) < 0)
+        return NULL;
+    m = PyModule_Create(&pyxmlhandlermodule);
+    if (m == NULL)
+        return NULL;
+    Py_INCREF(&PyDictHandlerType);
+    PyModule_AddObject(m, "_PyDictHandler", (PyObject*)&PyDictHandlerType);
+    return m;
+}

{xmlpydict-0.0.8 → xmlpydict-0.0.12}/tests/test_parse.py RENAMED Viewed

@@ -1,13 +1,14 @@
 import pytest
-from xmlpydict import parse
 import json
+from xmlpydict import parse
 def test_simple():
-    assert parse("") == {}
-    assert parse("<p/>") == {"p": {}}
-    assert parse("<p></p>") == {"p": {}}
+    assert parse("<p/>") == {"p": None}
+    assert parse("<p></p>") == {"p": None}
     assert parse('<p width="10"></p>') == {"p": {"@width": "10"}}
+    assert parse("<p>Hello</p>") == {"p": "Hello"}
     assert parse('<p width="10">Hello World</p>') == {
         "p": {"@width": "10", "#text": "Hello World"}
     }
@@ -21,7 +22,18 @@ def test_simple():
         "p": {"@width": "10", "@height": "20"}
     }
     assert parse("<p>Hey <b>bold</b>There</p>") == {
-        "p": {"#text": "HeyThere", "b": {"#text": "bold"}}
+        "p": {"#text": "Hey There", "b": "bold"}
+    }
+    assert parse("<p>Hey <b>bold</b>There <b>bold</b>Buddy </p>") == {
+        "p": {"#text": "Hey There Buddy", "b": ["bold", "bold"]}
+    }
+    assert parse("<p>Hey <b/>There Buddy</p>") == {
+        "p": {"#text": "Hey There Buddy", "b": None}
+    }
+    assert parse("<p>Hey <b/>There Buddy <b/> </p>") == {
+        "p": {"#text": "Hey There Buddy", "b": [None, None]}
     }
     assert (
@@ -68,19 +80,16 @@ def test_simple():
 def test_cdata():
     assert parse("<content><![CDATA[<p>This is a paragraph</p>]]></content>") == {
-        "content": {"#text": "<p>This is a paragraph</p>"}
+        "content": "<p>This is a paragraph</p>"
     }
+    assert parse(
+        "<special_chars><![CDATA[$ ^ * % & <> () + - + ` ~]]></special_chars>"
+    ) == {"special_chars": "$ ^ * % & <> () + - + ` ~"}
 def test_nested():
-    assert parse("<book><p/></book> ") == {"book": {"p": {}}}
-    assert parse("<book><p></p></book>") == {"book": {"p": {}}}
-    assert parse("<book><p></p></book><card/>") == {"book": {"p": {}}, "card": {}}
-    assert parse("<pizza></pizza><book><p></p></book><card/>") == {
-        "pizza": {},
-        "book": {"p": {}},
-        "card": {},
-    }
+    assert parse("<book><p/></book> ") == {"book": {"p": None}}
+    assert parse("<book><p></p></book>") == {"book": {"p": None}}
 def test_list():
@@ -95,12 +104,20 @@ def test_list():
 def test_comment():
-    assert parse("<!-- simple comment -->") == {}
+    assert parse("<p/><!-- simple comment -->") == {"p": None}
     comment = """<world>
   <!-- $comment+++@python -->
   <lake>Content</lake>
 </world>"""
-    assert parse(comment) == {"world": {"lake": {"#text": "Content"}}}
+    assert parse(comment) == {"world": {"lake": "Content"}}
+    multiple_comments = """<book>
+    <!-- Comment 0 -->
+    <!-- Comment 1 -->
+    <lines>510</lines>
+    <!-- Comment 2 -->
+    <!-- -->
+</book>"""
+    assert parse(multiple_comments) == {"book": {"lines": "510"}}
 def test_files():
@@ -275,15 +292,17 @@ def test_files():
 def test_exception():
     xml_strings = [
-        "< p/>",
-        "<p>",
-        "<p/ >",
         "<p height'10'/>",
         "<p height='10'width='5'/>",
-        "<p width='5/>",
         "<p width=5'/>",
-        "</p>",
         "<pwidth='5'/>",
+        "<!---->",
+        "<a></p>",
+        "<></>",
+        "</>",
+        "<",
+        ">",
+        "<nested></p></nested>",
     ]
     for xml_str in xml_strings:
         with pytest.raises(Exception):
@@ -291,8 +310,43 @@ def test_exception():
 def test_prefix():
-    assert parse("<p></p>", attr_prefix="$") == {"p": {}}
+    assert parse("<p></p>", attr_prefix="$") == {"p": None}
     assert parse('<p width="10"></p>', attr_prefix="$") == {"p": {"$width": "10"}}
     assert parse('<p width="10" height="5"></p>', attr_prefix="$") == {
         "p": {"$width": "10", "$height": "5"}
     }
+    assert parse('<p width="10" height="5"></p>', attr_prefix="$$$$$$$$$") == {
+        "p": {"$$$$$$$$$width": "10", "$$$$$$$$$height": "5"}
+    }
+    assert parse('<p width="10" height="5"></p>', attr_prefix="") == {
+        "p": {"width": "10", "height": "5"}
+    }
+def test_document():
+    s = """<?xml version="1.0" encoding="UTF-8"?><repository>
+  <project pypi="xmlpydict">
+    <title>XML document parser</title>
+    <author>Matthew Taylor</author>
+  </project>
+  <project pypi="blank">
+    <title>Test project</title>
+    <author>Matthew Taylor</author>
+  </project>
+</repository>"""
+    assert parse(s) == {
+        "repository": {
+            "project": [
+                {
+                    "@pypi": "xmlpydict",
+                    "title": "XML document parser",
+                    "author": "Matthew Taylor",
+                },
+                {
+                    "@pypi": "blank",
+                    "title": "Test project",
+                    "author": "Matthew Taylor",
+                },
+            ]
+        }
+    }

xmlpydict-0.0.12/xmlpydict/__init__.py ADDED Viewed

@@ -0,0 +1,75 @@
+from pyxmlhandler import _PyDictHandler
+from xml.parsers import expat
+def parse(xml_content, attr_prefix: str = "@", cdata_key: str = "#text") -> dict:
+    """
+    Parse XML content into a python dictionary.
+    Args:
+        xml_content: The XML content to be parsed.
+        attr_prefix: The prefix to use for attributes in the resulting dictionary.
+        cdata_key: The key to use for character data in the resulting dictionary.
+    Returns:
+        A dictionary representation of the XML content.
+    """
+    handler = _PyDictHandler(attr_prefix=attr_prefix, cdata_key=cdata_key)
+    parser = expat.ParserCreate()
+    parser.CharacterDataHandler = handler.characters
+    parser.StartElementHandler = handler.startElement
+    parser.EndElementHandler = handler.endElement
+    parser.Parse(xml_content, True)
+    return handler.item
+def parse_file(file_path, attr_prefix: str = "@", cdata_key: str = "#text") -> dict:
+    """
+    Parse an XML file into a python dictionary.
+    Args:
+        file_path: The path to the XML file to be parsed.
+        attr_prefix: The prefix to use for attributes in the resulting dictionary.
+        cdata_key: The key to use for character data in the resulting dictionary.
+    Returns:
+        A dictionary representation of the XML file content.
+    """
+    handler = _PyDictHandler(attr_prefix=attr_prefix, cdata_key=cdata_key)
+    parser = expat.ParserCreate()
+    parser.CharacterDataHandler = handler.characters
+    parser.StartElementHandler = handler.startElement
+    parser.EndElementHandler = handler.endElement
+    with open(file_path, "rb") as f:
+        parser.ParseFile(f)
+    return handler.item
+def iter_xml_documents(file_path, chunk_size=64 * 1024):
+    start_token = b"<?xml"
+    buffer = b""
+    with open(file_path, "rb") as f:
+        while True:
+            chunk = f.read(chunk_size)
+            if not chunk:
+                if buffer.strip():
+                    yield buffer
+                break
+            buffer += chunk
+            while True:
+                start_index = buffer.find(start_token, 1)
+                if start_index == -1:
+                    break
+                yield buffer[:start_index]
+                buffer = buffer[start_index:]
+def parse_xml_collections(file_path, attr_prefix: str = "@", cdata_key: str = "#text"):
+    for xml_content in iter_xml_documents(file_path):
+        yield parse(
+            xml_content.decode("utf-8"),
+            attr_prefix=attr_prefix,
+            cdata_key=cdata_key
+        )

{xmlpydict-0.0.8 → xmlpydict-0.0.12/xmlpydict.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
-Metadata-Version: 2.1
+Metadata-Version: 2.4
 Name: xmlpydict
-Version: 0.0.8
+Version: 0.0.12
 Summary: xml to dictionary tool for python
 Author-email: Matthew Taylor <matthew.taylor.andre@gmail.com>
 Project-URL: Homepage, https://github.com/MatthewAndreTaylor/xml-to-pydict
@@ -10,29 +10,30 @@ Classifier: Intended Audience :: Developers
 Classifier: License :: OSI Approved :: MIT License
 Classifier: Programming Language :: Python :: 3
 Classifier: Programming Language :: Python :: 3 :: Only
-Classifier: Programming Language :: Python :: 3.7
 Classifier: Programming Language :: Python :: 3.8
 Classifier: Programming Language :: Python :: 3.9
 Classifier: Programming Language :: Python :: 3.10
 Classifier: Programming Language :: Python :: 3.11
 Classifier: Programming Language :: Python :: Implementation :: CPython
 Classifier: Topic :: Software Development :: Libraries :: Python Modules
-Requires-Python: >=3.7
+Classifier: Topic :: Text Processing :: Markup :: XML
+Requires-Python: >=3.8
 Description-Content-Type: text/markdown
 License-File: LICENSE
 Provides-Extra: tests
 Requires-Dist: pytest; extra == "tests"
-Requires-Dist: xmltodict; extra == "tests"
+Requires-Dist: requests; extra == "tests"
+Dynamic: license-file
 # xmlpydict 📑
 [![XML Tests](https://github.com/MatthewAndreTaylor/xml-to-pydict/actions/workflows/tests.yml/badge.svg)](https://github.com/MatthewAndreTaylor/xml-to-pydict/actions/workflows/tests.yml)
-[![PyPI versions](https://img.shields.io/badge/python-3.7%2B-blue)](https://github.com/MatthewAndreTaylor/xml-to-pydict)
+[![PyPI versions](https://img.shields.io/badge/python-3.8%2B-blue)](https://github.com/MatthewAndreTaylor/xml-to-pydict)
 [![PyPI](https://img.shields.io/pypi/v/xmlpydict.svg)](https://pypi.org/project/xmlpydict/)
 ## Requirements
-- `python 3.7+`
+- `python 3.8+`
 ## Installation
@@ -54,13 +55,11 @@ pip install xmlpydict
 ## Goals
-Create a consistent parsing strategy between xml and python dictionaries.
-xmlpydict takes a more laid pack approack to enforcing the syntax of xml.
+Create a consistent parsing strategy between XML and Python dictionaries using the specification found [here](https://www.xml.com/pub/a/2006/05/31/converting-between-xml-and-json.html). `xmlpydict` focuses on speed; see the benchmarks below.
-## Features
+<img width="256" alt="small_xml_document" src="https://github.com/user-attachments/assets/0248a408-6bb6-4790-bd0f-f90537e2f21a" />
+<img width="256" alt="large_xml_document" src="https://github.com/user-attachments/assets/539a2a69-f475-46a5-bffc-1e8805a5a5e7" />
-xmlpydict allows for multiple root elements.
-The root object is treated as the python object.
 ### xmlpydict supports the following
@@ -72,19 +71,15 @@ The root object is treated as the python object.
 [Characters](https://www.w3.org/TR/xml/#charsets):  Similar to CDATA text is stored as {'#text': Char} , however this text is stripped.
-### dict.get(key[, default]) will not cause exceptions
 ```py
 # Empty tags are containers
 >>> from xmlpydict import parse
 >>> parse("<a></a>")
-{'a': {}}
+{'a': None}
 >>> parse("<a/>")
-{'a': {}}
+{'a': None}
 >>> parse("<a/>").get('href')
 None
->>> parse("")
-{}
 ```
 ### Attribute prefixing
@@ -103,7 +98,7 @@ None
 # Grammar and structure of the xml_content is checked while parsing
 >>> from xmlpydict import parse
 >>> parse("<a></ a>")
-Exception: not well formed (violation at pos=5)
+xml.parsers.expat.ExpatError: not well-formed (invalid token): line 1, column 5
 ```

xmlpydict-0.0.12/xmlpydict.egg-info/SOURCES.txt ADDED Viewed

@@ -0,0 +1,13 @@
+LICENSE
+MANIFEST.in
+README.md
+pyproject.toml
+setup.py
+src/xmlparse.cpp
+tests/test_parse.py
+xmlpydict/__init__.py
+xmlpydict.egg-info/PKG-INFO
+xmlpydict.egg-info/SOURCES.txt
+xmlpydict.egg-info/dependency_links.txt
+xmlpydict.egg-info/requires.txt
+xmlpydict.egg-info/top_level.txt

{xmlpydict-0.0.8/src → xmlpydict-0.0.12}/xmlpydict.egg-info/requires.txt RENAMED Viewed

@@ -1,4 +1,4 @@
 [tests]
 pytest
-xmltodict
+requests

xmlpydict-0.0.12/xmlpydict.egg-info/top_level.txt ADDED Viewed

	@@ -0,0 +1,2 @@
1	+ pyxmlhandler
2	+ xmlpydict

xmlpydict-0.0.8/src/xmlparse.cpp DELETED Viewed

@@ -1,413 +0,0 @@
-/**
- * Copyright (c) 2023 Matthew Andre Taylor
- */
-#include <Python.h>
-#include <string>
-#include <vector>
-typedef enum {
-  PRIMITIVE,
-  CONTAINER_OPEN,
-  CONTAINER_CLOSE,
-  TEXT,
-  COMMENT
-} NodeType;
-typedef struct {
-  std::string key;
-  std::string value;
-} Pair;
-typedef struct {
-  NodeType type;
-  std::string elementName;
-  std::vector<Pair> attr;
-} XMLNode;
-size_t i;
-static void parseContainerClose(XMLNode *node, const char *xmlContent) {
-  node->type = CONTAINER_CLOSE;
-  i++;
-  if (std::isalpha(xmlContent[i]) || xmlContent[i] == '_' ||
-      xmlContent[i] == ':') {
-    node->elementName.push_back(xmlContent[i]);
-    i++;
-  } else {
-    PyErr_Format(PyExc_Exception, "not well formed (violation at pos=%d)", i);
-    return;
-  }
-  while (xmlContent[i] != '\0' && xmlContent[i] != '>') {
-    if (std::isalnum(xmlContent[i]) || xmlContent[i] == '_' ||
-        xmlContent[i] == ':' || xmlContent[i] == '-' || xmlContent[i] == '.') {
-      node->elementName.push_back(xmlContent[i]);
-    } else if (std::isspace(xmlContent[i])) {
-      if (node->elementName.empty()) {
-        PyErr_Format(PyExc_Exception, "not well formed (violation at pos=%d)",
-                     i);
-        return;
-      }
-    } else {
-      PyErr_Format(PyExc_Exception, "not well formed (violation at pos=%d)", i);
-      return;
-    }
-    i++;
-  }
-  i++;
-}
-static void parseContainerOpen(XMLNode *node, const char *xmlContent) {
-  node->type = CONTAINER_OPEN;
-  if (std::isalpha(xmlContent[i]) || xmlContent[i] == '_' ||
-      xmlContent[i] == ':') {
-    node->elementName.push_back(xmlContent[i]);
-    i++;
-  } else {
-    PyErr_Format(PyExc_Exception, "not well formed (violation at pos=%d)", i);
-    return;
-  }
-  bool hasAttr = false;
-  // Parse name
-  while (xmlContent[i] != '\0' && xmlContent[i] != '>') {
-    if (xmlContent[i] == '/' && xmlContent[i + 1] == '>') {
-      node->type = PRIMITIVE;
-      i += 2;
-      return;
-    }
-    if (std::isalnum(xmlContent[i]) || xmlContent[i] == '_' ||
-        xmlContent[i] == ':' || xmlContent[i] == '-' || xmlContent[i] == '.') {
-      node->elementName.push_back(xmlContent[i]);
-      i++;
-    } else if (std::isspace(xmlContent[i])) {
-      if (node->elementName.empty()) {
-        PyErr_Format(PyExc_Exception, "not well formed (violation at pos=%d)",
-                     i);
-        return;
-      }
-      hasAttr = true;
-      break;
-    } else {
-      PyErr_Format(PyExc_Exception, "not well formed (violation at pos=%d)", i);
-      return;
-    }
-  }
-  // 0: space, 1: start, 2: name, 3: equals, 4: quote, 5: value
-  char state = 0;
-  if (hasAttr) {
-    std::string key;
-    std::string val;
-    char quoteType = 0;
-    while (xmlContent[i] != '\0' && xmlContent[i] != '>') {
-      switch (state) {
-      case 0:
-        if (xmlContent[i] == '/' && xmlContent[i + 1] == '>') {
-          node->type = PRIMITIVE;
-          i += 2;
-          return;
-        }
-        if (std::isspace(xmlContent[i])) {
-          i++;
-          state = 1;
-        } else {
-          PyErr_Format(PyExc_Exception, "not well formed (violation at pos=%d)",
-                       i);
-          return;
-        }
-        break;
-      case 1:
-        if (xmlContent[i] == '/' && xmlContent[i + 1] == '>') {
-          node->type = PRIMITIVE;
-          i += 2;
-          return;
-        }
-        if (std::isspace(xmlContent[i])) {
-          i++;
-        } else if (std::isalpha(xmlContent[i]) || xmlContent[i] == '_' ||
-                   xmlContent[i] == ':') {
-          state = 2;
-          key.push_back(xmlContent[i]);
-          i++;
-        } else {
-          PyErr_Format(PyExc_Exception, "not well formed (violation at pos=%d)",
-                       i);
-          return;
-        }
-        break;
-      case 2:
-        if (xmlContent[i] == '=') {
-          state = 4;
-        } else if (std::isalnum(xmlContent[i]) || xmlContent[i] == '_' ||
-                   xmlContent[i] == ':' || xmlContent[i] == '-' ||
-                   xmlContent[i] == '.') {
-          key.push_back(xmlContent[i]);
-        } else if (std::isspace(xmlContent[i])) {
-          state = 3;
-        } else {
-          PyErr_Format(PyExc_Exception, "not well formed (violation at pos=%d)",
-                       i);
-          return;
-        }
-        i++;
-        break;
-      case 3:
-        if (xmlContent[i] == '=') {
-          state = 4;
-        } else if (!std::isspace(xmlContent[i])) {
-          PyErr_Format(PyExc_Exception, "not well formed (violation at pos=%d)",
-                       i);
-          return;
-        }
-        i++;
-        break;
-      case 4:
-        if (xmlContent[i] == '\'' || xmlContent[i] == '\"') {
-          state = 5;
-          quoteType = xmlContent[i];
-        } else if (!std::isspace(xmlContent[i])) {
-          PyErr_Format(PyExc_Exception, "not well formed (violation at pos=%d)",
-                       i);
-          return;
-        }
-        i++;
-        break;
-      default:
-        if (xmlContent[i] == quoteType) {
-          state = 0;
-          node->attr.push_back({key, val});
-          key.clear();
-          val.clear();
-        } else {
-          val.push_back(xmlContent[i]);
-        }
-        i++;
-        break;
-      }
-    }
-  }
-  if (state > 1) {
-    PyErr_Format(PyExc_Exception, "not well formed (violation at pos=%d)", i);
-    return;
-  }
-  i++;
-}
-static void parseComment(XMLNode *node, const char *xmlContent) {
-  node->type = COMMENT;
-  i++;
-  if (xmlContent[i] != '-' || xmlContent[i + 1] != '-') {
-    PyErr_Format(PyExc_Exception, "not well formed (violation at pos=%d)", i);
-    return;
-  }
-  i += 2;
-  while (xmlContent[i] != '\0' || xmlContent[i + 1] != '\0') {
-    if (xmlContent[i] == '-' && xmlContent[i + 1] == '-' &&
-        xmlContent[i + 2] == '>') {
-      // Found the end of the comment
-      if (xmlContent[i - 1] == '-') {
-        PyErr_Format(PyExc_Exception, "not well formed (violation at pos=%d)",
-                     i - 1);
-        return;
-      }
-      i += 3;
-      return;
-    }
-    i++;
-  }
-  PyErr_SetString(PyExc_Exception, "unclosed token");
-}
-static void parseCData(XMLNode *node, const char *xmlContent) {
-  node->type = TEXT;
-  i+=2;
-  std::string cdata = "CDATA[";
-  size_t j = 0;
-  while (xmlContent[i] != '\0') {
-    if (j >= cdata.size()) {
-      break;
-    }
-    if (cdata[j] != xmlContent[i]) {
-      PyErr_Format(PyExc_Exception, "not well formed (violation at pos=%d)", i);
-      return;
-    }
-    i++;
-    j++;
-  }
-  while (xmlContent[i] != '\0' || xmlContent[i + 1] != '\0') {
-    if (xmlContent[i] == ']' && xmlContent[i + 1] == ']' &&
-        xmlContent[i + 2] == '>') {
-      i += 3;
-      return;
-    }
-    node->elementName.push_back(xmlContent[i]);
-    i++;
-  }
-  PyErr_SetString(PyExc_Exception, "unclosed token");
-}
-static void parseText(XMLNode *node, const char *xmlContent) {
-  node->type = TEXT;
-  bool isSpace = false;
-  while (xmlContent[i] != '\0' && xmlContent[i] != '<') {
-    if (xmlContent[i] == '&') {
-      PyErr_Format(PyExc_Exception, "not well formed (violation at pos=%d)", i);
-      return;
-    }
-    if (isSpace || !std::isspace(xmlContent[i])) {
-      node->elementName.push_back(xmlContent[i]);
-      isSpace = true;
-    }
-    i++;
-  }
-  while (std::isspace(node->elementName.back())) {
-    node->elementName.pop_back();
-  }
-}
-static std::vector<XMLNode> splitNodes(const char *xmlContent) {
-  std::vector<XMLNode> nodes;
-  i = 0;
-  while (xmlContent[i] != '\0') {
-    XMLNode node;
-    if (xmlContent[i] == '<') {
-      i++;
-      if (xmlContent[i] == '/') {
-        parseContainerClose(&node, xmlContent);
-      } else if (xmlContent[i] == '!') {
-        if (xmlContent[i+1] == '[') {
-          parseCData(&node, xmlContent);
-        } else {
-          parseComment(&node, xmlContent);
-        }
-      } else {
-        parseContainerOpen(&node, xmlContent);
-      }
-    } else {
-      parseText(&node, xmlContent);
-    }
-    if (!node.elementName.empty()) {
-      nodes.push_back(node);
-    }
-  }
-  return nodes;
-}
-static PyObject *createDict(const std::vector<Pair> &attributes, char* attributePrefix) {
-  PyObject *dict = PyDict_New();
-  for (const Pair &attr : attributes) {
-    const std::string &key = attributePrefix + attr.key;
-    PyObject *val = PyUnicode_FromString(attr.value.c_str());
-    PyDict_SetItemString(dict, key.c_str(), val);
-  }
-  return dict;
-}
-PyDoc_STRVAR(xml_parse_doc, "parse(xml_content: str, attr_prefix=\"@\") -> dict:\n"
-                            "...\n\n"
-                            "Parse XML content into a dictionary.\n\n"
-                            "Args:\n\t"
-                            "xml_content (str): xml document to be parsed.\n"
-                            "Returns:\n\t"
-                            "dict: Dictionary of the xml dom.\n");
-static PyObject *xml_parse(PyObject *self, PyObject *args, PyObject *kwargs) {
-  const char *xmlContent;
-  char* attributePrefix = "@";
-  static char *kwlist[] = {"xml_content", "attr_prefix", NULL};
-  if (!PyArg_ParseTupleAndKeywords(args, kwargs, "s|s", kwlist, &xmlContent, &attributePrefix)) {
-    return NULL;
-  }
-  std::vector<XMLNode> nodes = splitNodes(xmlContent);
-  if (PyErr_Occurred() != NULL) {
-    return NULL;
-  }
-  PyObject *currDict = PyDict_New();
-  std::vector<std::string> containerStackNames;
-  std::vector<PyObject *> containerStack;
-  containerStack.push_back(currDict);
-  containerStackNames.push_back("");
-  bool isList = false;
-  for (const XMLNode &node : nodes) {
-    PyObject *childKey = PyUnicode_FromString(node.elementName.c_str());
-    if (node.type == TEXT) {
-      PyObject *item = PyDict_GetItemString(currDict, "#text");
-      if (item != NULL) {
-        PyDict_SetItemString(currDict, "#text", PyUnicode_Concat(item, childKey));
-      } else {
-        PyDict_SetItemString(currDict, "#text", childKey);
-      }
-    } else if (node.type == CONTAINER_OPEN || node.type == PRIMITIVE) {
-      PyObject *d = createDict(node.attr, attributePrefix);
-      PyObject *item = PyDict_GetItem(currDict, childKey);
-      if (item != NULL) {
-        // Check if it is a List or dict
-        if (isList && PyList_Check(item)) {
-          PyList_Append(item, d);
-        } else {
-          PyObject *children = PyList_New(2);
-          PyList_SetItem(children, 0, item);
-          PyList_SetItem(children, 1, d);
-          PyDict_SetItem(currDict, childKey, children);
-          isList = true;
-        }
-      } else {
-        PyDict_SetItem(currDict, childKey, d);
-        isList = false;
-      }
-      if (node.type == CONTAINER_OPEN) {
-        currDict = d;
-        containerStack.push_back(d);
-        containerStackNames.push_back(node.elementName);
-      }
-    } else if (node.type == CONTAINER_CLOSE) {
-      if (containerStackNames.back() != node.elementName) {
-        PyErr_Format(PyExc_Exception,
-                     "tag mismatch ('%U' does not match the last start tag)",
-                     childKey);
-      }
-      containerStackNames.pop_back();
-      containerStack.pop_back();
-      currDict = containerStack.back();
-    }
-    Py_DECREF(childKey);
-  }
-  if (containerStack.size() > 1) {
-    PyErr_Format(PyExc_Exception, "not well formed (%d unclosed tags)",
-                 containerStack.size() - 1);
-    return NULL;
-  }
-  PyObject *result = containerStack.front();
-  Py_INCREF(result);
-  return result;
-}
-static PyMethodDef XMLParserMethods[] = {
-    {"parse", (PyCFunction)xml_parse, METH_VARARGS | METH_KEYWORDS, xml_parse_doc},
-    {NULL, NULL, 0, NULL}};
-static struct PyModuleDef xmlparsermodule = {PyModuleDef_HEAD_INIT, "xmlpydict",
-                                             NULL, -1, XMLParserMethods};
-PyMODINIT_FUNC PyInit_xmlpydict() { return PyModule_Create(&xmlparsermodule); }

xmlpydict-0.0.8/src/xmlparse.py DELETED Viewed

@@ -1,68 +0,0 @@
-def parse(xml_content: str) -> dict:
-    i = 0
-    key = "@"
-    val = ""
-    xml_content += " "
-    curr_dict = {}
-    container_stack = [curr_dict]
-    while i < len(xml_content):
-        element_name = ""
-        if xml_content[i] == "<":
-            if xml_content[i + 1] == "/":
-                container_stack.pop()
-                curr_dict = container_stack[-1]
-                i = xml_content.find(">", i + 1)
-            elif xml_content[i + 1] == "!":
-                i = xml_content.find(">", i + 1)
-            else:
-                i += 1
-                has_attr = False
-                in_quotes = False
-                is_container = True
-                d = {}
-                while i < len(xml_content) and xml_content[i] != ">":
-                    is_space = xml_content[i].isspace()
-                    if xml_content[i] == "/" and xml_content[i + 1] == ">":
-                        is_container = False
-                    elif not has_attr and is_space:
-                        has_attr = True
-                    else:
-                        if has_attr:
-                            if xml_content[i] == "'" or xml_content[i] == '"':
-                                in_quotes = not in_quotes
-                                if not in_quotes and key != "" and val != "":
-                                    d[key] = val
-                                    key = "@"
-                                    val = ""
-                            elif in_quotes:
-                                val += xml_content[i]
-                            elif xml_content[i] != "=" and not is_space:
-                                key += xml_content[i]
-                        else:
-                            element_name += xml_content[i]
-                    i += 1
-                item = curr_dict.get(element_name)
-                if item is None:
-                    curr_dict[element_name] = d
-                else:
-                    if isinstance(item, list):
-                        item.append(d)
-                    else:
-                        curr_dict[element_name] = [item, d]
-                if is_container:
-                    curr_dict = d
-                    container_stack.append(d)
-            i += 1
-        else:
-            j = xml_content.find("<", i + 1)
-            if j < 0:
-                return container_stack.pop()
-            element_name = xml_content[i:j].strip()
-            i = j
-            if len(element_name) > 0:
-                curr_dict["#text"] = element_name
-    return container_stack.pop()

xmlpydict-0.0.8/src/xmlpydict.egg-info/SOURCES.txt DELETED Viewed

@@ -1,14 +0,0 @@
-LICENSE
-MANIFEST.in
-README.md
-pyproject.toml
-setup.py
-src/xmlparse.cpp
-src/xmlparse.py
-src/xmlpydict.egg-info/PKG-INFO
-src/xmlpydict.egg-info/SOURCES.txt
-src/xmlpydict.egg-info/dependency_links.txt
-src/xmlpydict.egg-info/requires.txt
-src/xmlpydict.egg-info/top_level.txt
-tests/test.py
-tests/test_parse.py

xmlpydict-0.0.8/src/xmlpydict.egg-info/top_level.txt DELETED Viewed

	@@ -1,2 +0,0 @@
1	- xmlparse
2	- xmlpydict

xmlpydict-0.0.8/tests/test.py DELETED Viewed

@@ -1,24 +0,0 @@
-import xmlpydict
-import xmltodict
-import timeit
-s = """<svg xmlns="http://www.w3.org/2000/svg" width="400" height="400">
-  <rect x="50" y="50" width="100" height="50" fill="blue" />
-  <circle cx="200" cy="100" r="50" fill="red" />
-  <ellipse cx="350" cy="75" rx="50" ry="25" fill="green" />
-  <line x1="50" y1="200" x2="150" y2="300" stroke="orange" />
-  <polyline points="200,200 250,250 300,200 350,250" fill="none" stroke="purple" />
-  <polygon points="350,200 400,250 400,150" fill="yellow" />
-  <path d="M50,350 L100,350 Q125,375 150,350 T200,350" fill="none" stroke="black"/>
-  <rect x="10" y="10" height="100" width="100"
-        style="stroke:#ff0000; fill: #0000ff"/>
-        <path d="M50,350 L100,350 Q125,375 150,350 T200,350" fill="none" stroke="black"/><polygon points="350,200 400,250 400,150" fill="yellow" />
-  <circle cx="200" cy="100" r="50" fill="red"></circle>
-  <polygon points="350,200 400,250 400,150" fill="yellow" />
-</svg>"""
-print(timeit.timeit(lambda: xmlpydict.parse(s), number=100))
-print(timeit.timeit(lambda: xmltodict.parse(s), number=100))

{xmlpydict-0.0.8 → xmlpydict-0.0.12}/LICENSE RENAMED Viewed

File without changes

{xmlpydict-0.0.8 → xmlpydict-0.0.12}/MANIFEST.in RENAMED Viewed

File without changes

{xmlpydict-0.0.8 → xmlpydict-0.0.12}/setup.cfg RENAMED Viewed

File without changes

{xmlpydict-0.0.8/src → xmlpydict-0.0.12}/xmlpydict.egg-info/dependency_links.txt RENAMED Viewed

File without changes

xmlpydict 0.0.8__tar.gz → 0.0.12__tar.gz

xmlpydict 0.0.8tar.gz → 0.0.12tar.gz