PyPI - xmlpydict - Versions diffs - 0.0.7__tar.gz → 0.0.9__tar.gz - Mend

xmlpydict 0.0.7tar.gz → 0.0.9tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of xmlpydict might be problematic. Click here for more details.

Files changed (17) hide show

{xmlpydict-0.0.7/src/xmlpydict.egg-info → xmlpydict-0.0.9}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: xmlpydict
-Version: 0.0.7
+Version: 0.0.9
 Summary: xml to dictionary tool for python
 Author-email: Matthew Taylor <matthew.taylor.andre@gmail.com>
 Project-URL: Homepage, https://github.com/MatthewAndreTaylor/xml-to-pydict
@@ -52,9 +52,26 @@ pip install xmlpydict
 {'person': {'@name': 'Matthew', '#text': 'Hello!'}}
 ```
-## Tags
+## Goals
-# dict.get(key[, default]) will not cause exceptions
+Create a consistent parsing strategy between XML and Python dictionaries. xmlpydict takes a more laid-back approach to enforce the syntax of XML. However, still ensures fast speeds by using finite automata.
+## Features
+xmlpydict allows for multiple root elements.
+The root object is treated as the Python object.
+### xmlpydict supports the following
+[CDataSection](https://www.w3.org/TR/xml/#sec-cdata-sect):  CDATA Sections are stored as {'#text': CData}.
+[Comments](https://www.w3.org/TR/xml/#sec-comments):  Comments are tokenized for corectness, but have no effect in what is returned.
+[Element Tags](https://www.w3.org/TR/xml/#sec-starttags):  Allows for duplicate attributes, however only the latest defined will be taken.
+[Characters](https://www.w3.org/TR/xml/#charsets):  Similar to CDATA text is stored as {'#text': Char} , however this text is stripped.
+### dict.get(key[, default]) will not cause exceptions
 ```py
 # Empty tags are containers
@@ -68,3 +85,31 @@ None
 >>> parse("")
 {}
 ```
+### Attribute prefixing
+```py
+# Change prefix from default "@" with keyword argument attr_prefix
+>>> from xmlpydict import parse
+>>> parse('<p width="10" height="5"></p>', attr_prefix="$")
+{"p": {"$width": "10", "$height": "5"}}
+```
+### Exceptions
+```py
+# Grammar and structure of the xml_content is checked while parsing
+>>> from xmlpydict import parse
+>>> parse("<a></ a>")
+Exception: not well formed (violation at pos=5)
+```
+### Unsupported
+Prolog / Enforcing Document Type Definition and Element Type Declarations
+Entity Referencing
+Namespaces

xmlpydict-0.0.9/README.md ADDED Viewed

@@ -0,0 +1,89 @@
+# xmlpydict 📑
+[![XML Tests](https://github.com/MatthewAndreTaylor/xml-to-pydict/actions/workflows/tests.yml/badge.svg)](https://github.com/MatthewAndreTaylor/xml-to-pydict/actions/workflows/tests.yml)
+[![PyPI versions](https://img.shields.io/badge/python-3.7%2B-blue)](https://github.com/MatthewAndreTaylor/xml-to-pydict)
+[![PyPI](https://img.shields.io/pypi/v/xmlpydict.svg)](https://pypi.org/project/xmlpydict/)
+## Requirements
+- `python 3.7+`
+## Installation
+To install xmlpydict, using pip:
+```bash
+pip install xmlpydict
+```
+## Quickstart
+```py
+>>> from xmlpydict import parse
+>>> parse("<package><xmlpydict language='python'/></package>")
+{'package': {'xmlpydict': {'@language': 'python'}}}
+>>> parse("<person name='Matthew'>Hello!</person>")
+{'person': {'@name': 'Matthew', '#text': 'Hello!'}}
+```
+## Goals
+Create a consistent parsing strategy between XML and Python dictionaries. xmlpydict takes a more laid-back approach to enforce the syntax of XML. However, still ensures fast speeds by using finite automata.
+## Features
+xmlpydict allows for multiple root elements.
+The root object is treated as the Python object.
+### xmlpydict supports the following
+[CDataSection](https://www.w3.org/TR/xml/#sec-cdata-sect):  CDATA Sections are stored as {'#text': CData}.
+[Comments](https://www.w3.org/TR/xml/#sec-comments):  Comments are tokenized for corectness, but have no effect in what is returned.
+[Element Tags](https://www.w3.org/TR/xml/#sec-starttags):  Allows for duplicate attributes, however only the latest defined will be taken.
+[Characters](https://www.w3.org/TR/xml/#charsets):  Similar to CDATA text is stored as {'#text': Char} , however this text is stripped.
+### dict.get(key[, default]) will not cause exceptions
+```py
+# Empty tags are containers
+>>> from xmlpydict import parse
+>>> parse("<a></a>")
+{'a': {}}
+>>> parse("<a/>")
+{'a': {}}
+>>> parse("<a/>").get('href')
+None
+>>> parse("")
+{}
+```
+### Attribute prefixing
+```py
+# Change prefix from default "@" with keyword argument attr_prefix
+>>> from xmlpydict import parse
+>>> parse('<p width="10" height="5"></p>', attr_prefix="$")
+{"p": {"$width": "10", "$height": "5"}}
+```
+### Exceptions
+```py
+# Grammar and structure of the xml_content is checked while parsing
+>>> from xmlpydict import parse
+>>> parse("<a></ a>")
+Exception: not well formed (violation at pos=5)
+```
+### Unsupported
+Prolog / Enforcing Document Type Definition and Element Type Declarations
+Entity Referencing
+Namespaces

{xmlpydict-0.0.7 → xmlpydict-0.0.9}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "xmlpydict"
-version = "0.0.7"
+version = "0.0.9"
 description="xml to dictionary tool for python"
 authors = [
     {name = "Matthew Taylor", email = "matthew.taylor.andre@gmail.com"},

{xmlpydict-0.0.7 → xmlpydict-0.0.9}/src/xmlparse.cpp RENAMED Viewed

@@ -225,6 +225,34 @@ static void parseComment(XMLNode *node, const char *xmlContent) {
   PyErr_SetString(PyExc_Exception, "unclosed token");
 }
+static void parseCData(XMLNode *node, const char *xmlContent) {
+  node->type = TEXT;
+  i+=2;
+  std::string cdata = "CDATA[";
+  size_t j = 0;
+  while (xmlContent[i] != '\0') {
+    if (j >= cdata.size()) {
+      break;
+    }
+    if (cdata[j] != xmlContent[i]) {
+      PyErr_Format(PyExc_Exception, "not well formed (violation at pos=%d)", i);
+      return;
+    }
+    i++;
+    j++;
+  }
+  while (xmlContent[i] != '\0' || xmlContent[i + 1] != '\0') {
+    if (xmlContent[i] == ']' && xmlContent[i + 1] == ']' &&
+        xmlContent[i + 2] == '>') {
+      i += 3;
+      return;
+    }
+    node->elementName.push_back(xmlContent[i]);
+    i++;
+  }
+  PyErr_SetString(PyExc_Exception, "unclosed token");
+}
 static void parseText(XMLNode *node, const char *xmlContent) {
   node->type = TEXT;
   bool isSpace = false;
@@ -245,9 +273,28 @@ static void parseText(XMLNode *node, const char *xmlContent) {
   }
 }
+static void parseProlog(const char *xmlContent) {
+  const char* startTag = "<?xml";
+  int j = 0;
+  for (j = 0; j < 5 && xmlContent[j] != '\0'; j++) {
+      if (xmlContent[j] != startTag[j]) {
+          return;
+      }
+  }
+  i = j;
+  while (xmlContent[i] != '\0') {
+    if (xmlContent[i] == '>'){
+      i++;
+      break;
+    }
+    i++;
+  }
+}
 static std::vector<XMLNode> splitNodes(const char *xmlContent) {
   std::vector<XMLNode> nodes;
   i = 0;
+  parseProlog(xmlContent);
   while (xmlContent[i] != '\0') {
     XMLNode node;
@@ -256,7 +303,11 @@ static std::vector<XMLNode> splitNodes(const char *xmlContent) {
       if (xmlContent[i] == '/') {
         parseContainerClose(&node, xmlContent);
       } else if (xmlContent[i] == '!') {
-        parseComment(&node, xmlContent);
+        if (xmlContent[i+1] == '[') {
+          parseCData(&node, xmlContent);
+        } else {
+          parseComment(&node, xmlContent);
+        }
       } else {
         parseContainerOpen(&node, xmlContent);
       }
@@ -271,10 +322,10 @@ static std::vector<XMLNode> splitNodes(const char *xmlContent) {
   return nodes;
 }
-static PyObject *createDict(const std::vector<Pair> &attributes) {
+static PyObject *createDict(const std::vector<Pair> &attributes, char* attributePrefix) {
   PyObject *dict = PyDict_New();
   for (const Pair &attr : attributes) {
-    const std::string &key = "@" + attr.key;
+    const std::string &key = attributePrefix + attr.key;
     PyObject *val = PyUnicode_FromString(attr.value.c_str());
     PyDict_SetItemString(dict, key.c_str(), val);
   }
@@ -282,17 +333,20 @@ static PyObject *createDict(const std::vector<Pair> &attributes) {
   return dict;
 }
-PyDoc_STRVAR(xml_parse_doc, "parse(xml_content: str) -> dict:\n"
+PyDoc_STRVAR(xml_parse_doc, "parse(xml_content: str, attr_prefix=\"@\") -> dict:\n"
                             "...\n\n"
                             "Parse XML content into a dictionary.\n\n"
                             "Args:\n\t"
                             "xml_content (str): xml document to be parsed.\n"
                             "Returns:\n\t"
                             "dict: Dictionary of the xml dom.\n");
-static PyObject *xml_parse(PyObject *self, PyObject *args) {
+static PyObject *xml_parse(PyObject *self, PyObject *args, PyObject *kwargs) {
   const char *xmlContent;
+  char* attributePrefix = "@";
+  static char *kwlist[] = {"xml_content", "attr_prefix", NULL};
-  if (!PyArg_ParseTuple(args, "s", &xmlContent)) {
+  if (!PyArg_ParseTupleAndKeywords(args, kwargs, "s|s", kwlist, &xmlContent, &attributePrefix)) {
     return NULL;
   }
@@ -319,7 +373,7 @@ static PyObject *xml_parse(PyObject *self, PyObject *args) {
         PyDict_SetItemString(currDict, "#text", childKey);
       }
     } else if (node.type == CONTAINER_OPEN || node.type == PRIMITIVE) {
-      PyObject *d = createDict(node.attr);
+      PyObject *d = createDict(node.attr, attributePrefix);
       PyObject *item = PyDict_GetItem(currDict, childKey);
       if (item != NULL) {
@@ -369,7 +423,7 @@ static PyObject *xml_parse(PyObject *self, PyObject *args) {
 }
 static PyMethodDef XMLParserMethods[] = {
-    {"parse", (PyCFunction)xml_parse, METH_VARARGS, xml_parse_doc},
+    {"parse", (PyCFunction)xml_parse, METH_VARARGS | METH_KEYWORDS, xml_parse_doc},
     {NULL, NULL, 0, NULL}};
 static struct PyModuleDef xmlparsermodule = {PyModuleDef_HEAD_INIT, "xmlpydict",

{xmlpydict-0.0.7 → xmlpydict-0.0.9}/src/xmlparse.py RENAMED Viewed

@@ -1,6 +1,6 @@
-def parse(xml_content: str) -> dict:
+def parse(xml_content: str, attr_prefix="@") -> dict:
     i = 0
-    key = "@"
+    key = attr_prefix
     val = ""
     xml_content += " "
@@ -35,7 +35,7 @@ def parse(xml_content: str) -> dict:
                                 in_quotes = not in_quotes
                                 if not in_quotes and key != "" and val != "":
                                     d[key] = val
-                                    key = "@"
+                                    key = attr_prefix
                                     val = ""
                             elif in_quotes:
                                 val += xml_content[i]
@@ -63,6 +63,6 @@ def parse(xml_content: str) -> dict:
             element_name = xml_content[i:j].strip()
             i = j
             if len(element_name) > 0:
-                curr_dict["#text"] = element_name
+                curr_dict["#text"] = curr_dict.setdefault("#text", "") + element_name
     return container_stack.pop()

{xmlpydict-0.0.7 → xmlpydict-0.0.9/src/xmlpydict.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: xmlpydict
-Version: 0.0.7
+Version: 0.0.9
 Summary: xml to dictionary tool for python
 Author-email: Matthew Taylor <matthew.taylor.andre@gmail.com>
 Project-URL: Homepage, https://github.com/MatthewAndreTaylor/xml-to-pydict
@@ -52,9 +52,26 @@ pip install xmlpydict
 {'person': {'@name': 'Matthew', '#text': 'Hello!'}}
 ```
-## Tags
+## Goals
-# dict.get(key[, default]) will not cause exceptions
+Create a consistent parsing strategy between XML and Python dictionaries. xmlpydict takes a more laid-back approach to enforce the syntax of XML. However, still ensures fast speeds by using finite automata.
+## Features
+xmlpydict allows for multiple root elements.
+The root object is treated as the Python object.
+### xmlpydict supports the following
+[CDataSection](https://www.w3.org/TR/xml/#sec-cdata-sect):  CDATA Sections are stored as {'#text': CData}.
+[Comments](https://www.w3.org/TR/xml/#sec-comments):  Comments are tokenized for corectness, but have no effect in what is returned.
+[Element Tags](https://www.w3.org/TR/xml/#sec-starttags):  Allows for duplicate attributes, however only the latest defined will be taken.
+[Characters](https://www.w3.org/TR/xml/#charsets):  Similar to CDATA text is stored as {'#text': Char} , however this text is stripped.
+### dict.get(key[, default]) will not cause exceptions
 ```py
 # Empty tags are containers
@@ -68,3 +85,31 @@ None
 >>> parse("")
 {}
 ```
+### Attribute prefixing
+```py
+# Change prefix from default "@" with keyword argument attr_prefix
+>>> from xmlpydict import parse
+>>> parse('<p width="10" height="5"></p>', attr_prefix="$")
+{"p": {"$width": "10", "$height": "5"}}
+```
+### Exceptions
+```py
+# Grammar and structure of the xml_content is checked while parsing
+>>> from xmlpydict import parse
+>>> parse("<a></ a>")
+Exception: not well formed (violation at pos=5)
+```
+### Unsupported
+Prolog / Enforcing Document Type Definition and Element Type Declarations
+Entity Referencing
+Namespaces

{xmlpydict-0.0.7 → xmlpydict-0.0.9}/tests/test_parse.py RENAMED Viewed

@@ -66,6 +66,12 @@ def test_simple():
     )
+def test_cdata():
+    assert parse("<content><![CDATA[<p>This is a paragraph</p>]]></content>") == {
+        "content": {"#text": "<p>This is a paragraph</p>"}
+    }
 def test_nested():
     assert parse("<book><p/></book> ") == {"book": {"p": {}}}
     assert parse("<book><p></p></book>") == {"book": {"p": {}}}
@@ -282,3 +288,11 @@ def test_exception():
     for xml_str in xml_strings:
         with pytest.raises(Exception):
             parse(xml_str)
+def test_prefix():
+    assert parse("<p></p>", attr_prefix="$") == {"p": {}}
+    assert parse('<p width="10"></p>', attr_prefix="$") == {"p": {"$width": "10"}}
+    assert parse('<p width="10" height="5"></p>', attr_prefix="$") == {
+        "p": {"$width": "10", "$height": "5"}
+    }

xmlpydict-0.0.7/README.md DELETED Viewed

@@ -1,44 +0,0 @@
-# xmlpydict 📑
-[![XML Tests](https://github.com/MatthewAndreTaylor/xml-to-pydict/actions/workflows/tests.yml/badge.svg)](https://github.com/MatthewAndreTaylor/xml-to-pydict/actions/workflows/tests.yml)
-[![PyPI versions](https://img.shields.io/badge/python-3.7%2B-blue)](https://github.com/MatthewAndreTaylor/xml-to-pydict)
-[![PyPI](https://img.shields.io/pypi/v/xmlpydict.svg)](https://pypi.org/project/xmlpydict/)
-## Requirements
-- `python 3.7+`
-## Installation
-To install xmlpydict, using pip:
-```bash
-pip install xmlpydict
-```
-## Quickstart
-```py
->>> from xmlpydict import parse
->>> parse("<package><xmlpydict language='python'/></package>")
-{'package': {'xmlpydict': {'@language': 'python'}}}
->>> parse("<person name='Matthew'>Hello!</person>")
-{'person': {'@name': 'Matthew', '#text': 'Hello!'}}
-```
-## Tags
-# dict.get(key[, default]) will not cause exceptions
-```py
-# Empty tags are containers
->>> from xmlpydict import parse
->>> parse("<a></a>")
-{'a': {}}
->>> parse("<a/>")
-{'a': {}}
->>> parse("<a/>").get('href')
-None
->>> parse("")
-{}
-```