xmlpydict 0.0.7__tar.gz → 0.0.9__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of xmlpydict might be problematic. Click here for more details.

@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: xmlpydict
3
- Version: 0.0.7
3
+ Version: 0.0.9
4
4
  Summary: xml to dictionary tool for python
5
5
  Author-email: Matthew Taylor <matthew.taylor.andre@gmail.com>
6
6
  Project-URL: Homepage, https://github.com/MatthewAndreTaylor/xml-to-pydict
@@ -52,9 +52,26 @@ pip install xmlpydict
52
52
  {'person': {'@name': 'Matthew', '#text': 'Hello!'}}
53
53
  ```
54
54
 
55
- ## Tags
55
+ ## Goals
56
56
 
57
- # dict.get(key[, default]) will not cause exceptions
57
+ Create a consistent parsing strategy between XML and Python dictionaries. xmlpydict takes a more laid-back approach to enforce the syntax of XML. However, still ensures fast speeds by using finite automata.
58
+
59
+ ## Features
60
+
61
+ xmlpydict allows for multiple root elements.
62
+ The root object is treated as the Python object.
63
+
64
+ ### xmlpydict supports the following
65
+
66
+ [CDataSection](https://www.w3.org/TR/xml/#sec-cdata-sect): CDATA Sections are stored as {'#text': CData}.
67
+
68
+ [Comments](https://www.w3.org/TR/xml/#sec-comments): Comments are tokenized for corectness, but have no effect in what is returned.
69
+
70
+ [Element Tags](https://www.w3.org/TR/xml/#sec-starttags): Allows for duplicate attributes, however only the latest defined will be taken.
71
+
72
+ [Characters](https://www.w3.org/TR/xml/#charsets): Similar to CDATA text is stored as {'#text': Char} , however this text is stripped.
73
+
74
+ ### dict.get(key[, default]) will not cause exceptions
58
75
 
59
76
  ```py
60
77
  # Empty tags are containers
@@ -68,3 +85,31 @@ None
68
85
  >>> parse("")
69
86
  {}
70
87
  ```
88
+
89
+ ### Attribute prefixing
90
+
91
+ ```py
92
+ # Change prefix from default "@" with keyword argument attr_prefix
93
+ >>> from xmlpydict import parse
94
+ >>> parse('<p width="10" height="5"></p>', attr_prefix="$")
95
+ {"p": {"$width": "10", "$height": "5"}}
96
+ ```
97
+
98
+
99
+ ### Exceptions
100
+
101
+ ```py
102
+ # Grammar and structure of the xml_content is checked while parsing
103
+ >>> from xmlpydict import parse
104
+ >>> parse("<a></ a>")
105
+ Exception: not well formed (violation at pos=5)
106
+ ```
107
+
108
+
109
+ ### Unsupported
110
+
111
+ Prolog / Enforcing Document Type Definition and Element Type Declarations
112
+
113
+ Entity Referencing
114
+
115
+ Namespaces
@@ -0,0 +1,89 @@
1
+ # xmlpydict 📑
2
+
3
+ [![XML Tests](https://github.com/MatthewAndreTaylor/xml-to-pydict/actions/workflows/tests.yml/badge.svg)](https://github.com/MatthewAndreTaylor/xml-to-pydict/actions/workflows/tests.yml)
4
+ [![PyPI versions](https://img.shields.io/badge/python-3.7%2B-blue)](https://github.com/MatthewAndreTaylor/xml-to-pydict)
5
+ [![PyPI](https://img.shields.io/pypi/v/xmlpydict.svg)](https://pypi.org/project/xmlpydict/)
6
+
7
+ ## Requirements
8
+
9
+ - `python 3.7+`
10
+
11
+ ## Installation
12
+
13
+ To install xmlpydict, using pip:
14
+
15
+ ```bash
16
+ pip install xmlpydict
17
+ ```
18
+
19
+ ## Quickstart
20
+
21
+ ```py
22
+ >>> from xmlpydict import parse
23
+ >>> parse("<package><xmlpydict language='python'/></package>")
24
+ {'package': {'xmlpydict': {'@language': 'python'}}}
25
+ >>> parse("<person name='Matthew'>Hello!</person>")
26
+ {'person': {'@name': 'Matthew', '#text': 'Hello!'}}
27
+ ```
28
+
29
+ ## Goals
30
+
31
+ Create a consistent parsing strategy between XML and Python dictionaries. xmlpydict takes a more laid-back approach to enforce the syntax of XML. However, still ensures fast speeds by using finite automata.
32
+
33
+ ## Features
34
+
35
+ xmlpydict allows for multiple root elements.
36
+ The root object is treated as the Python object.
37
+
38
+ ### xmlpydict supports the following
39
+
40
+ [CDataSection](https://www.w3.org/TR/xml/#sec-cdata-sect): CDATA Sections are stored as {'#text': CData}.
41
+
42
+ [Comments](https://www.w3.org/TR/xml/#sec-comments): Comments are tokenized for corectness, but have no effect in what is returned.
43
+
44
+ [Element Tags](https://www.w3.org/TR/xml/#sec-starttags): Allows for duplicate attributes, however only the latest defined will be taken.
45
+
46
+ [Characters](https://www.w3.org/TR/xml/#charsets): Similar to CDATA text is stored as {'#text': Char} , however this text is stripped.
47
+
48
+ ### dict.get(key[, default]) will not cause exceptions
49
+
50
+ ```py
51
+ # Empty tags are containers
52
+ >>> from xmlpydict import parse
53
+ >>> parse("<a></a>")
54
+ {'a': {}}
55
+ >>> parse("<a/>")
56
+ {'a': {}}
57
+ >>> parse("<a/>").get('href')
58
+ None
59
+ >>> parse("")
60
+ {}
61
+ ```
62
+
63
+ ### Attribute prefixing
64
+
65
+ ```py
66
+ # Change prefix from default "@" with keyword argument attr_prefix
67
+ >>> from xmlpydict import parse
68
+ >>> parse('<p width="10" height="5"></p>', attr_prefix="$")
69
+ {"p": {"$width": "10", "$height": "5"}}
70
+ ```
71
+
72
+
73
+ ### Exceptions
74
+
75
+ ```py
76
+ # Grammar and structure of the xml_content is checked while parsing
77
+ >>> from xmlpydict import parse
78
+ >>> parse("<a></ a>")
79
+ Exception: not well formed (violation at pos=5)
80
+ ```
81
+
82
+
83
+ ### Unsupported
84
+
85
+ Prolog / Enforcing Document Type Definition and Element Type Declarations
86
+
87
+ Entity Referencing
88
+
89
+ Namespaces
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
4
4
 
5
5
  [project]
6
6
  name = "xmlpydict"
7
- version = "0.0.7"
7
+ version = "0.0.9"
8
8
  description="xml to dictionary tool for python"
9
9
  authors = [
10
10
  {name = "Matthew Taylor", email = "matthew.taylor.andre@gmail.com"},
@@ -225,6 +225,34 @@ static void parseComment(XMLNode *node, const char *xmlContent) {
225
225
  PyErr_SetString(PyExc_Exception, "unclosed token");
226
226
  }
227
227
 
228
+ static void parseCData(XMLNode *node, const char *xmlContent) {
229
+ node->type = TEXT;
230
+ i+=2;
231
+ std::string cdata = "CDATA[";
232
+ size_t j = 0;
233
+ while (xmlContent[i] != '\0') {
234
+ if (j >= cdata.size()) {
235
+ break;
236
+ }
237
+ if (cdata[j] != xmlContent[i]) {
238
+ PyErr_Format(PyExc_Exception, "not well formed (violation at pos=%d)", i);
239
+ return;
240
+ }
241
+ i++;
242
+ j++;
243
+ }
244
+ while (xmlContent[i] != '\0' || xmlContent[i + 1] != '\0') {
245
+ if (xmlContent[i] == ']' && xmlContent[i + 1] == ']' &&
246
+ xmlContent[i + 2] == '>') {
247
+ i += 3;
248
+ return;
249
+ }
250
+ node->elementName.push_back(xmlContent[i]);
251
+ i++;
252
+ }
253
+ PyErr_SetString(PyExc_Exception, "unclosed token");
254
+ }
255
+
228
256
  static void parseText(XMLNode *node, const char *xmlContent) {
229
257
  node->type = TEXT;
230
258
  bool isSpace = false;
@@ -245,9 +273,28 @@ static void parseText(XMLNode *node, const char *xmlContent) {
245
273
  }
246
274
  }
247
275
 
276
+ static void parseProlog(const char *xmlContent) {
277
+ const char* startTag = "<?xml";
278
+ int j = 0;
279
+ for (j = 0; j < 5 && xmlContent[j] != '\0'; j++) {
280
+ if (xmlContent[j] != startTag[j]) {
281
+ return;
282
+ }
283
+ }
284
+ i = j;
285
+ while (xmlContent[i] != '\0') {
286
+ if (xmlContent[i] == '>'){
287
+ i++;
288
+ break;
289
+ }
290
+ i++;
291
+ }
292
+ }
293
+
248
294
  static std::vector<XMLNode> splitNodes(const char *xmlContent) {
249
295
  std::vector<XMLNode> nodes;
250
296
  i = 0;
297
+ parseProlog(xmlContent);
251
298
 
252
299
  while (xmlContent[i] != '\0') {
253
300
  XMLNode node;
@@ -256,7 +303,11 @@ static std::vector<XMLNode> splitNodes(const char *xmlContent) {
256
303
  if (xmlContent[i] == '/') {
257
304
  parseContainerClose(&node, xmlContent);
258
305
  } else if (xmlContent[i] == '!') {
259
- parseComment(&node, xmlContent);
306
+ if (xmlContent[i+1] == '[') {
307
+ parseCData(&node, xmlContent);
308
+ } else {
309
+ parseComment(&node, xmlContent);
310
+ }
260
311
  } else {
261
312
  parseContainerOpen(&node, xmlContent);
262
313
  }
@@ -271,10 +322,10 @@ static std::vector<XMLNode> splitNodes(const char *xmlContent) {
271
322
  return nodes;
272
323
  }
273
324
 
274
- static PyObject *createDict(const std::vector<Pair> &attributes) {
325
+ static PyObject *createDict(const std::vector<Pair> &attributes, char* attributePrefix) {
275
326
  PyObject *dict = PyDict_New();
276
327
  for (const Pair &attr : attributes) {
277
- const std::string &key = "@" + attr.key;
328
+ const std::string &key = attributePrefix + attr.key;
278
329
  PyObject *val = PyUnicode_FromString(attr.value.c_str());
279
330
  PyDict_SetItemString(dict, key.c_str(), val);
280
331
  }
@@ -282,17 +333,20 @@ static PyObject *createDict(const std::vector<Pair> &attributes) {
282
333
  return dict;
283
334
  }
284
335
 
285
- PyDoc_STRVAR(xml_parse_doc, "parse(xml_content: str) -> dict:\n"
336
+ PyDoc_STRVAR(xml_parse_doc, "parse(xml_content: str, attr_prefix=\"@\") -> dict:\n"
286
337
  "...\n\n"
287
338
  "Parse XML content into a dictionary.\n\n"
288
339
  "Args:\n\t"
289
340
  "xml_content (str): xml document to be parsed.\n"
290
341
  "Returns:\n\t"
291
342
  "dict: Dictionary of the xml dom.\n");
292
- static PyObject *xml_parse(PyObject *self, PyObject *args) {
343
+ static PyObject *xml_parse(PyObject *self, PyObject *args, PyObject *kwargs) {
293
344
  const char *xmlContent;
345
+ char* attributePrefix = "@";
346
+
347
+ static char *kwlist[] = {"xml_content", "attr_prefix", NULL};
294
348
 
295
- if (!PyArg_ParseTuple(args, "s", &xmlContent)) {
349
+ if (!PyArg_ParseTupleAndKeywords(args, kwargs, "s|s", kwlist, &xmlContent, &attributePrefix)) {
296
350
  return NULL;
297
351
  }
298
352
 
@@ -319,7 +373,7 @@ static PyObject *xml_parse(PyObject *self, PyObject *args) {
319
373
  PyDict_SetItemString(currDict, "#text", childKey);
320
374
  }
321
375
  } else if (node.type == CONTAINER_OPEN || node.type == PRIMITIVE) {
322
- PyObject *d = createDict(node.attr);
376
+ PyObject *d = createDict(node.attr, attributePrefix);
323
377
 
324
378
  PyObject *item = PyDict_GetItem(currDict, childKey);
325
379
  if (item != NULL) {
@@ -369,7 +423,7 @@ static PyObject *xml_parse(PyObject *self, PyObject *args) {
369
423
  }
370
424
 
371
425
  static PyMethodDef XMLParserMethods[] = {
372
- {"parse", (PyCFunction)xml_parse, METH_VARARGS, xml_parse_doc},
426
+ {"parse", (PyCFunction)xml_parse, METH_VARARGS | METH_KEYWORDS, xml_parse_doc},
373
427
  {NULL, NULL, 0, NULL}};
374
428
 
375
429
  static struct PyModuleDef xmlparsermodule = {PyModuleDef_HEAD_INIT, "xmlpydict",
@@ -1,6 +1,6 @@
1
- def parse(xml_content: str) -> dict:
1
+ def parse(xml_content: str, attr_prefix="@") -> dict:
2
2
  i = 0
3
- key = "@"
3
+ key = attr_prefix
4
4
  val = ""
5
5
  xml_content += " "
6
6
 
@@ -35,7 +35,7 @@ def parse(xml_content: str) -> dict:
35
35
  in_quotes = not in_quotes
36
36
  if not in_quotes and key != "" and val != "":
37
37
  d[key] = val
38
- key = "@"
38
+ key = attr_prefix
39
39
  val = ""
40
40
  elif in_quotes:
41
41
  val += xml_content[i]
@@ -63,6 +63,6 @@ def parse(xml_content: str) -> dict:
63
63
  element_name = xml_content[i:j].strip()
64
64
  i = j
65
65
  if len(element_name) > 0:
66
- curr_dict["#text"] = element_name
66
+ curr_dict["#text"] = curr_dict.setdefault("#text", "") + element_name
67
67
 
68
68
  return container_stack.pop()
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.1
2
2
  Name: xmlpydict
3
- Version: 0.0.7
3
+ Version: 0.0.9
4
4
  Summary: xml to dictionary tool for python
5
5
  Author-email: Matthew Taylor <matthew.taylor.andre@gmail.com>
6
6
  Project-URL: Homepage, https://github.com/MatthewAndreTaylor/xml-to-pydict
@@ -52,9 +52,26 @@ pip install xmlpydict
52
52
  {'person': {'@name': 'Matthew', '#text': 'Hello!'}}
53
53
  ```
54
54
 
55
- ## Tags
55
+ ## Goals
56
56
 
57
- # dict.get(key[, default]) will not cause exceptions
57
+ Create a consistent parsing strategy between XML and Python dictionaries. xmlpydict takes a more laid-back approach to enforce the syntax of XML. However, still ensures fast speeds by using finite automata.
58
+
59
+ ## Features
60
+
61
+ xmlpydict allows for multiple root elements.
62
+ The root object is treated as the Python object.
63
+
64
+ ### xmlpydict supports the following
65
+
66
+ [CDataSection](https://www.w3.org/TR/xml/#sec-cdata-sect): CDATA Sections are stored as {'#text': CData}.
67
+
68
+ [Comments](https://www.w3.org/TR/xml/#sec-comments): Comments are tokenized for corectness, but have no effect in what is returned.
69
+
70
+ [Element Tags](https://www.w3.org/TR/xml/#sec-starttags): Allows for duplicate attributes, however only the latest defined will be taken.
71
+
72
+ [Characters](https://www.w3.org/TR/xml/#charsets): Similar to CDATA text is stored as {'#text': Char} , however this text is stripped.
73
+
74
+ ### dict.get(key[, default]) will not cause exceptions
58
75
 
59
76
  ```py
60
77
  # Empty tags are containers
@@ -68,3 +85,31 @@ None
68
85
  >>> parse("")
69
86
  {}
70
87
  ```
88
+
89
+ ### Attribute prefixing
90
+
91
+ ```py
92
+ # Change prefix from default "@" with keyword argument attr_prefix
93
+ >>> from xmlpydict import parse
94
+ >>> parse('<p width="10" height="5"></p>', attr_prefix="$")
95
+ {"p": {"$width": "10", "$height": "5"}}
96
+ ```
97
+
98
+
99
+ ### Exceptions
100
+
101
+ ```py
102
+ # Grammar and structure of the xml_content is checked while parsing
103
+ >>> from xmlpydict import parse
104
+ >>> parse("<a></ a>")
105
+ Exception: not well formed (violation at pos=5)
106
+ ```
107
+
108
+
109
+ ### Unsupported
110
+
111
+ Prolog / Enforcing Document Type Definition and Element Type Declarations
112
+
113
+ Entity Referencing
114
+
115
+ Namespaces
@@ -66,6 +66,12 @@ def test_simple():
66
66
  )
67
67
 
68
68
 
69
+ def test_cdata():
70
+ assert parse("<content><![CDATA[<p>This is a paragraph</p>]]></content>") == {
71
+ "content": {"#text": "<p>This is a paragraph</p>"}
72
+ }
73
+
74
+
69
75
  def test_nested():
70
76
  assert parse("<book><p/></book> ") == {"book": {"p": {}}}
71
77
  assert parse("<book><p></p></book>") == {"book": {"p": {}}}
@@ -282,3 +288,11 @@ def test_exception():
282
288
  for xml_str in xml_strings:
283
289
  with pytest.raises(Exception):
284
290
  parse(xml_str)
291
+
292
+
293
+ def test_prefix():
294
+ assert parse("<p></p>", attr_prefix="$") == {"p": {}}
295
+ assert parse('<p width="10"></p>', attr_prefix="$") == {"p": {"$width": "10"}}
296
+ assert parse('<p width="10" height="5"></p>', attr_prefix="$") == {
297
+ "p": {"$width": "10", "$height": "5"}
298
+ }
xmlpydict-0.0.7/README.md DELETED
@@ -1,44 +0,0 @@
1
- # xmlpydict 📑
2
-
3
- [![XML Tests](https://github.com/MatthewAndreTaylor/xml-to-pydict/actions/workflows/tests.yml/badge.svg)](https://github.com/MatthewAndreTaylor/xml-to-pydict/actions/workflows/tests.yml)
4
- [![PyPI versions](https://img.shields.io/badge/python-3.7%2B-blue)](https://github.com/MatthewAndreTaylor/xml-to-pydict)
5
- [![PyPI](https://img.shields.io/pypi/v/xmlpydict.svg)](https://pypi.org/project/xmlpydict/)
6
-
7
- ## Requirements
8
-
9
- - `python 3.7+`
10
-
11
- ## Installation
12
-
13
- To install xmlpydict, using pip:
14
-
15
- ```bash
16
- pip install xmlpydict
17
- ```
18
-
19
- ## Quickstart
20
-
21
- ```py
22
- >>> from xmlpydict import parse
23
- >>> parse("<package><xmlpydict language='python'/></package>")
24
- {'package': {'xmlpydict': {'@language': 'python'}}}
25
- >>> parse("<person name='Matthew'>Hello!</person>")
26
- {'person': {'@name': 'Matthew', '#text': 'Hello!'}}
27
- ```
28
-
29
- ## Tags
30
-
31
- # dict.get(key[, default]) will not cause exceptions
32
-
33
- ```py
34
- # Empty tags are containers
35
- >>> from xmlpydict import parse
36
- >>> parse("<a></a>")
37
- {'a': {}}
38
- >>> parse("<a/>")
39
- {'a': {}}
40
- >>> parse("<a/>").get('href')
41
- None
42
- >>> parse("")
43
- {}
44
- ```
File without changes
File without changes
File without changes
File without changes
File without changes