selectolax 0.3.28__cp38-cp38-musllinux_1_2_aarch64.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of selectolax might be problematic. Click here for more details.

@@ -0,0 +1,183 @@
1
+ Metadata-Version: 2.1
2
+ Name: selectolax
3
+ Version: 0.3.28
4
+ Summary: Fast HTML5 parser with CSS selectors.
5
+ Home-page: https://github.com/rushter/selectolax
6
+ Author: Artem Golubin
7
+ Author-email: me@rushter.com
8
+ License: MIT license
9
+ Project-URL: Source code, https://github.com/rushter/selectolax
10
+ Keywords: selectolax
11
+ Classifier: Development Status :: 5 - Production/Stable
12
+ Classifier: Topic :: Text Processing :: Markup :: HTML
13
+ Classifier: Topic :: Internet
14
+ Classifier: Topic :: Internet :: WWW/HTTP
15
+ Classifier: Intended Audience :: Developers
16
+ Classifier: License :: OSI Approved :: MIT License
17
+ Classifier: Natural Language :: English
18
+ Classifier: Programming Language :: Python :: 3
19
+ Classifier: Programming Language :: Python :: 3.7
20
+ Classifier: Programming Language :: Python :: 3.8
21
+ Classifier: Programming Language :: Python :: 3.9
22
+ Classifier: Programming Language :: Python :: 3.10
23
+ Classifier: Programming Language :: Python :: 3.11
24
+ Classifier: Programming Language :: Python :: 3.12
25
+ Classifier: Programming Language :: Python :: 3.13
26
+ License-File: LICENSE
27
+ Provides-Extra: cython
28
+ Requires-Dist: Cython==3.0.11; extra == "cython"
29
+
30
+ .. image:: docs/logo.png
31
+ :alt: selectolax logo
32
+
33
+ -------------------------
34
+
35
+ .. image:: https://img.shields.io/pypi/v/selectolax.svg
36
+ :target: https://pypi.python.org/pypi/selectolax
37
+
38
+ A fast HTML5 parser with CSS selectors using `Modest <https://github.com/lexborisov/Modest/>`_ and
39
+ `Lexbor <https://github.com/lexbor/lexbor>`_ engines.
40
+
41
+
42
+ Installation
43
+ ------------
44
+ From PyPI using pip:
45
+
46
+ .. code-block:: bash
47
+
48
+ pip install selectolax
49
+
50
+ If installation fails due to compilation errors, you may need to install `Cython <https://github.com/cython/cython>`_:
51
+
52
+ .. code-block:: bash
53
+
54
+ pip install selectolax[cython]
55
+
56
+ This usually happens when you try to install an outdated version of selectolax on a newer version of Python.
57
+
58
+
59
+ Development version from GitHub:
60
+
61
+ .. code-block:: bash
62
+
63
+ git clone --recursive https://github.com/rushter/selectolax
64
+ cd selectolax
65
+ pip install -r requirements_dev.txt
66
+ python setup.py install
67
+
68
+ How to compile selectolax while developing:
69
+
70
+ .. code-block:: bash
71
+
72
+ make clean
73
+ make dev
74
+
75
+ Basic examples
76
+ --------------
77
+
78
+ Here are some basic examples to get you started with selectolax:
79
+
80
+ Parsing HTML and extracting text:
81
+
82
+ .. code:: python
83
+
84
+ In [1]: from selectolax.parser import HTMLParser
85
+ ...:
86
+ ...: html = """
87
+ ...: <h1 id="title" data-updated="20201101">Hi there</h1>
88
+ ...: <div class="post">Lorem Ipsum is simply dummy text of the printing and typesetting industry. </div>
89
+ ...: <div class="post">Lorem ipsum dolor sit amet, consectetur adipiscing elit.</div>
90
+ ...: """
91
+ ...: tree = HTMLParser(html)
92
+
93
+ In [2]: tree.css_first('h1#title').text()
94
+ Out[2]: 'Hi there'
95
+
96
+ In [3]: tree.css_first('h1#title').attributes
97
+ Out[3]: {'id': 'title', 'data-updated': '20201101'}
98
+
99
+ In [4]: [node.text() for node in tree.css('.post')]
100
+ Out[4]:
101
+ ['Lorem Ipsum is simply dummy text of the printing and typesetting industry. ',
102
+ 'Lorem ipsum dolor sit amet, consectetur adipiscing elit.']
103
+
104
+ Using advanced CSS selectors:
105
+
106
+ .. code:: python
107
+
108
+ In [1]: html = "<div><p id=p1><p id=p2><p id=p3><a>link</a><p id=p4><p id=p5>text<p id=p6></div>"
109
+ ...: selector = "div > :nth-child(2n+1):not(:has(a))"
110
+
111
+ In [2]: for node in HTMLParser(html).css(selector):
112
+ ...: print(node.attributes, node.text(), node.tag)
113
+ ...: print(node.parent.tag)
114
+ ...: print(node.html)
115
+ ...:
116
+ {'id': 'p1'} p
117
+ div
118
+ <p id="p1"></p>
119
+ {'id': 'p5'} text p
120
+ div
121
+ <p id="p5">text</p>
122
+
123
+
124
+ * `Detailed overview <https://github.com/rushter/selectolax/blob/master/examples/walkthrough.ipynb>`_
125
+
126
+ Available backends
127
+ ------------------
128
+
129
+ Selectolax supports two backends: ``Modest`` and ``Lexbor``. By default, all examples use the Modest backend.
130
+ Most of the features between backends are almost identical, but there are still some differences.
131
+
132
+ As of 2024, the preferred backend is ``Lexbor``. The ``Modest`` backend is still available for compatibility reasons
133
+ and the underlying C library that selectolax uses is not maintained anymore.
134
+
135
+
136
+ To use ``lexbor``, just import the parser and use it in the similar way to the `HTMLParser`.
137
+
138
+ .. code:: python
139
+
140
+ In [1]: from selectolax.lexbor import LexborHTMLParser
141
+
142
+ In [2]: html = """
143
+ ...: <title>Hi there</title>
144
+ ...: <div id="updated">2021-08-15</div>
145
+ ...: """
146
+
147
+ In [3]: parser = LexborHTMLParser(html)
148
+ In [4]: parser.root.css_first("#updated").text()
149
+ Out[4]: '2021-08-15'
150
+
151
+
152
+ Simple Benchmark
153
+ ----------------
154
+
155
+ * Extract title, links, scripts and a meta tag from main pages of top 754 domains. See ``examples/benchmark.py`` for more information.
156
+
157
+ ============================ ===========
158
+ Package Time
159
+ ============================ ===========
160
+ Beautiful Soup (html.parser) 61.02 sec.
161
+ lxml / Beautiful Soup (lxml) 9.09 sec.
162
+ html5_parser 16.10 sec.
163
+ selectolax (Modest) 2.94 sec.
164
+ selectolax (Lexbor) 2.39 sec.
165
+ ============================ ===========
166
+
167
+ Links
168
+ -----
169
+
170
+ * `selectolax API reference <http://selectolax.readthedocs.io/en/latest/parser.html>`_
171
+ * `Video introduction to web scraping using selectolax <https://youtu.be/HpRsfpPuUzE>`_
172
+ * `How to Scrape 7k Products with Python using selectolax and httpx <https://www.youtube.com/watch?v=XpGvq755J2U>`_
173
+ * `Detailed overview <https://github.com/rushter/selectolax/blob/master/examples/walkthrough.ipynb>`_
174
+ * `Modest introduction <https://lexborisov.github.io/Modest/>`_
175
+ * `Modest benchmark <http://lexborisov.github.io/benchmark-html-persers/>`_
176
+ * `Python benchmark <https://rushter.com/blog/python-fast-html-parser/>`_
177
+ * `Another Python benchmark <https://www.peterbe.com/plog/selectolax-or-pyquery>`_
178
+
179
+ License
180
+ -------
181
+
182
+ * Modest engine — `LGPL2.1 <https://github.com/lexborisov/Modest/blob/master/LICENSE>`_
183
+ * selectolax - `MIT <https://github.com/rushter/selectolax/blob/master/LICENSE>`_
@@ -0,0 +1,26 @@
1
+ selectolax-0.3.28.dist-info/LICENSE,sha256=MYCcM-Cv_rC2-lQiwDumin0E-rMXAhK-qIGGA29434Y,1077
2
+ selectolax-0.3.28.dist-info/WHEEL,sha256=BHRewEbUf61vtumHxoNlQbDQLU_vG0pDuFQgyArrrYE,111
3
+ selectolax-0.3.28.dist-info/METADATA,sha256=vTdwlSLUc9k1DeDfD4h5ZcM0cB0aIgShNit3UMiniAU,6060
4
+ selectolax-0.3.28.dist-info/top_level.txt,sha256=e5MuEM2PrQzoDlWetkFli9uXSlxa_ktW5jJEihhaI1c,11
5
+ selectolax-0.3.28.dist-info/RECORD,,
6
+ selectolax/parser.pxd,sha256=zZlg1vHUg6o4MXaiwKAo5S5hO_DqBGc4_E10qJ2EcM4,24564
7
+ selectolax/__init__.py,sha256=IhnQaAtBWz03SUIe66y78uQqmWBontg4z13rRupwa7Q,175
8
+ selectolax/base.pxi,sha256=eiPKlY9gG3l49qJoRQVLl1Ljza6z1k0A-met6sDPcqE,89
9
+ selectolax/lexbor.pyi,sha256=gf0rPd2B1EZyz_oN6EER-wFojg__Sz18GwjjVYo7SkU,6552
10
+ selectolax/lexbor.cpython-38-aarch64-linux-gnu.so,sha256=RdpEP3hSN58notcTjtbCPX_OtWKYdEJzDFv9hQNKRJE,9320136
11
+ selectolax/parser.pyi,sha256=qLkvStGG4K3daXChLChzPHGV5w5gmIEMvFwRpC_Q4EM,11561
12
+ selectolax/utils.pxi,sha256=uB0-0naFQPy1JpR2DiIlKnyLyC76yWLnUHSuH11xg6s,3459
13
+ selectolax/parser.c,sha256=yd8gEU0netTAe3rf5haknFFz-QtMHTi6nhKGW5GUTTE,2214825
14
+ selectolax/lexbor.pxd,sha256=PwygBdb1blWAQcxXubZS5uffhgcXaqgySNMPFMT02-c,20958
15
+ selectolax/lexbor.pyx,sha256=rpb32yQ2E_6nJeaPDQs3kb3GFoALZqQbVCN35kcUM-M,10882
16
+ selectolax/py.typed,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
17
+ selectolax/parser.pyx,sha256=o1HkYE_nQr3TS7EPlldJx2-ygU9B5FI2uWYFzdF-VaI,12953
18
+ selectolax/parser.cpython-38-aarch64-linux-gnu.so,sha256=UPArZ08hYBRVp22NBbvWfT2ecV9c4Rdu-j6pG0l6Vss,7582216
19
+ selectolax/lexbor.c,sha256=UkiarPPcp6IKa-HMZLnhap1CRltADEZdwsDWo5EviLQ,2359509
20
+ selectolax/modest/util.pxi,sha256=aX9UnRNTITImHVBTlIs9efOd3EyugLq_Lwuo0zVTiuQ,551
21
+ selectolax/modest/node.pxi,sha256=NrMzJnQJDCmgTHpUxpMHDyAfQ_AS_n_Cr_2ryEKjyL0,32550
22
+ selectolax/modest/selection.pxi,sha256=S55MMxEW2B1oPExB_DRwPM46WoWZU73J3rFRZU1URuQ,6393
23
+ selectolax/lexbor/attrs.pxi,sha256=Ol2RNzXZAcWaqJdDBUe0ChOCcA8HC990Hjncj98XAkw,3138
24
+ selectolax/lexbor/util.pxi,sha256=Zq7S-zlyU3wOo49wGHQHnmmhpbkrcJm59ZCTPENcZQA,563
25
+ selectolax/lexbor/node.pxi,sha256=-cqsA4gz9yL6hCte6uGgdQKvhIBZF_BZc_xHJn0rkCM,29340
26
+ selectolax/lexbor/selection.pxi,sha256=FA6npHtXjJjvS8H2_e_LS53i5zbpGYgb5zTh5Tf_XQY,6571
@@ -0,0 +1,5 @@
1
+ Wheel-Version: 1.0
2
+ Generator: setuptools (75.1.0)
3
+ Root-Is-Purelib: false
4
+ Tag: cp38-cp38-musllinux_1_2_aarch64
5
+
@@ -0,0 +1 @@
1
+ selectolax