scrape-cli 1.1__tar.gz → 1.1.2__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- scrape_cli-1.1.2/PKG-INFO +131 -0
- {scrape_cli-1.1 → scrape_cli-1.1.2}/README.md +41 -18
- {scrape_cli-1.1 → scrape_cli-1.1.2}/scrape_cli/__init__.py +1 -1
- {scrape_cli-1.1 → scrape_cli-1.1.2}/scrape_cli/scrape.py +1 -1
- scrape_cli-1.1.2/scrape_cli.egg-info/PKG-INFO +131 -0
- {scrape_cli-1.1 → scrape_cli-1.1.2}/setup.py +9 -1
- scrape_cli-1.1/PKG-INFO +0 -14
- scrape_cli-1.1/scrape_cli.egg-info/PKG-INFO +0 -14
- {scrape_cli-1.1 → scrape_cli-1.1.2}/LICENSE +0 -0
- {scrape_cli-1.1 → scrape_cli-1.1.2}/scrape_cli.egg-info/SOURCES.txt +0 -0
- {scrape_cli-1.1 → scrape_cli-1.1.2}/scrape_cli.egg-info/dependency_links.txt +0 -0
- {scrape_cli-1.1 → scrape_cli-1.1.2}/scrape_cli.egg-info/entry_points.txt +0 -0
- {scrape_cli-1.1 → scrape_cli-1.1.2}/scrape_cli.egg-info/requires.txt +0 -0
- {scrape_cli-1.1 → scrape_cli-1.1.2}/scrape_cli.egg-info/top_level.txt +0 -0
- {scrape_cli-1.1 → scrape_cli-1.1.2}/setup.cfg +0 -0
|
@@ -0,0 +1,131 @@
|
|
|
1
|
+
Metadata-Version: 2.1
|
|
2
|
+
Name: scrape-cli
|
|
3
|
+
Version: 1.1.2
|
|
4
|
+
Summary: It's a command-line tool to extract HTML elements using an XPath query or CSS3 selector.
|
|
5
|
+
Home-page: https://github.com/aborruso/scrape-cli
|
|
6
|
+
Author: Andrea Borruso
|
|
7
|
+
Author-email: aborruso@gmail.com
|
|
8
|
+
Classifier: Programming Language :: Python :: 3
|
|
9
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
10
|
+
Classifier: Operating System :: OS Independent
|
|
11
|
+
Requires-Python: >=3.6
|
|
12
|
+
Description-Content-Type: text/markdown
|
|
13
|
+
License-File: LICENSE
|
|
14
|
+
Requires-Dist: cssselect
|
|
15
|
+
Requires-Dist: lxml
|
|
16
|
+
|
|
17
|
+
[](https://badge.fury.io/py/scrape-cli)
|
|
18
|
+
[](https://pypi.org/project/scrape-cli/)
|
|
19
|
+
|
|
20
|
+
# scrape cli
|
|
21
|
+
|
|
22
|
+
It's a **command-line tool** to **extract** HTML elements using an [**XPath**](https://www.w3schools.com/xml/xpath_intro.asp) query or [**CSS3 selector**](https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Selectors).
|
|
23
|
+
|
|
24
|
+
It's based on the great and simple [scraping tool](https://github.com/jeroenjanssens/data-science-at-the-command-line/blob/master/tools/scrape) written by [**Jeroen Janssens**](http://jeroenjanssens.com).
|
|
25
|
+
|
|
26
|
+
- [How does it work?](#how-does-it-work)
|
|
27
|
+
- [How to use it in Linux](#how-to-use-it-in-linux)
|
|
28
|
+
- [Note on building it](#note-on-building-it)
|
|
29
|
+
|
|
30
|
+
|
|
31
|
+
|
|
32
|
+
## Installation
|
|
33
|
+
|
|
34
|
+
You can install scrape-cli using pip:
|
|
35
|
+
|
|
36
|
+
### Using pipx (recommended for CLI tools)
|
|
37
|
+
|
|
38
|
+
```bash
|
|
39
|
+
pipx install scrape-cli
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
Using pip
|
|
43
|
+
|
|
44
|
+
```bash
|
|
45
|
+
pip install scrape-cli
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
Or install from source:
|
|
49
|
+
|
|
50
|
+
```bash
|
|
51
|
+
git clone https://github.com/aborruso/scrape-cli
|
|
52
|
+
cd scrape-cli
|
|
53
|
+
pip install -e .
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
## Requirements
|
|
57
|
+
- Python >=3.6
|
|
58
|
+
- requests
|
|
59
|
+
- lxml
|
|
60
|
+
- cssselect
|
|
61
|
+
|
|
62
|
+
## How does it work?
|
|
63
|
+
|
|
64
|
+
A CSS selector query like this
|
|
65
|
+
|
|
66
|
+
```bash
|
|
67
|
+
curl -L 'https://en.wikipedia.org/wiki/List_of_sovereign_states' -s \
|
|
68
|
+
| scrape -be 'table.wikitable > tbody > tr > td > b > a'
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
or an XPATH query like this one:
|
|
72
|
+
|
|
73
|
+
```bash
|
|
74
|
+
curl -L 'https://en.wikipedia.org/wiki/List_of_sovereign_states' -s \
|
|
75
|
+
| scrape -be '//table[contains(@class, 'wikitable')]/tbody/tr/td/b/a'
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
gives you back:
|
|
79
|
+
|
|
80
|
+
```html
|
|
81
|
+
<html>
|
|
82
|
+
<head>
|
|
83
|
+
</head>
|
|
84
|
+
<body>
|
|
85
|
+
<a href="/wiki/Afghanistan" title="Afghanistan">
|
|
86
|
+
Afghanistan
|
|
87
|
+
</a>
|
|
88
|
+
<a href="/wiki/Albania" title="Albania">
|
|
89
|
+
Albania
|
|
90
|
+
</a>
|
|
91
|
+
<a href="/wiki/Algeria" title="Algeria">
|
|
92
|
+
Algeria
|
|
93
|
+
</a>
|
|
94
|
+
<a href="/wiki/Andorra" title="Andorra">
|
|
95
|
+
Andorra
|
|
96
|
+
</a>
|
|
97
|
+
<a href="/wiki/Angola" title="Angola">
|
|
98
|
+
Angola
|
|
99
|
+
</a>
|
|
100
|
+
<a href="/wiki/Antigua_and_Barbuda" title="Antigua and Barbuda">
|
|
101
|
+
Antigua and Barbuda
|
|
102
|
+
</a>
|
|
103
|
+
<a href="/wiki/Argentina" title="Argentina">
|
|
104
|
+
Argentina
|
|
105
|
+
</a>
|
|
106
|
+
<a href="/wiki/Armenia" title="Armenia">
|
|
107
|
+
Armenia
|
|
108
|
+
</a>
|
|
109
|
+
...
|
|
110
|
+
...
|
|
111
|
+
</body>
|
|
112
|
+
</html>
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
Some notes on the commands:
|
|
116
|
+
|
|
117
|
+
- `-e` to set the query
|
|
118
|
+
- `-b` to add `<html>`, `<head>` and `<body>` tags to the HTML output.
|
|
119
|
+
|
|
120
|
+
|
|
121
|
+
## Linux 64 bit precompiled binary
|
|
122
|
+
|
|
123
|
+
If you are looking for precompiled executables for Linux, please refer to the [Releases](https://github.com/aborruso/scrape-cli/releases) page on GitHub where you can find the latest precompiled binary file.
|
|
124
|
+
|
|
125
|
+
I have built the `scrape-linux-x86_64` precompiled binary, using [pyinstaller](https://www.pyinstaller.org/) and this command: `pyinstaller --onefile scrape.py`.<br>
|
|
126
|
+
|
|
127
|
+
Once you have built it, it's an executable, and it's possible to use it Linux 64 bit environment.
|
|
128
|
+
|
|
129
|
+
## License
|
|
130
|
+
|
|
131
|
+
[MIT](LICENSE)
|
|
@@ -1,3 +1,6 @@
|
|
|
1
|
+
[](https://badge.fury.io/py/scrape-cli)
|
|
2
|
+
[](https://pypi.org/project/scrape-cli/)
|
|
3
|
+
|
|
1
4
|
# scrape cli
|
|
2
5
|
|
|
3
6
|
It's a **command-line tool** to **extract** HTML elements using an [**XPath**](https://www.w3schools.com/xml/xpath_intro.asp) query or [**CSS3 selector**](https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Selectors).
|
|
@@ -8,6 +11,38 @@ It's based on the great and simple [scraping tool](https://github.com/jeroenjans
|
|
|
8
11
|
- [How to use it in Linux](#how-to-use-it-in-linux)
|
|
9
12
|
- [Note on building it](#note-on-building-it)
|
|
10
13
|
|
|
14
|
+
|
|
15
|
+
|
|
16
|
+
## Installation
|
|
17
|
+
|
|
18
|
+
You can install scrape-cli using pip:
|
|
19
|
+
|
|
20
|
+
### Using pipx (recommended for CLI tools)
|
|
21
|
+
|
|
22
|
+
```bash
|
|
23
|
+
pipx install scrape-cli
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
Using pip
|
|
27
|
+
|
|
28
|
+
```bash
|
|
29
|
+
pip install scrape-cli
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
Or install from source:
|
|
33
|
+
|
|
34
|
+
```bash
|
|
35
|
+
git clone https://github.com/aborruso/scrape-cli
|
|
36
|
+
cd scrape-cli
|
|
37
|
+
pip install -e .
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
## Requirements
|
|
41
|
+
- Python >=3.6
|
|
42
|
+
- requests
|
|
43
|
+
- lxml
|
|
44
|
+
- cssselect
|
|
45
|
+
|
|
11
46
|
## How does it work?
|
|
12
47
|
|
|
13
48
|
A CSS selector query like this
|
|
@@ -66,27 +101,15 @@ Some notes on the commands:
|
|
|
66
101
|
- `-e` to set the query
|
|
67
102
|
- `-b` to add `<html>`, `<head>` and `<body>` tags to the HTML output.
|
|
68
103
|
|
|
69
|
-
## How to use it in Linux
|
|
70
104
|
|
|
71
|
-
|
|
72
|
-
# go in example to the home folder
|
|
73
|
-
cd ~
|
|
74
|
-
# download scrape-cli
|
|
75
|
-
wget "https://github.com/aborruso/scrape-cli/releases/download/v1.0/scrape"
|
|
76
|
-
# move it in a folder of your PATH as /usr/bin
|
|
77
|
-
sudo mv ./scrape /usr/bin
|
|
78
|
-
# give it execute permission
|
|
79
|
-
sudo chmod +x /usr/bin/scrape
|
|
80
|
-
# use it
|
|
81
|
-
```
|
|
105
|
+
## Linux 64 bit precompiled binary
|
|
82
106
|
|
|
83
|
-
|
|
107
|
+
If you are looking for precompiled executables for Linux, please refer to the [Releases](https://github.com/aborruso/scrape-cli/releases) page on GitHub where you can find the latest precompiled binary file.
|
|
84
108
|
|
|
85
|
-
|
|
109
|
+
I have built the `scrape-linux-x86_64` precompiled binary, using [pyinstaller](https://www.pyinstaller.org/) and this command: `pyinstaller --onefile scrape.py`.<br>
|
|
86
110
|
|
|
87
|
-
|
|
88
|
-
There are two modules requirements: install in this environment `cssselect` and then `lxml`, in this order (using pip).
|
|
111
|
+
Once you have built it, it's an executable, and it's possible to use it Linux 64 bit environment.
|
|
89
112
|
|
|
90
|
-
|
|
113
|
+
## License
|
|
91
114
|
|
|
92
|
-
|
|
115
|
+
[MIT](LICENSE)
|
|
@@ -42,7 +42,7 @@ def main():
|
|
|
42
42
|
|
|
43
43
|
expression = [e if e.startswith('//') else GenericTranslator().css_to_xpath(e) for e in args.expression]
|
|
44
44
|
|
|
45
|
-
html_parser = etree.HTMLParser(encoding='utf-8', recover=True
|
|
45
|
+
html_parser = etree.HTMLParser(encoding='utf-8', recover=True)
|
|
46
46
|
|
|
47
47
|
inp = open(args.file, 'rb') if args.file else args.html
|
|
48
48
|
if args.rawinput:
|
|
@@ -0,0 +1,131 @@
|
|
|
1
|
+
Metadata-Version: 2.1
|
|
2
|
+
Name: scrape-cli
|
|
3
|
+
Version: 1.1.2
|
|
4
|
+
Summary: It's a command-line tool to extract HTML elements using an XPath query or CSS3 selector.
|
|
5
|
+
Home-page: https://github.com/aborruso/scrape-cli
|
|
6
|
+
Author: Andrea Borruso
|
|
7
|
+
Author-email: aborruso@gmail.com
|
|
8
|
+
Classifier: Programming Language :: Python :: 3
|
|
9
|
+
Classifier: License :: OSI Approved :: MIT License
|
|
10
|
+
Classifier: Operating System :: OS Independent
|
|
11
|
+
Requires-Python: >=3.6
|
|
12
|
+
Description-Content-Type: text/markdown
|
|
13
|
+
License-File: LICENSE
|
|
14
|
+
Requires-Dist: cssselect
|
|
15
|
+
Requires-Dist: lxml
|
|
16
|
+
|
|
17
|
+
[](https://badge.fury.io/py/scrape-cli)
|
|
18
|
+
[](https://pypi.org/project/scrape-cli/)
|
|
19
|
+
|
|
20
|
+
# scrape cli
|
|
21
|
+
|
|
22
|
+
It's a **command-line tool** to **extract** HTML elements using an [**XPath**](https://www.w3schools.com/xml/xpath_intro.asp) query or [**CSS3 selector**](https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Selectors).
|
|
23
|
+
|
|
24
|
+
It's based on the great and simple [scraping tool](https://github.com/jeroenjanssens/data-science-at-the-command-line/blob/master/tools/scrape) written by [**Jeroen Janssens**](http://jeroenjanssens.com).
|
|
25
|
+
|
|
26
|
+
- [How does it work?](#how-does-it-work)
|
|
27
|
+
- [How to use it in Linux](#how-to-use-it-in-linux)
|
|
28
|
+
- [Note on building it](#note-on-building-it)
|
|
29
|
+
|
|
30
|
+
|
|
31
|
+
|
|
32
|
+
## Installation
|
|
33
|
+
|
|
34
|
+
You can install scrape-cli using pip:
|
|
35
|
+
|
|
36
|
+
### Using pipx (recommended for CLI tools)
|
|
37
|
+
|
|
38
|
+
```bash
|
|
39
|
+
pipx install scrape-cli
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
Using pip
|
|
43
|
+
|
|
44
|
+
```bash
|
|
45
|
+
pip install scrape-cli
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
Or install from source:
|
|
49
|
+
|
|
50
|
+
```bash
|
|
51
|
+
git clone https://github.com/aborruso/scrape-cli
|
|
52
|
+
cd scrape-cli
|
|
53
|
+
pip install -e .
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
## Requirements
|
|
57
|
+
- Python >=3.6
|
|
58
|
+
- requests
|
|
59
|
+
- lxml
|
|
60
|
+
- cssselect
|
|
61
|
+
|
|
62
|
+
## How does it work?
|
|
63
|
+
|
|
64
|
+
A CSS selector query like this
|
|
65
|
+
|
|
66
|
+
```bash
|
|
67
|
+
curl -L 'https://en.wikipedia.org/wiki/List_of_sovereign_states' -s \
|
|
68
|
+
| scrape -be 'table.wikitable > tbody > tr > td > b > a'
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
or an XPATH query like this one:
|
|
72
|
+
|
|
73
|
+
```bash
|
|
74
|
+
curl -L 'https://en.wikipedia.org/wiki/List_of_sovereign_states' -s \
|
|
75
|
+
| scrape -be '//table[contains(@class, 'wikitable')]/tbody/tr/td/b/a'
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
gives you back:
|
|
79
|
+
|
|
80
|
+
```html
|
|
81
|
+
<html>
|
|
82
|
+
<head>
|
|
83
|
+
</head>
|
|
84
|
+
<body>
|
|
85
|
+
<a href="/wiki/Afghanistan" title="Afghanistan">
|
|
86
|
+
Afghanistan
|
|
87
|
+
</a>
|
|
88
|
+
<a href="/wiki/Albania" title="Albania">
|
|
89
|
+
Albania
|
|
90
|
+
</a>
|
|
91
|
+
<a href="/wiki/Algeria" title="Algeria">
|
|
92
|
+
Algeria
|
|
93
|
+
</a>
|
|
94
|
+
<a href="/wiki/Andorra" title="Andorra">
|
|
95
|
+
Andorra
|
|
96
|
+
</a>
|
|
97
|
+
<a href="/wiki/Angola" title="Angola">
|
|
98
|
+
Angola
|
|
99
|
+
</a>
|
|
100
|
+
<a href="/wiki/Antigua_and_Barbuda" title="Antigua and Barbuda">
|
|
101
|
+
Antigua and Barbuda
|
|
102
|
+
</a>
|
|
103
|
+
<a href="/wiki/Argentina" title="Argentina">
|
|
104
|
+
Argentina
|
|
105
|
+
</a>
|
|
106
|
+
<a href="/wiki/Armenia" title="Armenia">
|
|
107
|
+
Armenia
|
|
108
|
+
</a>
|
|
109
|
+
...
|
|
110
|
+
...
|
|
111
|
+
</body>
|
|
112
|
+
</html>
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
Some notes on the commands:
|
|
116
|
+
|
|
117
|
+
- `-e` to set the query
|
|
118
|
+
- `-b` to add `<html>`, `<head>` and `<body>` tags to the HTML output.
|
|
119
|
+
|
|
120
|
+
|
|
121
|
+
## Linux 64 bit precompiled binary
|
|
122
|
+
|
|
123
|
+
If you are looking for precompiled executables for Linux, please refer to the [Releases](https://github.com/aborruso/scrape-cli/releases) page on GitHub where you can find the latest precompiled binary file.
|
|
124
|
+
|
|
125
|
+
I have built the `scrape-linux-x86_64` precompiled binary, using [pyinstaller](https://www.pyinstaller.org/) and this command: `pyinstaller --onefile scrape.py`.<br>
|
|
126
|
+
|
|
127
|
+
Once you have built it, it's an executable, and it's possible to use it Linux 64 bit environment.
|
|
128
|
+
|
|
129
|
+
## License
|
|
130
|
+
|
|
131
|
+
[MIT](LICENSE)
|
|
@@ -1,9 +1,17 @@
|
|
|
1
|
+
# setup.py
|
|
1
2
|
from setuptools import setup
|
|
3
|
+
from pathlib import Path
|
|
4
|
+
|
|
5
|
+
# Leggi il README
|
|
6
|
+
this_directory = Path(__file__).parent
|
|
7
|
+
long_description = (this_directory / "README.md").read_text(encoding="utf-8")
|
|
2
8
|
|
|
3
9
|
setup(
|
|
4
10
|
name="scrape-cli",
|
|
5
|
-
version="1.1",
|
|
11
|
+
version="1.1.2",
|
|
6
12
|
description="It's a command-line tool to extract HTML elements using an XPath query or CSS3 selector.",
|
|
13
|
+
long_description=long_description,
|
|
14
|
+
long_description_content_type="text/markdown", # Specifica formato Markdown
|
|
7
15
|
author="Andrea Borruso",
|
|
8
16
|
author_email="aborruso@gmail.com",
|
|
9
17
|
url="https://github.com/aborruso/scrape-cli",
|
scrape_cli-1.1/PKG-INFO
DELETED
|
@@ -1,14 +0,0 @@
|
|
|
1
|
-
Metadata-Version: 2.1
|
|
2
|
-
Name: scrape-cli
|
|
3
|
-
Version: 1.1
|
|
4
|
-
Summary: It's a command-line tool to extract HTML elements using an XPath query or CSS3 selector.
|
|
5
|
-
Home-page: https://github.com/aborruso/scrape-cli
|
|
6
|
-
Author: Andrea Borruso
|
|
7
|
-
Author-email: aborruso@gmail.com
|
|
8
|
-
Classifier: Programming Language :: Python :: 3
|
|
9
|
-
Classifier: License :: OSI Approved :: MIT License
|
|
10
|
-
Classifier: Operating System :: OS Independent
|
|
11
|
-
Requires-Python: >=3.6
|
|
12
|
-
License-File: LICENSE
|
|
13
|
-
Requires-Dist: cssselect
|
|
14
|
-
Requires-Dist: lxml
|
|
@@ -1,14 +0,0 @@
|
|
|
1
|
-
Metadata-Version: 2.1
|
|
2
|
-
Name: scrape-cli
|
|
3
|
-
Version: 1.1
|
|
4
|
-
Summary: It's a command-line tool to extract HTML elements using an XPath query or CSS3 selector.
|
|
5
|
-
Home-page: https://github.com/aborruso/scrape-cli
|
|
6
|
-
Author: Andrea Borruso
|
|
7
|
-
Author-email: aborruso@gmail.com
|
|
8
|
-
Classifier: Programming Language :: Python :: 3
|
|
9
|
-
Classifier: License :: OSI Approved :: MIT License
|
|
10
|
-
Classifier: Operating System :: OS Independent
|
|
11
|
-
Requires-Python: >=3.6
|
|
12
|
-
License-File: LICENSE
|
|
13
|
-
Requires-Dist: cssselect
|
|
14
|
-
Requires-Dist: lxml
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|