scrape-cli 1.1__tar.gz → 1.1.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,131 @@
1
+ Metadata-Version: 2.1
2
+ Name: scrape-cli
3
+ Version: 1.1.2
4
+ Summary: It's a command-line tool to extract HTML elements using an XPath query or CSS3 selector.
5
+ Home-page: https://github.com/aborruso/scrape-cli
6
+ Author: Andrea Borruso
7
+ Author-email: aborruso@gmail.com
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: License :: OSI Approved :: MIT License
10
+ Classifier: Operating System :: OS Independent
11
+ Requires-Python: >=3.6
12
+ Description-Content-Type: text/markdown
13
+ License-File: LICENSE
14
+ Requires-Dist: cssselect
15
+ Requires-Dist: lxml
16
+
17
+ [![PyPI version](https://badge.fury.io/py/scrape-cli.svg)](https://badge.fury.io/py/scrape-cli)
18
+ [![Python Versions](https://img.shields.io/pypi/pyversions/scrape-cli.svg)](https://pypi.org/project/scrape-cli/)
19
+
20
+ # scrape cli
21
+
22
+ It's a **command-line tool** to **extract** HTML elements using an [**XPath**](https://www.w3schools.com/xml/xpath_intro.asp) query or [**CSS3 selector**](https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Selectors).
23
+
24
+ It's based on the great and simple [scraping tool](https://github.com/jeroenjanssens/data-science-at-the-command-line/blob/master/tools/scrape) written by [**Jeroen Janssens**](http://jeroenjanssens.com).
25
+
26
+ - [How does it work?](#how-does-it-work)
27
+ - [How to use it in Linux](#how-to-use-it-in-linux)
28
+ - [Note on building it](#note-on-building-it)
29
+
30
+
31
+
32
+ ## Installation
33
+
34
+ You can install scrape-cli using pip:
35
+
36
+ ### Using pipx (recommended for CLI tools)
37
+
38
+ ```bash
39
+ pipx install scrape-cli
40
+ ```
41
+
42
+ Using pip
43
+
44
+ ```bash
45
+ pip install scrape-cli
46
+ ```
47
+
48
+ Or install from source:
49
+
50
+ ```bash
51
+ git clone https://github.com/aborruso/scrape-cli
52
+ cd scrape-cli
53
+ pip install -e .
54
+ ```
55
+
56
+ ## Requirements
57
+ - Python >=3.6
58
+ - requests
59
+ - lxml
60
+ - cssselect
61
+
62
+ ## How does it work?
63
+
64
+ A CSS selector query like this
65
+
66
+ ```bash
67
+ curl -L 'https://en.wikipedia.org/wiki/List_of_sovereign_states' -s \
68
+ | scrape -be 'table.wikitable > tbody > tr > td > b > a'
69
+ ```
70
+
71
+ or an XPATH query like this one:
72
+
73
+ ```bash
74
+ curl -L 'https://en.wikipedia.org/wiki/List_of_sovereign_states' -s \
75
+ | scrape -be '//table[contains(@class, 'wikitable')]/tbody/tr/td/b/a'
76
+ ```
77
+
78
+ gives you back:
79
+
80
+ ```html
81
+ <html>
82
+ <head>
83
+ </head>
84
+ <body>
85
+ <a href="/wiki/Afghanistan" title="Afghanistan">
86
+ Afghanistan
87
+ </a>
88
+ <a href="/wiki/Albania" title="Albania">
89
+ Albania
90
+ </a>
91
+ <a href="/wiki/Algeria" title="Algeria">
92
+ Algeria
93
+ </a>
94
+ <a href="/wiki/Andorra" title="Andorra">
95
+ Andorra
96
+ </a>
97
+ <a href="/wiki/Angola" title="Angola">
98
+ Angola
99
+ </a>
100
+ <a href="/wiki/Antigua_and_Barbuda" title="Antigua and Barbuda">
101
+ Antigua and Barbuda
102
+ </a>
103
+ <a href="/wiki/Argentina" title="Argentina">
104
+ Argentina
105
+ </a>
106
+ <a href="/wiki/Armenia" title="Armenia">
107
+ Armenia
108
+ </a>
109
+ ...
110
+ ...
111
+ </body>
112
+ </html>
113
+ ```
114
+
115
+ Some notes on the commands:
116
+
117
+ - `-e` to set the query
118
+ - `-b` to add `<html>`, `<head>` and `<body>` tags to the HTML output.
119
+
120
+
121
+ ## Linux 64 bit precompiled binary
122
+
123
+ If you are looking for precompiled executables for Linux, please refer to the [Releases](https://github.com/aborruso/scrape-cli/releases) page on GitHub where you can find the latest precompiled binary file.
124
+
125
+ I have built the `scrape-linux-x86_64` precompiled binary, using [pyinstaller](https://www.pyinstaller.org/) and this command: `pyinstaller --onefile scrape.py`.<br>
126
+
127
+ Once you have built it, it's an executable, and it's possible to use it Linux 64 bit environment.
128
+
129
+ ## License
130
+
131
+ [MIT](LICENSE)
@@ -1,3 +1,6 @@
1
+ [![PyPI version](https://badge.fury.io/py/scrape-cli.svg)](https://badge.fury.io/py/scrape-cli)
2
+ [![Python Versions](https://img.shields.io/pypi/pyversions/scrape-cli.svg)](https://pypi.org/project/scrape-cli/)
3
+
1
4
  # scrape cli
2
5
 
3
6
  It's a **command-line tool** to **extract** HTML elements using an [**XPath**](https://www.w3schools.com/xml/xpath_intro.asp) query or [**CSS3 selector**](https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Selectors).
@@ -8,6 +11,38 @@ It's based on the great and simple [scraping tool](https://github.com/jeroenjans
8
11
  - [How to use it in Linux](#how-to-use-it-in-linux)
9
12
  - [Note on building it](#note-on-building-it)
10
13
 
14
+
15
+
16
+ ## Installation
17
+
18
+ You can install scrape-cli using pip:
19
+
20
+ ### Using pipx (recommended for CLI tools)
21
+
22
+ ```bash
23
+ pipx install scrape-cli
24
+ ```
25
+
26
+ Using pip
27
+
28
+ ```bash
29
+ pip install scrape-cli
30
+ ```
31
+
32
+ Or install from source:
33
+
34
+ ```bash
35
+ git clone https://github.com/aborruso/scrape-cli
36
+ cd scrape-cli
37
+ pip install -e .
38
+ ```
39
+
40
+ ## Requirements
41
+ - Python >=3.6
42
+ - requests
43
+ - lxml
44
+ - cssselect
45
+
11
46
  ## How does it work?
12
47
 
13
48
  A CSS selector query like this
@@ -66,27 +101,15 @@ Some notes on the commands:
66
101
  - `-e` to set the query
67
102
  - `-b` to add `<html>`, `<head>` and `<body>` tags to the HTML output.
68
103
 
69
- ## How to use it in Linux
70
104
 
71
- ```bash
72
- # go in example to the home folder
73
- cd ~
74
- # download scrape-cli
75
- wget "https://github.com/aborruso/scrape-cli/releases/download/v1.0/scrape"
76
- # move it in a folder of your PATH as /usr/bin
77
- sudo mv ./scrape /usr/bin
78
- # give it execute permission
79
- sudo chmod +x /usr/bin/scrape
80
- # use it
81
- ```
105
+ ## Linux 64 bit precompiled binary
82
106
 
83
- **Please note**: in OSX it seems not to work ([#8](https://github.com/aborruso/scrape-cli/issues/8)).
107
+ If you are looking for precompiled executables for Linux, please refer to the [Releases](https://github.com/aborruso/scrape-cli/releases) page on GitHub where you can find the latest precompiled binary file.
84
108
 
85
- ## Note on building it
109
+ I have built the `scrape-linux-x86_64` precompiled binary, using [pyinstaller](https://www.pyinstaller.org/) and this command: `pyinstaller --onefile scrape.py`.<br>
86
110
 
87
- The original source is written in Python 2, then I have built it in Python 2 environment.<br>
88
- There are two modules requirements: install in this environment `cssselect` and then `lxml`, in this order (using pip).
111
+ Once you have built it, it's an executable, and it's possible to use it Linux 64 bit environment.
89
112
 
90
- I have built it using [pyinstaller](https://www.pyinstaller.org/) and this command: `pyinstaller --onefile scrape.py`.<br>
113
+ ## License
91
114
 
92
- Once you have built it, it's an executable, and it's possible to use it in any environment.
115
+ [MIT](LICENSE)
@@ -4,7 +4,7 @@ scrape-cli - A command-line tool to extract HTML elements using XPath or CSS3 se
4
4
 
5
5
  from scrape_cli.scrape import main
6
6
 
7
- __version__ = "1.1"
7
+ __version__ = "1.1.2"
8
8
  __author__ = "Andrea Borruso"
9
9
  __author_email__ = "aborruso@gmail.com"
10
10
 
@@ -42,7 +42,7 @@ def main():
42
42
 
43
43
  expression = [e if e.startswith('//') else GenericTranslator().css_to_xpath(e) for e in args.expression]
44
44
 
45
- html_parser = etree.HTMLParser(encoding='utf-8', recover=True, strip_cdata=True)
45
+ html_parser = etree.HTMLParser(encoding='utf-8', recover=True)
46
46
 
47
47
  inp = open(args.file, 'rb') if args.file else args.html
48
48
  if args.rawinput:
@@ -0,0 +1,131 @@
1
+ Metadata-Version: 2.1
2
+ Name: scrape-cli
3
+ Version: 1.1.2
4
+ Summary: It's a command-line tool to extract HTML elements using an XPath query or CSS3 selector.
5
+ Home-page: https://github.com/aborruso/scrape-cli
6
+ Author: Andrea Borruso
7
+ Author-email: aborruso@gmail.com
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: License :: OSI Approved :: MIT License
10
+ Classifier: Operating System :: OS Independent
11
+ Requires-Python: >=3.6
12
+ Description-Content-Type: text/markdown
13
+ License-File: LICENSE
14
+ Requires-Dist: cssselect
15
+ Requires-Dist: lxml
16
+
17
+ [![PyPI version](https://badge.fury.io/py/scrape-cli.svg)](https://badge.fury.io/py/scrape-cli)
18
+ [![Python Versions](https://img.shields.io/pypi/pyversions/scrape-cli.svg)](https://pypi.org/project/scrape-cli/)
19
+
20
+ # scrape cli
21
+
22
+ It's a **command-line tool** to **extract** HTML elements using an [**XPath**](https://www.w3schools.com/xml/xpath_intro.asp) query or [**CSS3 selector**](https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Selectors).
23
+
24
+ It's based on the great and simple [scraping tool](https://github.com/jeroenjanssens/data-science-at-the-command-line/blob/master/tools/scrape) written by [**Jeroen Janssens**](http://jeroenjanssens.com).
25
+
26
+ - [How does it work?](#how-does-it-work)
27
+ - [How to use it in Linux](#how-to-use-it-in-linux)
28
+ - [Note on building it](#note-on-building-it)
29
+
30
+
31
+
32
+ ## Installation
33
+
34
+ You can install scrape-cli using pip:
35
+
36
+ ### Using pipx (recommended for CLI tools)
37
+
38
+ ```bash
39
+ pipx install scrape-cli
40
+ ```
41
+
42
+ Using pip
43
+
44
+ ```bash
45
+ pip install scrape-cli
46
+ ```
47
+
48
+ Or install from source:
49
+
50
+ ```bash
51
+ git clone https://github.com/aborruso/scrape-cli
52
+ cd scrape-cli
53
+ pip install -e .
54
+ ```
55
+
56
+ ## Requirements
57
+ - Python >=3.6
58
+ - requests
59
+ - lxml
60
+ - cssselect
61
+
62
+ ## How does it work?
63
+
64
+ A CSS selector query like this
65
+
66
+ ```bash
67
+ curl -L 'https://en.wikipedia.org/wiki/List_of_sovereign_states' -s \
68
+ | scrape -be 'table.wikitable > tbody > tr > td > b > a'
69
+ ```
70
+
71
+ or an XPATH query like this one:
72
+
73
+ ```bash
74
+ curl -L 'https://en.wikipedia.org/wiki/List_of_sovereign_states' -s \
75
+ | scrape -be '//table[contains(@class, 'wikitable')]/tbody/tr/td/b/a'
76
+ ```
77
+
78
+ gives you back:
79
+
80
+ ```html
81
+ <html>
82
+ <head>
83
+ </head>
84
+ <body>
85
+ <a href="/wiki/Afghanistan" title="Afghanistan">
86
+ Afghanistan
87
+ </a>
88
+ <a href="/wiki/Albania" title="Albania">
89
+ Albania
90
+ </a>
91
+ <a href="/wiki/Algeria" title="Algeria">
92
+ Algeria
93
+ </a>
94
+ <a href="/wiki/Andorra" title="Andorra">
95
+ Andorra
96
+ </a>
97
+ <a href="/wiki/Angola" title="Angola">
98
+ Angola
99
+ </a>
100
+ <a href="/wiki/Antigua_and_Barbuda" title="Antigua and Barbuda">
101
+ Antigua and Barbuda
102
+ </a>
103
+ <a href="/wiki/Argentina" title="Argentina">
104
+ Argentina
105
+ </a>
106
+ <a href="/wiki/Armenia" title="Armenia">
107
+ Armenia
108
+ </a>
109
+ ...
110
+ ...
111
+ </body>
112
+ </html>
113
+ ```
114
+
115
+ Some notes on the commands:
116
+
117
+ - `-e` to set the query
118
+ - `-b` to add `<html>`, `<head>` and `<body>` tags to the HTML output.
119
+
120
+
121
+ ## Linux 64 bit precompiled binary
122
+
123
+ If you are looking for precompiled executables for Linux, please refer to the [Releases](https://github.com/aborruso/scrape-cli/releases) page on GitHub where you can find the latest precompiled binary file.
124
+
125
+ I have built the `scrape-linux-x86_64` precompiled binary, using [pyinstaller](https://www.pyinstaller.org/) and this command: `pyinstaller --onefile scrape.py`.<br>
126
+
127
+ Once you have built it, it's an executable, and it's possible to use it Linux 64 bit environment.
128
+
129
+ ## License
130
+
131
+ [MIT](LICENSE)
@@ -1,9 +1,17 @@
1
+ # setup.py
1
2
  from setuptools import setup
3
+ from pathlib import Path
4
+
5
+ # Leggi il README
6
+ this_directory = Path(__file__).parent
7
+ long_description = (this_directory / "README.md").read_text(encoding="utf-8")
2
8
 
3
9
  setup(
4
10
  name="scrape-cli",
5
- version="1.1",
11
+ version="1.1.2",
6
12
  description="It's a command-line tool to extract HTML elements using an XPath query or CSS3 selector.",
13
+ long_description=long_description,
14
+ long_description_content_type="text/markdown", # Specifica formato Markdown
7
15
  author="Andrea Borruso",
8
16
  author_email="aborruso@gmail.com",
9
17
  url="https://github.com/aborruso/scrape-cli",
scrape_cli-1.1/PKG-INFO DELETED
@@ -1,14 +0,0 @@
1
- Metadata-Version: 2.1
2
- Name: scrape-cli
3
- Version: 1.1
4
- Summary: It's a command-line tool to extract HTML elements using an XPath query or CSS3 selector.
5
- Home-page: https://github.com/aborruso/scrape-cli
6
- Author: Andrea Borruso
7
- Author-email: aborruso@gmail.com
8
- Classifier: Programming Language :: Python :: 3
9
- Classifier: License :: OSI Approved :: MIT License
10
- Classifier: Operating System :: OS Independent
11
- Requires-Python: >=3.6
12
- License-File: LICENSE
13
- Requires-Dist: cssselect
14
- Requires-Dist: lxml
@@ -1,14 +0,0 @@
1
- Metadata-Version: 2.1
2
- Name: scrape-cli
3
- Version: 1.1
4
- Summary: It's a command-line tool to extract HTML elements using an XPath query or CSS3 selector.
5
- Home-page: https://github.com/aborruso/scrape-cli
6
- Author: Andrea Borruso
7
- Author-email: aborruso@gmail.com
8
- Classifier: Programming Language :: Python :: 3
9
- Classifier: License :: OSI Approved :: MIT License
10
- Classifier: Operating System :: OS Independent
11
- Requires-Python: >=3.6
12
- License-File: LICENSE
13
- Requires-Dist: cssselect
14
- Requires-Dist: lxml
File without changes
File without changes