stealth-requests 0.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,110 @@
1
+ Metadata-Version: 2.1
2
+ Name: stealth-requests
3
+ Version: 0.1
4
+ Summary: Make HTTP requests exactly like a browser.
5
+ Home-page: https://github.com/jpjacobpadilla/Stealth-Requests
6
+ Author: Jacob Padilla
7
+ Author-email: Jacob Padilla <jp@jacobpadilla.com>
8
+ License: MIT
9
+ Project-URL: Homepage, https://github.com/jpjacobpadilla/Stealth-Requests
10
+ Requires-Python: >=3.6
11
+ Description-Content-Type: text/markdown
12
+ Requires-Dist: curl_cffi
13
+ Provides-Extra: parsers
14
+ Requires-Dist: lxml; extra == "parsers"
15
+ Requires-Dist: html2text; extra == "parsers"
16
+ Requires-Dist: beautifulsoup4; extra == "parsers"
17
+
18
+ <p align="center">
19
+ <img src="https://github.com/jpjacobpadilla/Stealth-Requests/blob/7f83b67a0d62a932663d8216bad7d25971c90aaf/logo.png">
20
+ </p>
21
+
22
+ <h1 align="center">Stay Undetected While Scraping the Web.</h1>
23
+
24
+ ### The All-In-One Solution to Web Scraping:
25
+ - Mimic the headers sent by a browser when going to a website (GET requests)
26
+ - Automatically handle and update the Referer header & client hint headers
27
+ - Mask the TLS fingerprint of the request using the [curl_cffi](https://curl-cffi.readthedocs.io/en/latest/) package
28
+ - Automatically parse the metadata from HTML responses such as page title, description, thumbnail, author, etc...
29
+ - Easily get an [lxml](https://lxml.de/apidoc/lxml.html) tree or [BeautifulSoup](https://beautiful-soup-4.readthedocs.io/en/latest/) object from the HTTP response.
30
+
31
+ ### Sending Requests
32
+
33
+ This package mimics the API of the `requests` package, and thus can be used in basically the same way.
34
+
35
+ You can send one-off requests like such:
36
+
37
+ ```python
38
+ import stealth_requests and requests
39
+
40
+ resp = requests.get(link)
41
+ ```
42
+
43
+ Or you can use a `StealthSession` object which will keep track of certain headers for you between requests such as the `Referer` header.
44
+
45
+ ```python
46
+ from stealth_requests import StealthSession
47
+
48
+ with StealthSession() as s:
49
+ resp = s.get(link)
50
+ ```
51
+
52
+ When sending a one-off request, or creating a session, you can specify the type of browser that you want the request to mimic - either `safari` or `chrome` (which is the default).
53
+
54
+ ### Sending Requests With Asyncio
55
+
56
+ This package supports Asyncio in the same way as the `requests` package.
57
+
58
+ ```python
59
+ from stealth_requests import AsyncStealthSession
60
+
61
+ async with AsyncStealthSession(impersonate='chrome') as s:
62
+ resp = await s.get(link)
63
+ ```
64
+
65
+ or, for a one off request you can do something like this:
66
+
67
+ ```python
68
+ from curl_cffi import requests
69
+
70
+ resp = await requests.post(link, data=...)
71
+ ```
72
+
73
+ ### Getting Response Metadata
74
+
75
+ The response returned from this package is a `StealthResponse` which has all of the same methods and attributes as a standard `requests` response, with a few added features. One if automatic parsing of header metadata. The metadata can be accessed from the `meta` attribute, which gives you access to the following data (if it's avaible on the scraped website):
76
+
77
+ - title: str
78
+ - description: str
79
+ - thumbnail: str
80
+ - author: str
81
+ - keywords: tuple[str]
82
+ - twitter_handle: str
83
+ - robots: tuple[str]
84
+ - canonical: str
85
+
86
+ Heres an example of how to get the title of a page:
87
+
88
+ ```python
89
+ import stealth_requests and requests
90
+
91
+ resp = requests.get(link)
92
+ print(resp.meta.title)
93
+ ```
94
+
95
+ ### Parsing Response
96
+
97
+ To make parsing HTML easier, I've also added two popular parsing packages to this project - `Lxml` and `BeautifulSoup4`. To install these add-ons you need to install the parsers extra: `pip install stealth_requests[parsers]`.
98
+
99
+ To easily get an Lxml tree, you can use `resp.tree()` and to get a BeautifulSoup object, use the `resp.soup()` method.
100
+
101
+ For simple parsing, I've also added the following convience methods right to the `StealthResponse` object:
102
+
103
+ - `iterlinks` Iterate through all links in an HTML response
104
+ - `itertext`: Iterate through all text in an HTML response
105
+ - `text_content`: Get all text content in an HTML response
106
+ - `xpath` Go right to using XPATH expressions instead of getting your own Lxml tree.
107
+
108
+ ### Getting HTML response in MarkDown format
109
+
110
+ Sometimes it's easier to get a webpage in MarkDown format instead of HTML. To do this, use the `resp.markdown()` method, after sending a GET request to a website.
@@ -0,0 +1,93 @@
1
+ <p align="center">
2
+ <img src="https://github.com/jpjacobpadilla/Stealth-Requests/blob/7f83b67a0d62a932663d8216bad7d25971c90aaf/logo.png">
3
+ </p>
4
+
5
+ <h1 align="center">Stay Undetected While Scraping the Web.</h1>
6
+
7
+ ### The All-In-One Solution to Web Scraping:
8
+ - Mimic the headers sent by a browser when going to a website (GET requests)
9
+ - Automatically handle and update the Referer header & client hint headers
10
+ - Mask the TLS fingerprint of the request using the [curl_cffi](https://curl-cffi.readthedocs.io/en/latest/) package
11
+ - Automatically parse the metadata from HTML responses such as page title, description, thumbnail, author, etc...
12
+ - Easily get an [lxml](https://lxml.de/apidoc/lxml.html) tree or [BeautifulSoup](https://beautiful-soup-4.readthedocs.io/en/latest/) object from the HTTP response.
13
+
14
+ ### Sending Requests
15
+
16
+ This package mimics the API of the `requests` package, and thus can be used in basically the same way.
17
+
18
+ You can send one-off requests like such:
19
+
20
+ ```python
21
+ import stealth_requests and requests
22
+
23
+ resp = requests.get(link)
24
+ ```
25
+
26
+ Or you can use a `StealthSession` object which will keep track of certain headers for you between requests such as the `Referer` header.
27
+
28
+ ```python
29
+ from stealth_requests import StealthSession
30
+
31
+ with StealthSession() as s:
32
+ resp = s.get(link)
33
+ ```
34
+
35
+ When sending a one-off request, or creating a session, you can specify the type of browser that you want the request to mimic - either `safari` or `chrome` (which is the default).
36
+
37
+ ### Sending Requests With Asyncio
38
+
39
+ This package supports Asyncio in the same way as the `requests` package.
40
+
41
+ ```python
42
+ from stealth_requests import AsyncStealthSession
43
+
44
+ async with AsyncStealthSession(impersonate='chrome') as s:
45
+ resp = await s.get(link)
46
+ ```
47
+
48
+ or, for a one off request you can do something like this:
49
+
50
+ ```python
51
+ from curl_cffi import requests
52
+
53
+ resp = await requests.post(link, data=...)
54
+ ```
55
+
56
+ ### Getting Response Metadata
57
+
58
+ The response returned from this package is a `StealthResponse` which has all of the same methods and attributes as a standard `requests` response, with a few added features. One if automatic parsing of header metadata. The metadata can be accessed from the `meta` attribute, which gives you access to the following data (if it's avaible on the scraped website):
59
+
60
+ - title: str
61
+ - description: str
62
+ - thumbnail: str
63
+ - author: str
64
+ - keywords: tuple[str]
65
+ - twitter_handle: str
66
+ - robots: tuple[str]
67
+ - canonical: str
68
+
69
+ Heres an example of how to get the title of a page:
70
+
71
+ ```python
72
+ import stealth_requests and requests
73
+
74
+ resp = requests.get(link)
75
+ print(resp.meta.title)
76
+ ```
77
+
78
+ ### Parsing Response
79
+
80
+ To make parsing HTML easier, I've also added two popular parsing packages to this project - `Lxml` and `BeautifulSoup4`. To install these add-ons you need to install the parsers extra: `pip install stealth_requests[parsers]`.
81
+
82
+ To easily get an Lxml tree, you can use `resp.tree()` and to get a BeautifulSoup object, use the `resp.soup()` method.
83
+
84
+ For simple parsing, I've also added the following convience methods right to the `StealthResponse` object:
85
+
86
+ - `iterlinks` Iterate through all links in an HTML response
87
+ - `itertext`: Iterate through all text in an HTML response
88
+ - `text_content`: Get all text content in an HTML response
89
+ - `xpath` Go right to using XPATH expressions instead of getting your own Lxml tree.
90
+
91
+ ### Getting HTML response in MarkDown format
92
+
93
+ Sometimes it's easier to get a webpage in MarkDown format instead of HTML. To do this, use the `resp.markdown()` method, after sending a GET request to a website.
@@ -0,0 +1,28 @@
1
+ [build-system]
2
+ requires = ["setuptools>=42", "wheel"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "stealth-requests"
7
+ version = "0.1"
8
+ description = "Make HTTP requests exactly like a browser."
9
+ readme = "README.md"
10
+ requires-python = ">=3.6"
11
+ license = { text = "MIT" }
12
+ authors = [
13
+ { name = "Jacob Padilla", email = "jp@jacobpadilla.com" }
14
+ ]
15
+ keywords = [""]
16
+ urls = { "Homepage" = "https://github.com/jpjacobpadilla/Stealth-Requests" }
17
+
18
+ dependencies = ["curl_cffi"]
19
+
20
+ [project.optional-dependencies]
21
+ parsers = [
22
+ "lxml",
23
+ "html2text",
24
+ "beautifulsoup4"
25
+ ]
26
+
27
+ [tool.setuptools.packages.find]
28
+ include = ["stealth_requests"]
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
@@ -0,0 +1,27 @@
1
+ from setuptools import setup
2
+
3
+
4
+ with open('README.md', 'r', encoding='utf-8') as f:
5
+ long_description = f.read()
6
+
7
+ setup(
8
+ name='stealth-requests',
9
+ description='Make HTTP requests exactly like a browser.',
10
+ version='0.1',
11
+ packages=['stealth_requests'],
12
+ install_requires=['curl_cffi'],
13
+ extras_require={
14
+ 'parsers': [
15
+ 'lxml',
16
+ 'html2text',
17
+ 'beautifulsoup4'
18
+ ]
19
+ },
20
+ author = 'Jacob Padilla',
21
+ author_email = 'jp@jacobpadilla.com',
22
+ url='https://github.com/jpjacobpadilla/Stealth-Requests',
23
+ license='MIT',
24
+ long_description=long_description,
25
+ long_description_content_type='text/markdown',
26
+ keywords=''
27
+ )
@@ -0,0 +1,16 @@
1
+ from functools import partial
2
+ from .session import StealthSession, AsyncStealthSession
3
+ from curl_cffi.requests import *
4
+
5
+
6
+ def request(method: str, url: str, *args, **kwargs) -> Response:
7
+ with StealthSession() as s:
8
+ return s.request(method, url, *args, **kwargs)
9
+
10
+ head = partial(request, "HEAD")
11
+ get = partial(request, "GET")
12
+ post = partial(request, "POST")
13
+ put = partial(request, "PUT")
14
+ patch = partial(request, "PATCH")
15
+ delete = partial(request, "DELETE")
16
+ options = partial(request, "OPTIONS")
@@ -0,0 +1,108 @@
1
+ from __future__ import annotations
2
+
3
+ from dataclasses import dataclass
4
+ from typing import TYPE_CHECKING
5
+
6
+ if TYPE_CHECKING:
7
+ from lxml.html import HtmlElement
8
+ from bs4 import BeautifulSoup
9
+
10
+
11
+ @dataclass
12
+ class Metadata:
13
+ title: str | None
14
+ description: str | None
15
+ thumbnail: str | None
16
+ author: str | None
17
+ keywords: tuple[str] | None
18
+ twitter_handle: str | None
19
+ robots: tuple[str] | None
20
+ canonical: str | None
21
+
22
+ PARSER_IMPORT_SOLUTION = "Install it using 'pip install stealth-requests[parsers]'."
23
+
24
+
25
+ class StealthResponse():
26
+ def __init__(self, resp):
27
+ self._response = resp
28
+
29
+ self._tree = None
30
+ self._important_meta_tags = None
31
+
32
+ def __getattr__(self, name):
33
+ return getattr(self._response, name)
34
+
35
+ def _get_tree(self):
36
+ try:
37
+ from lxml import html
38
+ except ImportError:
39
+ raise ImportError(f'Lxml is not installed. {PARSER_IMPORT_SOLUTION}')
40
+
41
+ self._tree = html.fromstring(self.content)
42
+ return self._tree
43
+
44
+ def tree(self) -> HtmlElement:
45
+ return self._tree or self._get_tree()
46
+
47
+ def soup(self, parser: str = 'html.parser') -> BeautifulSoup:
48
+ try:
49
+ from bs4 import BeautifulSoup
50
+ except ImportError:
51
+ raise ImportError(f'BeautifulSoup is required for markdown extraction. {PARSER_IMPORT_SOLUTION}')
52
+
53
+ return BeautifulSoup(self.content, parser)
54
+
55
+ def markdown(self):
56
+ try:
57
+ import html2text
58
+ except ImportError:
59
+ raise ImportError(f'Html2text is required for markdown extraction. {PARSER_IMPORT_SOLUTION}')
60
+
61
+ text_maker = html2text.HTML2Text()
62
+ text_maker.ignore_links = True
63
+ return text_maker.handle(str(self.soup()))
64
+
65
+ def xpath(self, xp: str):
66
+ return self.tree().xpath(xp)
67
+
68
+ def iterlinks(self, *args, **kwargs):
69
+ return self.tree().iterlinks(*args, **kwargs)
70
+
71
+ def itertext(self, *args, **kwargs):
72
+ return self.tree().itertext(*args, **kwargs)
73
+
74
+ def text_content(self, *args, **kwargs):
75
+ return self.tree().text_content(*args, **kwargs)
76
+
77
+ @staticmethod
78
+ def _format_meta_list(content: str) -> tuple[str]:
79
+ items = content.split(',')
80
+ return tuple(item.strip() for item in items)
81
+
82
+ def _set_important_meta_tags(self) -> Metadata:
83
+ tree = self.tree()
84
+
85
+ title = tree.xpath('//head/title/text()')
86
+ description = tree.xpath('//head/meta[@name="description"]/@content')
87
+ thumbnail = tree.xpath('//head/meta[@property="og:image"]/@content')
88
+ author = tree.xpath('//head/meta[@name="author"]/@content')
89
+ keywords = tree.xpath('//head/meta[@name="keywords"]/@content')
90
+ twitter_handle = tree.xpath('//head/meta[@name="twitter:site"]/@content')
91
+ robots = tree.xpath('//head/meta[@name="robots"]/@content')
92
+ canonical = tree.xpath('//head/link[@rel="canonical"]/@content')
93
+
94
+ self._important_meta_tags = Metadata(
95
+ title = title[0] if title else None,
96
+ description = description[0] if description else None,
97
+ thumbnail = thumbnail[0] if thumbnail else None,
98
+ author = author[0] if author else None,
99
+ keywords = self._format_meta_list(keywords[0]) if keywords else None,
100
+ twitter_handle = twitter_handle[0] if twitter_handle else None,
101
+ robots = self._format_meta_list(robots[0]) if robots else None,
102
+ canonical = canonical[0] if canonical else None
103
+ )
104
+ return self._important_meta_tags
105
+
106
+ @property
107
+ def meta(self):
108
+ return self._important_meta_tags or self._set_important_meta_tags()
@@ -0,0 +1,154 @@
1
+ import random
2
+ import json
3
+ from dataclasses import dataclass
4
+ from urllib.parse import urlparse
5
+ from collections import defaultdict
6
+ from functools import partialmethod
7
+
8
+ from .response import StealthResponse
9
+
10
+ from curl_cffi.requests.session import Session, AsyncSession
11
+ from curl_cffi.requests.models import Response
12
+
13
+
14
+ @dataclass
15
+ class ClientProfile:
16
+ user_agent: str
17
+ sec_ch_ua: str
18
+ sec_ch_ua_mobile: str
19
+ sec_ch_ua_platform: str
20
+
21
+
22
+ class BaseStealthSession:
23
+ def __init__(
24
+ self,
25
+ client_profile: str = None,
26
+ impersonate: str = 'chrome124',
27
+ **kwargs
28
+ ):
29
+ if impersonate.lower() in ('chrome', 'chrome124'):
30
+ impersonate = 'chrome124'
31
+ elif impersonate.lower() in ('safari', 'safari_17_0', 'safari17'):
32
+ impersonate = 'safari17_0'
33
+
34
+ self.profile = client_profile or BaseStealthSession.create_profile(impersonate)
35
+ self.last_request_url = defaultdict(lambda: 'https://www.google.com/')
36
+
37
+ super().__init__(
38
+ headers=self.initialize_chrome_headers()
39
+ if impersonate == 'chrome124'
40
+ else self.initialize_safari_headers(),
41
+ impersonate=impersonate,
42
+ **kwargs
43
+ )
44
+
45
+ def __enter__(self):
46
+ return self
47
+
48
+ def __exit__(self, *_):
49
+ self.close()
50
+ return False
51
+
52
+ async def __aenter__(self):
53
+ return self
54
+
55
+ async def __aexit__(self, *_):
56
+ self.close()
57
+ return False
58
+
59
+ @staticmethod
60
+ def create_profile(impersonate: str) -> ClientProfile:
61
+ import os
62
+ file_path = os.path.join(os.path.dirname(__file__), 'profiles.json')
63
+
64
+ with open(file_path, encoding='utf-8', mode='r') as file:
65
+ user_agents = json.load(file)
66
+
67
+ assert impersonate in user_agents.keys(), f'Please choose one of the supported profiles: {user_agents.keys()}'
68
+
69
+ return ClientProfile(
70
+ user_agent=random.choice(user_agents[impersonate]),
71
+ sec_ch_ua='"Not A;Brand";v="99", "Chromium";v="124", "Google Chrome";v="124"' if impersonate == 'chrome_124' else None,
72
+ sec_ch_ua_mobile='?0' if impersonate == 'chrome_124' else None,
73
+ sec_ch_ua_platform='"macOS"' if impersonate == 'chrome_124' else None
74
+ )
75
+
76
+ def initialize_chrome_headers(self) -> dict[str, str]:
77
+ return {
78
+ "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
79
+ "Accept-Encoding": "gzip, deflate, br, zstd",
80
+ "Accept-Language": "en-US,en;q=0.9",
81
+ "Cache-Control": "no-cache",
82
+ "Connection": "keep-alive",
83
+ "Pragma": "no-cache",
84
+ "Upgrade-Insecure-Requests": "1",
85
+ "User-Agent": self.profile.user_agent,
86
+ "Sec-Fetch-Dest": "document",
87
+ "Sec-Fetch-Mode": "navigate",
88
+ "Sec-Fetch-Site": "same-origin",
89
+ "Sec-Fetch-User": "?1",
90
+ "sec-ch-ua": self.profile.sec_ch_ua,
91
+ "sec-ch-ua-mobile": self.profile.sec_ch_ua_mobile,
92
+ "sec-ch-ua-platform": self.profile.sec_ch_ua_platform,
93
+ }
94
+
95
+ def initialize_safari_headers(self) -> dict[str, str]:
96
+ return {
97
+ "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
98
+ "Accept-Encoding": "gzip, deflate, br",
99
+ "Accept-Language": "en-US,en;q=0.9",
100
+ "Cache-Control": "no-cache",
101
+ "Connection": "keep-alive",
102
+ "Pragma": "no-cache",
103
+ "Sec-Fetch-Dest": "document",
104
+ "Sec-Fetch-Mode": "navigate",
105
+ "Sec-Fetch-Site": "same-origin",
106
+ "User-Agent": self.profile.user_agent
107
+ }
108
+
109
+ def get_dynamic_headers(self, url: str) -> dict[str, str]:
110
+ parsed_url = urlparse(url)
111
+ host = parsed_url.netloc
112
+
113
+ headers = {
114
+ "Host": host,
115
+ "Referer": self.last_request_url[host]
116
+ }
117
+
118
+ self.last_request_url[host] = url
119
+ return headers
120
+
121
+
122
+ class StealthSession(BaseStealthSession, Session):
123
+ def __init__(self, *args, **kwargs):
124
+ super().__init__(*args, **kwargs)
125
+
126
+ def request(self, method: str, url: str, *args, **kwargs) -> Response:
127
+ headers = self.get_dynamic_headers(url) | kwargs.pop('headers', {})
128
+ resp = Session.request(self, method, url, *args, headers=headers, **kwargs)
129
+ return StealthResponse(resp)
130
+
131
+ head = partialmethod(request, "HEAD")
132
+ get = partialmethod(request, "GET")
133
+ post = partialmethod(request, "POST")
134
+ put = partialmethod(request, "PUT")
135
+ patch = partialmethod(request, "PATCH")
136
+ delete = partialmethod(request, "DELETE")
137
+ options = partialmethod(request, "OPTIONS")
138
+
139
+ class AsyncStealthSession(BaseStealthSession, AsyncSession):
140
+ def __init__(self, *args, **kwargs):
141
+ super().__init__(*args, **kwargs)
142
+
143
+ async def request(self, method: str, url: str, *args, **kwargs) -> Response:
144
+ headers = self.get_dynamic_headers(url) | kwargs.pop('headers', {})
145
+ resp = await AsyncSession.request(self, method, url, *args, headers=headers, **kwargs)
146
+ return StealthResponse(resp)
147
+
148
+ head = partialmethod(request, "HEAD")
149
+ get = partialmethod(request, "GET")
150
+ post = partialmethod(request, "POST")
151
+ put = partialmethod(request, "PUT")
152
+ patch = partialmethod(request, "PATCH")
153
+ delete = partialmethod(request, "DELETE")
154
+ options = partialmethod(request, "OPTIONS")
@@ -0,0 +1,110 @@
1
+ Metadata-Version: 2.1
2
+ Name: stealth-requests
3
+ Version: 0.1
4
+ Summary: Make HTTP requests exactly like a browser.
5
+ Home-page: https://github.com/jpjacobpadilla/Stealth-Requests
6
+ Author: Jacob Padilla
7
+ Author-email: Jacob Padilla <jp@jacobpadilla.com>
8
+ License: MIT
9
+ Project-URL: Homepage, https://github.com/jpjacobpadilla/Stealth-Requests
10
+ Requires-Python: >=3.6
11
+ Description-Content-Type: text/markdown
12
+ Requires-Dist: curl_cffi
13
+ Provides-Extra: parsers
14
+ Requires-Dist: lxml; extra == "parsers"
15
+ Requires-Dist: html2text; extra == "parsers"
16
+ Requires-Dist: beautifulsoup4; extra == "parsers"
17
+
18
+ <p align="center">
19
+ <img src="https://github.com/jpjacobpadilla/Stealth-Requests/blob/7f83b67a0d62a932663d8216bad7d25971c90aaf/logo.png">
20
+ </p>
21
+
22
+ <h1 align="center">Stay Undetected While Scraping the Web.</h1>
23
+
24
+ ### The All-In-One Solution to Web Scraping:
25
+ - Mimic the headers sent by a browser when going to a website (GET requests)
26
+ - Automatically handle and update the Referer header & client hint headers
27
+ - Mask the TLS fingerprint of the request using the [curl_cffi](https://curl-cffi.readthedocs.io/en/latest/) package
28
+ - Automatically parse the metadata from HTML responses such as page title, description, thumbnail, author, etc...
29
+ - Easily get an [lxml](https://lxml.de/apidoc/lxml.html) tree or [BeautifulSoup](https://beautiful-soup-4.readthedocs.io/en/latest/) object from the HTTP response.
30
+
31
+ ### Sending Requests
32
+
33
+ This package mimics the API of the `requests` package, and thus can be used in basically the same way.
34
+
35
+ You can send one-off requests like such:
36
+
37
+ ```python
38
+ import stealth_requests and requests
39
+
40
+ resp = requests.get(link)
41
+ ```
42
+
43
+ Or you can use a `StealthSession` object which will keep track of certain headers for you between requests such as the `Referer` header.
44
+
45
+ ```python
46
+ from stealth_requests import StealthSession
47
+
48
+ with StealthSession() as s:
49
+ resp = s.get(link)
50
+ ```
51
+
52
+ When sending a one-off request, or creating a session, you can specify the type of browser that you want the request to mimic - either `safari` or `chrome` (which is the default).
53
+
54
+ ### Sending Requests With Asyncio
55
+
56
+ This package supports Asyncio in the same way as the `requests` package.
57
+
58
+ ```python
59
+ from stealth_requests import AsyncStealthSession
60
+
61
+ async with AsyncStealthSession(impersonate='chrome') as s:
62
+ resp = await s.get(link)
63
+ ```
64
+
65
+ or, for a one off request you can do something like this:
66
+
67
+ ```python
68
+ from curl_cffi import requests
69
+
70
+ resp = await requests.post(link, data=...)
71
+ ```
72
+
73
+ ### Getting Response Metadata
74
+
75
+ The response returned from this package is a `StealthResponse` which has all of the same methods and attributes as a standard `requests` response, with a few added features. One if automatic parsing of header metadata. The metadata can be accessed from the `meta` attribute, which gives you access to the following data (if it's avaible on the scraped website):
76
+
77
+ - title: str
78
+ - description: str
79
+ - thumbnail: str
80
+ - author: str
81
+ - keywords: tuple[str]
82
+ - twitter_handle: str
83
+ - robots: tuple[str]
84
+ - canonical: str
85
+
86
+ Heres an example of how to get the title of a page:
87
+
88
+ ```python
89
+ import stealth_requests and requests
90
+
91
+ resp = requests.get(link)
92
+ print(resp.meta.title)
93
+ ```
94
+
95
+ ### Parsing Response
96
+
97
+ To make parsing HTML easier, I've also added two popular parsing packages to this project - `Lxml` and `BeautifulSoup4`. To install these add-ons you need to install the parsers extra: `pip install stealth_requests[parsers]`.
98
+
99
+ To easily get an Lxml tree, you can use `resp.tree()` and to get a BeautifulSoup object, use the `resp.soup()` method.
100
+
101
+ For simple parsing, I've also added the following convience methods right to the `StealthResponse` object:
102
+
103
+ - `iterlinks` Iterate through all links in an HTML response
104
+ - `itertext`: Iterate through all text in an HTML response
105
+ - `text_content`: Get all text content in an HTML response
106
+ - `xpath` Go right to using XPATH expressions instead of getting your own Lxml tree.
107
+
108
+ ### Getting HTML response in MarkDown format
109
+
110
+ Sometimes it's easier to get a webpage in MarkDown format instead of HTML. To do this, use the `resp.markdown()` method, after sending a GET request to a website.
@@ -0,0 +1,11 @@
1
+ README.md
2
+ pyproject.toml
3
+ setup.py
4
+ stealth_requests/__init__.py
5
+ stealth_requests/response.py
6
+ stealth_requests/session.py
7
+ stealth_requests.egg-info/PKG-INFO
8
+ stealth_requests.egg-info/SOURCES.txt
9
+ stealth_requests.egg-info/dependency_links.txt
10
+ stealth_requests.egg-info/requires.txt
11
+ stealth_requests.egg-info/top_level.txt
@@ -0,0 +1,6 @@
1
+ curl_cffi
2
+
3
+ [parsers]
4
+ lxml
5
+ html2text
6
+ beautifulsoup4
@@ -0,0 +1 @@
1
+ stealth_requests