fetchxml 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
fetchxml-0.1.0/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Saurabh Kumar Agarwal
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,356 @@
1
+ Metadata-Version: 2.4
2
+ Name: fetchxml
3
+ Version: 0.1.0
4
+ Summary: Lightweight session-based XML fetcher with browser-like behavior.
5
+ Author: Your Name
6
+ Project-URL: Homepage, https://github.com/yourusername/fetchxml
7
+ Requires-Python: >=3.8
8
+ Description-Content-Type: text/markdown
9
+ License-File: LICENSE
10
+ Requires-Dist: requests
11
+ Dynamic: license-file
12
+
13
+ Here is a complete, production-quality **README.md** for your open-source package **fetchxml**.
14
+
15
+ You can paste this directly into `README.md`.
16
+
17
+ ---
18
+
19
+ # ๐Ÿ“ฆ fetchxml
20
+
21
+ Lightweight, session-based XML fetcher for Python.
22
+
23
+ `fetchxml` provides a clean, reusable way to fetch XML from web endpoints that require:
24
+
25
+ * Browser-like headers
26
+ * Session initialization
27
+ * Cookie handling
28
+ * Referer validation
29
+ * Basic anti-bot protection handling
30
+
31
+ It abstracts session bootstrapping and retry logic into a simple interface.
32
+
33
+ ---
34
+
35
+ ## ๐Ÿš€ Why fetchxml?
36
+
37
+ Some websites block simple HTTP requests and require:
38
+
39
+ * A session cookie
40
+ * Proper User-Agent
41
+ * Referer header
42
+ * Basic browser simulation
43
+
44
+ `fetchxml` handles this automatically.
45
+
46
+ Instead of writing repetitive session logic every time, you can do:
47
+
48
+ ```python
49
+ from fetchxml import FetchXML
50
+
51
+ client = FetchXML(base_url="https://example.com")
52
+ xml = client.fetch("https://example.com/file.xml")
53
+
54
+ print(xml)
55
+ ```
56
+
57
+ ---
58
+
59
+ # ๐Ÿ“ฅ Installation
60
+
61
+ ## Option 1 โ€“ Install from local project
62
+
63
+ From the project root (where `pyproject.toml` is located):
64
+
65
+ ```bash
66
+ pip install .
67
+ ```
68
+
69
+ For development mode:
70
+
71
+ ```bash
72
+ pip install -e .
73
+ ```
74
+
75
+ ---
76
+
77
+ ## Option 2 โ€“ Install from PyPI (after publishing)
78
+
79
+ ```bash
80
+ pip install fetchxml
81
+ ```
82
+
83
+ ---
84
+
85
+ # ๐Ÿง  Basic Usage
86
+
87
+ ## 1๏ธโƒฃ Simple XML Fetch
88
+
89
+ ```python
90
+ from fetchxml import FetchXML
91
+
92
+ client = FetchXML()
93
+
94
+ xml = client.fetch("https://example.com/sample.xml")
95
+
96
+ print(xml[:500])
97
+ ```
98
+
99
+ Use this when the target site does NOT require session bootstrap.
100
+
101
+ ---
102
+
103
+ ## 2๏ธโƒฃ Fetch XML With Session Bootstrap
104
+
105
+ Some sites require hitting their homepage first to establish cookies.
106
+
107
+ ```python
108
+ from fetchxml import FetchXML
109
+
110
+ client = FetchXML(base_url="https://example.com")
111
+
112
+ xml = client.fetch("https://example.com/sample.xml")
113
+
114
+ print(xml)
115
+ ```
116
+
117
+ `base_url` triggers automatic session initialization.
118
+
119
+ ---
120
+
121
+ ## 3๏ธโƒฃ Fetch With Custom Referer
122
+
123
+ If a specific referer header is required:
124
+
125
+ ```python
126
+ xml = client.fetch(
127
+ "https://example.com/sample.xml",
128
+ referer="https://example.com/dashboard"
129
+ )
130
+ ```
131
+
132
+ ---
133
+
134
+ # โš™๏ธ Configuration Options
135
+
136
+ When initializing:
137
+
138
+ ```python
139
+ client = FetchXML(
140
+ base_url="https://example.com", # optional
141
+ delay=0.5, # delay between requests (seconds)
142
+ timeout=15 # request timeout (seconds)
143
+ )
144
+ ```
145
+
146
+ ### Parameters
147
+
148
+ | Parameter | Description |
149
+ | ---------- | --------------------------------------------- |
150
+ | `base_url` | URL used to bootstrap session cookies |
151
+ | `delay` | Sleep time before each request (default 0.5s) |
152
+ | `timeout` | Request timeout in seconds (default 15s) |
153
+
154
+ ---
155
+
156
+ # ๐Ÿ” Automatic Retry Behavior
157
+
158
+ If a request returns **HTTP 403**, `fetchxml` will:
159
+
160
+ 1. Attempt to re-bootstrap session (if `base_url` provided)
161
+ 2. Retry the request once
162
+
163
+ If it still fails โ†’ exception is raised.
164
+
165
+ ---
166
+
167
+ # โ— Exception Handling
168
+
169
+ All errors raise:
170
+
171
+ ```python
172
+ FetchXMLError
173
+ ```
174
+
175
+ Import it like:
176
+
177
+ ```python
178
+ from fetchxml import FetchXMLError
179
+ ```
180
+
181
+ Example:
182
+
183
+ ```python
184
+ from fetchxml import FetchXML, FetchXMLError
185
+
186
+ client = FetchXML(base_url="https://example.com")
187
+
188
+ try:
189
+ xml = client.fetch("https://example.com/sample.xml")
190
+ print(xml)
191
+ except FetchXMLError as e:
192
+ print("Failed to fetch XML:", str(e))
193
+ ```
194
+
195
+ ---
196
+
197
+ # ๐Ÿ” What Triggers FetchXMLError?
198
+
199
+ * Session bootstrap failure
200
+ * Non-200 HTTP response
201
+ * Timeout
202
+ * Connection error
203
+ * Persistent 403 after retry
204
+
205
+ ---
206
+
207
+ # ๐Ÿ›ก๏ธ Rate Limiting
208
+
209
+ `delay` ensures a pause before each request:
210
+
211
+ ```python
212
+ client = FetchXML(delay=1.5)
213
+ ```
214
+
215
+ Recommended for:
216
+
217
+ * Bulk XML downloads
218
+ * Respecting server load
219
+ * Avoiding bot detection
220
+
221
+ ---
222
+
223
+ # ๐Ÿ“ Example: Download and Save XML
224
+
225
+ ```python
226
+ from fetchxml import FetchXML
227
+
228
+ client = FetchXML(base_url="https://example.com")
229
+
230
+ url = "https://example.com/sample.xml"
231
+ xml = client.fetch(url)
232
+
233
+ with open("sample.xml", "w", encoding="utf-8") as f:
234
+ f.write(xml)
235
+
236
+ print("Saved successfully.")
237
+ ```
238
+
239
+ ---
240
+
241
+ # ๐Ÿ”ง Advanced: Reusing One Client for Multiple Files
242
+
243
+ Best practice for bulk downloads:
244
+
245
+ ```python
246
+ from fetchxml import FetchXML
247
+
248
+ client = FetchXML(base_url="https://example.com")
249
+
250
+ urls = [
251
+ "https://example.com/file1.xml",
252
+ "https://example.com/file2.xml",
253
+ "https://example.com/file3.xml"
254
+ ]
255
+
256
+ for url in urls:
257
+ xml = client.fetch(url)
258
+ print(f"Downloaded {url}")
259
+ ```
260
+
261
+ This reuses the same session and cookies.
262
+
263
+ ---
264
+
265
+ # ๐Ÿงช Testing Connectivity
266
+
267
+ You can quickly test a URL:
268
+
269
+ ```python
270
+ from fetchxml import FetchXML
271
+
272
+ client = FetchXML()
273
+
274
+ try:
275
+ xml = client.fetch("https://example.com/sample.xml")
276
+ print("Success")
277
+ except Exception as e:
278
+ print("Error:", e)
279
+ ```
280
+
281
+ ---
282
+
283
+ # ๐Ÿ—๏ธ Project Structure
284
+
285
+ ```
286
+ fetchxml/
287
+ โ”‚
288
+ โ”œโ”€โ”€ fetchxml/
289
+ โ”‚ โ”œโ”€โ”€ __init__.py
290
+ โ”‚ โ”œโ”€โ”€ client.py
291
+ โ”‚ โ”œโ”€โ”€ exceptions.py
292
+ โ”‚
293
+ โ”œโ”€โ”€ pyproject.toml
294
+ โ”œโ”€โ”€ README.md
295
+ โ””โ”€โ”€ LICENSE
296
+ ```
297
+
298
+ ---
299
+
300
+ # ๐Ÿ“œ License
301
+
302
+ MIT License
303
+
304
+ See `LICENSE` file for full text.
305
+
306
+ ---
307
+
308
+ # โš ๏ธ Disclaimer
309
+
310
+ `fetchxml` does not bypass authentication systems or CAPTCHAs.
311
+
312
+ It simply mimics normal browser session behavior using:
313
+
314
+ * Session cookies
315
+ * Proper headers
316
+ * Referer validation
317
+
318
+ Users are responsible for complying with website terms of service.
319
+
320
+ ---
321
+
322
+ # ๐Ÿ’ก When To Use fetchxml
323
+
324
+ Use it when:
325
+
326
+ * A site blocks naive `requests.get()`
327
+ * Cookies must be initialized first
328
+ * Referer headers are required
329
+ * You want clean, reusable XML fetching logic
330
+
331
+ Do NOT use it for:
332
+
333
+ * Circumventing login walls
334
+ * Bypassing paywalls
335
+ * Evading legal restrictions
336
+
337
+ ---
338
+
339
+ # ๐Ÿงฉ Roadmap (Optional Future Enhancements)
340
+
341
+ * Async version
342
+ * Disk caching layer
343
+ * Proxy support
344
+ * Built-in XML validation
345
+ * Exponential backoff strategy
346
+ * Logging integration
347
+
348
+ ---
349
+
350
+ # ๐Ÿ‘ค Author
351
+
352
+ Saurabh Kumar Agarwal
353
+ 2026
354
+
355
+ ---
356
+
@@ -0,0 +1,344 @@
1
+ Here is a complete, production-quality **README.md** for your open-source package **fetchxml**.
2
+
3
+ You can paste this directly into `README.md`.
4
+
5
+ ---
6
+
7
+ # ๐Ÿ“ฆ fetchxml
8
+
9
+ Lightweight, session-based XML fetcher for Python.
10
+
11
+ `fetchxml` provides a clean, reusable way to fetch XML from web endpoints that require:
12
+
13
+ * Browser-like headers
14
+ * Session initialization
15
+ * Cookie handling
16
+ * Referer validation
17
+ * Basic anti-bot protection handling
18
+
19
+ It abstracts session bootstrapping and retry logic into a simple interface.
20
+
21
+ ---
22
+
23
+ ## ๐Ÿš€ Why fetchxml?
24
+
25
+ Some websites block simple HTTP requests and require:
26
+
27
+ * A session cookie
28
+ * Proper User-Agent
29
+ * Referer header
30
+ * Basic browser simulation
31
+
32
+ `fetchxml` handles this automatically.
33
+
34
+ Instead of writing repetitive session logic every time, you can do:
35
+
36
+ ```python
37
+ from fetchxml import FetchXML
38
+
39
+ client = FetchXML(base_url="https://example.com")
40
+ xml = client.fetch("https://example.com/file.xml")
41
+
42
+ print(xml)
43
+ ```
44
+
45
+ ---
46
+
47
+ # ๐Ÿ“ฅ Installation
48
+
49
+ ## Option 1 โ€“ Install from local project
50
+
51
+ From the project root (where `pyproject.toml` is located):
52
+
53
+ ```bash
54
+ pip install .
55
+ ```
56
+
57
+ For development mode:
58
+
59
+ ```bash
60
+ pip install -e .
61
+ ```
62
+
63
+ ---
64
+
65
+ ## Option 2 โ€“ Install from PyPI (after publishing)
66
+
67
+ ```bash
68
+ pip install fetchxml
69
+ ```
70
+
71
+ ---
72
+
73
+ # ๐Ÿง  Basic Usage
74
+
75
+ ## 1๏ธโƒฃ Simple XML Fetch
76
+
77
+ ```python
78
+ from fetchxml import FetchXML
79
+
80
+ client = FetchXML()
81
+
82
+ xml = client.fetch("https://example.com/sample.xml")
83
+
84
+ print(xml[:500])
85
+ ```
86
+
87
+ Use this when the target site does NOT require session bootstrap.
88
+
89
+ ---
90
+
91
+ ## 2๏ธโƒฃ Fetch XML With Session Bootstrap
92
+
93
+ Some sites require hitting their homepage first to establish cookies.
94
+
95
+ ```python
96
+ from fetchxml import FetchXML
97
+
98
+ client = FetchXML(base_url="https://example.com")
99
+
100
+ xml = client.fetch("https://example.com/sample.xml")
101
+
102
+ print(xml)
103
+ ```
104
+
105
+ `base_url` triggers automatic session initialization.
106
+
107
+ ---
108
+
109
+ ## 3๏ธโƒฃ Fetch With Custom Referer
110
+
111
+ If a specific referer header is required:
112
+
113
+ ```python
114
+ xml = client.fetch(
115
+ "https://example.com/sample.xml",
116
+ referer="https://example.com/dashboard"
117
+ )
118
+ ```
119
+
120
+ ---
121
+
122
+ # โš™๏ธ Configuration Options
123
+
124
+ When initializing:
125
+
126
+ ```python
127
+ client = FetchXML(
128
+ base_url="https://example.com", # optional
129
+ delay=0.5, # delay between requests (seconds)
130
+ timeout=15 # request timeout (seconds)
131
+ )
132
+ ```
133
+
134
+ ### Parameters
135
+
136
+ | Parameter | Description |
137
+ | ---------- | --------------------------------------------- |
138
+ | `base_url` | URL used to bootstrap session cookies |
139
+ | `delay` | Sleep time before each request (default 0.5s) |
140
+ | `timeout` | Request timeout in seconds (default 15s) |
141
+
142
+ ---
143
+
144
+ # ๐Ÿ” Automatic Retry Behavior
145
+
146
+ If a request returns **HTTP 403**, `fetchxml` will:
147
+
148
+ 1. Attempt to re-bootstrap session (if `base_url` provided)
149
+ 2. Retry the request once
150
+
151
+ If it still fails โ†’ exception is raised.
152
+
153
+ ---
154
+
155
+ # โ— Exception Handling
156
+
157
+ All errors raise:
158
+
159
+ ```python
160
+ FetchXMLError
161
+ ```
162
+
163
+ Import it like:
164
+
165
+ ```python
166
+ from fetchxml import FetchXMLError
167
+ ```
168
+
169
+ Example:
170
+
171
+ ```python
172
+ from fetchxml import FetchXML, FetchXMLError
173
+
174
+ client = FetchXML(base_url="https://example.com")
175
+
176
+ try:
177
+ xml = client.fetch("https://example.com/sample.xml")
178
+ print(xml)
179
+ except FetchXMLError as e:
180
+ print("Failed to fetch XML:", str(e))
181
+ ```
182
+
183
+ ---
184
+
185
+ # ๐Ÿ” What Triggers FetchXMLError?
186
+
187
+ * Session bootstrap failure
188
+ * Non-200 HTTP response
189
+ * Timeout
190
+ * Connection error
191
+ * Persistent 403 after retry
192
+
193
+ ---
194
+
195
+ # ๐Ÿ›ก๏ธ Rate Limiting
196
+
197
+ `delay` ensures a pause before each request:
198
+
199
+ ```python
200
+ client = FetchXML(delay=1.5)
201
+ ```
202
+
203
+ Recommended for:
204
+
205
+ * Bulk XML downloads
206
+ * Respecting server load
207
+ * Avoiding bot detection
208
+
209
+ ---
210
+
211
+ # ๐Ÿ“ Example: Download and Save XML
212
+
213
+ ```python
214
+ from fetchxml import FetchXML
215
+
216
+ client = FetchXML(base_url="https://example.com")
217
+
218
+ url = "https://example.com/sample.xml"
219
+ xml = client.fetch(url)
220
+
221
+ with open("sample.xml", "w", encoding="utf-8") as f:
222
+ f.write(xml)
223
+
224
+ print("Saved successfully.")
225
+ ```
226
+
227
+ ---
228
+
229
+ # ๐Ÿ”ง Advanced: Reusing One Client for Multiple Files
230
+
231
+ Best practice for bulk downloads:
232
+
233
+ ```python
234
+ from fetchxml import FetchXML
235
+
236
+ client = FetchXML(base_url="https://example.com")
237
+
238
+ urls = [
239
+ "https://example.com/file1.xml",
240
+ "https://example.com/file2.xml",
241
+ "https://example.com/file3.xml"
242
+ ]
243
+
244
+ for url in urls:
245
+ xml = client.fetch(url)
246
+ print(f"Downloaded {url}")
247
+ ```
248
+
249
+ This reuses the same session and cookies.
250
+
251
+ ---
252
+
253
+ # ๐Ÿงช Testing Connectivity
254
+
255
+ You can quickly test a URL:
256
+
257
+ ```python
258
+ from fetchxml import FetchXML
259
+
260
+ client = FetchXML()
261
+
262
+ try:
263
+ xml = client.fetch("https://example.com/sample.xml")
264
+ print("Success")
265
+ except Exception as e:
266
+ print("Error:", e)
267
+ ```
268
+
269
+ ---
270
+
271
+ # ๐Ÿ—๏ธ Project Structure
272
+
273
+ ```
274
+ fetchxml/
275
+ โ”‚
276
+ โ”œโ”€โ”€ fetchxml/
277
+ โ”‚ โ”œโ”€โ”€ __init__.py
278
+ โ”‚ โ”œโ”€โ”€ client.py
279
+ โ”‚ โ”œโ”€โ”€ exceptions.py
280
+ โ”‚
281
+ โ”œโ”€โ”€ pyproject.toml
282
+ โ”œโ”€โ”€ README.md
283
+ โ””โ”€โ”€ LICENSE
284
+ ```
285
+
286
+ ---
287
+
288
+ # ๐Ÿ“œ License
289
+
290
+ MIT License
291
+
292
+ See `LICENSE` file for full text.
293
+
294
+ ---
295
+
296
+ # โš ๏ธ Disclaimer
297
+
298
+ `fetchxml` does not bypass authentication systems or CAPTCHAs.
299
+
300
+ It simply mimics normal browser session behavior using:
301
+
302
+ * Session cookies
303
+ * Proper headers
304
+ * Referer validation
305
+
306
+ Users are responsible for complying with website terms of service.
307
+
308
+ ---
309
+
310
+ # ๐Ÿ’ก When To Use fetchxml
311
+
312
+ Use it when:
313
+
314
+ * A site blocks naive `requests.get()`
315
+ * Cookies must be initialized first
316
+ * Referer headers are required
317
+ * You want clean, reusable XML fetching logic
318
+
319
+ Do NOT use it for:
320
+
321
+ * Circumventing login walls
322
+ * Bypassing paywalls
323
+ * Evading legal restrictions
324
+
325
+ ---
326
+
327
+ # ๐Ÿงฉ Roadmap (Optional Future Enhancements)
328
+
329
+ * Async version
330
+ * Disk caching layer
331
+ * Proxy support
332
+ * Built-in XML validation
333
+ * Exponential backoff strategy
334
+ * Logging integration
335
+
336
+ ---
337
+
338
+ # ๐Ÿ‘ค Author
339
+
340
+ Saurabh Kumar Agarwal
341
+ 2026
342
+
343
+ ---
344
+
@@ -0,0 +1,4 @@
1
+ from .client import FetchXML
2
+ from .exceptions import FetchXMLError
3
+
4
+ __all__ = ["FetchXML", "FetchXMLError"]
@@ -0,0 +1,70 @@
1
+ import requests
2
+ import time
3
+ from .exceptions import FetchXMLError
4
+
5
+
6
+ class FetchXML:
7
+ def __init__(self, base_url=None, delay=0.5, timeout=15):
8
+ self.base_url = base_url
9
+ self.delay = delay
10
+ self.timeout = timeout
11
+ self.session = requests.Session()
12
+ self._init_headers()
13
+ if base_url:
14
+ self._bootstrap_session()
15
+
16
+ def _init_headers(self):
17
+ self.session.headers.update({
18
+ "User-Agent": (
19
+ "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
20
+ "AppleWebKit/537.36 (KHTML, like Gecko) "
21
+ "Chrome/122.0.0.0 Safari/537.36"
22
+ ),
23
+ "Accept-Language": "en-US,en;q=0.9",
24
+ "Connection": "keep-alive",
25
+ })
26
+
27
+ def _bootstrap_session(self):
28
+ try:
29
+ r = self.session.get(self.base_url, timeout=self.timeout)
30
+ if r.status_code != 200:
31
+ raise FetchXMLError(
32
+ f"Failed to initialize session. Status {r.status_code}"
33
+ )
34
+ except Exception as e:
35
+ raise FetchXMLError(f"Session bootstrap failed: {str(e)}")
36
+
37
+ def fetch(self, url, referer=None):
38
+ headers = {
39
+ "Accept": "application/xml,text/xml,*/*;q=0.1",
40
+ }
41
+
42
+ if referer:
43
+ headers["Referer"] = referer
44
+ elif self.base_url:
45
+ headers["Referer"] = self.base_url
46
+
47
+ time.sleep(self.delay)
48
+
49
+ response = self.session.get(
50
+ url,
51
+ headers=headers,
52
+ timeout=self.timeout
53
+ )
54
+
55
+ if response.status_code == 403:
56
+ # Attempt session refresh once
57
+ if self.base_url:
58
+ self._bootstrap_session()
59
+ response = self.session.get(
60
+ url,
61
+ headers=headers,
62
+ timeout=self.timeout
63
+ )
64
+
65
+ if response.status_code != 200:
66
+ raise FetchXMLError(
67
+ f"Failed to fetch XML. Status {response.status_code}"
68
+ )
69
+
70
+ return response.text
@@ -0,0 +1,3 @@
1
+ class FetchXMLError(Exception):
2
+ """Custom exception for fetchxml errors."""
3
+ pass
@@ -0,0 +1,356 @@
1
+ Metadata-Version: 2.4
2
+ Name: fetchxml
3
+ Version: 0.1.0
4
+ Summary: Lightweight session-based XML fetcher with browser-like behavior.
5
+ Author: Your Name
6
+ Project-URL: Homepage, https://github.com/yourusername/fetchxml
7
+ Requires-Python: >=3.8
8
+ Description-Content-Type: text/markdown
9
+ License-File: LICENSE
10
+ Requires-Dist: requests
11
+ Dynamic: license-file
12
+
13
+ Here is a complete, production-quality **README.md** for your open-source package **fetchxml**.
14
+
15
+ You can paste this directly into `README.md`.
16
+
17
+ ---
18
+
19
+ # ๐Ÿ“ฆ fetchxml
20
+
21
+ Lightweight, session-based XML fetcher for Python.
22
+
23
+ `fetchxml` provides a clean, reusable way to fetch XML from web endpoints that require:
24
+
25
+ * Browser-like headers
26
+ * Session initialization
27
+ * Cookie handling
28
+ * Referer validation
29
+ * Basic anti-bot protection handling
30
+
31
+ It abstracts session bootstrapping and retry logic into a simple interface.
32
+
33
+ ---
34
+
35
+ ## ๐Ÿš€ Why fetchxml?
36
+
37
+ Some websites block simple HTTP requests and require:
38
+
39
+ * A session cookie
40
+ * Proper User-Agent
41
+ * Referer header
42
+ * Basic browser simulation
43
+
44
+ `fetchxml` handles this automatically.
45
+
46
+ Instead of writing repetitive session logic every time, you can do:
47
+
48
+ ```python
49
+ from fetchxml import FetchXML
50
+
51
+ client = FetchXML(base_url="https://example.com")
52
+ xml = client.fetch("https://example.com/file.xml")
53
+
54
+ print(xml)
55
+ ```
56
+
57
+ ---
58
+
59
+ # ๐Ÿ“ฅ Installation
60
+
61
+ ## Option 1 โ€“ Install from local project
62
+
63
+ From the project root (where `pyproject.toml` is located):
64
+
65
+ ```bash
66
+ pip install .
67
+ ```
68
+
69
+ For development mode:
70
+
71
+ ```bash
72
+ pip install -e .
73
+ ```
74
+
75
+ ---
76
+
77
+ ## Option 2 โ€“ Install from PyPI (after publishing)
78
+
79
+ ```bash
80
+ pip install fetchxml
81
+ ```
82
+
83
+ ---
84
+
85
+ # ๐Ÿง  Basic Usage
86
+
87
+ ## 1๏ธโƒฃ Simple XML Fetch
88
+
89
+ ```python
90
+ from fetchxml import FetchXML
91
+
92
+ client = FetchXML()
93
+
94
+ xml = client.fetch("https://example.com/sample.xml")
95
+
96
+ print(xml[:500])
97
+ ```
98
+
99
+ Use this when the target site does NOT require session bootstrap.
100
+
101
+ ---
102
+
103
+ ## 2๏ธโƒฃ Fetch XML With Session Bootstrap
104
+
105
+ Some sites require hitting their homepage first to establish cookies.
106
+
107
+ ```python
108
+ from fetchxml import FetchXML
109
+
110
+ client = FetchXML(base_url="https://example.com")
111
+
112
+ xml = client.fetch("https://example.com/sample.xml")
113
+
114
+ print(xml)
115
+ ```
116
+
117
+ `base_url` triggers automatic session initialization.
118
+
119
+ ---
120
+
121
+ ## 3๏ธโƒฃ Fetch With Custom Referer
122
+
123
+ If a specific referer header is required:
124
+
125
+ ```python
126
+ xml = client.fetch(
127
+ "https://example.com/sample.xml",
128
+ referer="https://example.com/dashboard"
129
+ )
130
+ ```
131
+
132
+ ---
133
+
134
+ # โš™๏ธ Configuration Options
135
+
136
+ When initializing:
137
+
138
+ ```python
139
+ client = FetchXML(
140
+ base_url="https://example.com", # optional
141
+ delay=0.5, # delay between requests (seconds)
142
+ timeout=15 # request timeout (seconds)
143
+ )
144
+ ```
145
+
146
+ ### Parameters
147
+
148
+ | Parameter | Description |
149
+ | ---------- | --------------------------------------------- |
150
+ | `base_url` | URL used to bootstrap session cookies |
151
+ | `delay` | Sleep time before each request (default 0.5s) |
152
+ | `timeout` | Request timeout in seconds (default 15s) |
153
+
154
+ ---
155
+
156
+ # ๐Ÿ” Automatic Retry Behavior
157
+
158
+ If a request returns **HTTP 403**, `fetchxml` will:
159
+
160
+ 1. Attempt to re-bootstrap session (if `base_url` provided)
161
+ 2. Retry the request once
162
+
163
+ If it still fails โ†’ exception is raised.
164
+
165
+ ---
166
+
167
+ # โ— Exception Handling
168
+
169
+ All errors raise:
170
+
171
+ ```python
172
+ FetchXMLError
173
+ ```
174
+
175
+ Import it like:
176
+
177
+ ```python
178
+ from fetchxml import FetchXMLError
179
+ ```
180
+
181
+ Example:
182
+
183
+ ```python
184
+ from fetchxml import FetchXML, FetchXMLError
185
+
186
+ client = FetchXML(base_url="https://example.com")
187
+
188
+ try:
189
+ xml = client.fetch("https://example.com/sample.xml")
190
+ print(xml)
191
+ except FetchXMLError as e:
192
+ print("Failed to fetch XML:", str(e))
193
+ ```
194
+
195
+ ---
196
+
197
+ # ๐Ÿ” What Triggers FetchXMLError?
198
+
199
+ * Session bootstrap failure
200
+ * Non-200 HTTP response
201
+ * Timeout
202
+ * Connection error
203
+ * Persistent 403 after retry
204
+
205
+ ---
206
+
207
+ # ๐Ÿ›ก๏ธ Rate Limiting
208
+
209
+ `delay` ensures a pause before each request:
210
+
211
+ ```python
212
+ client = FetchXML(delay=1.5)
213
+ ```
214
+
215
+ Recommended for:
216
+
217
+ * Bulk XML downloads
218
+ * Respecting server load
219
+ * Avoiding bot detection
220
+
221
+ ---
222
+
223
+ # ๐Ÿ“ Example: Download and Save XML
224
+
225
+ ```python
226
+ from fetchxml import FetchXML
227
+
228
+ client = FetchXML(base_url="https://example.com")
229
+
230
+ url = "https://example.com/sample.xml"
231
+ xml = client.fetch(url)
232
+
233
+ with open("sample.xml", "w", encoding="utf-8") as f:
234
+ f.write(xml)
235
+
236
+ print("Saved successfully.")
237
+ ```
238
+
239
+ ---
240
+
241
+ # ๐Ÿ”ง Advanced: Reusing One Client for Multiple Files
242
+
243
+ Best practice for bulk downloads:
244
+
245
+ ```python
246
+ from fetchxml import FetchXML
247
+
248
+ client = FetchXML(base_url="https://example.com")
249
+
250
+ urls = [
251
+ "https://example.com/file1.xml",
252
+ "https://example.com/file2.xml",
253
+ "https://example.com/file3.xml"
254
+ ]
255
+
256
+ for url in urls:
257
+ xml = client.fetch(url)
258
+ print(f"Downloaded {url}")
259
+ ```
260
+
261
+ This reuses the same session and cookies.
262
+
263
+ ---
264
+
265
+ # ๐Ÿงช Testing Connectivity
266
+
267
+ You can quickly test a URL:
268
+
269
+ ```python
270
+ from fetchxml import FetchXML
271
+
272
+ client = FetchXML()
273
+
274
+ try:
275
+ xml = client.fetch("https://example.com/sample.xml")
276
+ print("Success")
277
+ except Exception as e:
278
+ print("Error:", e)
279
+ ```
280
+
281
+ ---
282
+
283
+ # ๐Ÿ—๏ธ Project Structure
284
+
285
+ ```
286
+ fetchxml/
287
+ โ”‚
288
+ โ”œโ”€โ”€ fetchxml/
289
+ โ”‚ โ”œโ”€โ”€ __init__.py
290
+ โ”‚ โ”œโ”€โ”€ client.py
291
+ โ”‚ โ”œโ”€โ”€ exceptions.py
292
+ โ”‚
293
+ โ”œโ”€โ”€ pyproject.toml
294
+ โ”œโ”€โ”€ README.md
295
+ โ””โ”€โ”€ LICENSE
296
+ ```
297
+
298
+ ---
299
+
300
+ # ๐Ÿ“œ License
301
+
302
+ MIT License
303
+
304
+ See `LICENSE` file for full text.
305
+
306
+ ---
307
+
308
+ # โš ๏ธ Disclaimer
309
+
310
+ `fetchxml` does not bypass authentication systems or CAPTCHAs.
311
+
312
+ It simply mimics normal browser session behavior using:
313
+
314
+ * Session cookies
315
+ * Proper headers
316
+ * Referer validation
317
+
318
+ Users are responsible for complying with website terms of service.
319
+
320
+ ---
321
+
322
+ # ๐Ÿ’ก When To Use fetchxml
323
+
324
+ Use it when:
325
+
326
+ * A site blocks naive `requests.get()`
327
+ * Cookies must be initialized first
328
+ * Referer headers are required
329
+ * You want clean, reusable XML fetching logic
330
+
331
+ Do NOT use it for:
332
+
333
+ * Circumventing login walls
334
+ * Bypassing paywalls
335
+ * Evading legal restrictions
336
+
337
+ ---
338
+
339
+ # ๐Ÿงฉ Roadmap (Optional Future Enhancements)
340
+
341
+ * Async version
342
+ * Disk caching layer
343
+ * Proxy support
344
+ * Built-in XML validation
345
+ * Exponential backoff strategy
346
+ * Logging integration
347
+
348
+ ---
349
+
350
+ # ๐Ÿ‘ค Author
351
+
352
+ Saurabh Kumar Agarwal
353
+ 2026
354
+
355
+ ---
356
+
@@ -0,0 +1,11 @@
1
+ LICENSE
2
+ README.md
3
+ pyproject.toml
4
+ fetchxml/__init__.py
5
+ fetchxml/client.py
6
+ fetchxml/exceptions.py
7
+ fetchxml.egg-info/PKG-INFO
8
+ fetchxml.egg-info/SOURCES.txt
9
+ fetchxml.egg-info/dependency_links.txt
10
+ fetchxml.egg-info/requires.txt
11
+ fetchxml.egg-info/top_level.txt
@@ -0,0 +1 @@
1
+ requests
@@ -0,0 +1 @@
1
+ fetchxml
@@ -0,0 +1,19 @@
1
+ [build-system]
2
+ requires = ["setuptools>=61.0"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "fetchxml"
7
+ version = "0.1.0"
8
+ description = "Lightweight session-based XML fetcher with browser-like behavior."
9
+ authors = [
10
+ { name="Your Name" }
11
+ ]
12
+ readme = "README.md"
13
+ requires-python = ">=3.8"
14
+ dependencies = [
15
+ "requests"
16
+ ]
17
+
18
+ [project.urls]
19
+ "Homepage" = "https://github.com/yourusername/fetchxml"
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+