ilpost-api-wrapper 0.3.0__tar.gz → 0.4.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: ilpost-api-wrapper
3
- Version: 0.3.0
3
+ Version: 0.4.0
4
4
  Summary: Python wrapper for Il Post newspaper API
5
5
  Author: Antonio Girasella
6
6
  License-Expression: MIT
@@ -72,6 +72,9 @@ for doc in result.docs:
72
72
  | `content_type` | `ContentType` | `None` | Filter by content type |
73
73
  | `category` | `str` | `None` | Editorial category (articles only) |
74
74
  | `date_range` | `DateRange` | `None` | Publication date filter |
75
+ | `fetch_content` | `bool` | `False` | Scrape and return full article text for each article result (see [`Document.content`](#document)) |
76
+
77
+ > `fetch_content` is available on `search()`, `search_articles()`, and `paginate()`. It has no effect on podcast or newsletter results — `doc.content` will be `None` for those types.
75
78
 
76
79
  #### Enums
77
80
 
@@ -127,6 +130,7 @@ for doc in result.docs:
127
130
  | `category` | `str \| None` | Editorial category (articles only) |
128
131
  | `post_tag_text` | `list[str]` | Tags (articles only) |
129
132
  | `derived_info` | `dict` | Extra data: episode or newsletter metadata |
133
+ | `content` | `str \| None` | Full article body text, populated when `fetch_content=True` (articles only) |
130
134
  | `is_article` | `bool` | Convenience property |
131
135
  | `is_podcast` | `bool` | Convenience property |
132
136
  | `is_newsletter` | `bool` | Convenience property |
@@ -162,6 +166,17 @@ for group in result.filters:
162
166
  print(f"{group.label}:")
163
167
  for opt in group.options:
164
168
  print(f" {opt.label}: {opt.doc_count}")
169
+
170
+ # Fetch full article text alongside search results
171
+ result = client.search_articles("economia", hits=5, fetch_content=True)
172
+ for doc in result.docs:
173
+ print(doc.title)
174
+ if doc.content:
175
+ print(doc.content[:300])
176
+
177
+ # Use the scraper directly (e.g. for parallel fetching)
178
+ from ilpost import fetch_article_content
179
+ text = fetch_article_content("https://www.ilpost.it/2026/04/02/...")
165
180
  ```
166
181
 
167
182
  ## CLI
@@ -173,10 +188,12 @@ usage: ilpost-search [-h] [--type {articles,podcasts,newsletters}]
173
188
  [--sort {relevance,newest,oldest}]
174
189
  [--date {all,year,month}] [--category CATEGORY]
175
190
  [--page PAGE] [--hits HITS] [--all-pages]
176
- [--max-pages N]
191
+ [--max-pages N] [--fetch-content]
177
192
  query
178
193
  ```
179
194
 
195
+ Each result is printed with labelled fields: `type`, `category`, `title`, `link`, `date`, `score`, `summary`, and either `content` (when `--fetch-content` is used) or `excerpt` (search highlight).
196
+
180
197
  ```bash
181
198
  # Basic search
182
199
  ilpost-search berlusconi
@@ -192,6 +209,9 @@ ilpost-search sicilia --sort oldest --hits 5 --page 2
192
209
 
193
210
  # Fetch all pages of newsletter results (up to 3 pages)
194
211
  ilpost-search economia --type newsletters --all-pages --max-pages 3
212
+
213
+ # Fetch full article text for each result
214
+ ilpost-search bondi --type articles --hits 3 --fetch-content
195
215
  ```
196
216
 
197
217
  ## Notes
@@ -1,176 +1,196 @@
1
- # ilpost-api-wrapper
2
-
3
- A Python wrapper for the [Il Post](https://www.ilpost.it) public search API.
4
- Searches articles, podcast episodes, and newsletters — no authentication required.
5
-
6
- ## Installation
7
-
8
- ```bash
9
- pip install ilpost-api-wrapper
10
- ```
11
-
12
- Requires Python 3.9+. No third-party dependencies.
13
-
14
- ## Quick start
15
-
16
- ```python
17
- from ilpost import IlPostClient, SortOrder, ContentType, DateRange
18
-
19
- client = IlPostClient()
20
-
21
- result = client.search("berlusconi")
22
- for doc in result.docs:
23
- print(doc.title, doc.link)
24
- ```
25
-
26
- ## API reference
27
-
28
- ### `IlPostClient(timeout=10)`
29
-
30
- | Method | Description |
31
- |--------|-------------|
32
- | `search(query, ...)` | General search across all content types |
33
- | `search_articles(query, ...)` | Articles only |
34
- | `search_podcasts(query, ...)` | Podcast episodes only |
35
- | `search_newsletters(query, ...)` | Newsletter issues only |
36
- | `paginate(query, ...)` | Generator that yields one `SearchResult` per page |
37
-
38
- #### Common parameters
39
-
40
- | Parameter | Type | Default | Description |
41
- |-----------|------|---------|-------------|
42
- | `query` | `str` | — | Search term |
43
- | `page` | `int` | `1` | Page number (1-based) |
44
- | `hits` | `int` | `10` | Results per page |
45
- | `sort` | `SortOrder` | `RELEVANCE` | Sort order |
46
- | `content_type` | `ContentType` | `None` | Filter by content type |
47
- | `category` | `str` | `None` | Editorial category (articles only) |
48
- | `date_range` | `DateRange` | `None` | Publication date filter |
49
-
50
- #### Enums
51
-
52
- **`SortOrder`**
53
- | Value | Description |
54
- |-------|-------------|
55
- | `RELEVANCE` | Sort by relevance score (default) |
56
- | `NEWEST` | Most recent first |
57
- | `OLDEST` | Oldest first |
58
-
59
- **`ContentType`**
60
- | Value | Description |
61
- |-------|-------------|
62
- | `ARTICLES` | Articles and news posts |
63
- | `PODCASTS` | Podcast episodes |
64
- | `NEWSLETTERS` | Newsletter issues |
65
-
66
- **`DateRange`**
67
- | Value | Description |
68
- |-------|-------------|
69
- | `ALL_TIME` | Entire archive (default) |
70
- | `PAST_YEAR` | Past 12 months |
71
- | `PAST_30_DAYS` | Past 30 days |
72
-
73
- ### `SearchResult`
74
-
75
- | Attribute | Type | Description |
76
- |-----------|------|-------------|
77
- | `total` | `int` | Total number of matching results |
78
- | `docs` | `list[Document]` | Results for this page |
79
- | `filters` | `list[FilterGroup]` | Available filters with counts |
80
- | `sort` | `str` | Active sort value |
81
- | `hits` | `int` | Page size |
82
- | `page` | `int` | Current page number |
83
- | `total_pages` | `int` | Total number of pages |
84
- | `has_next_page` | `bool` | Whether a next page exists |
85
- | `has_prev_page` | `bool` | Whether a previous page exists |
86
-
87
- ### `Document`
88
-
89
- | Attribute | Type | Description |
90
- |-----------|------|-------------|
91
- | `id` | `int` | Unique content identifier |
92
- | `type` | `str` | `"post"`, `"episodes"`, or `"newsletter"` |
93
- | `title` | `str` | Content title |
94
- | `link` | `str` | URL to the content page |
95
- | `timestamp` | `str` | Publication date (ISO 8601, Italian local time) |
96
- | `summary` | `str` | Short excerpt |
97
- | `image` | `str` | Cover image URL |
98
- | `score` | `float` | Relevance score (`0.0` when sorting by date) |
99
- | `subscriber` | `bool` | `True` if content is paywalled |
100
- | `highlight` | `str \| None` | Snippet with matched term in `<span>` tags |
101
- | `category` | `str \| None` | Editorial category (articles only) |
102
- | `post_tag_text` | `list[str]` | Tags (articles only) |
103
- | `derived_info` | `dict` | Extra data: episode or newsletter metadata |
104
- | `is_article` | `bool` | Convenience property |
105
- | `is_podcast` | `bool` | Convenience property |
106
- | `is_newsletter` | `bool` | Convenience property |
107
- | `is_paywalled` | `bool` | Alias for `subscriber` |
108
-
109
- ## Examples
110
-
111
- ```python
112
- from ilpost import IlPostClient, SortOrder, ContentType, DateRange
113
-
114
- client = IlPostClient()
115
-
116
- # Most recent articles in politics
117
- result = client.search_articles(
118
- "renzi",
119
- sort=SortOrder.NEWEST,
120
- category="politica",
121
- date_range=DateRange.PAST_30_DAYS,
122
- )
123
-
124
- # Podcast search
125
- result = client.search_podcasts("cacao", sort=SortOrder.NEWEST)
126
-
127
- # Paginate through all results, 5 per page
128
- for page in client.paginate("sicilia", hits=5, max_pages=10):
129
- print(f"Page {page.page}/{page.total_pages}")
130
- for doc in page.docs:
131
- print(f" [{doc.type}] {doc.title}")
132
-
133
- # Access filter counts from a response
134
- result = client.search("europa")
135
- for group in result.filters:
136
- print(f"{group.label}:")
137
- for opt in group.options:
138
- print(f" {opt.label}: {opt.doc_count}")
139
- ```
140
-
141
- ## CLI
142
-
143
- The `ilpost-search` command is included with the package:
144
-
145
- ```
146
- usage: ilpost-search [-h] [--type {articles,podcasts,newsletters}]
147
- [--sort {relevance,newest,oldest}]
148
- [--date {all,year,month}] [--category CATEGORY]
149
- [--page PAGE] [--hits HITS] [--all-pages]
150
- [--max-pages N]
151
- query
152
- ```
153
-
154
- ```bash
155
- # Basic search
156
- ilpost-search berlusconi
157
-
158
- # Most recent articles in politics
159
- ilpost-search renzi --type articles --sort newest --category politica
160
-
161
- # Podcast search, past 30 days
162
- ilpost-search cacao --type podcasts --date month
163
-
164
- # Page 2, 5 results per page, oldest first
165
- ilpost-search sicilia --sort oldest --hits 5 --page 2
166
-
167
- # Fetch all pages of newsletter results (up to 3 pages)
168
- ilpost-search economia --type newsletters --all-pages --max-pages 3
169
- ```
170
-
171
- ## Notes
172
-
173
- - Paywalled content (`subscriber: true`) is included in search results — title, summary, and highlight are visible, but the full article requires an active ilpost.it subscription.
174
- - When sorting by date (`NEWEST` or `OLDEST`), `score` is always `0.0`.
175
- - The `category` filter only applies to articles. It is ignored by the server when `content_type=PODCASTS`.
176
- - Timestamps are in Italian local time (CET/CEST) with no UTC offset.
1
+ # ilpost-api-wrapper
2
+
3
+ A Python wrapper for the [Il Post](https://www.ilpost.it) public search API.
4
+ Searches articles, podcast episodes, and newsletters — no authentication required.
5
+
6
+ ## Installation
7
+
8
+ ```bash
9
+ pip install ilpost-api-wrapper
10
+ ```
11
+
12
+ Requires Python 3.9+. No third-party dependencies.
13
+
14
+ ## Quick start
15
+
16
+ ```python
17
+ from ilpost import IlPostClient, SortOrder, ContentType, DateRange
18
+
19
+ client = IlPostClient()
20
+
21
+ result = client.search("berlusconi")
22
+ for doc in result.docs:
23
+ print(doc.title, doc.link)
24
+ ```
25
+
26
+ ## API reference
27
+
28
+ ### `IlPostClient(timeout=10)`
29
+
30
+ | Method | Description |
31
+ |--------|-------------|
32
+ | `search(query, ...)` | General search across all content types |
33
+ | `search_articles(query, ...)` | Articles only |
34
+ | `search_podcasts(query, ...)` | Podcast episodes only |
35
+ | `search_newsletters(query, ...)` | Newsletter issues only |
36
+ | `paginate(query, ...)` | Generator that yields one `SearchResult` per page |
37
+
38
+ #### Common parameters
39
+
40
+ | Parameter | Type | Default | Description |
41
+ |-----------|------|---------|-------------|
42
+ | `query` | `str` | — | Search term |
43
+ | `page` | `int` | `1` | Page number (1-based) |
44
+ | `hits` | `int` | `10` | Results per page |
45
+ | `sort` | `SortOrder` | `RELEVANCE` | Sort order |
46
+ | `content_type` | `ContentType` | `None` | Filter by content type |
47
+ | `category` | `str` | `None` | Editorial category (articles only) |
48
+ | `date_range` | `DateRange` | `None` | Publication date filter |
49
+ | `fetch_content` | `bool` | `False` | Scrape and return full article text for each article result (see [`Document.content`](#document)) |
50
+
51
+ > `fetch_content` is available on `search()`, `search_articles()`, and `paginate()`. It has no effect on podcast or newsletter results — `doc.content` will be `None` for those types.
52
+
53
+ #### Enums
54
+
55
+ **`SortOrder`**
56
+ | Value | Description |
57
+ |-------|-------------|
58
+ | `RELEVANCE` | Sort by relevance score (default) |
59
+ | `NEWEST` | Most recent first |
60
+ | `OLDEST` | Oldest first |
61
+
62
+ **`ContentType`**
63
+ | Value | Description |
64
+ |-------|-------------|
65
+ | `ARTICLES` | Articles and news posts |
66
+ | `PODCASTS` | Podcast episodes |
67
+ | `NEWSLETTERS` | Newsletter issues |
68
+
69
+ **`DateRange`**
70
+ | Value | Description |
71
+ |-------|-------------|
72
+ | `ALL_TIME` | Entire archive (default) |
73
+ | `PAST_YEAR` | Past 12 months |
74
+ | `PAST_30_DAYS` | Past 30 days |
75
+
76
+ ### `SearchResult`
77
+
78
+ | Attribute | Type | Description |
79
+ |-----------|------|-------------|
80
+ | `total` | `int` | Total number of matching results |
81
+ | `docs` | `list[Document]` | Results for this page |
82
+ | `filters` | `list[FilterGroup]` | Available filters with counts |
83
+ | `sort` | `str` | Active sort value |
84
+ | `hits` | `int` | Page size |
85
+ | `page` | `int` | Current page number |
86
+ | `total_pages` | `int` | Total number of pages |
87
+ | `has_next_page` | `bool` | Whether a next page exists |
88
+ | `has_prev_page` | `bool` | Whether a previous page exists |
89
+
90
+ ### `Document`
91
+
92
+ | Attribute | Type | Description |
93
+ |-----------|------|-------------|
94
+ | `id` | `int` | Unique content identifier |
95
+ | `type` | `str` | `"post"`, `"episodes"`, or `"newsletter"` |
96
+ | `title` | `str` | Content title |
97
+ | `link` | `str` | URL to the content page |
98
+ | `timestamp` | `str` | Publication date (ISO 8601, Italian local time) |
99
+ | `summary` | `str` | Short excerpt |
100
+ | `image` | `str` | Cover image URL |
101
+ | `score` | `float` | Relevance score (`0.0` when sorting by date) |
102
+ | `subscriber` | `bool` | `True` if content is paywalled |
103
+ | `highlight` | `str \| None` | Snippet with matched term in `<span>` tags |
104
+ | `category` | `str \| None` | Editorial category (articles only) |
105
+ | `post_tag_text` | `list[str]` | Tags (articles only) |
106
+ | `derived_info` | `dict` | Extra data: episode or newsletter metadata |
107
+ | `content` | `str \| None` | Full article body text, populated when `fetch_content=True` (articles only) |
108
+ | `is_article` | `bool` | Convenience property |
109
+ | `is_podcast` | `bool` | Convenience property |
110
+ | `is_newsletter` | `bool` | Convenience property |
111
+ | `is_paywalled` | `bool` | Alias for `subscriber` |
112
+
113
+ ## Examples
114
+
115
+ ```python
116
+ from ilpost import IlPostClient, SortOrder, ContentType, DateRange
117
+
118
+ client = IlPostClient()
119
+
120
+ # Most recent articles in politics
121
+ result = client.search_articles(
122
+ "renzi",
123
+ sort=SortOrder.NEWEST,
124
+ category="politica",
125
+ date_range=DateRange.PAST_30_DAYS,
126
+ )
127
+
128
+ # Podcast search
129
+ result = client.search_podcasts("cacao", sort=SortOrder.NEWEST)
130
+
131
+ # Paginate through all results, 5 per page
132
+ for page in client.paginate("sicilia", hits=5, max_pages=10):
133
+ print(f"Page {page.page}/{page.total_pages}")
134
+ for doc in page.docs:
135
+ print(f" [{doc.type}] {doc.title}")
136
+
137
+ # Access filter counts from a response
138
+ result = client.search("europa")
139
+ for group in result.filters:
140
+ print(f"{group.label}:")
141
+ for opt in group.options:
142
+ print(f" {opt.label}: {opt.doc_count}")
143
+
144
+ # Fetch full article text alongside search results
145
+ result = client.search_articles("economia", hits=5, fetch_content=True)
146
+ for doc in result.docs:
147
+ print(doc.title)
148
+ if doc.content:
149
+ print(doc.content[:300])
150
+
151
+ # Use the scraper directly (e.g. for parallel fetching)
152
+ from ilpost import fetch_article_content
153
+ text = fetch_article_content("https://www.ilpost.it/2026/04/02/...")
154
+ ```
155
+
156
+ ## CLI
157
+
158
+ The `ilpost-search` command is included with the package:
159
+
160
+ ```
161
+ usage: ilpost-search [-h] [--type {articles,podcasts,newsletters}]
162
+ [--sort {relevance,newest,oldest}]
163
+ [--date {all,year,month}] [--category CATEGORY]
164
+ [--page PAGE] [--hits HITS] [--all-pages]
165
+ [--max-pages N] [--fetch-content]
166
+ query
167
+ ```
168
+
169
+ Each result is printed with labelled fields: `type`, `category`, `title`, `link`, `date`, `score`, `summary`, and either `content` (when `--fetch-content` is used) or `excerpt` (search highlight).
170
+
171
+ ```bash
172
+ # Basic search
173
+ ilpost-search berlusconi
174
+
175
+ # Most recent articles in politics
176
+ ilpost-search renzi --type articles --sort newest --category politica
177
+
178
+ # Podcast search, past 30 days
179
+ ilpost-search cacao --type podcasts --date month
180
+
181
+ # Page 2, 5 results per page, oldest first
182
+ ilpost-search sicilia --sort oldest --hits 5 --page 2
183
+
184
+ # Fetch all pages of newsletter results (up to 3 pages)
185
+ ilpost-search economia --type newsletters --all-pages --max-pages 3
186
+
187
+ # Fetch full article text for each result
188
+ ilpost-search bondi --type articles --hits 3 --fetch-content
189
+ ```
190
+
191
+ ## Notes
192
+
193
+ - Paywalled content (`subscriber: true`) is included in search results — title, summary, and highlight are visible, but the full article requires an active ilpost.it subscription.
194
+ - When sorting by date (`NEWEST` or `OLDEST`), `score` is always `0.0`.
195
+ - The `category` filter only applies to articles. It is ignored by the server when `content_type=PODCASTS`.
196
+ - Timestamps are in Italian local time (CET/CEST) with no UTC offset.
@@ -1,13 +1,16 @@
1
- from .client import IlPostClient
2
- from .models import SearchResult, Document, FilterGroup, FilterOption, SortOrder, ContentType, DateRange
3
-
4
- __all__ = [
5
- "IlPostClient",
6
- "SearchResult",
7
- "Document",
8
- "FilterGroup",
9
- "FilterOption",
10
- "SortOrder",
11
- "ContentType",
12
- "DateRange",
13
- ]
1
+ from .client import IlPostClient
2
+ from .models import SearchResult, Document, FilterGroup, FilterOption, SortOrder, ContentType, DateRange
3
+ from .scraper import ArticleScraper, fetch_article_content
4
+
5
+ __all__ = [
6
+ "IlPostClient",
7
+ "SearchResult",
8
+ "Document",
9
+ "FilterGroup",
10
+ "FilterOption",
11
+ "SortOrder",
12
+ "ContentType",
13
+ "DateRange",
14
+ "ArticleScraper",
15
+ "fetch_article_content",
16
+ ]