substack-api 0.0.2__tar.gz → 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,21 +1,21 @@
1
- MIT License
2
-
3
- Copyright (c) 2023 Nick Hagar
4
-
5
- Permission is hereby granted, free of charge, to any person obtaining a copy
6
- of this software and associated documentation files (the "Software"), to deal
7
- in the Software without restriction, including without limitation the rights
8
- to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
- copies of the Software, and to permit persons to whom the Software is
10
- furnished to do so, subject to the following conditions:
11
-
12
- The above copyright notice and this permission notice shall be included in all
13
- copies or substantial portions of the Software.
14
-
15
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
- AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
- OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
1
+ MIT License
2
+
3
+ Copyright (c) 2023 Nick Hagar
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
21
  SOFTWARE.
@@ -0,0 +1,61 @@
1
+ Metadata-Version: 2.1
2
+ Name: substack-api
3
+ Version: 0.1.0
4
+ Summary: unofficial python wrapper for collecting substack data
5
+ License: MIT
6
+ Author: NHagar
7
+ Author-email: nicholasrhagar@gmail.com
8
+ Requires-Python: >=3.8,<4.0
9
+ Classifier: License :: OSI Approved :: MIT License
10
+ Classifier: Programming Language :: Python :: 3
11
+ Classifier: Programming Language :: Python :: 3.8
12
+ Classifier: Programming Language :: Python :: 3.9
13
+ Classifier: Programming Language :: Python :: 3.10
14
+ Classifier: Programming Language :: Python :: 3.11
15
+ Classifier: Programming Language :: Python :: 3.12
16
+ Requires-Dist: beautifulsoup4 (>=4.12.3,<5.0.0)
17
+ Requires-Dist: requests (>=2.31.0,<3.0.0)
18
+ Description-Content-Type: text/markdown
19
+
20
+ # Substack-api
21
+
22
+ **An unofficial Python wrapper around Substack's API.**
23
+
24
+ I developed this package as a lightweight tool to help researchers collect data about Substack newsletters, and to help writers archive their work off-platform. This is not a tool designed for bulk text extraction/web scraping. It supports the following functionality:
25
+
26
+ * Download full JSON metadata about newsletters by category
27
+ * Download full JSON metadata about posts by newsletter
28
+ * Download text of individual, publicly-available posts
29
+ * List newsletter categories
30
+
31
+ ## Installation
32
+
33
+ `pip install substack-api`
34
+
35
+ ## Usage
36
+
37
+ ```from substack_api import substack_api```
38
+
39
+ List all categories on Substack:
40
+
41
+ ```
42
+ substack_api.list_all_categories()
43
+ ```
44
+
45
+ Get metadata for the first 2 pages of Technology newsletters:
46
+
47
+ ```
48
+ substack_api.get_newsletters_in_category(4, start_page=0, end_page=2)
49
+ ```
50
+
51
+ Get post metadata for the most recent 30 posts from a newsletter:
52
+
53
+ ```
54
+ substack_api.get_newsletter_post_metadata("platformer", start_offset=0, end_offset=30)
55
+ ```
56
+
57
+ Get post contents (HTML only) from one newsletter post:
58
+
59
+ ```
60
+ substack_api.get_post_contents("platformer", "how-a-single-engineer-brought-down", html_only=True)
61
+ ```
@@ -1,42 +1,42 @@
1
- # Substack-api
2
-
3
- **An unofficial Python wrapper around Substack's API.**
4
-
5
- I developed this package as a lightweight tool to help researchers collect data about Substack newsletters, and to help writers archive their work off-platform. This is not a tool designed for bulk text extraction/web scraping. It supports the following functionality:
6
-
7
- * Download full JSON metadata about newsletters by category
8
- * Download full JSON metadata about posts by newsletter
9
- * Download text of individual, publicly-available posts
10
- * List newsletter categories
11
-
12
- ## Installation
13
-
14
- `pip install substack-api`
15
-
16
- ## Usage
17
-
18
- ```from substack_api import substack_api```
19
-
20
- List all categories on Substack:
21
-
22
- ```
23
- substack_api.list_all_categories()
24
- ```
25
-
26
- Get metadata for the first 2 pages of Technology newsletters:
27
-
28
- ```
29
- substack_api.get_newsletters_in_category(4, start_page=0, end_page=2)
30
- ```
31
-
32
- Get post metadata for the most recent 30 posts from a newsletter:
33
-
34
- ```
35
- substack_api.get_newsletter_post_metadata("platformer", start_offset=0, end_offset=30)
36
- ```
37
-
38
- Get post contents (HTML only) from one newsletter post:
39
-
40
- ```
41
- substack_api.get_post_contents("platformer", "how-a-single-engineer-brought-down", html_only=True)
1
+ # Substack-api
2
+
3
+ **An unofficial Python wrapper around Substack's API.**
4
+
5
+ I developed this package as a lightweight tool to help researchers collect data about Substack newsletters, and to help writers archive their work off-platform. This is not a tool designed for bulk text extraction/web scraping. It supports the following functionality:
6
+
7
+ * Download full JSON metadata about newsletters by category
8
+ * Download full JSON metadata about posts by newsletter
9
+ * Download text of individual, publicly-available posts
10
+ * List newsletter categories
11
+
12
+ ## Installation
13
+
14
+ `pip install substack-api`
15
+
16
+ ## Usage
17
+
18
+ ```from substack_api import substack_api```
19
+
20
+ List all categories on Substack:
21
+
22
+ ```
23
+ substack_api.list_all_categories()
24
+ ```
25
+
26
+ Get metadata for the first 2 pages of Technology newsletters:
27
+
28
+ ```
29
+ substack_api.get_newsletters_in_category(4, start_page=0, end_page=2)
30
+ ```
31
+
32
+ Get post metadata for the most recent 30 posts from a newsletter:
33
+
34
+ ```
35
+ substack_api.get_newsletter_post_metadata("platformer", start_offset=0, end_offset=30)
36
+ ```
37
+
38
+ Get post contents (HTML only) from one newsletter post:
39
+
40
+ ```
41
+ substack_api.get_post_contents("platformer", "how-a-single-engineer-brought-down", html_only=True)
42
42
  ```
@@ -0,0 +1,17 @@
1
+ [tool.poetry]
2
+ name = "substack-api"
3
+ version = "0.1.0"
4
+ description = "unofficial python wrapper for collecting substack data"
5
+ authors = ["NHagar <nicholasrhagar@gmail.com>"]
6
+ license = "MIT"
7
+ readme = "README.md"
8
+
9
+ [tool.poetry.dependencies]
10
+ python = "^3.8"
11
+ requests = "^2.31.0"
12
+ beautifulsoup4 = "^4.12.3"
13
+
14
+
15
+ [build-system]
16
+ requires = ["poetry-core"]
17
+ build-backend = "poetry.core.masonry.api"
@@ -1,139 +1,178 @@
1
- import math
2
- from time import sleep
3
- from typing import Dict, List, Tuple, Union
4
-
5
- import requests
6
-
7
- HEADERS = {
8
- "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36"
9
- }
10
-
11
- def list_all_categories() -> List[Tuple[str, int]]:
12
- """
13
- Get name / id representations of all newsletter categories
14
- """
15
- endpoint_cat = "https://substack.com/api/v1/categories"
16
- r = requests.get(endpoint_cat, headers=HEADERS)
17
- categories = [(i['name'], i['id']) for i in r.json()]
18
- return categories
19
-
20
-
21
- def category_id_to_name(id: int) -> str:
22
- """
23
- Map a numerical category id to a name
24
-
25
- Parameters
26
- ----------
27
- id : Numerical category identifier
28
- """
29
- categories = list_all_categories()
30
- category_name = [i[0] for i in categories if i[1] == id]
31
- if len(category_name) > 0:
32
- return category_name[0]
33
- else:
34
- raise ValueError(f"{id} is not in Substack's list of categories")
35
-
36
-
37
- def category_name_to_id(name: str) -> int:
38
- """
39
- Map a category name to a numerical id
40
-
41
- Parameters
42
- ----------
43
- name : Category name
44
- """
45
- categories = list_all_categories()
46
- category_id = [i[1] for i in categories if i[0] == name]
47
- if len(category_id) > 0:
48
- return category_id[0]
49
- else:
50
- raise ValueError(f"{name} is not in Substack's list of categories")
51
-
52
-
53
- def get_newsletters_in_category(category_id: int, subdomains_only: bool = False, start_page: int = None, end_page: int = None) -> List:
54
- """
55
- Collects newsletter objects listed under specified category
56
-
57
- Parameters
58
- ----------
59
- category_id : Numerical category identifier
60
- subdomains_only : Whether to return only newsletter subdomains (needed for post collection), or to return all metadata
61
- start_page : Start page for paginated API results
62
- end_page : End page for paginated API results
63
- """
64
- page_num = 0 if start_page is None else start_page
65
- page_num_end = math.inf if end_page is None else end_page
66
-
67
- base_url = f"https://substack.com/api/v1/category/public/{category_id}/all?page="
68
- page_num = 0
69
- more = True
70
- all_pubs = []
71
- while more and page_num < page_num_end:
72
- full_url = base_url + str(page_num)
73
- pubs = requests.get(full_url, headers=HEADERS).json()
74
- more = pubs["more"]
75
- if subdomains_only:
76
- pubs = [i["id"] for i in pubs["publications"]]
77
- else:
78
- pubs = pubs["publications"]
79
- all_pubs.extend(pubs)
80
- page_num += 1
81
- print(f"page {page_num} done")
82
- sleep(1)
83
-
84
- return all_pubs
85
-
86
-
87
- def get_newsletter_post_metadata(newsletter_subdomain: str, slugs_only: bool = False, start_offset: int = None, end_offset: int = None) -> List:
88
- """
89
- Get available post metadata for newsletter
90
-
91
- Parameters
92
- ----------
93
- newsletter_subdomain : Substack subdomain of newsletter (can be retrieved from `get_newsletters_in_category`)
94
- slugs_only : Whether to return only post slugs (needed for post content collection), or to return all metadata
95
- start_page : Start page for paginated API results
96
- end_page : End page for paginated API results
97
- """
98
- offset_start = 0 if start_offset is None else start_offset
99
- offset_end = math.inf if end_offset is None else end_offset
100
-
101
- last_id_ref = 0
102
- all_posts = []
103
- while offset_start < offset_end:
104
- full_url = f"https://{newsletter_subdomain}.substack.com/api/v1/archive?sort=new&search=&offset={offset_start}&limit=10"
105
- posts = requests.get(full_url, headers=HEADERS).json()
106
-
107
- last_id = posts[-1]["id"]
108
- if last_id == last_id_ref:
109
- break
110
- else:
111
- last_id_ref = last_id
112
-
113
- if slugs_only:
114
- all_posts.extend([i["slug"] for i in posts])
115
- else:
116
- all_posts.extend(posts)
117
-
118
- offset_start += 10
119
- sleep(1)
120
-
121
- return all_posts
122
-
123
-
124
- def get_post_contents(newsletter_subdomain: str, slug: str, html_only: bool = False) -> Union[Dict, str]:
125
- """
126
- Gets individual post metadata and contents
127
-
128
- Parameters
129
- ----------
130
- newsletter_subdomain : Substack subdomain of newsletter (can be retrieved from `get_newsletters_in_category`)
131
- slug : Slug of post to retrieve (can be retrieved from `get_newsletter_post_metadata`)
132
- html_only : Whether to get only HTML of body text, or all metadata/content
133
- """
134
- endpoint = f"https://{newsletter_subdomain}.substack.com/api/v1/posts/{slug}"
135
- post_info = requests.get(endpoint, headers=HEADERS).json()
136
- if html_only:
137
- return post_info["body_html"]
138
- else:
139
- return post_info
1
+ import math
2
+ from time import sleep
3
+ from typing import Dict, List, Tuple, Union
4
+
5
+ from bs4 import BeautifulSoup
6
+ import requests
7
+
8
+
9
+ HEADERS = {
10
+ "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36"
11
+ }
12
+
13
+
14
+ def list_all_categories() -> List[Tuple[str, int]]:
15
+ """
16
+ Get name / id representations of all newsletter categories
17
+ """
18
+ endpoint_cat = "https://substack.com/api/v1/categories"
19
+ r = requests.get(endpoint_cat, headers=HEADERS, timeout=30)
20
+ categories = [(i["name"], i["id"]) for i in r.json()]
21
+ return categories
22
+
23
+
24
+ def category_id_to_name(user_id: int) -> str:
25
+ """
26
+ Map a numerical category id to a name
27
+
28
+ Parameters
29
+ ----------
30
+ id : Numerical category identifier
31
+ """
32
+ categories = list_all_categories()
33
+ category_name = [i[0] for i in categories if i[1] == user_id]
34
+ if len(category_name) > 0:
35
+ return category_name[0]
36
+
37
+ raise ValueError(f"{user_id} is not in Substack's list of categories")
38
+
39
+
40
+ def category_name_to_id(name: str) -> int:
41
+ """
42
+ Map a category name to a numerical id
43
+
44
+ Parameters
45
+ ----------
46
+ name : Category name
47
+ """
48
+ categories = list_all_categories()
49
+ category_id = [i[1] for i in categories if i[0] == name]
50
+ if len(category_id) > 0:
51
+ return category_id[0]
52
+ else:
53
+ raise ValueError(f"{name} is not in Substack's list of categories")
54
+
55
+
56
+ def get_newsletters_in_category(
57
+ category_id: int,
58
+ subdomains_only: bool = False,
59
+ start_page: int = None,
60
+ end_page: int = None,
61
+ ) -> List:
62
+ """
63
+ Collects newsletter objects listed under specified category
64
+
65
+ Parameters
66
+ ----------
67
+ category_id : Numerical category identifier
68
+ subdomains_only : Whether to return only newsletter subdomains (needed for post collection)
69
+ start_page : Start page for paginated API results
70
+ end_page : End page for paginated API results
71
+ """
72
+ page_num = 0 if start_page is None else start_page
73
+ page_num_end = math.inf if end_page is None else end_page
74
+
75
+ base_url = f"https://substack.com/api/v1/category/public/{category_id}/all?page="
76
+ page_num = 0
77
+ more = True
78
+ all_pubs = []
79
+ while more and page_num < page_num_end:
80
+ full_url = base_url + str(page_num)
81
+ pubs = requests.get(full_url, headers=HEADERS, timeout=30).json()
82
+ more = pubs["more"]
83
+ if subdomains_only:
84
+ pubs = [i["id"] for i in pubs["publications"]]
85
+ else:
86
+ pubs = pubs["publications"]
87
+ all_pubs.extend(pubs)
88
+ page_num += 1
89
+ print(f"page {page_num} done")
90
+ sleep(1)
91
+
92
+ return all_pubs
93
+
94
+
95
+ def get_newsletter_post_metadata(
96
+ newsletter_subdomain: str,
97
+ slugs_only: bool = False,
98
+ start_offset: int = None,
99
+ end_offset: int = None,
100
+ ) -> List:
101
+ """
102
+ Get available post metadata for newsletter
103
+
104
+ Parameters
105
+ ----------
106
+ newsletter_subdomain : Substack subdomain of newsletter
107
+ slugs_only : Whether to return only post slugs (needed for post content collection)
108
+ start_page : Start page for paginated API results
109
+ end_page : End page for paginated API results
110
+ """
111
+ offset_start = 0 if start_offset is None else start_offset
112
+ offset_end = math.inf if end_offset is None else end_offset
113
+
114
+ last_id_ref = 0
115
+ all_posts = []
116
+ while offset_start < offset_end:
117
+ full_url = f"https://{newsletter_subdomain}.substack.com/api/v1/archive?sort=new&search=&offset={offset_start}&limit=10"
118
+ posts = requests.get(full_url, headers=HEADERS, timeout=30).json()
119
+
120
+ if len(posts) == 0:
121
+ break
122
+
123
+ last_id = posts[-1]["id"]
124
+ if last_id == last_id_ref:
125
+ break
126
+
127
+ last_id_ref = last_id
128
+
129
+ if slugs_only:
130
+ all_posts.extend([i["slug"] for i in posts])
131
+ else:
132
+ all_posts.extend(posts)
133
+
134
+ offset_start += 10
135
+ sleep(1)
136
+
137
+ return all_posts
138
+
139
+
140
+ def get_post_contents(
141
+ newsletter_subdomain: str, slug: str, html_only: bool = False
142
+ ) -> Union[Dict, str]:
143
+ """
144
+ Gets individual post metadata and contents
145
+
146
+ Parameters
147
+ ----------
148
+ newsletter_subdomain : Substack subdomain of newsletter
149
+ slug : Slug of post to retrieve (can be retrieved from `get_newsletter_post_metadata`)
150
+ html_only : Whether to get only HTML of body text, or all metadata/content
151
+ """
152
+ endpoint = f"https://{newsletter_subdomain}.substack.com/api/v1/posts/{slug}"
153
+ post_info = requests.get(endpoint, headers=HEADERS, timeout=30).json()
154
+ if html_only:
155
+ return post_info["body_html"]
156
+
157
+ return post_info
158
+
159
+
160
+ def get_newsletter_recommendations(newsletter_subdomain: str) -> List[Dict[str, str]]:
161
+ """
162
+ Gets recommended newsletters for a given newsletter
163
+
164
+ Parameters
165
+ ----------
166
+ newsletter_subdomain : Substack subdomain of newsletter
167
+ """
168
+ endpoint = f"https://{newsletter_subdomain}.substack.com/recommendations"
169
+ r = requests.get(endpoint, headers=HEADERS, timeout=30)
170
+ recs = r.text
171
+ soup = BeautifulSoup(recs, "html.parser")
172
+ div_elements = soup.find_all("div", class_="publication-content")
173
+ a_elements = [div.find("a") for div in div_elements]
174
+ titles = [i.text for i in soup.find_all("div", {"class": "publication-title"})]
175
+ links = [i["href"].split("?")[0] for i in a_elements]
176
+ results = [{"title": t, "url": u} for t, u in zip(titles, links)]
177
+
178
+ return results
@@ -0,0 +1,77 @@
1
+ from typing import Dict, List
2
+
3
+ import requests
4
+
5
+ HEADERS = {
6
+ "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36"
7
+ }
8
+
9
+
10
+ def get_user_id(username: str) -> int:
11
+ """
12
+ Get the user ID of a Substack user.
13
+
14
+ Parameters
15
+ ----------
16
+ username : str
17
+ The username of the Substack user.
18
+ """
19
+ endpoint = f"https://substack.com/api/v1/user/{username}/public_profile"
20
+ r = requests.get(endpoint, headers=HEADERS, timeout=30)
21
+ user_id = r.json()["id"]
22
+ return user_id
23
+
24
+
25
+ def get_user_reads(username: str) -> List[Dict[str, str]]:
26
+ """
27
+ Get newsletters from the "Reads" section of a user's profile.
28
+
29
+ Parameters
30
+ ----------
31
+ username : str
32
+ The username of the Substack user.
33
+ """
34
+ endpoint = f"https://substack.com/api/v1/user/{username}/public_profile"
35
+ r = requests.get(endpoint, headers=HEADERS, timeout=30)
36
+ user_data = r.json()
37
+ reads = [
38
+ {
39
+ "publication_id": i["publication"]["id"],
40
+ "publication_name": i["publication"]["name"],
41
+ "subscription_status": i["membership_state"],
42
+ }
43
+ for i in user_data["subscriptions"]
44
+ ]
45
+ return reads
46
+
47
+
48
+ def get_user_likes(user_id: int):
49
+ """
50
+ Get liked posts from a user's profile.
51
+
52
+ Parameters
53
+ ----------
54
+ user_id : int
55
+ The user ID of the Substack user.
56
+ """
57
+ endpoint = (
58
+ f"https://substack.com/api/v1/reader/feed/profile/{user_id}?types%5B%5D=like"
59
+ )
60
+ r = requests.get(endpoint, headers=HEADERS, timeout=30)
61
+ likes = r.json()["items"]
62
+ return likes
63
+
64
+
65
+ def get_user_notes(user_id: int):
66
+ """
67
+ Get notes and comments posted by a user.
68
+
69
+ Parameters
70
+ ----------
71
+ user_id : int
72
+ The user ID of the Substack user.
73
+ """
74
+ endpoint = f"https://substack.com/api/v1/reader/feed/profile/{user_id}"
75
+ r = requests.get(endpoint, headers=HEADERS, timeout=30)
76
+ notes = r.json()["items"]
77
+ return notes
@@ -1,4 +0,0 @@
1
- .conda/
2
- __pycache__/
3
- dist/
4
- .env
@@ -1,76 +0,0 @@
1
- Metadata-Version: 2.1
2
- Name: substack-api
3
- Version: 0.0.2
4
- Summary: The unofficial Substack API wrapper for Python.
5
- Project-URL: Homepage, https://github.com/NHagar/substack_api
6
- Author-email: Nick Hagar <nicholasrhagar@gmail.com>
7
- License: MIT License
8
-
9
- Copyright (c) 2023 Nick Hagar
10
-
11
- Permission is hereby granted, free of charge, to any person obtaining a copy
12
- of this software and associated documentation files (the "Software"), to deal
13
- in the Software without restriction, including without limitation the rights
14
- to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
15
- copies of the Software, and to permit persons to whom the Software is
16
- furnished to do so, subject to the following conditions:
17
-
18
- The above copyright notice and this permission notice shall be included in all
19
- copies or substantial portions of the Software.
20
-
21
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
22
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
23
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
24
- AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
25
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
26
- OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
27
- SOFTWARE.
28
- License-File: LICENSE
29
- Classifier: License :: OSI Approved :: MIT License
30
- Classifier: Operating System :: OS Independent
31
- Classifier: Programming Language :: Python :: 3
32
- Requires-Python: >=3.8
33
- Description-Content-Type: text/markdown
34
-
35
- # Substack-api
36
-
37
- **An unofficial Python wrapper around Substack's API.**
38
-
39
- I developed this package as a lightweight tool to help researchers collect data about Substack newsletters, and to help writers archive their work off-platform. This is not a tool designed for bulk text extraction/web scraping. It supports the following functionality:
40
-
41
- * Download full JSON metadata about newsletters by category
42
- * Download full JSON metadata about posts by newsletter
43
- * Download text of individual, publicly-available posts
44
- * List newsletter categories
45
-
46
- ## Installation
47
-
48
- `pip install substack-api`
49
-
50
- ## Usage
51
-
52
- ```from substack_api import substack_api```
53
-
54
- List all categories on Substack:
55
-
56
- ```
57
- substack_api.list_all_categories()
58
- ```
59
-
60
- Get metadata for the first 2 pages of Technology newsletters:
61
-
62
- ```
63
- substack_api.get_newsletters_in_category(4, start_page=0, end_page=2)
64
- ```
65
-
66
- Get post metadata for the most recent 30 posts from a newsletter:
67
-
68
- ```
69
- substack_api.get_newsletter_post_metadata("platformer", start_offset=0, end_offset=30)
70
- ```
71
-
72
- Get post contents (HTML only) from one newsletter post:
73
-
74
- ```
75
- substack_api.get_post_contents("platformer", "how-a-single-engineer-brought-down", html_only=True)
76
- ```
@@ -1,23 +0,0 @@
1
- [build-system]
2
- requires = ["hatchling",
3
- "requests"]
4
- build-backend = "hatchling.build"
5
-
6
- [project]
7
- name = "substack-api"
8
- version = "0.0.2"
9
- authors = [
10
- { name="Nick Hagar", email="nicholasrhagar@gmail.com" },
11
- ]
12
- description = "The unofficial Substack API wrapper for Python."
13
- readme = "README.md"
14
- license = { file="LICENSE" }
15
- requires-python = ">=3.8"
16
- classifiers = [
17
- "Programming Language :: Python :: 3",
18
- "License :: OSI Approved :: MIT License",
19
- "Operating System :: OS Independent",
20
- ]
21
-
22
- [project.urls]
23
- "Homepage" = "https://github.com/NHagar/substack_api"