xtimeline 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- xtimeline-0.1.0/LICENSE +21 -0
- xtimeline-0.1.0/PKG-INFO +107 -0
- xtimeline-0.1.0/README.md +93 -0
- xtimeline-0.1.0/pyproject.toml +28 -0
- xtimeline-0.1.0/setup.cfg +4 -0
- xtimeline-0.1.0/src/__init__.py +5 -0
- xtimeline-0.1.0/src/tweet.py +71 -0
- xtimeline-0.1.0/src/xclient.py +522 -0
- xtimeline-0.1.0/src/xtimeline.egg-info/PKG-INFO +107 -0
- xtimeline-0.1.0/src/xtimeline.egg-info/SOURCES.txt +11 -0
- xtimeline-0.1.0/src/xtimeline.egg-info/dependency_links.txt +1 -0
- xtimeline-0.1.0/src/xtimeline.egg-info/requires.txt +2 -0
- xtimeline-0.1.0/src/xtimeline.egg-info/top_level.txt +3 -0
xtimeline-0.1.0/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2024 Stephan Akkerman
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
xtimeline-0.1.0/PKG-INFO
ADDED
|
@@ -0,0 +1,107 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: xtimeline
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Lightweight X/Twitter timeline client (GraphQL via cURL or auth strategies)
|
|
5
|
+
Author: Stephan Akkerman
|
|
6
|
+
License: MIT
|
|
7
|
+
Keywords: twitter,x,timeline,scraping,aiohttp,asyncio
|
|
8
|
+
Requires-Python: >=3.10
|
|
9
|
+
Description-Content-Type: text/markdown
|
|
10
|
+
License-File: LICENSE
|
|
11
|
+
Requires-Dist: aiohttp>=3.12.15
|
|
12
|
+
Requires-Dist: uncurl>=0.0.11
|
|
13
|
+
Dynamic: license-file
|
|
14
|
+
|
|
15
|
+
# X-Timeline Scraper
|
|
16
|
+
A Python client to scrape tweets from X (formerly Twitter) timelines using a cURL command.
|
|
17
|
+
|
|
18
|
+
<!-- Add a banner here like: https://github.com/StephanAkkerman/fintwit-bot/blob/main/img/logo/fintwit-banner.png -->
|
|
19
|
+
|
|
20
|
+
---
|
|
21
|
+
<!-- Adjust the link of the second badge to your own repo -->
|
|
22
|
+
<p align="center">
|
|
23
|
+
<img src="https://img.shields.io/badge/python-3.13-blue.svg" alt="Supported versions">
|
|
24
|
+
<img src="https://img.shields.io/github/license/StephanAkkerman/x-timeline-scraper.svg?color=brightgreen" alt="License">
|
|
25
|
+
<a href="https://github.com/psf/black"><img src="https://img.shields.io/badge/code%20style-black-000000.svg" alt="Code style: black"></a>
|
|
26
|
+
</p>
|
|
27
|
+
|
|
28
|
+
## Introduction
|
|
29
|
+
|
|
30
|
+
This project provides a Python client to scrape tweets from X (formerly Twitter) timelines using a cURL command. It leverages asynchronous programming for efficient data retrieval and includes features for parsing tweet data.
|
|
31
|
+
|
|
32
|
+
## Table of Contents 🗂
|
|
33
|
+
|
|
34
|
+
- [Installation](#installation)
|
|
35
|
+
- [Usage](#usage)
|
|
36
|
+
- [Citation](#citation)
|
|
37
|
+
- [Contributing](#contributing)
|
|
38
|
+
- [License](#license)
|
|
39
|
+
|
|
40
|
+
## Installation ⚙️
|
|
41
|
+
<!-- Adjust the link of the second command to your own repo -->
|
|
42
|
+
|
|
43
|
+
The required packages to run this code can be found in the requirements.txt file. To run this file, execute the following code block after cloning the repository:
|
|
44
|
+
|
|
45
|
+
```bash
|
|
46
|
+
pip install .
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
or
|
|
50
|
+
|
|
51
|
+
```bash
|
|
52
|
+
pip install git+https://github.com/StephanAkkerman/x-timeline-scraper.git
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
## Usage ⌨️
|
|
56
|
+
|
|
57
|
+
To use the X-Timeline Scraper, you need to provide a cURL command that accesses the desired X timeline. The instructions can be found in [curl_example.txt](curl_example.txt). Then, you can use the `XTimelineClient` class to fetch and parse tweets.
|
|
58
|
+
|
|
59
|
+
Here's a simple example of how to use the client:
|
|
60
|
+
|
|
61
|
+
```python
|
|
62
|
+
import asyncio
|
|
63
|
+
from src.xclient import XTimelineClient
|
|
64
|
+
|
|
65
|
+
async with XTimelineClient(
|
|
66
|
+
"curl.txt", persist_last_id_path="state/last_id.txt"
|
|
67
|
+
) as xc:
|
|
68
|
+
tweets = await xc.fetch_tweets(update_last_id=False)
|
|
69
|
+
for t in tweets:
|
|
70
|
+
print(t.to_markdown())
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
You can also stream new tweets in real-time:
|
|
74
|
+
|
|
75
|
+
```python
|
|
76
|
+
import asyncio
|
|
77
|
+
from src.xclient import XTimelineClient
|
|
78
|
+
async with XTimelineClient(
|
|
79
|
+
"curl.txt", persist_last_id_path="state/last_id.txt"
|
|
80
|
+
) as xc:
|
|
81
|
+
async for t in xc.stream(interval_s=5.0):
|
|
82
|
+
print(t.to_markdown())
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
## Citation ✍️
|
|
86
|
+
<!-- Be sure to adjust everything here so it matches your name and repo -->
|
|
87
|
+
If you use this project in your research, please cite as follows:
|
|
88
|
+
|
|
89
|
+
```bibtex
|
|
90
|
+
@misc{project_name,
|
|
91
|
+
author = {Stephan Akkerman},
|
|
92
|
+
title = {X-Timeline Scraper},
|
|
93
|
+
year = {2025},
|
|
94
|
+
publisher = {GitHub},
|
|
95
|
+
journal = {GitHub repository},
|
|
96
|
+
howpublished = {\url{https://github.com/StephanAkkerman/x-timeline-scraper}}
|
|
97
|
+
}
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
## Contributing 🛠
|
|
101
|
+
<!-- Be sure to adjust the repo name here for both the URL and GitHub link -->
|
|
102
|
+
Contributions are welcome! If you have a feature request, bug report, or proposal for code refactoring, please feel free to open an issue on GitHub. We appreciate your help in improving this project.\
|
|
103
|
+

|
|
104
|
+
|
|
105
|
+
## License 📜
|
|
106
|
+
|
|
107
|
+
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
|
|
@@ -0,0 +1,93 @@
|
|
|
1
|
+
# X-Timeline Scraper
|
|
2
|
+
A Python client to scrape tweets from X (formerly Twitter) timelines using a cURL command.
|
|
3
|
+
|
|
4
|
+
<!-- Add a banner here like: https://github.com/StephanAkkerman/fintwit-bot/blob/main/img/logo/fintwit-banner.png -->
|
|
5
|
+
|
|
6
|
+
---
|
|
7
|
+
<!-- Adjust the link of the second badge to your own repo -->
|
|
8
|
+
<p align="center">
|
|
9
|
+
<img src="https://img.shields.io/badge/python-3.13-blue.svg" alt="Supported versions">
|
|
10
|
+
<img src="https://img.shields.io/github/license/StephanAkkerman/x-timeline-scraper.svg?color=brightgreen" alt="License">
|
|
11
|
+
<a href="https://github.com/psf/black"><img src="https://img.shields.io/badge/code%20style-black-000000.svg" alt="Code style: black"></a>
|
|
12
|
+
</p>
|
|
13
|
+
|
|
14
|
+
## Introduction
|
|
15
|
+
|
|
16
|
+
This project provides a Python client to scrape tweets from X (formerly Twitter) timelines using a cURL command. It leverages asynchronous programming for efficient data retrieval and includes features for parsing tweet data.
|
|
17
|
+
|
|
18
|
+
## Table of Contents 🗂
|
|
19
|
+
|
|
20
|
+
- [Installation](#installation)
|
|
21
|
+
- [Usage](#usage)
|
|
22
|
+
- [Citation](#citation)
|
|
23
|
+
- [Contributing](#contributing)
|
|
24
|
+
- [License](#license)
|
|
25
|
+
|
|
26
|
+
## Installation ⚙️
|
|
27
|
+
<!-- Adjust the link of the second command to your own repo -->
|
|
28
|
+
|
|
29
|
+
The required packages to run this code can be found in the requirements.txt file. To run this file, execute the following code block after cloning the repository:
|
|
30
|
+
|
|
31
|
+
```bash
|
|
32
|
+
pip install .
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
or
|
|
36
|
+
|
|
37
|
+
```bash
|
|
38
|
+
pip install git+https://github.com/StephanAkkerman/x-timeline-scraper.git
|
|
39
|
+
```
|
|
40
|
+
|
|
41
|
+
## Usage ⌨️
|
|
42
|
+
|
|
43
|
+
To use the X-Timeline Scraper, you need to provide a cURL command that accesses the desired X timeline. The instructions can be found in [curl_example.txt](curl_example.txt). Then, you can use the `XTimelineClient` class to fetch and parse tweets.
|
|
44
|
+
|
|
45
|
+
Here's a simple example of how to use the client:
|
|
46
|
+
|
|
47
|
+
```python
|
|
48
|
+
import asyncio
|
|
49
|
+
from src.xclient import XTimelineClient
|
|
50
|
+
|
|
51
|
+
async with XTimelineClient(
|
|
52
|
+
"curl.txt", persist_last_id_path="state/last_id.txt"
|
|
53
|
+
) as xc:
|
|
54
|
+
tweets = await xc.fetch_tweets(update_last_id=False)
|
|
55
|
+
for t in tweets:
|
|
56
|
+
print(t.to_markdown())
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
You can also stream new tweets in real-time:
|
|
60
|
+
|
|
61
|
+
```python
|
|
62
|
+
import asyncio
|
|
63
|
+
from src.xclient import XTimelineClient
|
|
64
|
+
async with XTimelineClient(
|
|
65
|
+
"curl.txt", persist_last_id_path="state/last_id.txt"
|
|
66
|
+
) as xc:
|
|
67
|
+
async for t in xc.stream(interval_s=5.0):
|
|
68
|
+
print(t.to_markdown())
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
## Citation ✍️
|
|
72
|
+
<!-- Be sure to adjust everything here so it matches your name and repo -->
|
|
73
|
+
If you use this project in your research, please cite as follows:
|
|
74
|
+
|
|
75
|
+
```bibtex
|
|
76
|
+
@misc{project_name,
|
|
77
|
+
author = {Stephan Akkerman},
|
|
78
|
+
title = {X-Timeline Scraper},
|
|
79
|
+
year = {2025},
|
|
80
|
+
publisher = {GitHub},
|
|
81
|
+
journal = {GitHub repository},
|
|
82
|
+
howpublished = {\url{https://github.com/StephanAkkerman/x-timeline-scraper}}
|
|
83
|
+
}
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
## Contributing 🛠
|
|
87
|
+
<!-- Be sure to adjust the repo name here for both the URL and GitHub link -->
|
|
88
|
+
Contributions are welcome! If you have a feature request, bug report, or proposal for code refactoring, please feel free to open an issue on GitHub. We appreciate your help in improving this project.\
|
|
89
|
+

|
|
90
|
+
|
|
91
|
+
## License 📜
|
|
92
|
+
|
|
93
|
+
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
|
|
@@ -0,0 +1,28 @@
|
|
|
1
|
+
[project]
|
|
2
|
+
name = "xtimeline"
|
|
3
|
+
version = "0.1.0"
|
|
4
|
+
description = "Lightweight X/Twitter timeline client (GraphQL via cURL or auth strategies)"
|
|
5
|
+
readme = "README.md"
|
|
6
|
+
requires-python = ">=3.10"
|
|
7
|
+
license = { text = "MIT" }
|
|
8
|
+
authors = [{ name = "Stephan Akkerman" }]
|
|
9
|
+
keywords = ["twitter", "x", "timeline", "scraping", "aiohttp", "asyncio"]
|
|
10
|
+
dependencies = [
|
|
11
|
+
"aiohttp>=3.12.15",
|
|
12
|
+
"uncurl>=0.0.11",
|
|
13
|
+
]
|
|
14
|
+
|
|
15
|
+
[tool.isort]
|
|
16
|
+
multi_line_output = 3
|
|
17
|
+
include_trailing_comma = true
|
|
18
|
+
force_grid_wrap = 0
|
|
19
|
+
line_length = 88
|
|
20
|
+
profile = "black"
|
|
21
|
+
|
|
22
|
+
[tool.ruff]
|
|
23
|
+
line-length = 88
|
|
24
|
+
#select = ["I001"]
|
|
25
|
+
|
|
26
|
+
[tool.ruff.lint.pydocstyle]
|
|
27
|
+
# Use Google-style docstrings.
|
|
28
|
+
convention = "numpy"
|
|
@@ -0,0 +1,71 @@
|
|
|
1
|
+
import logging
|
|
2
|
+
from dataclasses import asdict, dataclass
|
|
3
|
+
|
|
4
|
+
logger = logging.getLogger(__file__)
|
|
5
|
+
|
|
6
|
+
|
|
7
|
+
@dataclass(frozen=True, slots=True)
|
|
8
|
+
class MediaItem:
|
|
9
|
+
"""Single media attachment."""
|
|
10
|
+
|
|
11
|
+
url: str
|
|
12
|
+
type: str # e.g. "photo", "video"
|
|
13
|
+
|
|
14
|
+
|
|
15
|
+
@dataclass(slots=True)
|
|
16
|
+
class Tweet:
|
|
17
|
+
"""
|
|
18
|
+
Normalized tweet.
|
|
19
|
+
|
|
20
|
+
Attributes
|
|
21
|
+
----------
|
|
22
|
+
id : int
|
|
23
|
+
Tweet rest_id as int.
|
|
24
|
+
text : str
|
|
25
|
+
Full text, HTML entities unescaped and trailing t.co removed.
|
|
26
|
+
user_name : str
|
|
27
|
+
Display name.
|
|
28
|
+
user_screen_name : str
|
|
29
|
+
@handle (without @).
|
|
30
|
+
user_img : str
|
|
31
|
+
Profile image URL.
|
|
32
|
+
url : str
|
|
33
|
+
Canonical tweet URL.
|
|
34
|
+
media : list[MediaItem]
|
|
35
|
+
Unique media attachments.
|
|
36
|
+
tickers : list[str]
|
|
37
|
+
Uppercased $SYMBOLS.
|
|
38
|
+
hashtags : list[str]
|
|
39
|
+
Uppercased hashtags (CRYPTO excluded).
|
|
40
|
+
title : str
|
|
41
|
+
A short human-readable title (“X retweeted Y”, etc.).
|
|
42
|
+
media_types : list[str]
|
|
43
|
+
Mirrors the attachment types for convenience.
|
|
44
|
+
"""
|
|
45
|
+
|
|
46
|
+
id: int
|
|
47
|
+
text: str
|
|
48
|
+
user_name: str
|
|
49
|
+
user_screen_name: str
|
|
50
|
+
user_img: str
|
|
51
|
+
url: str
|
|
52
|
+
media: list[MediaItem]
|
|
53
|
+
tickers: list[str]
|
|
54
|
+
hashtags: list[str]
|
|
55
|
+
title: str
|
|
56
|
+
media_types: list[str]
|
|
57
|
+
|
|
58
|
+
def to_dict(self) -> dict:
|
|
59
|
+
"""Serialize to a plain dict safe for JSON."""
|
|
60
|
+
d = asdict(self)
|
|
61
|
+
d["media"] = [asdict(m) for m in self.media]
|
|
62
|
+
return d
|
|
63
|
+
|
|
64
|
+
def to_markdown(self) -> str:
|
|
65
|
+
"""Compact markdown rendering."""
|
|
66
|
+
md = f"**{self.user_name}** ([@{self.user_screen_name}])\n\n{self.text}\n\n{self.url}"
|
|
67
|
+
if self.tickers:
|
|
68
|
+
md += f"\n\n**Tickers:** {', '.join(self.tickers)}"
|
|
69
|
+
if self.hashtags:
|
|
70
|
+
md += f"\n**Hashtags:** {', '.join(self.hashtags)}"
|
|
71
|
+
return md
|
|
@@ -0,0 +1,522 @@
|
|
|
1
|
+
import asyncio
|
|
2
|
+
import datetime as dt
|
|
3
|
+
import json
|
|
4
|
+
import logging
|
|
5
|
+
import re
|
|
6
|
+
from collections.abc import AsyncIterator
|
|
7
|
+
from pathlib import Path
|
|
8
|
+
from typing import Any, Iterable
|
|
9
|
+
|
|
10
|
+
import aiohttp
|
|
11
|
+
import uncurl
|
|
12
|
+
|
|
13
|
+
from tweet import MediaItem, Tweet
|
|
14
|
+
|
|
15
|
+
logger = logging.getLogger(__file__)
|
|
16
|
+
|
|
17
|
+
|
|
18
|
+
def get_in(obj: Any, path: list[str] | tuple[str, ...], default: Any = None) -> Any:
|
|
19
|
+
"""Safe nested get: get_in(d, ['a','b','c'])."""
|
|
20
|
+
cur = obj
|
|
21
|
+
for key in path:
|
|
22
|
+
if not isinstance(cur, dict) or key not in cur:
|
|
23
|
+
return default
|
|
24
|
+
cur = cur[key]
|
|
25
|
+
return cur
|
|
26
|
+
|
|
27
|
+
|
|
28
|
+
def is_promoted_entry(entry: dict) -> bool:
|
|
29
|
+
"""Detect ads/promoted units."""
|
|
30
|
+
eid = entry.get("entryId", "")
|
|
31
|
+
if eid.startswith(("promoted-", "advertiser-")):
|
|
32
|
+
return True
|
|
33
|
+
content = entry.get("content", {})
|
|
34
|
+
item = content.get("itemContent", {})
|
|
35
|
+
return "promotedMetadata" in item or "promotedMetadata" in content
|
|
36
|
+
|
|
37
|
+
|
|
38
|
+
def is_tweet_item(entry: dict) -> bool:
|
|
39
|
+
"""Detect timeline items that actually contain tweets."""
|
|
40
|
+
content = entry.get("content", {})
|
|
41
|
+
if content.get("entryType") != "TimelineTimelineItem":
|
|
42
|
+
return False
|
|
43
|
+
item = content.get("itemContent", {})
|
|
44
|
+
return item.get("itemType") == "TimelineTweet" and "tweet_results" in item
|
|
45
|
+
|
|
46
|
+
|
|
47
|
+
def normalize_tweet_result(tweet_results: dict) -> dict | None:
|
|
48
|
+
"""
|
|
49
|
+
Normalize GraphQL result:
|
|
50
|
+
- 'Tweet' -> as-is
|
|
51
|
+
- 'TweetWithVisibilityResults' -> unwrap '.tweet'
|
|
52
|
+
"""
|
|
53
|
+
result = tweet_results.get("result")
|
|
54
|
+
if not isinstance(result, dict):
|
|
55
|
+
return None
|
|
56
|
+
tname = result.get("__typename")
|
|
57
|
+
if tname == "Tweet":
|
|
58
|
+
return result
|
|
59
|
+
if tname == "TweetWithVisibilityResults":
|
|
60
|
+
inner = result.get("tweet")
|
|
61
|
+
return inner if isinstance(inner, dict) else None
|
|
62
|
+
return None
|
|
63
|
+
|
|
64
|
+
|
|
65
|
+
def unescape_entities(text: str) -> str:
|
|
66
|
+
"""Unescape a minimal set of HTML entities that appear in tweet text."""
|
|
67
|
+
return text.replace("&", "&").replace(">", ">").replace("<", "<")
|
|
68
|
+
|
|
69
|
+
|
|
70
|
+
_RE_TRAILING_TCO = re.compile(r"(https?://t\.co/\S+)$")
|
|
71
|
+
|
|
72
|
+
|
|
73
|
+
def strip_trailing_tco(text: str) -> str:
|
|
74
|
+
return _RE_TRAILING_TCO.sub("", text)
|
|
75
|
+
|
|
76
|
+
|
|
77
|
+
class XTimelineClient:
|
|
78
|
+
"""
|
|
79
|
+
Minimal client for polling an X/Twitter timeline endpoint described by a cURL.
|
|
80
|
+
|
|
81
|
+
Parameters
|
|
82
|
+
----------
|
|
83
|
+
curl_path : str
|
|
84
|
+
Path to a text file containing a single cURL command.
|
|
85
|
+
timeout_s : float
|
|
86
|
+
Per-request timeout in seconds.
|
|
87
|
+
persist_last_id_path : str | None
|
|
88
|
+
Optional path to persist last seen tweet id between runs.
|
|
89
|
+
"""
|
|
90
|
+
|
|
91
|
+
def __init__(
|
|
92
|
+
self,
|
|
93
|
+
curl_path: str = "curl.txt",
|
|
94
|
+
timeout_s: float = 30.0,
|
|
95
|
+
persist_last_id_path: str | None = None,
|
|
96
|
+
) -> None:
|
|
97
|
+
self.curl_path = Path(curl_path)
|
|
98
|
+
self.timeout_s = timeout_s
|
|
99
|
+
self._session: aiohttp.ClientSession | None = None
|
|
100
|
+
self._req: dict[str, Any] = {}
|
|
101
|
+
self._last_tweet_id: int = 0
|
|
102
|
+
self.persist_last_id_path = (
|
|
103
|
+
Path(persist_last_id_path) if persist_last_id_path else None
|
|
104
|
+
)
|
|
105
|
+
self._load_curl()
|
|
106
|
+
self._load_last_id()
|
|
107
|
+
|
|
108
|
+
# ---------- lifecycle ----------
|
|
109
|
+
|
|
110
|
+
def _load_curl(self) -> None:
|
|
111
|
+
"""Parse the cURL file into url/headers/cookies/json payload."""
|
|
112
|
+
try:
|
|
113
|
+
raw = self.curl_path.read_text(encoding="utf-8")
|
|
114
|
+
ctx = uncurl.parse_context(
|
|
115
|
+
"".join(line.strip() for line in raw.splitlines())
|
|
116
|
+
)
|
|
117
|
+
self._req = {
|
|
118
|
+
"url": ctx.url,
|
|
119
|
+
"headers": dict(ctx.headers) if ctx.headers else {},
|
|
120
|
+
"cookies": dict(ctx.cookies) if ctx.cookies else {},
|
|
121
|
+
"json": json.loads(ctx.data) if ctx.data else None,
|
|
122
|
+
"method": ctx.method.upper(),
|
|
123
|
+
}
|
|
124
|
+
except Exception as e:
|
|
125
|
+
logger.critical("Error reading %s: %s", self.curl_path, e)
|
|
126
|
+
self._req = {}
|
|
127
|
+
|
|
128
|
+
def _load_last_id(self) -> None:
|
|
129
|
+
"""Load last tweet id from disk (if configured)."""
|
|
130
|
+
if not self.persist_last_id_path:
|
|
131
|
+
return
|
|
132
|
+
try:
|
|
133
|
+
self._last_tweet_id = int(
|
|
134
|
+
self.persist_last_id_path.read_text().strip() or "0"
|
|
135
|
+
)
|
|
136
|
+
except FileNotFoundError:
|
|
137
|
+
self._last_tweet_id = 0
|
|
138
|
+
except Exception as e:
|
|
139
|
+
logger.warning("Could not read last id file: %s", e)
|
|
140
|
+
|
|
141
|
+
def _store_last_id(self) -> None:
|
|
142
|
+
"""Persist last tweet id to disk (if configured)."""
|
|
143
|
+
if not self.persist_last_id_path:
|
|
144
|
+
return
|
|
145
|
+
try:
|
|
146
|
+
self.persist_last_id_path.parent.mkdir(parents=True, exist_ok=True)
|
|
147
|
+
self.persist_last_id_path.write_text(
|
|
148
|
+
str(self._last_tweet_id), encoding="utf-8"
|
|
149
|
+
)
|
|
150
|
+
except Exception as e:
|
|
151
|
+
logger.warning("Could not write last id file: %s", e)
|
|
152
|
+
|
|
153
|
+
async def __aenter__(self) -> "XTimelineClient":
|
|
154
|
+
await self._ensure_session()
|
|
155
|
+
return self
|
|
156
|
+
|
|
157
|
+
async def __aexit__(self, exc_type, exc, tb) -> None:
|
|
158
|
+
await self.aclose()
|
|
159
|
+
|
|
160
|
+
async def aclose(self) -> None:
|
|
161
|
+
if self._session and not self._session.closed:
|
|
162
|
+
await self._session.close()
|
|
163
|
+
self._session = None
|
|
164
|
+
|
|
165
|
+
async def _ensure_session(self) -> None:
|
|
166
|
+
if self._session is None or self._session.closed:
|
|
167
|
+
timeout = aiohttp.ClientTimeout(total=self.timeout_s)
|
|
168
|
+
self._session = aiohttp.ClientSession(
|
|
169
|
+
headers=self._req.get("headers"),
|
|
170
|
+
cookies=self._req.get("cookies"),
|
|
171
|
+
timeout=timeout,
|
|
172
|
+
)
|
|
173
|
+
|
|
174
|
+
# ---------- HTTP ----------
|
|
175
|
+
|
|
176
|
+
async def fetch_raw(self, *, text: bool = False) -> dict | str:
|
|
177
|
+
"""
|
|
178
|
+
Perform a single GET call using the cURL details.
|
|
179
|
+
|
|
180
|
+
Parameters
|
|
181
|
+
----------
|
|
182
|
+
text : bool
|
|
183
|
+
If True, return raw text; else parse JSON.
|
|
184
|
+
|
|
185
|
+
Returns
|
|
186
|
+
-------
|
|
187
|
+
dict | str
|
|
188
|
+
Parsed JSON (dict) or raw text.
|
|
189
|
+
"""
|
|
190
|
+
if not self._req:
|
|
191
|
+
logger.critical("No cURL loaded. Aborting fetch.")
|
|
192
|
+
return "" if text else {}
|
|
193
|
+
|
|
194
|
+
await self._ensure_session()
|
|
195
|
+
assert self._session is not None
|
|
196
|
+
url = self._req["url"]
|
|
197
|
+
json_payload = self._req.get("json")
|
|
198
|
+
|
|
199
|
+
try:
|
|
200
|
+
async with self._session.get(url, json=json_payload) as resp:
|
|
201
|
+
if resp.status >= 400:
|
|
202
|
+
body = await resp.text()
|
|
203
|
+
logger.error(
|
|
204
|
+
"HTTP %s for %s\nResponse: %s", resp.status, url, body[:2000]
|
|
205
|
+
)
|
|
206
|
+
return "" if text else {}
|
|
207
|
+
|
|
208
|
+
if text:
|
|
209
|
+
return await resp.text()
|
|
210
|
+
|
|
211
|
+
try:
|
|
212
|
+
return await resp.json()
|
|
213
|
+
except aiohttp.ContentTypeError:
|
|
214
|
+
raw = await resp.text()
|
|
215
|
+
logger.error("Non-JSON response from %s\nBody: %s", url, raw[:2000])
|
|
216
|
+
return {}
|
|
217
|
+
except json.JSONDecodeError as e:
|
|
218
|
+
raw = await resp.text()
|
|
219
|
+
logger.error("JSON decode error: %s\nBody: %s", e, raw[:2000])
|
|
220
|
+
return {}
|
|
221
|
+
except aiohttp.ClientError as e:
|
|
222
|
+
logger.error("Network error for %s: %s", url, e)
|
|
223
|
+
except asyncio.TimeoutError:
|
|
224
|
+
logger.error("Timeout after %ss for %s", self.timeout_s, url)
|
|
225
|
+
return "" if text else {}
|
|
226
|
+
|
|
227
|
+
# ---------- extraction ----------
|
|
228
|
+
|
|
229
|
+
@staticmethod
|
|
230
|
+
def _get_entries(payload: dict) -> list[dict]:
|
|
231
|
+
"""
|
|
232
|
+
Extract raw timeline entries (not yet filtered/normalized).
|
|
233
|
+
|
|
234
|
+
Returns
|
|
235
|
+
-------
|
|
236
|
+
list[dict]
|
|
237
|
+
Entries inside TimelineAddEntries.
|
|
238
|
+
"""
|
|
239
|
+
instructions = get_in(
|
|
240
|
+
payload, ["data", "home", "home_timeline_urt", "instructions"], []
|
|
241
|
+
)
|
|
242
|
+
if not isinstance(instructions, list):
|
|
243
|
+
return []
|
|
244
|
+
for inst in instructions:
|
|
245
|
+
if inst.get("type") == "TimelineAddEntries":
|
|
246
|
+
entries = inst.get("entries", [])
|
|
247
|
+
return entries if isinstance(entries, list) else []
|
|
248
|
+
return []
|
|
249
|
+
|
|
250
|
+
def _iter_entry_tweets(self, entries: list[dict]) -> Iterable[dict]:
|
|
251
|
+
"""
|
|
252
|
+
Yield normalized tweet dicts from raw entries, skipping promoted & non-tweet items.
|
|
253
|
+
"""
|
|
254
|
+
for entry in entries:
|
|
255
|
+
if not is_tweet_item(entry):
|
|
256
|
+
continue
|
|
257
|
+
if is_promoted_entry(entry):
|
|
258
|
+
continue
|
|
259
|
+
item = entry["content"]["itemContent"]
|
|
260
|
+
twr = item.get("tweet_results", {})
|
|
261
|
+
tw = normalize_tweet_result(twr)
|
|
262
|
+
if not tw:
|
|
263
|
+
continue
|
|
264
|
+
yield tw
|
|
265
|
+
|
|
266
|
+
# ---------- parsing ----------
|
|
267
|
+
|
|
268
|
+
def _save_errored_tweet(self, tweet: dict, error_msg: str) -> None:
|
|
269
|
+
logger.error(error_msg)
|
|
270
|
+
ts = dt.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
|
|
271
|
+
Path("logs").mkdir(parents=True, exist_ok=True)
|
|
272
|
+
(Path("logs") / f"error_tweet_{ts}.json").write_text(
|
|
273
|
+
json.dumps(tweet, ensure_ascii=False, indent=4), encoding="utf-8"
|
|
274
|
+
)
|
|
275
|
+
|
|
276
|
+
def _user_field(self, tweet: dict, key: str) -> str:
|
|
277
|
+
return (
|
|
278
|
+
tweet.get("core", {})
|
|
279
|
+
.get("user_results", {})
|
|
280
|
+
.get("result", {})
|
|
281
|
+
.get("legacy", {})
|
|
282
|
+
.get(key, "")
|
|
283
|
+
)
|
|
284
|
+
|
|
285
|
+
def _entities(self, tweet: dict, key: str) -> list[str]:
|
|
286
|
+
legacy = tweet.get("legacy", {})
|
|
287
|
+
entities = legacy.get("entities", {}).get(key)
|
|
288
|
+
if not entities:
|
|
289
|
+
return []
|
|
290
|
+
return [
|
|
291
|
+
e.get("text", "") for e in entities if isinstance(e, dict) and "text" in e
|
|
292
|
+
]
|
|
293
|
+
|
|
294
|
+
def _collect_media(self, tweet: dict) -> tuple[list[MediaItem], list[str]]:
|
|
295
|
+
legacy = tweet.get("legacy", {})
|
|
296
|
+
ext = legacy.get("extended_entities", {})
|
|
297
|
+
media_items: list[MediaItem] = []
|
|
298
|
+
media_types: list[str] = []
|
|
299
|
+
if "media" in ext:
|
|
300
|
+
for m in ext["media"]:
|
|
301
|
+
if not isinstance(m, dict):
|
|
302
|
+
continue
|
|
303
|
+
url = m.get("media_url_https") or m.get("media_url") or ""
|
|
304
|
+
mtype = m.get("type", "")
|
|
305
|
+
if url:
|
|
306
|
+
media_items.append(MediaItem(url=url, type=mtype))
|
|
307
|
+
media_types.append(mtype)
|
|
308
|
+
# dedupe by URL
|
|
309
|
+
seen = set()
|
|
310
|
+
uniq_items: list[MediaItem] = []
|
|
311
|
+
uniq_types: list[str] = []
|
|
312
|
+
for mi, mt in zip(media_items, media_types):
|
|
313
|
+
if mi.url in seen:
|
|
314
|
+
continue
|
|
315
|
+
seen.add(mi.url)
|
|
316
|
+
uniq_items.append(mi)
|
|
317
|
+
uniq_types.append(mt)
|
|
318
|
+
return uniq_items, uniq_types
|
|
319
|
+
|
|
320
|
+
def _tweet_url(self, tweet_id: int) -> str:
|
|
321
|
+
return f"https://twitter.com/user/status/{tweet_id}"
|
|
322
|
+
|
|
323
|
+
def _parse_single_tweet(
|
|
324
|
+
self, tw: dict, *, allow_update_last_id: bool
|
|
325
|
+
) -> Tweet | None:
|
|
326
|
+
"""
|
|
327
|
+
Parse a normalized Tweet GraphQL node into a Tweet object.
|
|
328
|
+
|
|
329
|
+
Parameters
|
|
330
|
+
----------
|
|
331
|
+
tw : dict
|
|
332
|
+
GraphQL 'Tweet' node (already normalized).
|
|
333
|
+
allow_update_last_id : bool
|
|
334
|
+
If True, update last seen id and skip older/equal.
|
|
335
|
+
|
|
336
|
+
Returns
|
|
337
|
+
-------
|
|
338
|
+
Tweet | None
|
|
339
|
+
Parsed Tweet or None if filtered/invalid.
|
|
340
|
+
"""
|
|
341
|
+
# Prefer legacy.id_str; fallback to rest_id
|
|
342
|
+
try:
|
|
343
|
+
tid = int(tw.get("legacy", {}).get("id_str") or tw.get("rest_id") or 0)
|
|
344
|
+
except Exception:
|
|
345
|
+
tid = 0
|
|
346
|
+
if tid <= 0:
|
|
347
|
+
self._save_errored_tweet(tw, "Missing tweet id")
|
|
348
|
+
return None
|
|
349
|
+
|
|
350
|
+
# last-id gate
|
|
351
|
+
if allow_update_last_id:
|
|
352
|
+
if tid <= self._last_tweet_id:
|
|
353
|
+
return None
|
|
354
|
+
self._last_tweet_id = tid
|
|
355
|
+
self._store_last_id()
|
|
356
|
+
|
|
357
|
+
# text & entities
|
|
358
|
+
legacy = tw.get("legacy", {})
|
|
359
|
+
text = legacy.get("full_text", "")
|
|
360
|
+
text = unescape_entities(strip_trailing_tco(text))
|
|
361
|
+
|
|
362
|
+
tickers = [t.upper() for t in self._entities(tw, "symbols") if t]
|
|
363
|
+
hashtags = [
|
|
364
|
+
h.upper()
|
|
365
|
+
for h in self._entities(tw, "hashtags")
|
|
366
|
+
if h and h.upper() != "CRYPTO"
|
|
367
|
+
]
|
|
368
|
+
|
|
369
|
+
# user info
|
|
370
|
+
user_name = self._user_field(tw, "name")
|
|
371
|
+
user_screen = self._user_field(tw, "screen_name")
|
|
372
|
+
user_img = self._user_field(tw, "profile_image_url_https")
|
|
373
|
+
url = self._tweet_url(tid)
|
|
374
|
+
|
|
375
|
+
# media
|
|
376
|
+
media_items, media_types = self._collect_media(tw)
|
|
377
|
+
|
|
378
|
+
# reply/quote/retweet handling (best-effort)
|
|
379
|
+
title = f"{user_name} tweeted"
|
|
380
|
+
quoted = tw.get("quoted_status_result") or None
|
|
381
|
+
retweeted = tw.get("legacy", {}).get("retweeted_status_result") or None
|
|
382
|
+
|
|
383
|
+
# For replies, X often embeds it differently; handle when present
|
|
384
|
+
# We try to reuse this parser recursively on embedded nodes.
|
|
385
|
+
def _parse_nested(n: dict) -> Tweet | None:
|
|
386
|
+
# n may already be the normalized 'Tweet' or wrapped
|
|
387
|
+
if "result" in n:
|
|
388
|
+
inner = (
|
|
389
|
+
normalize_tweet_result(n)
|
|
390
|
+
if "tweet" in n.get("result", {})
|
|
391
|
+
else n.get("result")
|
|
392
|
+
)
|
|
393
|
+
else:
|
|
394
|
+
inner = normalize_tweet_result({"result": n}) or n
|
|
395
|
+
if not isinstance(inner, dict):
|
|
396
|
+
return None
|
|
397
|
+
# don't update last_id for nested
|
|
398
|
+
return self._parse_single_tweet(inner, allow_update_last_id=False)
|
|
399
|
+
|
|
400
|
+
nested: Tweet | None = None
|
|
401
|
+
if quoted:
|
|
402
|
+
nested = _parse_nested(quoted)
|
|
403
|
+
if nested:
|
|
404
|
+
title = f"{user_name} quote tweeted {nested.user_name}"
|
|
405
|
+
q_text = "\n".join("> " + line for line in nested.text.splitlines())
|
|
406
|
+
text = f"{text}\n\n> [@{nested.user_screen_name}](https://twitter.com/{nested.user_screen_name}):\n{q_text}"
|
|
407
|
+
# Merge entities/media
|
|
408
|
+
media_items += nested.media
|
|
409
|
+
media_types += nested.media_types
|
|
410
|
+
tickers = sorted(set(tickers) | set(nested.tickers))
|
|
411
|
+
hashtags = sorted(set(hashtags) | set(nested.hashtags))
|
|
412
|
+
|
|
413
|
+
if retweeted:
|
|
414
|
+
nested = _parse_nested(retweeted)
|
|
415
|
+
if nested:
|
|
416
|
+
title = f"{user_name} retweeted {nested.user_name}"
|
|
417
|
+
# Use the full RT text
|
|
418
|
+
text = nested.text
|
|
419
|
+
media_items += nested.media
|
|
420
|
+
media_types += nested.media_types
|
|
421
|
+
tickers = sorted(set(tickers) | set(nested.tickers))
|
|
422
|
+
hashtags = sorted(set(hashtags) | set(nested.hashtags))
|
|
423
|
+
|
|
424
|
+
# replies can show up as composite timeline items; handled in entry stage usually
|
|
425
|
+
# If you later surface reply threads, add that here.
|
|
426
|
+
|
|
427
|
+
# dedupe media again after merges
|
|
428
|
+
uniq_media: list[MediaItem] = []
|
|
429
|
+
seen_urls = set()
|
|
430
|
+
for m in media_items:
|
|
431
|
+
if m.url and m.url not in seen_urls:
|
|
432
|
+
uniq_media.append(m)
|
|
433
|
+
seen_urls.add(m.url)
|
|
434
|
+
|
|
435
|
+
return Tweet(
|
|
436
|
+
id=tid,
|
|
437
|
+
text=text,
|
|
438
|
+
user_name=user_name,
|
|
439
|
+
user_screen_name=user_screen,
|
|
440
|
+
user_img=user_img,
|
|
441
|
+
url=url,
|
|
442
|
+
media=uniq_media,
|
|
443
|
+
tickers=sorted(set(tickers)),
|
|
444
|
+
hashtags=sorted(set(hashtags)),
|
|
445
|
+
title=title,
|
|
446
|
+
media_types=[m.type for m in uniq_media],
|
|
447
|
+
)
|
|
448
|
+
|
|
449
|
+
# ---------- public APIs ----------
|
|
450
|
+
|
|
451
|
+
async def fetch_tweets(self, *, update_last_id: bool = False) -> list[Tweet]:
|
|
452
|
+
"""
|
|
453
|
+
Fetch entries and parse into `Tweet` objects.
|
|
454
|
+
|
|
455
|
+
Parameters
|
|
456
|
+
----------
|
|
457
|
+
update_last_id : bool
|
|
458
|
+
If True, update the client's last-seen tweet id (skip older/equal).
|
|
459
|
+
|
|
460
|
+
Returns
|
|
461
|
+
-------
|
|
462
|
+
list[Tweet]
|
|
463
|
+
Parsed tweets (ads removed, deduped).
|
|
464
|
+
"""
|
|
465
|
+
payload = await self.fetch_raw(text=False)
|
|
466
|
+
if not isinstance(payload, dict) or not payload:
|
|
467
|
+
return []
|
|
468
|
+
|
|
469
|
+
out: list[Tweet] = []
|
|
470
|
+
seen: set[int] = set()
|
|
471
|
+
for tw in self._iter_entry_tweets(self._get_entries(payload)):
|
|
472
|
+
parsed = self._parse_single_tweet(tw, allow_update_last_id=update_last_id)
|
|
473
|
+
if not parsed:
|
|
474
|
+
continue
|
|
475
|
+
if parsed.id in seen:
|
|
476
|
+
continue
|
|
477
|
+
seen.add(parsed.id)
|
|
478
|
+
out.append(parsed)
|
|
479
|
+
return out
|
|
480
|
+
|
|
481
|
+
async def stream(self, interval_s: float = 5.0) -> AsyncIterator[Tweet]:
|
|
482
|
+
"""
|
|
483
|
+
Async generator that yields new tweets forever.
|
|
484
|
+
|
|
485
|
+
Parameters
|
|
486
|
+
----------
|
|
487
|
+
interval_s : float
|
|
488
|
+
Polling interval in seconds.
|
|
489
|
+
|
|
490
|
+
Yields
|
|
491
|
+
------
|
|
492
|
+
Tweet
|
|
493
|
+
Each new parsed Tweet.
|
|
494
|
+
"""
|
|
495
|
+
while True:
|
|
496
|
+
try:
|
|
497
|
+
for tw in await self.fetch_tweets(update_last_id=True):
|
|
498
|
+
yield tw
|
|
499
|
+
except Exception as e:
|
|
500
|
+
logger.error("stream() iteration error: %s", e)
|
|
501
|
+
await asyncio.sleep(interval_s)
|
|
502
|
+
|
|
503
|
+
|
|
504
|
+
async def _example_once():
|
|
505
|
+
async with XTimelineClient(
|
|
506
|
+
"curl.txt", persist_last_id_path="state/last_id.txt"
|
|
507
|
+
) as xc:
|
|
508
|
+
tweets = await xc.fetch_tweets(update_last_id=False)
|
|
509
|
+
for t in tweets:
|
|
510
|
+
print(t.to_markdown())
|
|
511
|
+
|
|
512
|
+
|
|
513
|
+
async def _example_stream():
|
|
514
|
+
async with XTimelineClient(
|
|
515
|
+
"curl.txt", persist_last_id_path="state/last_id.txt"
|
|
516
|
+
) as xc:
|
|
517
|
+
async for tweet in xc.stream(interval_s=5.0):
|
|
518
|
+
print(tweet.id, tweet.text)
|
|
519
|
+
|
|
520
|
+
|
|
521
|
+
# if __name__ == "__main__":
|
|
522
|
+
# asyncio.run(_example_stream())
|
|
@@ -0,0 +1,107 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: xtimeline
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Lightweight X/Twitter timeline client (GraphQL via cURL or auth strategies)
|
|
5
|
+
Author: Stephan Akkerman
|
|
6
|
+
License: MIT
|
|
7
|
+
Keywords: twitter,x,timeline,scraping,aiohttp,asyncio
|
|
8
|
+
Requires-Python: >=3.10
|
|
9
|
+
Description-Content-Type: text/markdown
|
|
10
|
+
License-File: LICENSE
|
|
11
|
+
Requires-Dist: aiohttp>=3.12.15
|
|
12
|
+
Requires-Dist: uncurl>=0.0.11
|
|
13
|
+
Dynamic: license-file
|
|
14
|
+
|
|
15
|
+
# X-Timeline Scraper
|
|
16
|
+
A Python client to scrape tweets from X (formerly Twitter) timelines using a cURL command.
|
|
17
|
+
|
|
18
|
+
<!-- Add a banner here like: https://github.com/StephanAkkerman/fintwit-bot/blob/main/img/logo/fintwit-banner.png -->
|
|
19
|
+
|
|
20
|
+
---
|
|
21
|
+
<!-- Adjust the link of the second badge to your own repo -->
|
|
22
|
+
<p align="center">
|
|
23
|
+
<img src="https://img.shields.io/badge/python-3.13-blue.svg" alt="Supported versions">
|
|
24
|
+
<img src="https://img.shields.io/github/license/StephanAkkerman/x-timeline-scraper.svg?color=brightgreen" alt="License">
|
|
25
|
+
<a href="https://github.com/psf/black"><img src="https://img.shields.io/badge/code%20style-black-000000.svg" alt="Code style: black"></a>
|
|
26
|
+
</p>
|
|
27
|
+
|
|
28
|
+
## Introduction
|
|
29
|
+
|
|
30
|
+
This project provides a Python client to scrape tweets from X (formerly Twitter) timelines using a cURL command. It leverages asynchronous programming for efficient data retrieval and includes features for parsing tweet data.
|
|
31
|
+
|
|
32
|
+
## Table of Contents 🗂
|
|
33
|
+
|
|
34
|
+
- [Installation](#installation)
|
|
35
|
+
- [Usage](#usage)
|
|
36
|
+
- [Citation](#citation)
|
|
37
|
+
- [Contributing](#contributing)
|
|
38
|
+
- [License](#license)
|
|
39
|
+
|
|
40
|
+
## Installation ⚙️
|
|
41
|
+
<!-- Adjust the link of the second command to your own repo -->
|
|
42
|
+
|
|
43
|
+
The required packages to run this code can be found in the requirements.txt file. To run this file, execute the following code block after cloning the repository:
|
|
44
|
+
|
|
45
|
+
```bash
|
|
46
|
+
pip install .
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
or
|
|
50
|
+
|
|
51
|
+
```bash
|
|
52
|
+
pip install git+https://github.com/StephanAkkerman/x-timeline-scraper.git
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
## Usage ⌨️
|
|
56
|
+
|
|
57
|
+
To use the X-Timeline Scraper, you need to provide a cURL command that accesses the desired X timeline. The instructions can be found in [curl_example.txt](curl_example.txt). Then, you can use the `XTimelineClient` class to fetch and parse tweets.
|
|
58
|
+
|
|
59
|
+
Here's a simple example of how to use the client:
|
|
60
|
+
|
|
61
|
+
```python
|
|
62
|
+
import asyncio
|
|
63
|
+
from src.xclient import XTimelineClient
|
|
64
|
+
|
|
65
|
+
async with XTimelineClient(
|
|
66
|
+
"curl.txt", persist_last_id_path="state/last_id.txt"
|
|
67
|
+
) as xc:
|
|
68
|
+
tweets = await xc.fetch_tweets(update_last_id=False)
|
|
69
|
+
for t in tweets:
|
|
70
|
+
print(t.to_markdown())
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
You can also stream new tweets in real-time:
|
|
74
|
+
|
|
75
|
+
```python
|
|
76
|
+
import asyncio
|
|
77
|
+
from src.xclient import XTimelineClient
|
|
78
|
+
async with XTimelineClient(
|
|
79
|
+
"curl.txt", persist_last_id_path="state/last_id.txt"
|
|
80
|
+
) as xc:
|
|
81
|
+
async for t in xc.stream(interval_s=5.0):
|
|
82
|
+
print(t.to_markdown())
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
## Citation ✍️
|
|
86
|
+
<!-- Be sure to adjust everything here so it matches your name and repo -->
|
|
87
|
+
If you use this project in your research, please cite as follows:
|
|
88
|
+
|
|
89
|
+
```bibtex
|
|
90
|
+
@misc{project_name,
|
|
91
|
+
author = {Stephan Akkerman},
|
|
92
|
+
title = {X-Timeline Scraper},
|
|
93
|
+
year = {2025},
|
|
94
|
+
publisher = {GitHub},
|
|
95
|
+
journal = {GitHub repository},
|
|
96
|
+
howpublished = {\url{https://github.com/StephanAkkerman/x-timeline-scraper}}
|
|
97
|
+
}
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
## Contributing 🛠
|
|
101
|
+
<!-- Be sure to adjust the repo name here for both the URL and GitHub link -->
|
|
102
|
+
Contributions are welcome! If you have a feature request, bug report, or proposal for code refactoring, please feel free to open an issue on GitHub. We appreciate your help in improving this project.\
|
|
103
|
+

|
|
104
|
+
|
|
105
|
+
## License 📜
|
|
106
|
+
|
|
107
|
+
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
|
|
@@ -0,0 +1,11 @@
|
|
|
1
|
+
LICENSE
|
|
2
|
+
README.md
|
|
3
|
+
pyproject.toml
|
|
4
|
+
src/__init__.py
|
|
5
|
+
src/tweet.py
|
|
6
|
+
src/xclient.py
|
|
7
|
+
src/xtimeline.egg-info/PKG-INFO
|
|
8
|
+
src/xtimeline.egg-info/SOURCES.txt
|
|
9
|
+
src/xtimeline.egg-info/dependency_links.txt
|
|
10
|
+
src/xtimeline.egg-info/requires.txt
|
|
11
|
+
src/xtimeline.egg-info/top_level.txt
|
|
@@ -0,0 +1 @@
|
|
|
1
|
+
|