linkedin-agent-cli 0.1.0__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,197 @@
1
+ Metadata-Version: 2.4
2
+ Name: linkedin-agent-cli
3
+ Version: 0.1.0
4
+ Summary: Django-free library and CLI for LinkedIn platform mechanics over a bound browser session (Voyager API + Playwright).
5
+ Project-URL: Homepage, https://github.com/eracle/linkedin-cli
6
+ Project-URL: Repository, https://github.com/eracle/linkedin-cli
7
+ Project-URL: Issues, https://github.com/eracle/linkedin-cli/issues
8
+ Author-email: eracle <eracle@posteo.eu>
9
+ License-Expression: MIT
10
+ License-File: LICENSE
11
+ Keywords: agent-tools,ai-agent,browser-automation,cli,lead-generation,linkedin,linkedin-api,linkedin-automation,linkedin-scraper,llm,outreach,playwright,voyager,web-scraping
12
+ Classifier: Development Status :: 4 - Beta
13
+ Classifier: Environment :: Console
14
+ Classifier: Intended Audience :: Developers
15
+ Classifier: License :: OSI Approved :: MIT License
16
+ Classifier: Programming Language :: Python :: 3
17
+ Classifier: Topic :: Internet :: WWW/HTTP :: Browsers
18
+ Classifier: Topic :: Office/Business
19
+ Classifier: Topic :: Software Development :: Libraries
20
+ Requires-Python: >=3.10
21
+ Requires-Dist: playwright-stealth
22
+ Requires-Dist: playwright>=1.59
23
+ Requires-Dist: tenacity<10,>=8
24
+ Requires-Dist: termcolor
25
+ Description-Content-Type: text/markdown
26
+
27
+ # linkedin-cli
28
+
29
+ **Drive LinkedIn from the command line or any program** — search people, scrape
30
+ profiles, check connection status, send connection requests, and read or send
31
+ messages. One small, dependency-light Python tool that talks to LinkedIn's
32
+ private **Voyager API** through a **real, logged-in browser** (Playwright), so
33
+ it behaves like a human session instead of a cookie-only scraper.
34
+
35
+ ![Python](https://img.shields.io/badge/python-3.10%2B-blue)
36
+ ![License](https://img.shields.io/badge/license-MIT-green)
37
+ ![Playwright](https://img.shields.io/badge/browser-Playwright-2EAD33)
38
+
39
+ > No SaaS, no API key, no database. Your browser, your LinkedIn account, your machine.
40
+
41
+ ---
42
+
43
+ ## ✨ Why linkedin-cli
44
+
45
+ - **Real browser session, not raw cookies.** A persistent Chromium window is
46
+ launched once and shared; requests ride your live, authenticated session —
47
+ far more resilient than header/cookie replay.
48
+ - **Structured JSON out of every command.** Pipe it into `jq`, a script, or an
49
+ LLM agent. Human-readable summaries by default; `--json` for the full record.
50
+ - **Robust login.** Authentication is a small **page-state machine** that
51
+ understands LinkedIn's login, authwall, and security-checkpoint redirects —
52
+ not a brittle one-shot form fill.
53
+ - **Language-agnostic.** Anything that can run a subprocess and parse JSON can
54
+ drive LinkedIn — Python, Node, Go, shell, or an AI agent. No SDK lock-in.
55
+ - **Tiny surface.** Eight verbs, four dependencies, zero web framework. It knows
56
+ about *a LinkedIn page and a browser* — nothing else.
57
+
58
+ ## 📦 Install
59
+
60
+ ```bash
61
+ pip install linkedin-agent-cli
62
+ python -m playwright install chromium
63
+ ```
64
+
65
+ This installs the `linkedin-cli` command (equivalent to `python -m linkedin_cli.cli`).
66
+ The PyPI package is `linkedin-agent-cli`; the import name is `linkedin_cli`. For the
67
+ latest unreleased code, install from git:
68
+ `pip install "linkedin-agent-cli @ git+https://github.com/eracle/linkedin-cli.git@main"`.
69
+
70
+ ## 🚀 Quickstart
71
+
72
+ linkedin-cli uses a **bind + connect** model: one long-lived process owns the
73
+ browser; every command is a short client that connects to it.
74
+
75
+ ```bash
76
+ # 1. Open + bind a session once (this process owns the browser window).
77
+ linkedin-cli session open --session work
78
+
79
+ # 2. From any other shell, drive it. Set the session once via env:
80
+ export LINKEDIN_CLI_SESSION=work
81
+ export LINKEDIN_USERNAME="you@example.com"
82
+ export LINKEDIN_PASSWORD="••••••••"
83
+
84
+ linkedin-cli login # authenticate the session
85
+ linkedin-cli search "head of growth" --network first # discover → handles
86
+ linkedin-cli profile alice-smith # scrape a profile
87
+ linkedin-cli profile alice-smith --json > alice.json # save the full record
88
+ linkedin-cli status alice-smith # Connected / Pending / Qualified
89
+ linkedin-cli connect alice-smith # send a connection request
90
+ linkedin-cli message alice-smith --text "Hi Alice 👋"
91
+ linkedin-cli thread alice-smith # read the conversation
92
+
93
+ linkedin-cli session close
94
+ ```
95
+
96
+ Hit a security checkpoint? `playwright-cli attach work` opens the *same* browser
97
+ so you can clear it by hand, then carry on.
98
+
99
+ ## 🧰 Commands
100
+
101
+ `--session <name>` (or `$LINKEDIN_CLI_SESSION`) and `--json` apply to every command.
102
+
103
+ | Command | What it does | `--json` result |
104
+ |---|---|---|
105
+ | `login` | Authenticate the session (creds from env), clear checkpoints, discover your own profile | `{account, self}` |
106
+ | `whoami` | Who is this session logged in as (no login flow) | `{self}` |
107
+ | `search <kw> [--network first/second/third] [--page N]` | People search → matching profile handles | `{query, page, network, profiles[]}` |
108
+ | `profile <id>` | Scrape a profile (positions, education, location, …); `--raw` adds the raw Voyager blob | full `LinkedInProfile` |
109
+ | `status <id>` | Connection state | `{public_identifier, state}` |
110
+ | `connect <id>` | Send a connection request (no note) | `{public_identifier, state}` |
111
+ | `message <id> --text …` | Send a direct message | `{public_identifier, sent}` |
112
+ | `thread <id>` | Read a conversation's messages | `{public_identifier, messages[]}` |
113
+
114
+ An `<id>` is a public handle (`alice-smith`) or a full profile URL. Commands that
115
+ need the internal member `urn` (`message`/`thread`/`status`) resolve it for you —
116
+ every command is independent and takes only a handle.
117
+
118
+ ## 🤖 Built for AI agents (and any language)
119
+
120
+ linkedin-cli is designed to be driven by an LLM as a **deterministic tool**. The
121
+ properties that make it agent-friendly:
122
+
123
+ - **Stable, typed JSON contract** — every verb returns one documented dict;
124
+ maps directly onto a function-calling / tool-use schema.
125
+ - **id-only, stateless commands** — a public handle is the only argument an agent
126
+ threads between steps; no session tokens, urns, or cursors to carry.
127
+ - **Predictable error taxonomy** — failures surface as `error: <type>: <message>`
128
+ on stderr with a non-zero exit, so an agent can branch on `type`
129
+ (`checkpoint_challenge`, `authentication`, `connection_limit`, …).
130
+ - **No hidden state or side effects** — stdout is result-only; logs go to stderr.
131
+ - **Self-describing** — see [`llms.txt`](llms.txt) for a compact spec an LLM can
132
+ load directly, and `linkedin-cli <verb> --help` for per-verb usage.
133
+
134
+ Because every command emits JSON on stdout, you can drive LinkedIn from anything —
135
+ Python, Node, Go, shell, or an agent loop — no SDK and no Python import required:
136
+
137
+ ```python
138
+ import subprocess, json
139
+
140
+ def li(*args):
141
+ out = subprocess.run(["linkedin-cli", *args, "--json"],
142
+ capture_output=True, text=True, check=True)
143
+ return json.loads(out.stdout)
144
+
145
+ for hit in li("search", "head of growth", "--network", "first")["profiles"]:
146
+ handle = hit["public_identifier"]
147
+ if li("status", handle)["state"] == "Qualified":
148
+ li("message", handle, "--text", "Hi — loved your recent post!")
149
+ ```
150
+
151
+ The discovery → outreach loop an agent runs: **`search` → `profile` / `status` →
152
+ `message` / `thread`.**
153
+
154
+ ## 🧠 How it works
155
+
156
+ - **bind + connect** — `linkedin-cli session open` launches a persistent Chromium
157
+ with `Browser.bind()` (Playwright ≥ 1.59) and registers a local `ws://`
158
+ endpoint under the session name. Each command `chromium.connect()`s to that same
159
+ browser and drives a *real* page. Auth, cookies, and fingerprint live in the
160
+ owner's on-disk profile; the CLI keeps only a name→endpoint registry — **no
161
+ database**. One session = one LinkedIn account.
162
+ - **Voyager API** — reads (`profile`, `thread`, `status`) call LinkedIn's private
163
+ Voyager endpoints from inside the authenticated page (`fetch`), then parse the
164
+ JSON — fast and structured, no DOM scraping where an API exists.
165
+ - **Page-state auth machine** — `classify_page()` judges the live page by URL
166
+ *path* only (so a `/login?...redirect=/feed/` URL never reads as the feed), and
167
+ each transition asserts its pre/post state, raising on an illegal jump. Login,
168
+ authwall, and checkpoint flows are modeled explicitly.
169
+
170
+ ## 📤 Output contract
171
+
172
+ - Every command produces one result **dict** — that dict is both the `--json`
173
+ payload and the source the human summary is rendered from, so the two never drift.
174
+ - **Human-readable by default; `--json` for the full dict.**
175
+ - **No `--out` flag** — print to stdout, redirect to save (`… --json > out.json`).
176
+ - **stdout is result-only; logs and errors go to stderr** as
177
+ `error: <type>: <message>` with a non-zero exit. Error types are stable:
178
+ `checkpoint_challenge`, `authentication`, `profile_inaccessible`,
179
+ `skip_profile`, `connection_limit`.
180
+
181
+ ## ⚠️ Responsible use
182
+
183
+ This tool automates **your own** LinkedIn account from **your own** machine.
184
+ Automating LinkedIn may conflict with its Terms of Service, and aggressive use
185
+ can get an account restricted. Respect rate limits, only contact people for
186
+ legitimate reasons, follow applicable laws (GDPR/CAN-SPAM), and use it at your
187
+ own risk. You are responsible for how you use it.
188
+
189
+ ## 📄 License
190
+
191
+ [MIT](LICENSE) © eracle
192
+
193
+ ---
194
+
195
+ linkedin-cli was extracted from [**OpenOutreach**](https://github.com/eracle/OpenOutreach),
196
+ an open-source AI outreach tool, where it powers the LinkedIn discovery and
197
+ interaction layer. It is fully standalone and reusable on its own.
@@ -0,0 +1,34 @@
1
+ linkedin_cli/__init__.py,sha256=IdOL1XgI1gAR1WmguOT5MwNe9LgQIug9kx63B6_gT-k,414
2
+ linkedin_cli/auth.py,sha256=aC6sfZC5WqapA4yUZqdIQ_78Tx5rLTK1v5PGzT9kG3s,3693
3
+ linkedin_cli/cli.py,sha256=DTqjgq9zi1gUIqxvxvMkNlGSkmq-KIBqXlR1ZNXLYfc,15143
4
+ linkedin_cli/conf.py,sha256=kGj1l9_8cQaK110sbvnEYDH5P5QKIEm-mCEIuZwFUw8,1298
5
+ linkedin_cli/enums.py,sha256=hRNr0vSlTm-w6BeF9wLLtxbobzVqzXNQsmCPuaCWTzY,229
6
+ linkedin_cli/exceptions.py,sha256=QI7anLCNUo3ebMb6GQEUk4HX-mD-pJOkUQhPbatsm4c,1403
7
+ linkedin_cli/launcher.py,sha256=8f9o4FbrtcxBj97CXPIrvnHvo-X4wjX8Lyvbr6pwsyY,2509
8
+ linkedin_cli/page_state.py,sha256=-QovkqQFlh49cLkKmoubCqCASAURjzJJCqoGy-h04Tk,5906
9
+ linkedin_cli/session.py,sha256=2ZLHM8S4Ry8gz1m90RX8ztXmAFDTvmo0Gzg344zZ1Jc,6509
10
+ linkedin_cli/url_utils.py,sha256=rYfyWFqLfHluJMv7BfzUqlM_VEuWzC9n1QqswxKXIp8,820
11
+ linkedin_cli/actions/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
12
+ linkedin_cli/actions/connect.py,sha256=mHI3GATmogQ0psqiyxBbE25oGZaESn5rYoarcsoFtNE,4043
13
+ linkedin_cli/actions/conversations.py,sha256=KHptMUHbmjIxZjhWZzy2b7rUNYBEdsTYLzUVZUW8wPc,4778
14
+ linkedin_cli/actions/message.py,sha256=yqIks2TRlBhBlgvW-WeXlRS_-uNpE6rv5GoPWL3YOZU,5810
15
+ linkedin_cli/actions/profile.py,sha256=H-zQk4lCOhZq7nIeKlZMjokbS_URuuzPhiwLRrpm3X0,529
16
+ linkedin_cli/actions/search.py,sha256=uXPdQfsuI46ig2fsd78I2girpy1uEuuI2baIIkCRLGo,6777
17
+ linkedin_cli/actions/status.py,sha256=BNvSDoWr5Ml0gZm3lM9jNRw7jYHc3jqDD-l7avfOtrM,4207
18
+ linkedin_cli/api/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
19
+ linkedin_cli/api/client.py,sha256=u303hFM5rYeS_p7pJmjw2OEp4pASnrYw6rR7nOgLKPg,6928
20
+ linkedin_cli/api/voyager.py,sha256=GkKZPYiV5sf-lOay6ORnZgZFXsRYmkGoErJhevEZfK8,11118
21
+ linkedin_cli/api/messaging/__init__.py,sha256=pduj6tsma0IS5MqsgPp6Yk_Ncs8aOrvFQBvkWW4xY9U,381
22
+ linkedin_cli/api/messaging/conversations.py,sha256=tOCgGOnwdv1ZBXiSLn4bmJZa6NWJoIBuVtUsvNpE7nA,1967
23
+ linkedin_cli/api/messaging/send.py,sha256=YpTUhj0pFMRdPZiyNTZps0Jk4Onkws7MnOcPwWg3b8Y,2170
24
+ linkedin_cli/api/messaging/utils.py,sha256=gmTHWPrZOJ3MlhiVtTIYXVCrQw3vpM28bbQU1faW2-I,789
25
+ linkedin_cli/browser/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
26
+ linkedin_cli/browser/login.py,sha256=tAmXzeWJFmIDGpTCUmRAnF71q7_3v7of8nDshV-45RE,5298
27
+ linkedin_cli/browser/nav.py,sha256=QVr-jpn3qISH_JRzw-hDSe5gffWDUS0RIMOfOaIunOg,4011
28
+ linkedin_cli/setup/__init__.py,sha256=47DEQpj8HBSa-_TImW-5JCeuQeRkm5NMpJWZG3hSuFU,0
29
+ linkedin_cli/setup/self_profile.py,sha256=wqE9_jgzzsccQLqWpme4amKArMo9QOMvQMv0q5zhW28,991
30
+ linkedin_agent_cli-0.1.0.dist-info/METADATA,sha256=ILWkI7gnbqmoY4U-ex1KUTZsvUrABwU-E46fGoiyxuU,9522
31
+ linkedin_agent_cli-0.1.0.dist-info/WHEEL,sha256=mffPy8wBnZQn2VnJUU5jE99KsxaSfiyMHV9Yt0aLVxs,87
32
+ linkedin_agent_cli-0.1.0.dist-info/entry_points.txt,sha256=kO1MrBzrciaqNkJH3b3rxotzsqiTCJJhWZ0ypxqrgAQ,55
33
+ linkedin_agent_cli-0.1.0.dist-info/licenses/LICENSE,sha256=evnow1KJu1sfcM-e106GjsnOY0pr4NmgcWxMLdsCbNA,1063
34
+ linkedin_agent_cli-0.1.0.dist-info/RECORD,,
@@ -0,0 +1,4 @@
1
+ Wheel-Version: 1.0
2
+ Generator: hatchling 1.30.1
3
+ Root-Is-Purelib: true
4
+ Tag: py3-none-any
@@ -0,0 +1,2 @@
1
+ [console_scripts]
2
+ linkedin-cli = linkedin_cli.cli:main
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 eracle
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,9 @@
1
+ """linkedin_cli — standalone, Django-free LinkedIn interaction library.
2
+
3
+ Owns the LinkedIn *platform* mechanics — navigation, login, the Voyager API
4
+ client, profile/conversation scraping, and the connect/message/status/thread
5
+ verbs. It holds no database and no campaign/CRM context: every verb runs
6
+ against a browser session supplied by the caller (see ``session.LinkedInSession``).
7
+ """
8
+
9
+ __version__ = "0.1.0"
File without changes
@@ -0,0 +1,118 @@
1
+ # linkedin/actions/connect.py
2
+ import logging
3
+ from typing import Dict, Any
4
+
5
+ from linkedin_cli.enums import ProfileState
6
+ from linkedin_cli.exceptions import SkipProfile, ReachedConnectionLimit
7
+ from linkedin_cli.browser.nav import find_top_card, dump_page_html
8
+
9
+ logger = logging.getLogger(__name__)
10
+
11
+ SELECTORS = {
12
+ "weekly_limit": 'div[class*="ip-fuse-limit-alert__warning"]',
13
+ "invite_to_connect": (
14
+ '[aria-label*="Invite"][aria-label*="to connect"]:visible, '
15
+ 'a:has(span:text-is("Connect")):visible, '
16
+ 'button:has(span:text-is("Connect")):visible'
17
+ ),
18
+ "error_toast": 'div[data-test-artdeco-toast-item-type="error"]',
19
+ "more_button": (
20
+ 'button[aria-label="More"]:visible, '
21
+ 'button[id*="overflow"]:visible, '
22
+ 'button[aria-label*="More actions"]:visible, '
23
+ 'button:has(span:text-is("More")):visible'
24
+ ),
25
+ "connect_option": (
26
+ 'div[role="button"][aria-label^="Invite"][aria-label*=" to connect"], '
27
+ 'div[role="button"]:text-is("Connect"), '
28
+ '[role="menuitem"][aria-label*="Connect"], '
29
+ '[role="menuitem"]:has-text("Connect"), '
30
+ 'li:text-is("Connect"), '
31
+ 'span[role="button"]:text-is("Connect")'
32
+ ),
33
+ "send_now": 'button:has-text("Send now"), button[aria-label*="Send without"], button[aria-label*="Send invitation"]',
34
+ }
35
+
36
+
37
+ def send_connection_request(
38
+ session: "LinkedInSession",
39
+ profile: Dict[str, Any],
40
+ ) -> ProfileState:
41
+ """
42
+ Sends a LinkedIn connection request WITHOUT a note (fastest & safest).
43
+
44
+ Assumes the profile page is already loaded (caller navigates via
45
+ ``get_connection_status`` or ``visit_profile`` beforehand).
46
+ """
47
+ public_identifier = profile.get('public_identifier')
48
+
49
+ # Send invitation WITHOUT note (current active flow)
50
+ if not _connect_direct(session) and not _connect_via_more(session):
51
+ logger.debug("Connect button not found for %s — staying at current stage", public_identifier)
52
+ dump_page_html(session, profile)
53
+ return ProfileState.QUALIFIED
54
+
55
+ _click_without_note(session)
56
+ _check_weekly_invitation_limit(session)
57
+
58
+ logger.debug("Connection request submitted for %s", public_identifier)
59
+ return ProfileState.PENDING
60
+
61
+
62
+ def _check_weekly_invitation_limit(session):
63
+ weekly_invitation_limit = session.page.locator(SELECTORS["weekly_limit"])
64
+ if weekly_invitation_limit.count() > 0:
65
+ raise ReachedConnectionLimit("Weekly connection limit pop up appeared")
66
+
67
+
68
+ def _connect_direct(session):
69
+ session.wait()
70
+ top_card = find_top_card(session)
71
+ direct = top_card.locator(SELECTORS["invite_to_connect"])
72
+ if direct.count() == 0:
73
+ return False
74
+
75
+ direct.first.click()
76
+ logger.debug("Clicked direct 'Connect' button")
77
+
78
+ error = session.page.locator(SELECTORS["error_toast"])
79
+ if error.count() > 0:
80
+ raise SkipProfile(f"{error.inner_text().strip()}")
81
+
82
+ return True
83
+
84
+
85
+ def _connect_via_more(session):
86
+ session.wait()
87
+ top_card = find_top_card(session)
88
+ page = session.page
89
+
90
+ # Dropdown may render as a portal outside top_card, so search page-wide
91
+ connect_option = page.locator(SELECTORS["connect_option"])
92
+
93
+ # Connect option may already be visible (More dropdown opened by status check)
94
+ if connect_option.count() == 0:
95
+ more = top_card.locator(SELECTORS["more_button"])
96
+ if more.count() == 0:
97
+ return False
98
+ more.first.click()
99
+ session.wait()
100
+
101
+ connect_option = page.locator(SELECTORS["connect_option"])
102
+ if connect_option.count() == 0:
103
+ return False
104
+ connect_option.first.click()
105
+ logger.debug("Used 'More → Connect' flow")
106
+
107
+ return True
108
+
109
+
110
+ def _click_without_note(session):
111
+ """Click flow: sends connection request instantly without note."""
112
+ session.wait()
113
+
114
+ # Click "Send now" / "Send without a note"
115
+ send_btn = session.page.locator(SELECTORS["send_now"])
116
+ send_btn.first.click(force=True)
117
+ session.wait()
118
+ logger.debug("Connection request submitted (no note)")
@@ -0,0 +1,132 @@
1
+ # linkedin/actions/conversations.py
2
+ """Retrieve past LinkedIn conversations for a given profile."""
3
+ import logging
4
+ from datetime import datetime, timezone
5
+
6
+ from linkedin_cli.api.client import PlaywrightLinkedinAPI
7
+ from linkedin_cli.api.messaging import fetch_conversations, fetch_messages, encode_urn
8
+
9
+ logger = logging.getLogger(__name__)
10
+
11
+
12
+ def find_conversation_urn(api: PlaywrightLinkedinAPI, target_urn: str, mailbox_urn: str) -> str | None:
13
+ """Find conversation URN for a target profile URN by scanning recent conversations."""
14
+ raw = fetch_conversations(api, mailbox_urn)
15
+ elements = raw.get("data", {}).get("messengerConversationsBySyncToken", {}).get("elements", [])
16
+
17
+ for conv in elements:
18
+ for p in conv.get("conversationParticipants", []):
19
+ if p.get("hostIdentityUrn") == target_urn:
20
+ return conv.get("entityUrn")
21
+ return None
22
+
23
+
24
+ def find_conversation_urn_via_navigation(session, target_urn: str) -> str | None:
25
+ """Navigate to the messaging page for a profile and capture the conversation URN.
26
+
27
+ Works for older conversations not in the first page of API results.
28
+ """
29
+ page = session.page
30
+ captured_urn = [None]
31
+
32
+ def on_response(response):
33
+ if "messengerMessages" not in response.url:
34
+ return
35
+ try:
36
+ data = response.json()
37
+ elements = data.get("data", {}).get("messengerMessagesBySyncToken", {}).get("elements", [])
38
+ if elements:
39
+ captured_urn[0] = elements[0].get("conversation", {}).get("entityUrn")
40
+ except Exception:
41
+ pass
42
+
43
+ session.context.on("response", on_response)
44
+ try:
45
+ url = f"https://www.linkedin.com/messaging/thread/new/?recipient={encode_urn(target_urn)}"
46
+ logger.debug("Navigating to messaging thread → %s", url)
47
+ page.goto(url, wait_until="domcontentloaded", timeout=30_000)
48
+ page.wait_for_timeout(8_000)
49
+ except Exception as e:
50
+ logger.warning("Navigation to messaging thread failed: %s", e)
51
+ finally:
52
+ session.context.remove_listener("response", on_response)
53
+
54
+ return captured_urn[0]
55
+
56
+
57
+ def parse_message_element(msg: dict) -> dict | None:
58
+ """Parse a single Voyager message element into a dict.
59
+
60
+ Returns {entityUrn, text, sender_name, sender_host_urn, delivered_at, is_outgoing (unset)}
61
+ or None if the element should be skipped.
62
+ """
63
+ body = msg.get("body", {})
64
+ text = body.get("text", "") if isinstance(body, dict) else str(body)
65
+ if not text:
66
+ return None
67
+
68
+ sender = msg.get("sender", {})
69
+ participant = sender.get("participantType", {}).get("member", {})
70
+ first = (participant.get("firstName") or {}).get("text", "")
71
+ last = (participant.get("lastName") or {}).get("text", "")
72
+ sender_name = f"{first} {last}".strip() or "unknown"
73
+
74
+ delivered_at = msg.get("deliveredAt")
75
+ ts = (
76
+ datetime.fromtimestamp(delivered_at / 1000, tz=timezone.utc)
77
+ if delivered_at
78
+ else None
79
+ )
80
+
81
+ return {
82
+ "entityUrn": msg.get("entityUrn"),
83
+ "text": text,
84
+ "sender_name": sender_name,
85
+ "sender_host_urn": sender.get("hostIdentityUrn", ""),
86
+ "delivered_at": ts,
87
+ }
88
+
89
+
90
+ def parse_messages(raw: dict) -> list[dict]:
91
+ """Parse raw messages response into a list of {sender, text, timestamp} dicts."""
92
+ elements = raw.get("data", {}).get("messengerMessagesBySyncToken", {}).get("elements", [])
93
+
94
+ messages = []
95
+ for msg in elements:
96
+ parsed = parse_message_element(msg)
97
+ if not parsed:
98
+ continue
99
+ ts = parsed["delivered_at"]
100
+ messages.append({
101
+ "sender": parsed["sender_name"],
102
+ "text": parsed["text"],
103
+ "timestamp": ts.strftime("%Y-%m-%d %H:%M") if ts else "",
104
+ })
105
+
106
+ messages.sort(key=lambda m: m["timestamp"])
107
+ return messages
108
+
109
+
110
+ def get_conversation(session, target_urn: str, mailbox_urn: str) -> list[dict] | None:
111
+ """Retrieve past messages with a profile.
112
+
113
+ Args:
114
+ session: Browser session.
115
+ target_urn: Target profile URN.
116
+ mailbox_urn: Authenticated user's profile URN.
117
+
118
+ Returns a list of {sender, text, timestamp} dicts, or None if no conversation exists.
119
+ """
120
+ session.ensure_browser()
121
+ api = PlaywrightLinkedinAPI(session=session)
122
+
123
+ conversation_urn = find_conversation_urn(api, target_urn, mailbox_urn)
124
+ if not conversation_urn:
125
+ logger.debug("Not in recent conversations, trying navigation fallback")
126
+ conversation_urn = find_conversation_urn_via_navigation(session, target_urn)
127
+ if not conversation_urn:
128
+ logger.info("No conversation found for %s", target_urn)
129
+ return None
130
+
131
+ raw = fetch_messages(api, conversation_urn)
132
+ return parse_messages(raw)