flatscraper 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Lukas
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,191 @@
1
+ Metadata-Version: 2.4
2
+ Name: flatscraper
3
+ Version: 0.1.0
4
+ Summary: Multi-platform flat search automation
5
+ Requires-Python: >=3.10
6
+ Description-Content-Type: text/markdown
7
+ License-File: LICENSE
8
+ Requires-Dist: playwright>=1.40.0
9
+ Requires-Dist: groq>=0.4.0
10
+ Requires-Dist: python-dotenv>=1.0.0
11
+ Requires-Dist: pydantic>=2.0.0
12
+ Requires-Dist: pydantic-settings>=2.0.0
13
+ Requires-Dist: questionary>=2.0.0
14
+ Requires-Dist: rich>=13.0.0
15
+ Provides-Extra: dev
16
+ Requires-Dist: pytest>=7.0.0; extra == "dev"
17
+ Dynamic: license-file
18
+
19
+ # FlatScraper
20
+
21
+ **Find WG rooms and apartments in hours, not weeks.** FlatScraper automates your search on WG-Gesucht: it discovers new listings, generates personalized messages with AI, and sends contact requests—so you can land multiple viewings while others are still refreshing the page.
22
+
23
+ ---
24
+
25
+ ## Why FlatScraper?
26
+
27
+ | The old way | With FlatScraper |
28
+ |-------------|------------------|
29
+ | Manually refresh WG-Gesucht | Automatically scans for new listings |
30
+ | Copy-paste generic messages | AI writes personalized Anschreiben for each ad |
31
+ | Hope your message stands out | Tailored tone: WG-friendly or landlord-professional |
32
+ | One viewing per week if you're lucky | **Multiple viewings in hours** |
33
+
34
+ ---
35
+
36
+ ## Demo
37
+
38
+ ### Setup wizard (one-time, ~2 minutes)
39
+
40
+ ![Setup demo](public/setup-demo.gif)
41
+
42
+ ### FlatScraper in action
43
+
44
+ ![FlatScraper demo](public/flatscraper-demo.gif)
45
+
46
+ ---
47
+
48
+ ## Features
49
+
50
+ - **Zero cost** – Runs entirely on [Groq's free tier](https://console.groq.com). No paid APIs.
51
+ - **Switch models** – Choose from Groq models (compound, llama, etc.) in setup or `.env`.
52
+ - **Personalized messages** – Your profile (age, job, personality, hobbies) shapes every Anschreiben.
53
+ - **Smart extraction** – Strips LLM meta-commentary; only the message is sent.
54
+ - **Rate limit handling** – Automatic retries with user feedback when Groq throttles.
55
+ - **Headless by default** – Runs in the background; use `--visible` for debugging.
56
+ - **Dry run** – `--no-send` to preview messages without sending.
57
+
58
+ ---
59
+
60
+ ## Quick start
61
+
62
+ ### 1. Install
63
+
64
+ ```powershell
65
+ # With uv (recommended)
66
+ uv sync
67
+
68
+ # Or with pip
69
+ pip install flatscraper
70
+ # or from source: pip install -e .
71
+ ```
72
+
73
+ ### 2. Install Playwright browser
74
+
75
+ ```powershell
76
+ playwright install chromium
77
+ ```
78
+
79
+ ### 3. Run setup
80
+
81
+ ```powershell
82
+ uv run flatscraper setup
83
+ # or: uv run setup
84
+ ```
85
+
86
+ The wizard guides you through:
87
+
88
+ 1. WG-Gesucht credentials
89
+ 2. [Groq API key](https://console.groq.com) (free)
90
+ 3. Google Drive link for your documents
91
+ 4. Your profile (age, city, job, personality, hobbies)
92
+ 5. Model selection (Groq free-tier models)
93
+ 6. Search URLs from WG-Gesucht
94
+
95
+ ### 4. Run
96
+
97
+ ```powershell
98
+ # Test run (no messages sent)
99
+ uv run flatscraper --no-send
100
+
101
+ # Live run
102
+ uv run flatscraper
103
+ ```
104
+
105
+ ---
106
+
107
+ ## CLI reference
108
+
109
+ | Command | Description |
110
+ |---------|-------------|
111
+ | `flatscraper` | Run once: find listings, generate messages, send |
112
+ | `flatscraper --no-send` | Dry run: generate messages only, don't send |
113
+ | `flatscraper --visible` | Show browser window (default: headless) |
114
+ | `flatscraper --debug` | Include all listings (ignore age filter) |
115
+ | `flatscraper --schedule` | Run repeatedly on an interval |
116
+ | `flatscraper setup` | Run the setup wizard |
117
+
118
+ ---
119
+
120
+ ## Configuration
121
+
122
+ ### Environment (`.env`)
123
+
124
+ | Variable | Required | Description |
125
+ |----------|----------|-------------|
126
+ | `FLATSCRAPER_EMAIL` | Yes | WG-Gesucht login email |
127
+ | `FLATSCRAPER_PASSWORD` | Yes | WG-Gesucht password |
128
+ | `GROQ_API_KEY` | Yes | [Groq API key](https://console.groq.com) (free tier) |
129
+ | `GOOGLE_DRIVE_LINK` | Yes | Google Drive folder with your documents |
130
+ | `GROQ_MODEL` | No | Model (default: `llama-3.1-8b-instant`) |
131
+ | `RUN_INTERVAL_MINUTES` | No | Schedule interval (default: `30`) |
132
+ | `AUTO_RUN_ENABLED` | No | Enable schedule (default: `false`) |
133
+
134
+ Copy `.env.example` to `.env` and fill in your values. **Never commit `.env` or `user_profile.json`**—they contain personal data.
135
+
136
+ ### Profile (`user_profile.json`)
137
+
138
+ Created by the setup wizard. Contains your persona (for personalized messages) and search URLs. Edit manually or run `flatscraper setup` again.
139
+
140
+ ---
141
+
142
+ ## Groq models (free tier)
143
+
144
+ You can switch models in setup or via `GROQ_MODEL` in `.env`:
145
+
146
+ | Model | Notes |
147
+ |-------|-------|
148
+ | `groq/compound` | High throughput, good for many messages |
149
+ | `groq/compound-mini` | Fast, good for high frequency |
150
+ | `llama-3.3-70b-versatile` | Best text quality |
151
+ | `meta-llama/llama-4-scout-17b-16e-instruct` | Good balance |
152
+ | `llama-3.1-8b-instant` | Default, fast |
153
+
154
+ ---
155
+
156
+ ## Tests
157
+
158
+ ```powershell
159
+ uv sync --extra dev
160
+ uv run pytest tests/ -v
161
+ ```
162
+
163
+ ---
164
+
165
+ ## Project structure
166
+
167
+ ```
168
+ flatscraper/
169
+ ├── run.py # Main entry point
170
+ ├── config.py # Settings + user profile
171
+ ├── groq_client.py # LLM client (Groq)
172
+ ├── models.py # Pydantic models
173
+ ├── setup_wizard.py # Interactive setup
174
+ ├── .env.example # Env template
175
+ └── platforms/
176
+ └── wggesucht/ # WG-Gesucht implementation
177
+ ```
178
+
179
+ ---
180
+
181
+ ## Supported platforms
182
+
183
+ | Platform | Status |
184
+ |------------|--------|
185
+ | WG-Gesucht | ✅ Ready |
186
+
187
+ ---
188
+
189
+ ## License
190
+
191
+ MIT
@@ -0,0 +1,173 @@
1
+ # FlatScraper
2
+
3
+ **Find WG rooms and apartments in hours, not weeks.** FlatScraper automates your search on WG-Gesucht: it discovers new listings, generates personalized messages with AI, and sends contact requests—so you can land multiple viewings while others are still refreshing the page.
4
+
5
+ ---
6
+
7
+ ## Why FlatScraper?
8
+
9
+ | The old way | With FlatScraper |
10
+ |-------------|------------------|
11
+ | Manually refresh WG-Gesucht | Automatically scans for new listings |
12
+ | Copy-paste generic messages | AI writes personalized Anschreiben for each ad |
13
+ | Hope your message stands out | Tailored tone: WG-friendly or landlord-professional |
14
+ | One viewing per week if you're lucky | **Multiple viewings in hours** |
15
+
16
+ ---
17
+
18
+ ## Demo
19
+
20
+ ### Setup wizard (one-time, ~2 minutes)
21
+
22
+ ![Setup demo](public/setup-demo.gif)
23
+
24
+ ### FlatScraper in action
25
+
26
+ ![FlatScraper demo](public/flatscraper-demo.gif)
27
+
28
+ ---
29
+
30
+ ## Features
31
+
32
+ - **Zero cost** – Runs entirely on [Groq's free tier](https://console.groq.com). No paid APIs.
33
+ - **Switch models** – Choose from Groq models (compound, llama, etc.) in setup or `.env`.
34
+ - **Personalized messages** – Your profile (age, job, personality, hobbies) shapes every Anschreiben.
35
+ - **Smart extraction** – Strips LLM meta-commentary; only the message is sent.
36
+ - **Rate limit handling** – Automatic retries with user feedback when Groq throttles.
37
+ - **Headless by default** – Runs in the background; use `--visible` for debugging.
38
+ - **Dry run** – `--no-send` to preview messages without sending.
39
+
40
+ ---
41
+
42
+ ## Quick start
43
+
44
+ ### 1. Install
45
+
46
+ ```powershell
47
+ # With uv (recommended)
48
+ uv sync
49
+
50
+ # Or with pip
51
+ pip install flatscraper
52
+ # or from source: pip install -e .
53
+ ```
54
+
55
+ ### 2. Install Playwright browser
56
+
57
+ ```powershell
58
+ playwright install chromium
59
+ ```
60
+
61
+ ### 3. Run setup
62
+
63
+ ```powershell
64
+ uv run flatscraper setup
65
+ # or: uv run setup
66
+ ```
67
+
68
+ The wizard guides you through:
69
+
70
+ 1. WG-Gesucht credentials
71
+ 2. [Groq API key](https://console.groq.com) (free)
72
+ 3. Google Drive link for your documents
73
+ 4. Your profile (age, city, job, personality, hobbies)
74
+ 5. Model selection (Groq free-tier models)
75
+ 6. Search URLs from WG-Gesucht
76
+
77
+ ### 4. Run
78
+
79
+ ```powershell
80
+ # Test run (no messages sent)
81
+ uv run flatscraper --no-send
82
+
83
+ # Live run
84
+ uv run flatscraper
85
+ ```
86
+
87
+ ---
88
+
89
+ ## CLI reference
90
+
91
+ | Command | Description |
92
+ |---------|-------------|
93
+ | `flatscraper` | Run once: find listings, generate messages, send |
94
+ | `flatscraper --no-send` | Dry run: generate messages only, don't send |
95
+ | `flatscraper --visible` | Show browser window (default: headless) |
96
+ | `flatscraper --debug` | Include all listings (ignore age filter) |
97
+ | `flatscraper --schedule` | Run repeatedly on an interval |
98
+ | `flatscraper setup` | Run the setup wizard |
99
+
100
+ ---
101
+
102
+ ## Configuration
103
+
104
+ ### Environment (`.env`)
105
+
106
+ | Variable | Required | Description |
107
+ |----------|----------|-------------|
108
+ | `FLATSCRAPER_EMAIL` | Yes | WG-Gesucht login email |
109
+ | `FLATSCRAPER_PASSWORD` | Yes | WG-Gesucht password |
110
+ | `GROQ_API_KEY` | Yes | [Groq API key](https://console.groq.com) (free tier) |
111
+ | `GOOGLE_DRIVE_LINK` | Yes | Google Drive folder with your documents |
112
+ | `GROQ_MODEL` | No | Model (default: `llama-3.1-8b-instant`) |
113
+ | `RUN_INTERVAL_MINUTES` | No | Schedule interval (default: `30`) |
114
+ | `AUTO_RUN_ENABLED` | No | Enable schedule (default: `false`) |
115
+
116
+ Copy `.env.example` to `.env` and fill in your values. **Never commit `.env` or `user_profile.json`**—they contain personal data.
117
+
118
+ ### Profile (`user_profile.json`)
119
+
120
+ Created by the setup wizard. Contains your persona (for personalized messages) and search URLs. Edit manually or run `flatscraper setup` again.
121
+
122
+ ---
123
+
124
+ ## Groq models (free tier)
125
+
126
+ You can switch models in setup or via `GROQ_MODEL` in `.env`:
127
+
128
+ | Model | Notes |
129
+ |-------|-------|
130
+ | `groq/compound` | High throughput, good for many messages |
131
+ | `groq/compound-mini` | Fast, good for high frequency |
132
+ | `llama-3.3-70b-versatile` | Best text quality |
133
+ | `meta-llama/llama-4-scout-17b-16e-instruct` | Good balance |
134
+ | `llama-3.1-8b-instant` | Default, fast |
135
+
136
+ ---
137
+
138
+ ## Tests
139
+
140
+ ```powershell
141
+ uv sync --extra dev
142
+ uv run pytest tests/ -v
143
+ ```
144
+
145
+ ---
146
+
147
+ ## Project structure
148
+
149
+ ```
150
+ flatscraper/
151
+ ├── run.py # Main entry point
152
+ ├── config.py # Settings + user profile
153
+ ├── groq_client.py # LLM client (Groq)
154
+ ├── models.py # Pydantic models
155
+ ├── setup_wizard.py # Interactive setup
156
+ ├── .env.example # Env template
157
+ └── platforms/
158
+ └── wggesucht/ # WG-Gesucht implementation
159
+ ```
160
+
161
+ ---
162
+
163
+ ## Supported platforms
164
+
165
+ | Platform | Status |
166
+ |------------|--------|
167
+ | WG-Gesucht | ✅ Ready |
168
+
169
+ ---
170
+
171
+ ## License
172
+
173
+ MIT
@@ -0,0 +1,165 @@
1
+ """Shared configuration for FlatScraper. Type-safe loading from env and user_profile.json."""
2
+
3
+ import json
4
+ from pathlib import Path
5
+
6
+ from pydantic import Field
7
+ from pydantic_settings import BaseSettings, SettingsConfigDict
8
+
9
+ from models import UserProfile
10
+
11
+ PROJECT_ROOT = Path(__file__).parent
12
+ PROFILE_PATH = PROJECT_ROOT / "user_profile.json"
13
+
14
+ # Default persona block when no profile exists (placeholder – run "flatscraper setup" to configure)
15
+ _DEFAULT_PERSONA_BLOCK = """DEINE PERSONA:
16
+ - Alter: 28 Jahre
17
+ - Herkunft: Berlin (zieht in Zielstadt)
18
+ - Beruf: Software Engineer
19
+ - Einzugstermin: Flexibel, gerne ab nächstem Monat
20
+ - Persönlichkeit: Ruhig, ordentlich, gesellig
21
+ - Hobbys: Lesen, Sport, Kochen
22
+ - Dokumente: Alle Unterlagen im Google Drive Link (bitte in .env konfigurieren)"""
23
+
24
+
25
+ class AppSettings(BaseSettings):
26
+ """Environment-based settings (from .env or os.environ)."""
27
+
28
+ model_config = SettingsConfigDict(
29
+ env_file=str(PROJECT_ROOT / ".env"),
30
+ env_file_encoding="utf-8",
31
+ extra="ignore",
32
+ populate_by_name=True,
33
+ )
34
+
35
+ email: str = Field(default="", validation_alias="FLATSCRAPER_EMAIL")
36
+ password: str = Field(default="", validation_alias="FLATSCRAPER_PASSWORD")
37
+ groq_api_key: str = Field(default="", validation_alias="GROQ_API_KEY")
38
+ groq_model: str = Field(default="llama-3.1-8b-instant", validation_alias="GROQ_MODEL")
39
+ google_drive_link: str = Field(default="", validation_alias="GOOGLE_DRIVE_LINK")
40
+ run_interval_minutes: int = Field(default=30, validation_alias="RUN_INTERVAL_MINUTES")
41
+ auto_run_enabled: bool = Field(
42
+ default=False,
43
+ validation_alias="AUTO_RUN_ENABLED",
44
+ )
45
+
46
+
47
+ _user_profile: UserProfile | None = None
48
+
49
+
50
+ def _load_user_profile() -> UserProfile | None:
51
+ global _user_profile
52
+ if _user_profile is not None:
53
+ return _user_profile
54
+ if PROFILE_PATH.exists():
55
+ try:
56
+ data = json.loads(PROFILE_PATH.read_text(encoding="utf-8"))
57
+ _user_profile = UserProfile.model_validate(data)
58
+ return _user_profile
59
+ except Exception:
60
+ pass
61
+ return None
62
+
63
+
64
+ # --- Public API ---
65
+
66
+ def get_settings() -> AppSettings:
67
+ return AppSettings()
68
+
69
+
70
+ def get_search_urls() -> list[str]:
71
+ profile = _load_user_profile()
72
+ if profile and profile.search_urls:
73
+ return profile.search_urls
74
+ return []
75
+
76
+
77
+ def get_persona_name() -> str:
78
+ profile = _load_user_profile()
79
+ if profile and profile.persona_name:
80
+ return profile.persona_name
81
+ return "Nutzer"
82
+
83
+
84
+ def get_persona_block() -> str:
85
+ profile = _load_user_profile()
86
+ if profile and profile.persona_block:
87
+ return profile.persona_block
88
+ return _DEFAULT_PERSONA_BLOCK
89
+
90
+
91
+ # Backward-compatible module-level exports (loaded at import)
92
+ _settings_instance = AppSettings()
93
+
94
+ EMAIL = _settings_instance.email
95
+ PASSWORD = _settings_instance.password
96
+ GROQ_API_KEY = _settings_instance.groq_api_key
97
+ GROQ_MODEL = _settings_instance.groq_model
98
+ GOOGLE_DRIVE_LINK = _settings_instance.google_drive_link
99
+ RUN_INTERVAL_MINUTES = _settings_instance.run_interval_minutes
100
+ AUTO_RUN_ENABLED = _settings_instance.auto_run_enabled
101
+
102
+ # Parse AUTO_RUN_ENABLED - pydantic-settings should handle "true"/"false" for bool
103
+ # Let me check - by default BaseSettings will coerce "false" to False. Good.
104
+
105
+ # LLM prompts - built from profile
106
+ _persona_block = get_persona_block()
107
+ _persona_name = get_persona_name()
108
+
109
+ LLM_SYSTEM_PROMPT = f"""Du bist ein charmanter, professioneller Assistent, der dabei hilft, ein WG-Zimmer ODER eine Wohnung zu finden. Deine Aufgabe ist es, basierend auf einer Wohnungsanzeige ein kurzes, sympathisches und persönliches Anschreiben auf Deutsch zu verfassen. Das Anschreiben passt sich dem Anzeigentyp an (WG-Zimmer vs. Wohnung).
110
+
111
+ {_persona_block}"""
112
+
113
+ # ---------------------------------------------------------------------------
114
+ # AD-TYPE INSTRUCTIONS
115
+ # ---------------------------------------------------------------------------
116
+
117
+ LLM_AD_TYPE_INSTRUCTIONS_WG = """\
118
+ ANZEIGENTYP: WG-Zimmer
119
+ - Ton: locker, herzlich, "passen wir zusammen?"-Gefühl.
120
+ - Fokus: gemeinsames WG-Leben, Interessen der Mitbewohner, Atmosphäre.
121
+ - Anrede: "Hallo [Name]," oder "Hi [Name]," – nie formell."""
122
+
123
+ LLM_AD_TYPE_INSTRUCTIONS_WOHNUNG = """\
124
+ ANZEIGENTYP: Wohnung (privat oder professionell)
125
+ - Ton: freundlich-professionell, seriös, verbindlich.
126
+ - Fokus: stabiles Einkommen, Zuverlässigkeit, Einzugstermin, vollständige Unterlagen.
127
+ - Anrede: "Hallo [Name]," oder "Sehr geehrte/r [Name]," – je nach Tonalität der Anzeige."""
128
+
129
+ # ---------------------------------------------------------------------------
130
+ # MESSAGE PROMPT
131
+ # ---------------------------------------------------------------------------
132
+
133
+ LLM_MESSAGE_PROMPT_TEMPLATE = """\
134
+ ANZEIGENDATEN (Typ: {ad_type_label}):
135
+
136
+ Titel: {title}
137
+ Adresse: {address}
138
+ Anbieter: {publisher_name}
139
+ Beschreibung:
140
+ \"\"\"
141
+ {description}
142
+ \"\"\"
143
+
144
+ ANZEIGENTYP-ANWEISUNGEN:
145
+ {ad_type_instructions}
146
+
147
+ ANREDE-REGELN (Priorität 1 → 3):
148
+ 1. Name unter "Anbieter" vorhanden → nutze ihn: "Hallo Marco," / "Hi Lisa,"
149
+ 2. Namen im Beschreibungstext ("Wir sind Jonas und Lisa") → nutze den ersten Namen.
150
+ 3. Kein Name bekannt → WG: "Hallo liebe WG," | Wohnung: "Hallo,"
151
+
152
+ AUFGABE:
153
+ Schreibe das Anschreiben für {persona_name} nach dieser Struktur:
154
+
155
+ 1. Persönliche Anrede (siehe Anrede-Regeln – niemals generisch wenn ein Name bekannt ist).
156
+ 2. Kurzer Icebreaker: ein konkretes Detail aus der Anzeige aufgreifen.
157
+ 3. Kurzvorstellung: Beruf, Herkunft, Einzugstermin.
158
+ 4. Call to Action: Verweis auf den Google Drive Link ({google_drive_link}) für alle Unterlagen + Freude auf Besichtigung.
159
+
160
+ STIL: persönlich, lebendig, kurz – kein Werbejargon, kein Fülltext.
161
+ LÄNGE: unter 150 Wörter.
162
+ FORMAT: Fließtext, kein Markdown, keine Signatur, kein Betreff.
163
+
164
+ WICHTIG: Antworte AUSSCHLIESSLICH mit dem fertigen Nachrichtentext. Keine Erklärungen, keine Gedanken, keine Meta-Kommentare (z.B. "Kurzfassung meiner Überlegungen"), keine Anleitung – nur die Nachricht selbst.\
165
+ """