learn-lock 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,9 @@
1
+ # learn-lock environment variables
2
+ # Copy to .env and fill in your API keys
3
+ # Both are FREE and take 2 minutes to set up
4
+
5
+ # Groq (for extraction): https://console.groq.com
6
+ GROQ_API_KEY=your_groq_api_key
7
+
8
+ # Gemini (for evaluation): https://aistudio.google.com/apikey
9
+ GEMINI_API_KEY=your_gemini_api_key
@@ -0,0 +1,38 @@
1
+ # Byte-compiled
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+
6
+ # Virtual environments
7
+ .venv/
8
+ venv/
9
+ ENV/
10
+
11
+ # Environment variables
12
+ .env
13
+
14
+ # IDE
15
+ .idea/
16
+ .vscode/
17
+ *.swp
18
+ *.swo
19
+
20
+ # Database
21
+ *.db
22
+ *.sqlite
23
+
24
+ # Distribution
25
+ dist/
26
+ build/
27
+ *.egg-info/
28
+
29
+ # Testing
30
+ .pytest_cache/
31
+ .coverage
32
+ htmlcov/
33
+
34
+ # OS
35
+ .DS_Store
36
+ Thumbs.db
37
+
38
+ .ruff_cache/
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Mitudru Dutta
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,494 @@
1
+ Metadata-Version: 2.4
2
+ Name: learn-lock
3
+ Version: 0.1.0
4
+ Summary: The app that argues with you. Adversarial Socratic learning with spaced repetition.
5
+ Project-URL: Homepage, https://github.com/MitudruDutta/learnlock
6
+ Project-URL: Repository, https://github.com/MitudruDutta/learnlock
7
+ Project-URL: Issues, https://github.com/MitudruDutta/learnlock/issues
8
+ Author-email: Mitudru Dutta <mitudrudutta72@gmail.com>
9
+ License: MIT
10
+ License-File: LICENSE
11
+ Keywords: ai,cli,education,flashcards,learning,socratic,spaced-repetition
12
+ Classifier: Development Status :: 4 - Beta
13
+ Classifier: Environment :: Console
14
+ Classifier: Intended Audience :: Developers
15
+ Classifier: Intended Audience :: Education
16
+ Classifier: License :: OSI Approved :: MIT License
17
+ Classifier: Operating System :: OS Independent
18
+ Classifier: Programming Language :: Python :: 3
19
+ Classifier: Programming Language :: Python :: 3.11
20
+ Classifier: Programming Language :: Python :: 3.12
21
+ Classifier: Programming Language :: Python :: 3.13
22
+ Classifier: Topic :: Education
23
+ Requires-Python: >=3.11
24
+ Requires-Dist: google-generativeai>=0.3.0
25
+ Requires-Dist: groq>=0.4.0
26
+ Requires-Dist: litellm>=1.0.0
27
+ Requires-Dist: pillow>=10.0.0
28
+ Requires-Dist: pymupdf>=1.23.0
29
+ Requires-Dist: python-dotenv>=1.0.0
30
+ Requires-Dist: requests>=2.28.0
31
+ Requires-Dist: rich>=13.0.0
32
+ Requires-Dist: trafilatura>=1.6.0
33
+ Requires-Dist: typer>=0.9.0
34
+ Requires-Dist: youtube-transcript-api>=0.6.0
35
+ Requires-Dist: yt-dlp>=2023.0.0
36
+ Provides-Extra: dev
37
+ Requires-Dist: build; extra == 'dev'
38
+ Requires-Dist: pytest; extra == 'dev'
39
+ Requires-Dist: ruff; extra == 'dev'
40
+ Requires-Dist: twine; extra == 'dev'
41
+ Provides-Extra: ocr
42
+ Requires-Dist: easyocr; extra == 'ocr'
43
+ Requires-Dist: pillow; extra == 'ocr'
44
+ Provides-Extra: whisper
45
+ Requires-Dist: groq; extra == 'whisper'
46
+ Requires-Dist: yt-dlp; extra == 'whisper'
47
+ Description-Content-Type: text/markdown
48
+
49
+ ```
50
+ ██╗ ███████╗ █████╗ ██████╗ ███╗ ██╗██╗ ██████╗ ██████╗██╗ ██╗
51
+ ██║ ██╔════╝██╔══██╗██╔══██╗████╗ ██║██║ ██╔═══██╗██╔════╝██║ ██╔╝
52
+ ██║ █████╗ ███████║██████╔╝██╔██╗ ██║██║ ██║ ██║██║ █████╔╝
53
+ ██║ ██╔══╝ ██╔══██║██╔══██╗██║╚██╗██║██║ ██║ ██║██║ ██╔═██╗
54
+ ███████╗███████╗██║ ██║██║ ██║██║ ╚████║███████╗╚██████╔╝╚██████╗██║ ██╗
55
+ ╚══════╝╚══════╝╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚═══╝╚══════╝ ╚═════╝ ╚═════╝╚═╝ ╚═╝
56
+ ```
57
+
58
+ > **The app that argues with you.**
59
+
60
+ LearnLock is a CLI learning tool that uses adversarial Socratic dialogue to expose gaps in your understanding. It doesn't quiz you — it _interrogates_ you.
61
+
62
+ ---
63
+
64
+ ## Table of Contents
65
+
66
+ - [Installation](#installation)
67
+ - [Quick Start](#quick-start)
68
+ - [How It Works](#how-it-works)
69
+ - [Architecture](#architecture)
70
+ - [The Duel Engine](#the-duel-engine)
71
+ - [Claim Pipeline](#claim-pipeline)
72
+ - [Module Reference](#module-reference)
73
+ - [Database Schema](#database-schema)
74
+ - [Configuration](#configuration)
75
+ - [CLI Commands](#cli-commands)
76
+ - [Known Limitations](#known-limitations)
77
+ - [Development](#development)
78
+ - [License](#license)
79
+
80
+ ---
81
+
82
+ ## Installation
83
+
84
+ ### From PyPI
85
+
86
+ Install using pip. Requires Python 3.11 or higher.
87
+
88
+ ### From Source
89
+
90
+ Clone the repository and install in editable mode.
91
+
92
+ ### Optional Dependencies
93
+
94
+ - `learnlock[ocr]` — EasyOCR for handwritten answer support
95
+ - `learnlock[whisper]` — Whisper fallback for YouTube videos without transcripts
96
+
97
+ ---
98
+
99
+ ## Quick Start
100
+
101
+ 1. Set your API keys as environment variables:
102
+ - `GROQ_API_KEY` (required) — Get free at console.groq.com
103
+ - `GEMINI_API_KEY` (recommended) — Get free at aistudio.google.com
104
+
105
+ 2. Launch the CLI by running `learnlock`
106
+
107
+ 3. Add content by pasting a YouTube URL, article link, PDF path, or GitHub repo
108
+
109
+ 4. Start studying with `/study`
110
+
111
+ 5. Double Enter to send your answer
112
+
113
+ ---
114
+
115
+ ## How It Works
116
+
117
+ 1. You explain a concept in your own words
118
+ 2. The engine infers what you believe
119
+ 3. It compares your belief against ground truth claims
120
+ 4. It finds contradictions and attacks the weakest point
121
+ 5. After 3 turns (or success), it reveals your belief trajectory
122
+ 6. Your score feeds into SM-2 spaced repetition scheduling
123
+
124
+ ---
125
+
126
+ ## Architecture
127
+
128
+ ```
129
+ Tools (youtube, article, pdf, github)
130
+
131
+
132
+ LLM ──▶ extract concepts & claims
133
+
134
+
135
+ Duel Engine ──▶ belief modeling, contradiction detection, interrogation
136
+
137
+ ├──▶ Scheduler (SM-2) ──▶ Storage (SQLite)
138
+
139
+ └──▶ HUD ──▶ CLI (claims, belief, attack, reveal)
140
+ ```
141
+
142
+ ### Data Flow
143
+
144
+ ```
145
+ Source (YouTube/PDF/Article/GitHub)
146
+
147
+
148
+ Content Extraction (tools/)
149
+
150
+
151
+ Concept Extraction (llm.py) ──▶ 8-12 concepts with claims
152
+
153
+
154
+ Storage (storage.py) ──▶ SQLite: sources, concepts, progress, duel_memory
155
+
156
+
157
+ Scheduler (scheduler.py) ──▶ SM-2 spaced repetition
158
+
159
+
160
+ Duel Engine (duel.py) ──▶ Adversarial Socratic interrogation
161
+
162
+
163
+ HUD (hud.py) ──▶ Live visualization of engine state
164
+ ```
165
+
166
+ ---
167
+
168
+ ## The Duel Engine
169
+
170
+ The cognitive core of LearnLock. Located in `duel.py`.
171
+
172
+ ### Philosophy
173
+
174
+ Traditional learning apps ask: "Do you know X?"
175
+
176
+ LearnLock asks: "What do you _believe_ about X, and where is it wrong?"
177
+
178
+ ### Pipeline
179
+
180
+ 1. **Belief Modeling** — Infers what the user thinks from their response
181
+ 2. **Contradiction Detection** — Compares belief against claims, finds violations
182
+ 3. **Interrogation** — Generates attack question targeting highest-severity error
183
+ 4. **Snapshot** — Records belief state for trajectory tracking
184
+
185
+ ### Behaviors
186
+
187
+ - Vague answers trigger mechanism probes
188
+ - Wrong answers trigger claim-specific attacks
189
+ - "I don't know" triggers guiding questions (not punishment)
190
+ - Correct answers pass after verification
191
+ - 3 turns exhausted triggers reveal with full trajectory
192
+
193
+ ### Graded Harshness
194
+
195
+ - Turn 1: Forgiving — only clear violations flagged
196
+ - Turn 2: Moderate — violations plus omissions
197
+ - Turn 3: Strict — all violations surfaced
198
+
199
+ ### Error Types
200
+
201
+ - `wrong_mechanism` — Incorrect explanation of how something works
202
+ - `missing_mechanism` — Omitted critical mechanism
203
+ - `boundary_error` — Wrong about limitations or scope
204
+ - `conflation` — Confused two distinct concepts
205
+ - `superficial` — Surface-level understanding without depth
206
+
207
+ ---
208
+
209
+ ## Claim Pipeline
210
+
211
+ Claims are the epistemic foundation. The duel is only as fair as the claims.
212
+
213
+ ### Three-Pass Verification
214
+
215
+ **Pass 1: Generation** — LLM generates claims with explicit instructions to produce conceptual truths, not transcript parroting. Demands falsifiable statements about WHY and HOW, not just WHAT.
216
+
217
+ **Pass 2: Garbage Filter** — Pattern matching rejects stateful claims ("is running", "must remain active"), tautologies ("processes requests", "serves requests"), and vague claims ("is useful", "is important").
218
+
219
+ **Pass 3: Sharpness Filter** — Rejects blurry truths that are technically correct but unfalsifiable ("handles security", "manages data", "deals with").
220
+
221
+ ### Claim Types
222
+
223
+ - `definition` — What the concept is
224
+ - `mechanism` — How it works internally
225
+ - `requirement` — What it needs to function
226
+ - `boundary` — What it cannot do or where it fails
227
+
228
+ ### Good vs Bad Claims
229
+
230
+ Bad claims get rejected:
231
+
232
+ - "The server processes requests" (tautology)
233
+ - "It handles security" (blurry)
234
+ - "Must be running to work" (stateful)
235
+
236
+ Good claims survive:
237
+
238
+ - "Validates request payloads against a JSON schema"
239
+ - "Enforces authentication via JWT token verification"
240
+ - "Uses Python type hints for automatic request validation"
241
+
242
+ ---
243
+
244
+ ## Module Reference
245
+
246
+ ### duel.py — The Engine
247
+
248
+ Core dataclasses: `Claim`, `BeliefError`, `BeliefSnapshot`, `BeliefState`
249
+
250
+ Main class `DuelEngine` provides:
251
+
252
+ - `process(user_input)` — Process response, return attack or reveal
253
+ - `get_reveal()` — Get final state with claims, errors, trajectory
254
+ - `get_claims()` — Get parsed claims
255
+ - `finished` — Boolean indicating duel completion
256
+
257
+ Helper functions:
258
+
259
+ - `create_duel()` — Factory for DuelEngine
260
+ - `belief_to_score()` — Convert final state to 1-5 score
261
+ - `export_duel_data()` — Export for research/training
262
+ - `save_duel_data()` — Persist to disk
263
+
264
+ ### hud.py — Visualization
265
+
266
+ - `set_gentle_mode()` — Toggle between brutal and gentle UI
267
+ - `render_duel_state()` — Render claims panel, belief panel, attack target
268
+ - `render_attack()` — Render interrogation panel with question
269
+ - `render_reveal()` — Render final verdict with trajectory and claim satisfaction
270
+
271
+ ### cli.py — Interface
272
+
273
+ Entry point `main()` launches the REPL.
274
+
275
+ Key commands routed through `handle_input()`:
276
+
277
+ - `cmd_study()` — Main duel session loop
278
+ - `cmd_add()` — Add content from URL
279
+ - `cmd_stats()` — Display progress statistics
280
+ - `cmd_list()` — List all concepts
281
+ - `cmd_due()` — Show due concepts
282
+
283
+ ### storage.py — Persistence
284
+
285
+ SQLite database with tables for sources, concepts, explanations, progress, and duel_memory.
286
+
287
+ Key functions:
288
+
289
+ - `add_source()` / `get_source()` — Source CRUD
290
+ - `add_concept()` / `get_concept()` — Concept CRUD
291
+ - `get_due_concepts()` — Query due items
292
+ - `save_duel_memory()` / `get_duel_memory()` — Persist last duel state per concept
293
+ - `update_progress()` — Update SM-2 scheduling data
294
+
295
+ ### scheduler.py — SM-2 Spaced Repetition
296
+
297
+ Implements SM-2 algorithm for scheduling reviews.
298
+
299
+ - `update_after_review()` — Update ease factor and interval after scoring
300
+ - `get_next_due()` — Get single next due concept
301
+ - `get_all_due()` — Get all due concepts
302
+ - `get_study_summary()` — Aggregate statistics
303
+
304
+ ### llm.py — LLM Interface
305
+
306
+ Dual-provider setup: Groq for extraction, Gemini for evaluation.
307
+
308
+ - `extract_concepts()` — Extract concepts with claims from content
309
+ - `evaluate_explanation()` — Score user explanation (legacy, replaced by duel)
310
+ - `generate_title()` — Generate topic-based title from content
311
+
312
+ ### tools/ — Content Extraction
313
+
314
+ **youtube.py**
315
+
316
+ - `extract_youtube()` — Get transcript with timestamps
317
+ - `find_timestamp_for_text()` — Find timestamp for concept
318
+ - `extract_frame_at_timestamp()` — Extract and describe frame with Gemini Vision
319
+ - Whisper fallback for videos without transcripts
320
+
321
+ **article.py**
322
+
323
+ - `extract_article()` — Extract text from web articles using trafilatura
324
+
325
+ **pdf.py**
326
+
327
+ - `extract_pdf()` — Extract text from local or remote PDFs using pymupdf
328
+
329
+ **github.py**
330
+
331
+ - `extract_github()` — Extract README from GitHub repositories
332
+
333
+ ### ocr.py — Image Input
334
+
335
+ - `extract_text_from_image()` — OCR using EasyOCR or Tesseract
336
+ - `check_relevance()` — Verify extracted text relates to concept
337
+
338
+ ---
339
+
340
+ ## Database Schema
341
+
342
+ ### sources
343
+
344
+ Stores raw content from URLs. Fields: id, url, title, source_type, raw_content, segments (JSON for YouTube timestamps), created_at
345
+
346
+ ### concepts
347
+
348
+ Stores extracted concepts. Fields: id, source_id, name, source_quote (ground truth), question, skipped, created_at
349
+
350
+ ### explanations
351
+
352
+ Stores user responses and scores. Fields: id, concept_id, text, score, covered, missed, feedback, created_at
353
+
354
+ ### progress
355
+
356
+ SM-2 scheduling data. Fields: id, concept_id, ease_factor, interval_days, due_date, review_count, last_score
357
+
358
+ ### duel_memory
359
+
360
+ Persists last duel state for returning users. Fields: id, concept_id, last_belief, last_errors, last_attack, updated_at
361
+
362
+ ---
363
+
364
+ ## Configuration
365
+
366
+ All settings configurable via environment variables.
367
+
368
+ ### Paths
369
+
370
+ - `LEARNLOCK_DATA_DIR` — Data directory (default: ~/.learnlock)
371
+
372
+ ### Models
373
+
374
+ - `LEARNLOCK_GROQ_MODEL` — Groq model for extraction
375
+ - `LEARNLOCK_GEMINI_MODEL` — Gemini model for evaluation and vision
376
+
377
+ ### SM-2 Parameters
378
+
379
+ - `LEARNLOCK_SM2_INITIAL_EASE` — Starting ease factor (default: 2.5)
380
+ - `LEARNLOCK_SM2_INITIAL_INTERVAL` — Starting interval in days (default: 1.0)
381
+ - `LEARNLOCK_SM2_MIN_EASE` — Minimum ease factor (default: 1.3)
382
+ - `LEARNLOCK_SM2_MAX_INTERVAL` — Maximum interval in days (default: 180)
383
+
384
+ ### Extraction
385
+
386
+ - `LEARNLOCK_MIN_CONCEPTS` — Minimum concepts per source (default: 8)
387
+ - `LEARNLOCK_MAX_CONCEPTS` — Maximum concepts per source (default: 12)
388
+ - `LEARNLOCK_CONTENT_MAX_CHARS` — Max content length for processing (default: 8000)
389
+
390
+ ---
391
+
392
+ ## CLI Commands
393
+
394
+ | Command | Description |
395
+ | ---------------- | ------------------------------------ |
396
+ | `/add <url>` | Add YouTube, article, PDF, or GitHub |
397
+ | `/study` | Start duel session |
398
+ | `/stats` | View progress statistics |
399
+ | `/list` | List all concepts |
400
+ | `/due` | Show concepts due for review |
401
+ | `/skip <name>` | Skip a concept |
402
+ | `/unskip <name>` | Restore skipped concept |
403
+ | `/config` | Show current configuration |
404
+ | `/help` | Show help |
405
+ | `/quit` | Exit |
406
+
407
+ ### Flags
408
+
409
+ - `--gentle` or `-g` — Gentle UI mode (minimal, supportive feedback)
410
+ - `--version` or `-v` — Show version
411
+
412
+ ---
413
+
414
+ ## Known Limitations
415
+
416
+ ### 1. Claim Quality (Epistemic Risk)
417
+
418
+ Claims are LLM-generated. Despite three-pass filtering, semantic drift can occur. A source saying "enforces authentication" might become "handles security" — technically related but unfalsifiable.
419
+
420
+ Mitigation: Pattern filters and sharpness checks reduce but don't eliminate this risk.
421
+
422
+ ### 2. Hallucinated Errors (Moral Risk)
423
+
424
+ The contradiction detector can invent violations. A correct answer might be flagged as "missing_mechanism" due to LLM drift, causing unfair attacks.
425
+
426
+ Mitigation: Graded harshness (forgiving on turn 1), claim-index validation (errors must reference real claims). Still possible.
427
+
428
+ ### 3. UI Density
429
+
430
+ The HUD displays claims, belief, attack target, and interrogation panel simultaneously. Powerful for power users, overwhelming for beginners.
431
+
432
+ Mitigation: `--gentle` flag provides minimal UI with supportive framing.
433
+
434
+ ### 4. No Confidence Signals
435
+
436
+ Errors are binary. The engine cannot express "I might be wrong here."
437
+
438
+ Future: Multi-pass agreement, confidence scores, human-in-the-loop for high-stakes content.
439
+
440
+ ---
441
+
442
+ ## Development
443
+
444
+ ### Setup
445
+
446
+ Clone the repo and install with dev dependencies using pip editable mode with the `[dev]` extra.
447
+
448
+ ### Testing
449
+
450
+ Run pytest from the project root.
451
+
452
+ ### Linting
453
+
454
+ Run ruff check on the src directory.
455
+
456
+ ### Building
457
+
458
+ Use python -m build to create distribution packages.
459
+
460
+ ### File Structure
461
+
462
+ ```
463
+ src/learnlock/
464
+ ├── __init__.py
465
+ ├── cli.py # CLI interface and command routing
466
+ ├── config.py # Environment-based configuration
467
+ ├── duel.py # Duel Engine (core logic)
468
+ ├── hud.py # Rich-based visualization
469
+ ├── llm.py # LLM interface (Groq/Gemini)
470
+ ├── ocr.py # Image text extraction
471
+ ├── scheduler.py # SM-2 spaced repetition
472
+ ├── storage.py # SQLite persistence
473
+ └── tools/
474
+ ├── __init__.py
475
+ ├── youtube.py # YouTube extraction with timestamps
476
+ ├── article.py # Web article extraction
477
+ ├── pdf.py # PDF extraction
478
+ └── github.py # GitHub README extraction
479
+ ```
480
+
481
+ ---
482
+
483
+ ## License
484
+
485
+ MIT
486
+
487
+ ---
488
+
489
+ <p align="center">
490
+ <b>Stop consuming. Start retaining.</b>
491
+ <br><br>
492
+ LearnLock doesn't teach you.<br>
493
+ It finds out what you don't know.
494
+ </p>