knowledgelm 4.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (30) hide show
  1. knowledgelm-4.0.0/.agent/README.md +16 -0
  2. knowledgelm-4.0.0/.agent/skills/knowledgelm-nse/SKILL.md +131 -0
  3. knowledgelm-4.0.0/.agent/skills/knowledgelm-nse/references/notebooklm_audio_prompt.md +103 -0
  4. knowledgelm-4.0.0/.context/ARCHITECTURE.md +157 -0
  5. knowledgelm-4.0.0/.context/CHANGELOG.md +99 -0
  6. knowledgelm-4.0.0/.context/CONVENTIONS.md +74 -0
  7. knowledgelm-4.0.0/.context/DESIGN.md +75 -0
  8. knowledgelm-4.0.0/.context/OVERVIEW.md +49 -0
  9. knowledgelm-4.0.0/.context/README.md +22 -0
  10. knowledgelm-4.0.0/.coverage +0 -0
  11. knowledgelm-4.0.0/.gitignore +40 -0
  12. knowledgelm-4.0.0/LICENSE +658 -0
  13. knowledgelm-4.0.0/PKG-INFO +115 -0
  14. knowledgelm-4.0.0/PUBLISHING.md +98 -0
  15. knowledgelm-4.0.0/README.md +80 -0
  16. knowledgelm-4.0.0/pyproject.toml +86 -0
  17. knowledgelm-4.0.0/src/knowledgelm/__init__.py +5 -0
  18. knowledgelm-4.0.0/src/knowledgelm/app.py +317 -0
  19. knowledgelm-4.0.0/src/knowledgelm/cli.py +269 -0
  20. knowledgelm-4.0.0/src/knowledgelm/config.py +53 -0
  21. knowledgelm-4.0.0/src/knowledgelm/core/__init__.py +0 -0
  22. knowledgelm-4.0.0/src/knowledgelm/core/service.py +211 -0
  23. knowledgelm-4.0.0/src/knowledgelm/data/__init__.py +0 -0
  24. knowledgelm-4.0.0/src/knowledgelm/data/nse_adapter.py +81 -0
  25. knowledgelm-4.0.0/src/knowledgelm/data/screener_adapter.py +256 -0
  26. knowledgelm-4.0.0/src/knowledgelm/utils/__init__.py +0 -0
  27. knowledgelm-4.0.0/src/knowledgelm/utils/file_utils.py +47 -0
  28. knowledgelm-4.0.0/src/knowledgelm/utils/log_utils.py +41 -0
  29. knowledgelm-4.0.0/tests/test_placeholder.py +6 -0
  30. knowledgelm-4.0.0/uv.lock +1560 -0
@@ -0,0 +1,16 @@
1
+ # Agent Resources
2
+
3
+ This directory contains resources for AI agents working with KnowledgeLM.
4
+
5
+ ## Structure
6
+
7
+ - **skills/** - Claude Code skill definitions for KnowledgeLM functionality
8
+ - `knowledgelm-nse/` - Main skill for downloading NSE company filings
9
+ - `SKILL.md` - Skill definition
10
+ - `references/` - Reference documentation loaded as needed
11
+ - `notebooklm_audio_prompt.md` - Template for generating fundamental analysis audio overviews
12
+ - _(Future: scripts/, assets/ for additional bundled resources)_
13
+
14
+ ## Future Extensions
15
+
16
+ This directory is designed to support additional agent protocol resources as the project evolves.
@@ -0,0 +1,131 @@
1
+ ---
2
+ name: knowledgelm-nse
3
+ description: >
4
+ Batch download Indian company filings (transcripts, investor presentations,
5
+ credit ratings, annual reports) from NSE and optionally add to NotebookLM.
6
+ Use when user asks to: (1) Download investor materials for Indian publicly
7
+ listed companies, (2) Research Indian stocks/companies, (3) Create research
8
+ notebooks with company filings, or (4) Analyze NSE-listed company documents.
9
+ ---
10
+
11
+ # KnowledgeLM NSE
12
+
13
+ Batch download Indian company filings from NSE and optionally integrate with NotebookLM.
14
+
15
+ ## Installation
16
+
17
+ Check if installed: `knowledgelm --version`
18
+
19
+ If not: `uv tool install knowledgelm`
20
+
21
+ To upgrade: `uv tool upgrade knowledgelm`
22
+
23
+ ## Skill Upgrade
24
+
25
+ To upgrade this skill to the latest version:
26
+
27
+ 1. Download the latest skill directory from GitHub:
28
+ ```
29
+ https://github.com/eggmasonvalue/KnowledgeLM/tree/main/.agent/skills/knowledgelm-nse
30
+ ```
31
+
32
+ 2. Download all files in the `knowledgelm-nse/` directory:
33
+ - `SKILL.md`
34
+ - `references/notebooklm_audio_prompt.md`
35
+ - (Any future bundled resources)
36
+
37
+ 3. Replace the existing skill directory in your skills directory with the downloaded version.
38
+
39
+ **Note:** The skill directory location depends on your AI agent vendor (e.g., `~/.claude/skills/knowledgelm-nse/`).
40
+
41
+ ## Command Discovery
42
+
43
+ **Use `--help` extensively to discover current options and flags.**
44
+
45
+ ```bash
46
+ knowledgelm --help
47
+ knowledgelm download --help
48
+ knowledgelm list-files --help
49
+ ```
50
+
51
+ ## Core Workflow
52
+
53
+ ### 1. Gather Required Information
54
+
55
+ **NSE Symbol:** If not provided, use `web_search` to find it.
56
+
57
+ **Date Range:** If not provided, ask for clarification. Accept various formats:
58
+ - Explicit: `"2023-01-01 to 2025-01-26"`, `"2023 to 2025"`, `"from 2023"`
59
+ - Relative: `"last 2 years"`
60
+ - Milestones: `"Since IPO"`, `"since <event>"` (use `web_search` to resolve dates)
61
+
62
+ Convert to `YYYY-MM-DD` for CLI.
63
+
64
+ **Categories:** Default to all categories if not specified. Use `--annual-reports-all` by default.
65
+
66
+ ### 2. Download Filings
67
+
68
+ Use `knowledgelm download` with appropriate flags. Files save to `./{SYMBOL}_knowledgeLM/`.
69
+
70
+ ### 3. List Files (if needed)
71
+
72
+ Use `knowledgelm list-files` with `--json` flag to get file paths (excludes `.pkl` cookies).
73
+
74
+ ## NotebookLM Integration
75
+
76
+ If user wants to create a NotebookLM notebook:
77
+
78
+ ### 1. Ensure Latest Package Version
79
+
80
+ Check if installed and upgrade to latest:
81
+
82
+ ```bash
83
+ notebooklm --version
84
+ ```
85
+
86
+ If not installed:
87
+ ```bash
88
+ uv tool install notebooklm-py
89
+ ```
90
+
91
+ If installed, upgrade to latest:
92
+ ```bash
93
+ uv tool upgrade notebooklm-py
94
+ ```
95
+
96
+ **Browser extras (for first-time setup):** If user hasn't authenticated with NotebookLM before, they need browser login support:
97
+
98
+ ```bash
99
+ uv tool install --reinstall "notebooklm-py[browser]"
100
+ playwright install chromium
101
+ ```
102
+
103
+ ### 2. Update Skill to Latest Version
104
+
105
+ **Always run** `notebooklm skill install` to ensure skill is current:
106
+
107
+ ```bash
108
+ notebooklm skill install
109
+ ```
110
+
111
+ This installs/updates to the default directory (typically `~/.claude/skills/notebooklm/`).
112
+
113
+ **Important:** Do NOT delete this directory - it's used for version tracking.
114
+
115
+ ### 3. Copy to Your Skills Directory (if different)
116
+
117
+ If your AI agent uses a different skills directory, copy the installed skill there. The install directory path is shown in the `skill install` output.
118
+
119
+ ### 4. Create Notebook
120
+
121
+ Use the notebooklm skill to create notebook and add downloaded files as sources (exclude `.pkl` files).
122
+
123
+ **Optional - Audio Overview Generation:**
124
+
125
+ For generating audio overviews focused on fundamental analysis, use the prompt template at `references/notebooklm_audio_prompt.md` as a system prompt. This provides structured guidance for creating investor-focused audio summaries.
126
+
127
+ ## Exception Handling
128
+
129
+ - **Invalid symbol:** CLI returns `"success": false` in JSON
130
+ - **Network issues:** Retry once after 5 seconds
131
+ - **Incomplete data:** May indicate newly listed company on the NSE mainboard or corporate action. Use `web_search` to verify.
@@ -0,0 +1,103 @@
1
+ # NotebookLM Audio Overview Prompt Template
2
+
3
+ Use this as a system prompt when generating audio overviews for fundamental analysis of Indian publicly listed companies.
4
+
5
+ ## Core Instructions
6
+
7
+ You are creating an audio overview for investors conducting fundamental analysis of an Indian publicly listed company. Your goal is to synthesize insights from earnings call transcripts, investor presentations, annual reports, credit rating reports and other investor material into a coherent, actionable narrative.
8
+
9
+ ## Structure
10
+
11
+ ### 1. Company Overview
12
+ - Business model and core operations
13
+ - Market position and competitive advantages
14
+ - Key products/services and revenue streams
15
+ - Geographic presence and market share
16
+
17
+ ### 2. Financial Performance Analysis
18
+ - Revenue growth trends and drivers
19
+ - Profitability metrics (margins, ROE, ROCE)
20
+ - Cash flow generation and quality of earnings
21
+ - Balance sheet strength (debt levels, working capital)
22
+ - Key financial ratios and their trends
23
+
24
+ ### 3. Strategic Initiatives & Growth Drivers
25
+ - Management's stated strategic priorities
26
+ - Capital allocation plans (capex, acquisitions, dividends)
27
+ - New product launches or market expansions
28
+ - Digital transformation or operational improvements
29
+ - Sustainability and ESG initiatives
30
+
31
+ ### 4. Risk Assessment
32
+ - Industry-specific risks and headwinds
33
+ - Regulatory or policy risks
34
+ - Competitive threats
35
+ - Operational or execution risks
36
+ - Credit rating insights and concerns
37
+ - Management's risk mitigation strategies
38
+
39
+ ### 5. Management Quality & Governance
40
+ - Track record of execution against guidance
41
+ - Capital allocation discipline
42
+ - Transparency in communication
43
+ - Corporate governance practices
44
+ - Related party transactions (if material)
45
+
46
+ ### 6. Valuation Context
47
+ - Current valuation multiples vs historical averages
48
+ - Peer comparison (if data available)
49
+ - Management's commentary on valuation
50
+ - Return ratios relative to cost of capital
51
+
52
+ ### 7. Investment Thesis Summary
53
+ - Bull case: Key reasons to be optimistic
54
+ - Bear case: Key concerns and red flags
55
+ - Critical questions for further research
56
+
57
+ ## Tone & Style
58
+
59
+ - **Conversational but analytical**: Explain complex financial concepts clearly
60
+ - **Balanced perspective**: Present both positives and concerns objectively
61
+ - **Data-driven**: Reference specific numbers, percentages, and trends
62
+ - **Forward-looking**: Focus on future prospects, not just historical performance
63
+ - **Investor-centric**: Frame insights around investment decision-making
64
+
65
+ ## Key Principles
66
+
67
+ 1. **Synthesize, don't summarize**: Connect insights across documents to identify patterns
68
+ 2. **Highlight changes**: Emphasize quarter-over-quarter or year-over-year changes in strategy, guidance, or performance
69
+ 3. **Flag inconsistencies**: Note any contradictions between management commentary and actual results
70
+ 4. **Context matters**: Compare metrics to industry benchmarks and company's own historical performance
71
+ 5. **Management credibility**: Assess whether management delivers on commitments
72
+ 6. **Quality over quantity**: Focus on material insights, not exhaustive detail
73
+
74
+ ## Red Flags to Watch For
75
+
76
+ - Deteriorating working capital or cash conversion
77
+ - Frequent changes in accounting policies
78
+ - Aggressive revenue recognition practices
79
+ - Rising debt without corresponding EBITDA growth
80
+ - Declining margins despite revenue growth
81
+ - Management guidance misses becoming a pattern
82
+ - High related party transactions
83
+ - Frequent management turnover
84
+ - Credit rating downgrades or negative outlooks
85
+
86
+ ## Green Flags to Highlight
87
+
88
+ - Consistent execution against stated strategy
89
+ - Improving return ratios (ROE, ROCE)
90
+ - Strong free cash flow generation
91
+ - Market share gains in core segments
92
+ - Successful new product/service launches
93
+ - Debt reduction or deleveraging
94
+ - Credit rating upgrades or stable outlooks
95
+ - Transparent and proactive communication
96
+ - Prudent capital allocation
97
+
98
+ ## Customization Notes
99
+
100
+ When using this template:
101
+ - Emphasize sections most relevant to the specific company/industry
102
+ - Incorporate sector-specific metrics (e.g., AUM for AMCs, NIM for banks, same-store sales for retail)
103
+ - Reference specific management quotes when particularly insightful or concerning
@@ -0,0 +1,157 @@
1
+ # Architecture
2
+
3
+ ## Module Structure
4
+
5
+ ```mermaid
6
+ graph TD
7
+ subgraph Entry["Entry Points"]
8
+ CLI[src/knowledgelm/cli.py<br/>Click CLI]
9
+ APP[src/knowledgelm/app.py<br/>Streamlit UI]
10
+ end
11
+
12
+ subgraph Logic["Core Logic"]
13
+ SRV[src/knowledgelm/core/service.py]
14
+ end
15
+
16
+ subgraph Data["Data Layer"]
17
+ NSE_ADPT[src/knowledgelm/data/nse_adapter.py]
18
+ SCR_ADPT[src/knowledgelm/data/screener_adapter.py]
19
+ end
20
+
21
+ subgraph Agent["Agent Resources"]
22
+ SKILL[.agent/skills/knowledgelm-nse/SKILL.md<br/>Agent Skill]
23
+ end
24
+
25
+ subgraph Utils["Utilities"]
26
+ CONF[src/knowledgelm/config.py]
27
+ F_UTIL[src/knowledgelm/utils/file_utils.py]
28
+ end
29
+
30
+ subgraph External["External Sources"]
31
+ NSE_LIB[NSE API<br/>nse library]
32
+ SCR_WEB[screener.in<br/>Credit Ratings]
33
+ NLMPY[notebooklm-py<br/>Optional Integration]
34
+ end
35
+
36
+ CLI --> SRV
37
+ APP --> SRV
38
+ SRV --> NSE_ADPT
39
+ SRV --> SCR_ADPT
40
+ SRV --> F_UTIL
41
+ NSE_ADPT --> NSE_LIB
42
+ SCR_ADPT --> SCR_WEB
43
+ CLI .-> SKILL
44
+ SKILL .-> NLMPY
45
+ APP ..-> CONF
46
+ SRV ..-> CONF
47
+ ```
48
+
49
+ ## Project Structure
50
+
51
+ ```
52
+ KnowledgeLM/
53
+ ├── src/
54
+ │ └── knowledgelm/
55
+ │ ├── __init__.py
56
+ │ ├── cli.py # Click CLI (v3.0)
57
+ │ ├── app.py # Streamlit UI
58
+ │ ├── config.py # Configuration
59
+ │ ├── core/
60
+ │ │ └── service.py # Orchestration Logic
61
+ │ ├── data/
62
+ │ │ ├── nse_adapter.py # NSE Library Wrapper
63
+ │ │ └── screener_adapter.py # Screener Scraper
64
+ │ └── utils/
65
+ │ └── file_utils.py # Sanitization & paths
66
+ ├── tests/
67
+ │ └── test_placeholder.py
68
+ ├── .agent/
69
+ │ └── skills/
70
+ │ └── knowledgelm-nse/
71
+ │ └── SKILL.md # Agent Skill (v3.0)
72
+ ├── .context/
73
+ ├── pyproject.toml # uv config
74
+ └── README.md
75
+ ```
76
+
77
+ ## Component Responsibilities
78
+
79
+ ### cli.py
80
+ - **CLI**: Click-based command interface (`download`, `list-categories`, `list-files`)
81
+ - **JSON Output**: `--json` flag for agent parsing
82
+ - **Help Discovery**: `--help` on all commands for agent self-discovery
83
+
84
+ ### app.py
85
+ - **UI**: Streamlit forms for symbol, dates, category selection.
86
+ - **Display**: Renders status and download tables.
87
+ - **Validations**: Calls `KnowledgeService`.
88
+
89
+ ### core/service.py
90
+ - **`KnowledgeService`**: Orchestrates fetching and downloading.
91
+ - **Filters**: Applies business logic (category filters) on fetched data.
92
+
93
+ ### data/
94
+ - **`nse_adapter.py`**: Wraps the external `nse` library.
95
+ - **Validation**: Checks symbol validity via `equityQuote`.
96
+ - **`screener_adapter.py`**: Handles scraping from Screener.in.
97
+ - Resolves ICRA PDF links directly.
98
+ - Uses Selenium for high-fidelity HTML-to-PDF conversion.
99
+
100
+ ### .agent/skills/knowledgelm-nse/
101
+ - **`SKILL.md`**: Agent skill following [Agent Skills](https://agentskills.io) standard.
102
+ - Instructs AI agents on CLI usage and NotebookLM integration.
103
+ - Self-upgradeable via GitHub raw URL.
104
+
105
+ ### utils/file_utils.py
106
+ - **`sanitize_folder_name`**: Prevents path traversal security issues.
107
+
108
+ ## Data Flow
109
+
110
+ ```mermaid
111
+ sequenceDiagram
112
+ participant U as User
113
+ participant A as app.py
114
+ participant S as Service
115
+ participant N as NSEAdapter
116
+ participant SC as ScreenerAdapter
117
+
118
+ U->>A: Enter symbol, dates, folder name
119
+ A->>S: process_request()
120
+ S->>S: sanitize_folder_name()
121
+
122
+ S->>N: validate_symbol()
123
+ alt Invalid Symbol
124
+ N-->>S: False
125
+ S--x A: Raise ValueError
126
+ end
127
+
128
+ S->>N: get_announcements()
129
+ N-->>S: JSON data
130
+
131
+ par Download Categories
132
+ loop Each Category
133
+ S->>N: download_document()
134
+ end
135
+ alt Credit Ratings
136
+ S->>SC: download_credit_ratings()
137
+ SC-->>S: Count
138
+ end
139
+ end
140
+
141
+ S-->>A: data, counts
142
+ A-->>U: Show status + updated tables
143
+ ```
144
+
145
+ ## Output Structure
146
+
147
+ ```
148
+ {folder_name}/
149
+ ├── transcripts/
150
+ ├── investor_presentations/
151
+ ├── credit_rating/
152
+ ├── related_party_txns/
153
+ ├── annual_reports/
154
+ ├── resignations/ (Optional: Logical grouping in UI, physical folder relies on user download)
155
+ ├── updates/ (Optional: Logical grouping in UI)
156
+ └── press_releases/ (Optional: Logical grouping in UI)
157
+ ```
@@ -0,0 +1,99 @@
1
+ # Changelog
2
+
3
+ ## [Unreleased]
4
+
5
+ ## [4.0.0] - 2026-02-08
6
+
7
+ ### Breaking Changes
8
+ - **Skill Location**: Moved agent skill from `src/knowledgelm/data/SKILL.md` to `.agent/skills/knowledgelm-nse/SKILL.md` following skill-creator best practices for directory-based structure with bundled resources
9
+
10
+ ### Features
11
+ - **Self-Upgradeable Skill**: Skill can now self-upgrade by downloading latest version from GitHub raw URL
12
+ - **Bundled Resources**: Added `references/` directory with NotebookLM audio overview prompt template for fundamental analysis
13
+ - **Enhanced NotebookLM Integration**:
14
+ - Comprehensive version management for both notebooklm-py package and skill
15
+ - Uses `uv tool upgrade` for package updates
16
+ - Always runs `notebooklm skill install` to ensure skill is current
17
+ - Handles browser extras conditionally for first-time authentication
18
+ - Vendor-agnostic approach for different AI agent skills directories
19
+ - **Package Upgrade**: Added `uv tool upgrade knowledgelm` to Installation section
20
+
21
+ ### Documentation
22
+ - **Skill Improvements**: Streamlined skill to be principle-based rather than task-like, emphasizing `--help` discovery over prescriptive commands
23
+ - **Architecture Updates**: Updated `.context/ARCHITECTURE.md` to reflect new `.agent/` directory structure with separate "Agent Resources" subgraph in mermaid diagram
24
+ - **Changelog Updates**: Updated skill path references throughout `.context/` artifacts
25
+
26
+ ### Cleanup
27
+ - Removed legacy pip artifacts (root `__pycache__`, committed `.coverage`, and `nse_cookies_requests.pkl`)
28
+ - Updated documentation and agent skill to prefer `uv` over `pip`
29
+
30
+ ## [3.0.0] - 2026-01-26
31
+
32
+ ### Features
33
+ - **CLI Interface**: New `knowledgelm` CLI with commands:
34
+ - `download SYMBOL --from DATE --to DATE`: Batch download filings
35
+ - `list-categories`: Show available filing types
36
+ - `list-files DIRECTORY --json`: List downloaded files for NotebookLM integration
37
+ - **AI Agent Skill**: Bundled `SKILL.md` at `.agent/skills/knowledgelm-nse/SKILL.md` following [Agent Skills](https://agentskills.io) open standard
38
+ - **NotebookLM Integration**: Skill includes workflow for adding downloads to NotebookLM notebooks via notebooklm-py
39
+
40
+ ### Architecture
41
+ - **CLI-First Design**: All operations accessible via CLI with `--help` discovery and `--json` output for agent parsing
42
+ - **Skill Maintenance**: SKILL.md uses `--help` discovery instead of hardcoded commands (reduces sync burden)
43
+
44
+ ### Dependencies
45
+ - Added `click>=8.1.0` for CLI
46
+
47
+ ### Documentation
48
+ - Updated README with CLI usage, agent skill installation prompt (LLM-agnostic)
49
+
50
+
51
+ ## [2.0.0] - 2026-01-26
52
+
53
+ ### Security
54
+ - **SSL Verification**: Enabled SSL verification for Screener.in requests to prevent MitM attacks.
55
+ - **Input Sanitization**: Implemented `sanitize_folder_name` to prevent path traversal vulnerabilities in download folders.
56
+
57
+ ### Architecture
58
+ - **Modular Design**: Split monolithic `filings_downloader.py` into:
59
+ - `core/service.py`: Business logic and orchestration.
60
+ - `data/nse_adapter.py`: Wrapper for NSE library.
61
+ - `data/screener_adapter.py`: Secure scraper for Screener.in.
62
+ - `utils/file_utils.py`: Shared utilities.
63
+ - `config.py`: Centralized configuration.
64
+ - **UI Decoupling**: Refactored `app.py` to delegate logic to `KnowledgeService`.
65
+ - **Validation**: Added early-exit symbol validation using `nse.equityQuote()` to prevent invalid API calls.
66
+ - **Cleanup**: Removed intermediate `{symbol}_announcements.json` dump from output folder (data is ephemeral/in-memory).
67
+
68
+ ### Features
69
+ - **Selenium Support**: Integrated `selenium` and `webdriver-manager` for headless Chrome operations to handle dynamic content.
70
+ - **Robust Scraper**: Added fallback logic to handle various content types (PDF vs HTML) from Screener.in.
71
+ - **Direct PDF Resolution**: implemented direct resolution for ICRA reports to bypass viewers.
72
+ - **High-Fidelity HTML Conversion**: Replaced `markdownify` with Selenium-based "Print to PDF" for better report quality.
73
+
74
+ ### Documentation
75
+ - **Walkthrough**: Added a browser recording demonstrating app functionalities to `README.md`.
76
+ - **Assets**: Created `assets/` directory for media files.
77
+ - **Context Artifacts**: Added `.context/` documentation artifacts (DESIGN, ARCHITECTURE, CHANGELOG).
78
+
79
+ ### Improvements
80
+ - **Logging**: Redirected library logs (NSE, Selenium) to application logger and reduced noise.
81
+ - **Project Structure**: Adopted `src/` layout.
82
+ - **Dependency Management**: Migrated to `pyproject.toml` and `uv`.
83
+ - **Code Quality**: Added `tests/` directory, configured `ruff`, and applied Google-style docstrings.
84
+
85
+ ### Fixed
86
+ - **Empty ICRA Reports**: Resolved by bypassing JS viewers.
87
+ - **Missing Dependencies**: Added `beautifulsoup4`, `markdownify`, `requests`.
88
+ - **Duplicate Imports**: Cleaned up `FilingsDownloader.py` (now refactored).
89
+
90
+ ---
91
+
92
+ ## Initial Release
93
+
94
+ ### Features
95
+ - Batch download NSE announcements by category
96
+ - Credit rating dual-source (screener.in primary, NSE fallback)
97
+ - Individual filing views (Resignations, Reg 30, Press Releases)
98
+ - Annual report download (date range or all)
99
+ - Streamlit UI with status feedback
@@ -0,0 +1,74 @@
1
+ # Code Conventions
2
+
3
+ ## Project Structure
4
+
5
+ ```
6
+ KnowledgeLM/
7
+ ├── src/
8
+ │ └── knowledgelm/
9
+ │ ├── app.py
10
+ │ ├── config.py
11
+ │ ├── core/
12
+ │ ├── data/
13
+ │ └── utils/
14
+ ├── .context/
15
+ └── pyproject.toml
16
+ ```
17
+
18
+ ## Naming
19
+
20
+ | Element | Pattern | Example |
21
+ |---------|---------|---------|
22
+ | Constants | `UPPER_SNAKE_CASE` | `CREDIT_RATING_FOLDER` |
23
+ | Functions | `snake_case` | `download_announcements()` |
24
+ | Download folders | `snake_case` | `investor_presentations/` |
25
+
26
+ ## Patterns
27
+
28
+ ### Category Configuration
29
+ Categories defined declaratively in `DOWNLOAD_CATEGORIES` dict:
30
+ ```python
31
+ "transcripts": {
32
+ "enabled_arg": "download_transcripts",
33
+ "filter": lambda item: ...,
34
+ "label": "transcript"
35
+ }
36
+ ```
37
+
38
+ ### Streamlit Session State
39
+ All persistent data stored in `st.session_state`:
40
+ ```python
41
+ if "data" not in st.session_state:
42
+ st.session_state.data = None
43
+ ```
44
+
45
+ ### Error Handling
46
+ - Use `logging` module (not `print()`)
47
+ - Catch specific exceptions (`requests.RequestException`, `ValueError`)
48
+ - Fail gracefully in UI (show error message but don't crash)
49
+
50
+ ### Logging & Output
51
+ - **No `print()`**: All output must go through `logger`.
52
+ - **Third-Party Noise**: Use `log_utils.redirect_stdout_to_logger` for libraries that print to stdout (e.g., `nse`).
53
+ - **Silence Warnings**: Explicitly suppress expected non-critical warnings (e.g., `InsecureRequestWarning` for specific legacy sites).
54
+
55
+ ## UI Components
56
+
57
+ ## Scraping & Automation
58
+
59
+ - **Browser Automation**: Use Selenium (headless Chrome) when:
60
+ - Content is dynamically rendered via JS (e.g., PDF viewers).
61
+ - High-fidelity artifact generation (PDF) is required from HTML.
62
+ - **Direct Downloads**: Prefer `requests` for static files (PDFs, CSVs) for efficiency.
63
+ - **Resilience**: Always verify file integrity (size, content type) after download.
64
+
65
+
66
+ - `st.columns()` for layout
67
+ - `st.expander()` for collapsible sections
68
+ - `st.spinner()` for loading feedback
69
+ - Checkboxes for category toggles
70
+
71
+ ## Testing
72
+
73
+ > [!CAUTION]
74
+ > No automated tests exist. Manual testing via Streamlit UI.
@@ -0,0 +1,75 @@
1
+ # Feature Design
2
+
3
+ ## [done] Core Download Categories
4
+
5
+ Batch download filings by category with configurable filters:
6
+ - Analyst Call Transcripts
7
+ - Investor Presentations
8
+ - Credit Ratings (dual-source)
9
+ - Related Party Transactions
10
+ - Annual Reports (date range or all)
11
+
12
+ ## [done] CLI Interface (v3.0)
13
+
14
+ Full programmatic access via `knowledgelm` command:
15
+ - **Download**: Batch download with automated folder creation.
16
+ - **Discovery**: `--help` on all levels for self-documenting interface.
17
+ - **JSON Output**: `--json` flag for machine readability and AI agent parsing.
18
+
19
+ ## [done] Agent-First Design (v3.0)
20
+
21
+ - **Standardized Skill**: `SKILL.md` compliant with [Agent Skills](https://agentskills.io) standard.
22
+ - **Automation workflows**: Optimized for LLM tools (Claude Code, Gemini CLI, etc.).
23
+ - **NotebookLM Synergy**: Purpose-built `list-files --json` command to facilitate source injection.
24
+
25
+ ## [done] Credit Rating Dual-Source
26
+
27
+ 1. Primary: Scrape screener.in (all-time)
28
+ - **ICRA**: Direct PDF link resolution (bypassing JS viewer).
29
+ - **HTML Reports (CRISIL, etc.)**: Selenium-based HTML-to-PDF conversion.
30
+ - **PDF Reports**: Direct download.
31
+ 2. Fallback: NSE API (date-filtered)
32
+
33
+ ## [done] Input Validation
34
+
35
+ - **Symbol Check**: Validates company symbol against NSE using `equityQuote` before attempting downloads.
36
+
37
+ ## [done] Individual Filing Views
38
+
39
+ Expandable tables with per-row downloads:
40
+ - Resignations
41
+ - Regulation 30 Updates (experimental)
42
+ - Press Releases
43
+
44
+ ## [done] Session State Management
45
+
46
+ Persist data across Streamlit reruns:
47
+ - Fetched announcements
48
+ - Category counts
49
+ - Status messages
50
+
51
+ - DataFrames for view tables
52
+
53
+ ## [done] Modular Architecture
54
+
55
+ Separation of concerns:
56
+ - **UI Layer**: Streamlit (`app.py`) handles only presentation.
57
+ - **Service Layer**: `KnowledgeService` orchestrates downloads.
58
+ - **Data Adapters**: Isolated `NSEAdapter` and `ScreenerAdapter`.
59
+ - **Config**: Centralized settings in `config.py`.
60
+
61
+ ## [done] Security Hardening
62
+
63
+ - **SSL Verification**: Enabled for all external requests.
64
+ - **Input Sanitization**: Strict validation of download folder names (`file_utils.sanitize_folder_name`).
65
+ - **Logging**: Replaced console printing with structured logging.
66
+
67
+
68
+ ---
69
+
70
+ ## [idea] Future Enhancements
71
+
72
+ - BSE support
73
+ - Export to structured formats (CSV, Excel)
74
+ - Configurable alert filters
75
+ - Progress bars for bulk downloads