devmind-cli 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,166 @@
1
+ Metadata-Version: 2.4
2
+ Name: devmind-cli
3
+ Version: 0.1.0
4
+ Summary: DevMind - Semantic Codebase Memory and Agentic Search for Developers
5
+ Author: Anishp-cell
6
+ Project-URL: Homepage, https://github.com/Anishp-cell/devmind-CLI
7
+ Project-URL: Issues, https://github.com/Anishp-cell/devmind-CLI/issues
8
+ Classifier: Programming Language :: Python :: 3
9
+ Classifier: License :: OSI Approved :: MIT License
10
+ Classifier: Operating System :: OS Independent
11
+ Classifier: Development Status :: 3 - Alpha
12
+ Classifier: Topic :: Software Development :: Libraries :: Application Frameworks
13
+ Requires-Python: >=3.10
14
+ Description-Content-Type: text/markdown
15
+ Requires-Dist: cognee[fastembed]>=0.1.0
16
+ Requires-Dist: typer[all]>=0.9.0
17
+ Requires-Dist: fastapi>=0.100.0
18
+ Requires-Dist: uvicorn>=0.22.0
19
+ Requires-Dist: gitpython>=3.1.30
20
+ Requires-Dist: fastmcp>=0.1.0
21
+ Requires-Dist: python-dotenv>=1.0.0
22
+ Requires-Dist: pydantic>=2.0.0
23
+ Requires-Dist: jinja2>=3.1.2
24
+ Requires-Dist: groq>=0.9.0
25
+ Provides-Extra: dev
26
+ Requires-Dist: pytest>=7.0; extra == "dev"
27
+ Requires-Dist: black>=23.0; extra == "dev"
28
+ Requires-Dist: isort>=5.12; extra == "dev"
29
+ Requires-Dist: mypy>=1.0; extra == "dev"
30
+
31
+ # DevMind – Codebase Memory for Developers
32
+
33
+ > "Your codebase finally has a memory."
34
+
35
+ DevMind is a developer CLI tool and local web interface that gives your codebase a persistent, queryable memory powered by **Cognee**. It scans source files, git commit history, comments, and architectural decisions, building a hybrid graph-vector knowledge store. This persistent memory allows developers and AI coding assistants (via MCP) to query the codebase in plain English and carry context across infinite sessions.
36
+
37
+ ---
38
+
39
+ ## Features
40
+
41
+ 1. **One-Command Ingestion** (`devmind remember`): Scans the codebase, git logs, and code comments to feed `cognee.remember()`.
42
+ 2. **Plain-English Q&A** (`devmind ask "..."`): Uses `cognee.recall()` to retrieve grounded, context-aware answers from the memory graph.
43
+ 3. **Decision Logging** (`devmind log "..."`): Records Architecture Decision Records (ADRs) to capture design reasoning.
44
+ 4. **Memory Refresh** (`devmind refresh`): Automatically detects modified files, updates the graph, and runs `cognee.improve()`.
45
+ 5. **Surgical Forget** (`devmind forget --file ...`): Prunes specific file memory from the knowledge graph using `cognee.forget()`.
46
+ 6. **Claude Code MCP Server** (`devmind mcp`): Seamlessly integrates with Claude Code or Cursor via standard Model Context Protocol (MCP).
47
+ 7. **Local Dashboard UI** (`devmind dashboard`): Provides a clean visual panel showing memory status, search queries, and recent decisions.
48
+ 8. **Smart API Key Rotation**: Automatically detects, formats, and rotates between multiple Groq and OpenRouter API keys to balance rate limits on free-tier LLM access.
49
+
50
+ ---
51
+
52
+ ## Installation & Setup
53
+
54
+ ### 1. Clone the repository
55
+ ```bash
56
+ git clone https://github.com/Anishp-cell/devmind-CLI.git
57
+ cd devmind-CLI
58
+ ```
59
+
60
+ ### 2. Configure Environment Variables
61
+ Copy `.env.example` to `.env` and fill in your keys:
62
+ ```bash
63
+ cp .env.example .env
64
+ ```
65
+ To run for **free**, configure your `.env` with a list of rotated API keys:
66
+ ```env
67
+ LLM_PROVIDER="groq"
68
+
69
+ # Add a comma-separated list of Groq keys (gsk_...) and/or OpenRouter keys (sk-or-v1-...)
70
+ # The CLI automatically load-balances and routes requests to the correct endpoints!
71
+ GROQ_API_KEYS="gsk_key1,sk-or-v1-key2,gsk_key3"
72
+
73
+ EMBEDDING_PROVIDER="fastembed"
74
+ EMBEDDING_MODEL="BAAI/bge-small-en-v1.5"
75
+ EMBEDDING_DIMENSIONS="384"
76
+ ```
77
+
78
+ ### 3. Install DevMind
79
+ Install the package in editable mode:
80
+ ```bash
81
+ pip install -e .
82
+ ```
83
+
84
+ ---
85
+
86
+ ## CLI Command Reference
87
+
88
+ * **Ingest Codebase**:
89
+ ```bash
90
+ devmind remember
91
+ ```
92
+ * **Ask a Question**:
93
+ ```bash
94
+ devmind ask "Why did we switch to redis for the queue?"
95
+ ```
96
+ * **Log an Architectural Decision (ADR)**:
97
+ ```bash
98
+ devmind log "Chose FastAPI for the web UI because it supports async routes natively."
99
+ ```
100
+ * **Refresh Changed Memory**:
101
+ ```bash
102
+ devmind refresh
103
+ ```
104
+ * **Forget a Specific File**:
105
+ ```bash
106
+ devmind forget --file devmind/web/app.py
107
+ ```
108
+ * **Wipe Local Database Cache**:
109
+ ```bash
110
+ devmind forget --all
111
+ ```
112
+ * **Launch Web Dashboard**:
113
+ ```bash
114
+ devmind dashboard --port 8000
115
+ ```
116
+ * **Start MCP Server**:
117
+ ```bash
118
+ devmind mcp
119
+ ```
120
+
121
+ ---
122
+
123
+ ## Running the Mock Demo Project
124
+
125
+ To test DevMind on a smaller project without polluting your main repo:
126
+ 1. Navigate to the demo directory:
127
+ ```bash
128
+ cd examples/demo_project
129
+ ```
130
+ 2. Build the memory of the demo:
131
+ ```bash
132
+ devmind remember --dir .
133
+ ```
134
+ 3. Query its memory:
135
+ ```bash
136
+ devmind ask "What open TODO tasks are left in main.py?"
137
+ devmind ask "Why do we use SQLite according to our architecture decisions?"
138
+ ```
139
+
140
+ ---
141
+
142
+ ## Claude Code MCP Integration
143
+
144
+ To connect Claude Code to DevMind's memory, add the server to your Claude MCP config:
145
+
146
+ ```bash
147
+ claude mcp add devmind "devmind mcp"
148
+ ```
149
+
150
+ Alternatively, configure your project-level `.mcp.json` file in your project root:
151
+ ```json
152
+ {
153
+ "mcpServers": {
154
+ "devmind": {
155
+ "command": "devmind",
156
+ "args": ["mcp"]
157
+ }
158
+ }
159
+ }
160
+ ```
161
+
162
+ ---
163
+
164
+ ## AI Assistant Declaration
165
+
166
+ Per the rules of **The Hangover Part AI Hackathon**, this project declares the use of **Claude** (via the Antigravity IDE agent) as an AI pair programmer.
@@ -0,0 +1,136 @@
1
+ # DevMind – Codebase Memory for Developers
2
+
3
+ > "Your codebase finally has a memory."
4
+
5
+ DevMind is a developer CLI tool and local web interface that gives your codebase a persistent, queryable memory powered by **Cognee**. It scans source files, git commit history, comments, and architectural decisions, building a hybrid graph-vector knowledge store. This persistent memory allows developers and AI coding assistants (via MCP) to query the codebase in plain English and carry context across infinite sessions.
6
+
7
+ ---
8
+
9
+ ## Features
10
+
11
+ 1. **One-Command Ingestion** (`devmind remember`): Scans the codebase, git logs, and code comments to feed `cognee.remember()`.
12
+ 2. **Plain-English Q&A** (`devmind ask "..."`): Uses `cognee.recall()` to retrieve grounded, context-aware answers from the memory graph.
13
+ 3. **Decision Logging** (`devmind log "..."`): Records Architecture Decision Records (ADRs) to capture design reasoning.
14
+ 4. **Memory Refresh** (`devmind refresh`): Automatically detects modified files, updates the graph, and runs `cognee.improve()`.
15
+ 5. **Surgical Forget** (`devmind forget --file ...`): Prunes specific file memory from the knowledge graph using `cognee.forget()`.
16
+ 6. **Claude Code MCP Server** (`devmind mcp`): Seamlessly integrates with Claude Code or Cursor via standard Model Context Protocol (MCP).
17
+ 7. **Local Dashboard UI** (`devmind dashboard`): Provides a clean visual panel showing memory status, search queries, and recent decisions.
18
+ 8. **Smart API Key Rotation**: Automatically detects, formats, and rotates between multiple Groq and OpenRouter API keys to balance rate limits on free-tier LLM access.
19
+
20
+ ---
21
+
22
+ ## Installation & Setup
23
+
24
+ ### 1. Clone the repository
25
+ ```bash
26
+ git clone https://github.com/Anishp-cell/devmind-CLI.git
27
+ cd devmind-CLI
28
+ ```
29
+
30
+ ### 2. Configure Environment Variables
31
+ Copy `.env.example` to `.env` and fill in your keys:
32
+ ```bash
33
+ cp .env.example .env
34
+ ```
35
+ To run for **free**, configure your `.env` with a list of rotated API keys:
36
+ ```env
37
+ LLM_PROVIDER="groq"
38
+
39
+ # Add a comma-separated list of Groq keys (gsk_...) and/or OpenRouter keys (sk-or-v1-...)
40
+ # The CLI automatically load-balances and routes requests to the correct endpoints!
41
+ GROQ_API_KEYS="gsk_key1,sk-or-v1-key2,gsk_key3"
42
+
43
+ EMBEDDING_PROVIDER="fastembed"
44
+ EMBEDDING_MODEL="BAAI/bge-small-en-v1.5"
45
+ EMBEDDING_DIMENSIONS="384"
46
+ ```
47
+
48
+ ### 3. Install DevMind
49
+ Install the package in editable mode:
50
+ ```bash
51
+ pip install -e .
52
+ ```
53
+
54
+ ---
55
+
56
+ ## CLI Command Reference
57
+
58
+ * **Ingest Codebase**:
59
+ ```bash
60
+ devmind remember
61
+ ```
62
+ * **Ask a Question**:
63
+ ```bash
64
+ devmind ask "Why did we switch to redis for the queue?"
65
+ ```
66
+ * **Log an Architectural Decision (ADR)**:
67
+ ```bash
68
+ devmind log "Chose FastAPI for the web UI because it supports async routes natively."
69
+ ```
70
+ * **Refresh Changed Memory**:
71
+ ```bash
72
+ devmind refresh
73
+ ```
74
+ * **Forget a Specific File**:
75
+ ```bash
76
+ devmind forget --file devmind/web/app.py
77
+ ```
78
+ * **Wipe Local Database Cache**:
79
+ ```bash
80
+ devmind forget --all
81
+ ```
82
+ * **Launch Web Dashboard**:
83
+ ```bash
84
+ devmind dashboard --port 8000
85
+ ```
86
+ * **Start MCP Server**:
87
+ ```bash
88
+ devmind mcp
89
+ ```
90
+
91
+ ---
92
+
93
+ ## Running the Mock Demo Project
94
+
95
+ To test DevMind on a smaller project without polluting your main repo:
96
+ 1. Navigate to the demo directory:
97
+ ```bash
98
+ cd examples/demo_project
99
+ ```
100
+ 2. Build the memory of the demo:
101
+ ```bash
102
+ devmind remember --dir .
103
+ ```
104
+ 3. Query its memory:
105
+ ```bash
106
+ devmind ask "What open TODO tasks are left in main.py?"
107
+ devmind ask "Why do we use SQLite according to our architecture decisions?"
108
+ ```
109
+
110
+ ---
111
+
112
+ ## Claude Code MCP Integration
113
+
114
+ To connect Claude Code to DevMind's memory, add the server to your Claude MCP config:
115
+
116
+ ```bash
117
+ claude mcp add devmind "devmind mcp"
118
+ ```
119
+
120
+ Alternatively, configure your project-level `.mcp.json` file in your project root:
121
+ ```json
122
+ {
123
+ "mcpServers": {
124
+ "devmind": {
125
+ "command": "devmind",
126
+ "args": ["mcp"]
127
+ }
128
+ }
129
+ }
130
+ ```
131
+
132
+ ---
133
+
134
+ ## AI Assistant Declaration
135
+
136
+ Per the rules of **The Hangover Part AI Hackathon**, this project declares the use of **Claude** (via the Antigravity IDE agent) as an AI pair programmer.
@@ -0,0 +1,2 @@
1
+ # DevMind: Codebase Memory for Developers
2
+ __version__ = "0.1.0"
@@ -0,0 +1,285 @@
1
+ # pyrefly: ignore [missing-import]
2
+ import typer
3
+ import sys
4
+ import asyncio
5
+ import os
6
+ import logging
7
+ import warnings
8
+
9
+ # Suppress ResourceWarning and DeprecationWarning from aiohttp/asyncio during garbage collection
10
+ warnings.filterwarnings("ignore", category=ResourceWarning)
11
+ warnings.filterwarnings("ignore", category=DeprecationWarning)
12
+
13
+ # Suppress Windows proactor event loop SSL bugs during shutdown
14
+ if sys.platform == 'win32':
15
+ asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())
16
+
17
+ from devmind.memory import initialize_cognee, remember_content, recall_query, improve_memory, forget_memory
18
+ from devmind.ingestion.file_reader import scan_codebase_files
19
+ from devmind.ingestion.git_parser import get_git_history
20
+ from devmind.ingestion.comment_extractor import get_codebase_comments
21
+
22
+ # Setup logging
23
+ logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
24
+ logger = logging.getLogger("devmind.cli")
25
+
26
+ def run_async(coro):
27
+ """
28
+ Custom asyncio runner that sets an exception handler to swallow
29
+ noisy Win32 socket teardown/closed event loop warnings on shutdown.
30
+ """
31
+ loop = asyncio.new_event_loop()
32
+ asyncio.set_event_loop(loop)
33
+
34
+ def silence_exceptions(loop, context):
35
+ exc = context.get("exception")
36
+ msg = context.get("message", "")
37
+ # Swallows Win32 10038/not-a-socket/Event loop is closed warnings during exit
38
+ if (exc and ("Event loop is closed" in str(exc) or "10038" in str(exc) or "socket" in str(exc))) or "Event loop is closed" in msg or "SSL transport" in msg:
39
+ return
40
+ loop.default_exception_handler(context)
41
+
42
+ loop.set_exception_handler(silence_exceptions)
43
+ try:
44
+ return loop.run_until_complete(coro)
45
+ finally:
46
+ try:
47
+ loop.run_until_complete(loop.shutdown_asyncgens())
48
+ except Exception:
49
+ pass
50
+ loop.close()
51
+
52
+ app = typer.Typer(
53
+ name="devmind",
54
+ help="DevMind – Codebase Memory for Developers. Powered by Cognee.",
55
+ add_completion=False
56
+ )
57
+
58
+ async def remember_pipeline(directory: str):
59
+ """
60
+ Core async pipeline for scanning files, comments, and git logs,
61
+ and loading them into Cognee.
62
+ """
63
+ # 1. Scan the codebase files
64
+ files = scan_codebase_files(directory)
65
+ if not files:
66
+ typer.echo("No files found to ingest.")
67
+ return
68
+
69
+ typer.echo(f"Ingesting {len(files)} files into Cognee memory...")
70
+
71
+ # Ingest file contents
72
+ file_success = 0
73
+ for idx, file_data in enumerate(files, start=1):
74
+ rel_path = file_data["relative_path"]
75
+ content = file_data["content"]
76
+
77
+ tagged_content = f"File Path: {rel_path}\n---\n{content}"
78
+ dataset_name = rel_path.replace("/", "_").replace("\\", "_").replace(".", "_").replace(" ", "_")
79
+
80
+ logger.info(f"[{idx}/{len(files)}] Processing {rel_path}...")
81
+ success = await remember_content(tagged_content, dataset_name=dataset_name)
82
+ if success:
83
+ file_success += 1
84
+
85
+ typer.echo(f"Successfully remembered {file_success}/{len(files)} files.")
86
+
87
+ # 2. Extract and Ingest Git History
88
+ git_logs = get_git_history(directory, max_commits=20)
89
+ if git_logs:
90
+ typer.echo(f"Ingesting git history ({len(git_logs)} commits) into Cognee...")
91
+ git_success = 0
92
+ for idx, commit_log in enumerate(git_logs, start=1):
93
+ dataset_name = f"git_commit_{idx}"
94
+ success = await remember_content(commit_log, dataset_name=dataset_name)
95
+ if success:
96
+ git_success += 1
97
+ typer.echo(f"Successfully remembered {git_success}/{len(git_logs)} commits.")
98
+
99
+ # 3. Extract and Ingest Inline Comments & Docstrings
100
+ relative_paths = [f["relative_path"] for f in files]
101
+ comments = get_codebase_comments(directory, relative_paths)
102
+ if comments:
103
+ typer.echo(f"Ingesting inline comments ({len(comments)} files containing comments)...")
104
+ comment_success = 0
105
+ for idx, comment_block in enumerate(comments, start=1):
106
+ dataset_name = f"code_comments_{idx}"
107
+ success = await remember_content(comment_block, dataset_name=dataset_name)
108
+ if success:
109
+ comment_success += 1
110
+ typer.echo(f"Successfully remembered {comment_success}/{len(comments)} comment segments.")
111
+
112
+ @app.command()
113
+ def remember(
114
+ directory: str = typer.Option(
115
+ ".",
116
+ "--dir", "-d",
117
+ help="The directory of the codebase to ingest."
118
+ )
119
+ ):
120
+ """
121
+ Ingest the codebase files into persistent Cognee memory.
122
+ """
123
+ initialize_cognee()
124
+ run_async(remember_pipeline(directory))
125
+ typer.echo("[Success] Codebase memory ingestion completed.")
126
+
127
+ @app.command()
128
+ def ask(
129
+ query: str = typer.Argument(..., help="Your natural language question about the codebase.")
130
+ ):
131
+ """
132
+ Ask a question about the ingested codebase memory in plain English.
133
+ """
134
+ initialize_cognee()
135
+
136
+ typer.echo(f"Querying codebase memory for: '{query}'...")
137
+ answer = run_async(recall_query(query))
138
+
139
+ typer.echo("\n--- Response ---")
140
+ typer.echo(answer)
141
+ typer.echo("----------------")
142
+
143
+ @app.command()
144
+ def chat():
145
+ """
146
+ Start an interactive DevMind terminal chat session to explore your codebase.
147
+ """
148
+ initialize_cognee()
149
+
150
+ from rich.console import Console
151
+ from rich.markdown import Markdown
152
+ from rich.panel import Panel
153
+ from rich.prompt import Prompt
154
+
155
+ console = Console()
156
+ console.print(Panel.fit("[bold blue]DevMind Codebase Chat[/bold blue]\n[dim]Type your queries below. Type 'exit' or 'quit' to close.[/dim]", border_style="blue"))
157
+
158
+ while True:
159
+ try:
160
+ query = Prompt.ask("\n[bold green]You[/bold green]")
161
+ if not query.strip():
162
+ continue
163
+ if query.lower().strip() in ['exit', 'quit', 'clear']:
164
+ console.print("[dim]Goodbye![/dim]")
165
+ break
166
+
167
+ with console.status("[bold cyan]DevMind is thinking...[/bold cyan]", spinner="dots"):
168
+ answer = run_async(recall_query(query))
169
+
170
+ console.print("\n[bold magenta]DevMind:[/bold magenta]")
171
+ console.print(Markdown(answer))
172
+ except (KeyboardInterrupt, EOFError):
173
+ console.print("\n[dim]Goodbye![/dim]")
174
+ break
175
+ except Exception as e:
176
+ console.print(f"[bold red]Error:[/bold red] {str(e)}")
177
+
178
+ @app.command()
179
+ def log(
180
+ decision: str = typer.Argument(..., help="The Architectural Decision Record (ADR) text to log.")
181
+ ):
182
+ """
183
+ Log an Architectural Decision Record (ADR) into persistent memory.
184
+ """
185
+ initialize_cognee()
186
+ typer.echo(f"Logging decision: '{decision}'...")
187
+
188
+ tagged_decision = f"Architectural Decision Record:\n{decision}"
189
+ import time
190
+ dataset_name = f"adr_decision_{int(time.time())}"
191
+
192
+ success = run_async(remember_content(tagged_decision, dataset_name=dataset_name))
193
+ if success:
194
+ typer.echo("[Success] Architectural decision successfully logged.")
195
+ else:
196
+ typer.echo("[Error] Failed to log architectural decision.")
197
+
198
+ @app.command()
199
+ def refresh(
200
+ directory: str = typer.Option(
201
+ ".",
202
+ "--dir", "-d",
203
+ help="The directory of the codebase to refresh."
204
+ )
205
+ ):
206
+ """
207
+ Refresh codebase memory by scanning for changed files and refining relationships.
208
+ """
209
+ initialize_cognee()
210
+ typer.echo("Scanning for codebase changes to refresh memory...")
211
+ run_async(remember_pipeline(directory))
212
+
213
+ typer.echo("Refining the codebase memory graph structure...")
214
+ # Improve memory on all dataset partitions
215
+ success = run_async(improve_memory(dataset_name="codebase_memory"))
216
+ if success:
217
+ typer.echo("[Success] Memory refresh and relationship refinement completed.")
218
+ else:
219
+ typer.echo("[Warning] File changes re-ingested, but relationship refinement had warnings.")
220
+
221
+ @app.command()
222
+ def forget(
223
+ file_path: str = typer.Option(
224
+ None,
225
+ "--file", "-f",
226
+ help="The relative path of the file memory to forget."
227
+ ),
228
+ all_memories: bool = typer.Option(
229
+ False,
230
+ "--all", "-a",
231
+ help="Wipe all local memory databases completely."
232
+ )
233
+ ):
234
+ """
235
+ Surgically forget a specific file's memory, or completely wipe the local databases.
236
+ """
237
+ initialize_cognee()
238
+
239
+ if all_memories:
240
+ typer.echo("Wiping all local memory databases...")
241
+ import shutil
242
+ from devmind.memory import system_path, data_path
243
+ try:
244
+ if os.path.exists(system_path):
245
+ shutil.rmtree(system_path)
246
+ if os.path.exists(data_path):
247
+ shutil.rmtree(data_path)
248
+ typer.echo("[Success] Local memory databases completely wiped.")
249
+ except Exception as e:
250
+ typer.echo(f"[Error] Failed to wipe memory folders: {e}")
251
+ return
252
+
253
+ if file_path:
254
+ dataset_name = file_path.replace("/", "_").replace("\\", "_").replace(".", "_").replace(" ", "_")
255
+ typer.echo(f"Removing memory dataset '{dataset_name}'...")
256
+ success = run_async(forget_memory(dataset_name))
257
+ if success:
258
+ typer.echo(f"[Success] Memory of '{file_path}' successfully forgotten.")
259
+ else:
260
+ typer.echo(f"[Error] Failed to forget memory of '{file_path}'.")
261
+ else:
262
+ typer.echo("[Warning] Please specify either --file <path> to forget a file, or --all to wipe all databases.")
263
+
264
+ @app.command()
265
+ def dashboard(
266
+ port: int = typer.Option(8000, "--port", "-p", help="Port to run the dashboard server on.")
267
+ ):
268
+ """
269
+ Launch the DevMind Web UI dashboard.
270
+ """
271
+ import uvicorn
272
+ typer.echo(f"Starting DevMind Web UI Dashboard on http://localhost:{port} ...")
273
+ uvicorn.run("devmind.web.app:app", host="127.0.0.1", port=port, reload=False)
274
+
275
+ @app.command()
276
+ def mcp():
277
+ """
278
+ Start the DevMind MCP server for integration with Claude Code.
279
+ """
280
+ typer.echo("Starting DevMind MCP Server...")
281
+ from devmind.integrations.claude_code import mcp as mcp_instance
282
+ mcp_instance.run()
283
+
284
+ if __name__ == "__main__":
285
+ app()
@@ -0,0 +1,91 @@
1
+ import os
2
+ import re
3
+ import logging
4
+
5
+ logger = logging.getLogger("devmind.comment_extractor")
6
+
7
+ # Regex to find common developer tags in comments
8
+ TAG_PATTERN = re.compile(r"\b(TODO|FIXME|NOTE|BUG|HACK|WARNING|ADR|DEPRECATED)\b", re.IGNORECASE)
9
+
10
+ def extract_comments_from_file(file_path: str) -> list[str]:
11
+ """
12
+ Parses a single file, extracting inline comments and docstrings
13
+ that contain key developer tags (TODO, FIXME, NOTE, HACK, etc.).
14
+ """
15
+ extracted = []
16
+ _, ext = os.path.splitext(file_path.lower())
17
+
18
+ try:
19
+ with open(file_path, "r", encoding="utf-8", errors="ignore") as f:
20
+ content = f.read()
21
+
22
+ lines = content.splitlines()
23
+
24
+ # 1. Python parsing (extract hash comments and triple-quoted docstrings)
25
+ if ext in (".py", ".sh", ".yaml", ".yml", ".ini"):
26
+ # Inline comment scanner
27
+ for idx, line in enumerate(lines, start=1):
28
+ comment_match = re.search(r"#\s*(.*)", line)
29
+ if comment_match:
30
+ comment_text = comment_match.group(1).strip()
31
+ if TAG_PATTERN.search(comment_text):
32
+ extracted.append(f"Line {idx}: {comment_text}")
33
+
34
+ # Simple regex search for docstrings
35
+ docstrings = re.findall(r'"""(.*?)"""', content, re.DOTALL)
36
+ docstrings.extend(re.findall(r"'''(.*?)'''", content, re.DOTALL))
37
+ for doc in docstrings:
38
+ doc_clean = doc.strip()
39
+ if doc_clean:
40
+ extracted.append(f"Docstring: {doc_clean}")
41
+
42
+ # 2. C-Style languages parsing (JS, TS, C, C++, Go, Java, Rust)
43
+ elif ext in (".js", ".ts", ".jsx", ".tsx", ".c", ".cpp", ".h", ".go", ".java", ".rs", ".css"):
44
+ # Inline comment scanner (// ...)
45
+ for idx, line in enumerate(lines, start=1):
46
+ comment_match = re.search(r"//\s*(.*)", line)
47
+ if comment_match:
48
+ comment_text = comment_match.group(1).strip()
49
+ if TAG_PATTERN.search(comment_text):
50
+ extracted.append(f"Line {idx}: {comment_text}")
51
+
52
+ # Block comment scanner (/* ... */)
53
+ block_comments = re.findall(r"/\*(.*?)\*/", content, re.DOTALL)
54
+ for block in block_comments:
55
+ for idx, block_line in enumerate(block.splitlines(), start=1):
56
+ block_line_clean = block_line.strip().lstrip("*").strip()
57
+ if TAG_PATTERN.search(block_line_clean):
58
+ extracted.append(f"Block Comment: {block_line_clean}")
59
+
60
+ # 3. HTML/XML/Markdown C-style comments (<!-- ... -->)
61
+ elif ext in (".html", ".htm", ".xml", ".md"):
62
+ html_comments = re.findall(r"<!--(.*?)-->", content, re.DOTALL)
63
+ for block in html_comments:
64
+ for block_line in block.splitlines():
65
+ block_line_clean = block_line.strip()
66
+ if TAG_PATTERN.search(block_line_clean):
67
+ extracted.append(f"HTML Comment: {block_line_clean}")
68
+
69
+ except Exception as e:
70
+ logger.error(f"Error parsing comments from {file_path}: {e}")
71
+
72
+ return extracted
73
+
74
+ def get_codebase_comments(repo_path: str, source_files: list[str]) -> list[str]:
75
+ """
76
+ Loops through all project files and extracts formatted comments/docstrings.
77
+ Returns a list of structured comment records.
78
+ """
79
+ all_comments = []
80
+ logger.info("Scanning codebase for inline comments and docstrings...")
81
+
82
+ for file_path in source_files:
83
+ abs_path = os.path.join(repo_path, file_path)
84
+ file_comments = extract_comments_from_file(abs_path)
85
+
86
+ if file_comments:
87
+ comment_log = [f"File Path: {file_path}"]
88
+ comment_log.extend(file_comments)
89
+ all_comments.append("\n".join(comment_log))
90
+
91
+ return all_comments