mcp-code-indexer 1.1.5__py3-none-any.whl → 1.2.1__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,851 @@
1
+ able
2
+ about
3
+ above
4
+ abroad
5
+ according
6
+ accordingly
7
+ across
8
+ actually
9
+ adj
10
+ after
11
+ afterwards
12
+ again
13
+ against
14
+ ago
15
+ ahead
16
+ ain't
17
+ all
18
+ allow
19
+ allows
20
+ almost
21
+ alone
22
+ along
23
+ alongside
24
+ already
25
+ also
26
+ although
27
+ always
28
+ am
29
+ amid
30
+ amidst
31
+ among
32
+ amongst
33
+ an
34
+ and
35
+ another
36
+ any
37
+ anybody
38
+ anyhow
39
+ anyone
40
+ anything
41
+ anyway
42
+ anyways
43
+ anywhere
44
+ apart
45
+ appear
46
+ appreciate
47
+ appropriate
48
+ are
49
+ aren't
50
+ around
51
+ as
52
+ a's
53
+ aside
54
+ ask
55
+ asking
56
+ associated
57
+ at
58
+ available
59
+ away
60
+ awfully
61
+ back
62
+ backward
63
+ backwards
64
+ be
65
+ became
66
+ because
67
+ become
68
+ becomes
69
+ becoming
70
+ been
71
+ before
72
+ beforehand
73
+ begin
74
+ behind
75
+ being
76
+ believe
77
+ below
78
+ beside
79
+ besides
80
+ best
81
+ better
82
+ between
83
+ beyond
84
+ both
85
+ brief
86
+ but
87
+ by
88
+ came
89
+ can
90
+ cannot
91
+ cant
92
+ can't
93
+ caption
94
+ cause
95
+ causes
96
+ certain
97
+ certainly
98
+ changes
99
+ clearly
100
+ c'mon
101
+ co
102
+ co.
103
+ com
104
+ come
105
+ comes
106
+ concerning
107
+ consequently
108
+ consider
109
+ considering
110
+ contain
111
+ containing
112
+ contains
113
+ corresponding
114
+ could
115
+ couldn't
116
+ course
117
+ c's
118
+ currently
119
+ dare
120
+ daren't
121
+ definitely
122
+ described
123
+ despite
124
+ did
125
+ didn't
126
+ different
127
+ directly
128
+ do
129
+ does
130
+ doesn't
131
+ doing
132
+ done
133
+ don't
134
+ down
135
+ downwards
136
+ during
137
+ each
138
+ edu
139
+ eg
140
+ eight
141
+ eighty
142
+ either
143
+ else
144
+ elsewhere
145
+ end
146
+ ending
147
+ enough
148
+ entirely
149
+ especially
150
+ et
151
+ etc
152
+ even
153
+ ever
154
+ evermore
155
+ every
156
+ everybody
157
+ everyone
158
+ everything
159
+ everywhere
160
+ ex
161
+ exactly
162
+ example
163
+ except
164
+ fairly
165
+ far
166
+ farther
167
+ few
168
+ fewer
169
+ fifth
170
+ first
171
+ five
172
+ followed
173
+ following
174
+ follows
175
+ for
176
+ forever
177
+ former
178
+ formerly
179
+ forth
180
+ forward
181
+ found
182
+ four
183
+ from
184
+ further
185
+ furthermore
186
+ get
187
+ gets
188
+ getting
189
+ given
190
+ gives
191
+ go
192
+ goes
193
+ going
194
+ gone
195
+ got
196
+ gotten
197
+ greetings
198
+ had
199
+ hadn't
200
+ half
201
+ happens
202
+ hardly
203
+ has
204
+ hasn't
205
+ have
206
+ haven't
207
+ having
208
+ he
209
+ he'd
210
+ he'll
211
+ hello
212
+ help
213
+ hence
214
+ her
215
+ here
216
+ hereafter
217
+ hereby
218
+ herein
219
+ here's
220
+ hereupon
221
+ hers
222
+ herself
223
+ he's
224
+ hi
225
+ him
226
+ himself
227
+ his
228
+ hither
229
+ hopefully
230
+ how
231
+ howbeit
232
+ however
233
+ hundred
234
+ i'd
235
+ ie
236
+ if
237
+ ignored
238
+ i'll
239
+ i'm
240
+ immediate
241
+ in
242
+ inasmuch
243
+ inc
244
+ inc.
245
+ indeed
246
+ indicate
247
+ indicated
248
+ indicates
249
+ inner
250
+ inside
251
+ insofar
252
+ instead
253
+ into
254
+ inward
255
+ is
256
+ isn't
257
+ it
258
+ it'd
259
+ it'll
260
+ its
261
+ it's
262
+ itself
263
+ i've
264
+ just
265
+ k
266
+ keep
267
+ keeps
268
+ kept
269
+ know
270
+ known
271
+ knows
272
+ last
273
+ lately
274
+ later
275
+ latter
276
+ latterly
277
+ least
278
+ less
279
+ lest
280
+ let
281
+ let's
282
+ like
283
+ liked
284
+ likely
285
+ likewise
286
+ little
287
+ look
288
+ looking
289
+ looks
290
+ low
291
+ lower
292
+ ltd
293
+ made
294
+ mainly
295
+ make
296
+ makes
297
+ many
298
+ may
299
+ maybe
300
+ mayn't
301
+ me
302
+ mean
303
+ meantime
304
+ meanwhile
305
+ merely
306
+ might
307
+ mightn't
308
+ mine
309
+ minus
310
+ miss
311
+ more
312
+ moreover
313
+ most
314
+ mostly
315
+ mr
316
+ mrs
317
+ much
318
+ must
319
+ mustn't
320
+ my
321
+ myself
322
+ name
323
+ namely
324
+ nd
325
+ near
326
+ nearly
327
+ necessary
328
+ need
329
+ needn't
330
+ needs
331
+ neither
332
+ never
333
+ neverf
334
+ neverless
335
+ nevertheless
336
+ new
337
+ next
338
+ nine
339
+ ninety
340
+ no
341
+ nobody
342
+ non
343
+ none
344
+ nonetheless
345
+ noone
346
+ no-one
347
+ nor
348
+ normally
349
+ not
350
+ nothing
351
+ notwithstanding
352
+ novel
353
+ now
354
+ nowhere
355
+ obviously
356
+ of
357
+ off
358
+ often
359
+ oh
360
+ ok
361
+ okay
362
+ old
363
+ on
364
+ once
365
+ one
366
+ ones
367
+ one's
368
+ only
369
+ onto
370
+ opposite
371
+ or
372
+ other
373
+ others
374
+ otherwise
375
+ ought
376
+ oughtn't
377
+ our
378
+ ours
379
+ ourselves
380
+ out
381
+ outside
382
+ over
383
+ overall
384
+ own
385
+ particular
386
+ particularly
387
+ past
388
+ per
389
+ perhaps
390
+ placed
391
+ please
392
+ plus
393
+ possible
394
+ presumably
395
+ probably
396
+ provided
397
+ provides
398
+ que
399
+ quite
400
+ qv
401
+ rather
402
+ rd
403
+ re
404
+ really
405
+ reasonably
406
+ recent
407
+ recently
408
+ regarding
409
+ regardless
410
+ regards
411
+ relatively
412
+ respectively
413
+ right
414
+ round
415
+ said
416
+ same
417
+ saw
418
+ say
419
+ saying
420
+ says
421
+ second
422
+ secondly
423
+ see
424
+ seeing
425
+ seem
426
+ seemed
427
+ seeming
428
+ seems
429
+ seen
430
+ self
431
+ selves
432
+ sensible
433
+ sent
434
+ serious
435
+ seriously
436
+ seven
437
+ several
438
+ shall
439
+ shan't
440
+ she
441
+ she'd
442
+ she'll
443
+ she's
444
+ should
445
+ shouldn't
446
+ since
447
+ six
448
+ so
449
+ some
450
+ somebody
451
+ someday
452
+ somehow
453
+ someone
454
+ something
455
+ sometime
456
+ sometimes
457
+ somewhat
458
+ somewhere
459
+ soon
460
+ sorry
461
+ specified
462
+ specify
463
+ specifying
464
+ still
465
+ sub
466
+ such
467
+ sup
468
+ sure
469
+ take
470
+ taken
471
+ taking
472
+ tell
473
+ tends
474
+ th
475
+ than
476
+ thank
477
+ thanks
478
+ thanx
479
+ that
480
+ that'll
481
+ thats
482
+ that's
483
+ that've
484
+ the
485
+ their
486
+ theirs
487
+ them
488
+ themselves
489
+ then
490
+ thence
491
+ there
492
+ thereafter
493
+ thereby
494
+ there'd
495
+ therefore
496
+ therein
497
+ there'll
498
+ there're
499
+ theres
500
+ there's
501
+ thereupon
502
+ there've
503
+ these
504
+ they
505
+ they'd
506
+ they'll
507
+ they're
508
+ they've
509
+ thing
510
+ things
511
+ think
512
+ third
513
+ thirty
514
+ this
515
+ thorough
516
+ thoroughly
517
+ those
518
+ though
519
+ three
520
+ through
521
+ throughout
522
+ thru
523
+ thus
524
+ till
525
+ to
526
+ together
527
+ too
528
+ took
529
+ toward
530
+ towards
531
+ tried
532
+ tries
533
+ truly
534
+ try
535
+ trying
536
+ t's
537
+ twice
538
+ two
539
+ un
540
+ under
541
+ underneath
542
+ undoing
543
+ unfortunately
544
+ unless
545
+ unlike
546
+ unlikely
547
+ until
548
+ unto
549
+ up
550
+ upon
551
+ upwards
552
+ us
553
+ use
554
+ used
555
+ useful
556
+ uses
557
+ using
558
+ usually
559
+ v
560
+ value
561
+ various
562
+ versus
563
+ very
564
+ via
565
+ viz
566
+ vs
567
+ want
568
+ wants
569
+ was
570
+ wasn't
571
+ way
572
+ we
573
+ we'd
574
+ welcome
575
+ well
576
+ we'll
577
+ went
578
+ were
579
+ we're
580
+ weren't
581
+ we've
582
+ what
583
+ whatever
584
+ what'll
585
+ what's
586
+ what've
587
+ when
588
+ whence
589
+ whenever
590
+ where
591
+ whereafter
592
+ whereas
593
+ whereby
594
+ wherein
595
+ where's
596
+ whereupon
597
+ wherever
598
+ whether
599
+ which
600
+ whichever
601
+ while
602
+ whilst
603
+ whither
604
+ who
605
+ who'd
606
+ whoever
607
+ whole
608
+ who'll
609
+ whom
610
+ whomever
611
+ who's
612
+ whose
613
+ why
614
+ will
615
+ willing
616
+ wish
617
+ with
618
+ within
619
+ without
620
+ wonder
621
+ won't
622
+ would
623
+ wouldn't
624
+ yes
625
+ yet
626
+ you
627
+ you'd
628
+ you'll
629
+ your
630
+ you're
631
+ yours
632
+ yourself
633
+ yourselves
634
+ you've
635
+ zero
636
+ a
637
+ how's
638
+ i
639
+ when's
640
+ why's
641
+ b
642
+ c
643
+ d
644
+ e
645
+ f
646
+ g
647
+ h
648
+ j
649
+ l
650
+ m
651
+ n
652
+ o
653
+ p
654
+ q
655
+ r
656
+ s
657
+ t
658
+ u
659
+ uucp
660
+ w
661
+ x
662
+ y
663
+ z
664
+ I
665
+ www
666
+ amount
667
+ bill
668
+ bottom
669
+ call
670
+ computer
671
+ con
672
+ couldnt
673
+ cry
674
+ de
675
+ describe
676
+ detail
677
+ due
678
+ eleven
679
+ empty
680
+ fifteen
681
+ fifty
682
+ fill
683
+ find
684
+ fire
685
+ forty
686
+ front
687
+ full
688
+ give
689
+ hasnt
690
+ herse
691
+ himse
692
+ interest
693
+ itse”
694
+ mill
695
+ move
696
+ myse”
697
+ part
698
+ put
699
+ show
700
+ side
701
+ sincere
702
+ sixty
703
+ system
704
+ ten
705
+ thick
706
+ thin
707
+ top
708
+ twelve
709
+ twenty
710
+ abst
711
+ accordance
712
+ act
713
+ added
714
+ adopted
715
+ affected
716
+ affecting
717
+ affects
718
+ ah
719
+ announce
720
+ anymore
721
+ apparently
722
+ approximately
723
+ aren
724
+ arent
725
+ arise
726
+ auth
727
+ beginning
728
+ beginnings
729
+ begins
730
+ biol
731
+ briefly
732
+ ca
733
+ date
734
+ ed
735
+ effect
736
+ et-al
737
+ ff
738
+ fix
739
+ gave
740
+ giving
741
+ heres
742
+ hes
743
+ hid
744
+ home
745
+ id
746
+ im
747
+ immediately
748
+ importance
749
+ important
750
+ index
751
+ information
752
+ invention
753
+ itd
754
+ keys
755
+ kg
756
+ km
757
+ largely
758
+ lets
759
+ line
760
+ 'll
761
+ means
762
+ mg
763
+ million
764
+ ml
765
+ mug
766
+ na
767
+ nay
768
+ necessarily
769
+ nos
770
+ noted
771
+ obtain
772
+ obtained
773
+ omitted
774
+ ord
775
+ owing
776
+ page
777
+ pages
778
+ poorly
779
+ possibly
780
+ potentially
781
+ pp
782
+ predominantly
783
+ present
784
+ previously
785
+ primarily
786
+ promptly
787
+ proud
788
+ quickly
789
+ ran
790
+ readily
791
+ ref
792
+ refs
793
+ related
794
+ research
795
+ resulted
796
+ resulting
797
+ results
798
+ run
799
+ sec
800
+ section
801
+ shed
802
+ shes
803
+ showed
804
+ shown
805
+ showns
806
+ shows
807
+ significant
808
+ significantly
809
+ similar
810
+ similarly
811
+ slightly
812
+ somethan
813
+ specifically
814
+ state
815
+ states
816
+ stop
817
+ strongly
818
+ substantially
819
+ successfully
820
+ sufficiently
821
+ suggest
822
+ thered
823
+ thereof
824
+ therere
825
+ thereto
826
+ theyd
827
+ theyre
828
+ thou
829
+ thoughh
830
+ thousand
831
+ throug
832
+ til
833
+ tip
834
+ ts
835
+ ups
836
+ usefully
837
+ usefulness
838
+ 've
839
+ vol
840
+ vols
841
+ wed
842
+ whats
843
+ wheres
844
+ whim
845
+ whod
846
+ whos
847
+ widely
848
+ words
849
+ world
850
+ youd
851
+ youre
@@ -17,7 +17,7 @@ import aiosqlite
17
17
 
18
18
  from mcp_code_indexer.database.models import (
19
19
  Project, FileDescription, MergeConflict, SearchResult,
20
- CodebaseSizeInfo
20
+ CodebaseSizeInfo, ProjectOverview, WordFrequencyResult, WordFrequencyTerm
21
21
  )
22
22
 
23
23
  logger = logging.getLogger(__name__)
@@ -538,3 +538,157 @@ class DatabaseManager:
538
538
  # Check if project has any descriptions
539
539
  file_count = await self.get_file_count(project.id, "main")
540
540
  return file_count == 0
541
+
542
+ # Project Overview operations
543
+
544
+ async def create_project_overview(self, overview: ProjectOverview) -> None:
545
+ """Create or update a project overview."""
546
+ async with self.get_connection() as db:
547
+ await db.execute(
548
+ """
549
+ INSERT OR REPLACE INTO project_overviews
550
+ (project_id, branch, overview, last_modified, total_files, total_tokens)
551
+ VALUES (?, ?, ?, ?, ?, ?)
552
+ """,
553
+ (
554
+ overview.project_id,
555
+ overview.branch,
556
+ overview.overview,
557
+ overview.last_modified,
558
+ overview.total_files,
559
+ overview.total_tokens
560
+ )
561
+ )
562
+ await db.commit()
563
+ logger.debug(f"Created/updated overview for project {overview.project_id}, branch {overview.branch}")
564
+
565
+ async def get_project_overview(self, project_id: str, branch: str) -> Optional[ProjectOverview]:
566
+ """Get project overview by ID and branch."""
567
+ async with self.get_connection() as db:
568
+ cursor = await db.execute(
569
+ "SELECT * FROM project_overviews WHERE project_id = ? AND branch = ?",
570
+ (project_id, branch)
571
+ )
572
+ row = await cursor.fetchone()
573
+
574
+ if row:
575
+ return ProjectOverview(
576
+ project_id=row['project_id'],
577
+ branch=row['branch'],
578
+ overview=row['overview'],
579
+ last_modified=datetime.fromisoformat(row['last_modified']),
580
+ total_files=row['total_files'],
581
+ total_tokens=row['total_tokens']
582
+ )
583
+ return None
584
+
585
+ async def cleanup_missing_files(self, project_id: str, branch: str, project_root: Path) -> List[str]:
586
+ """
587
+ Remove descriptions for files that no longer exist on disk.
588
+
589
+ Args:
590
+ project_id: Project identifier
591
+ branch: Branch name
592
+ project_root: Path to project root directory
593
+
594
+ Returns:
595
+ List of file paths that were cleaned up
596
+ """
597
+ removed_files = []
598
+
599
+ async with self.get_connection() as db:
600
+ # Get all file descriptions for this project/branch
601
+ cursor = await db.execute(
602
+ "SELECT file_path FROM file_descriptions WHERE project_id = ? AND branch = ?",
603
+ (project_id, branch)
604
+ )
605
+
606
+ rows = await cursor.fetchall()
607
+
608
+ # Check which files no longer exist
609
+ to_remove = []
610
+ for row in rows:
611
+ file_path = row['file_path']
612
+ full_path = project_root / file_path
613
+
614
+ if not full_path.exists():
615
+ to_remove.append(file_path)
616
+ removed_files.append(file_path)
617
+
618
+ # Remove descriptions for missing files
619
+ if to_remove:
620
+ await db.executemany(
621
+ "DELETE FROM file_descriptions WHERE project_id = ? AND branch = ? AND file_path = ?",
622
+ [(project_id, branch, path) for path in to_remove]
623
+ )
624
+ await db.commit()
625
+ logger.info(f"Cleaned up {len(to_remove)} missing files from {project_id}/{branch}")
626
+
627
+ return removed_files
628
+
629
+ async def analyze_word_frequency(self, project_id: str, branch: str, limit: int = 200) -> WordFrequencyResult:
630
+ """
631
+ Analyze word frequency across all file descriptions for a project/branch.
632
+
633
+ Args:
634
+ project_id: Project identifier
635
+ branch: Branch name
636
+ limit: Maximum number of top terms to return
637
+
638
+ Returns:
639
+ WordFrequencyResult with top terms and statistics
640
+ """
641
+ from collections import Counter
642
+ import re
643
+
644
+ # Load stop words from bundled file
645
+ stop_words_path = Path(__file__).parent.parent / "data" / "stop_words_english.txt"
646
+ stop_words = set()
647
+
648
+ if stop_words_path.exists():
649
+ with open(stop_words_path, 'r', encoding='utf-8') as f:
650
+ for line in f:
651
+ # Parse lines like "1: word" and extract just the word
652
+ parts = line.strip().split(': ', 1)
653
+ if len(parts) == 2:
654
+ stop_words.add(parts[1].lower())
655
+
656
+ # Add common programming keywords to stop words
657
+ programming_keywords = {
658
+ 'if', 'else', 'for', 'while', 'do', 'break', 'continue', 'return',
659
+ 'function', 'class', 'def', 'var', 'let', 'const', 'public', 'private',
660
+ 'static', 'async', 'await', 'import', 'export', 'from', 'true', 'false',
661
+ 'null', 'undefined', 'this', 'that', 'self', 'super', 'new', 'delete'
662
+ }
663
+ stop_words.update(programming_keywords)
664
+
665
+ async with self.get_connection() as db:
666
+ # Get all descriptions for this project/branch
667
+ cursor = await db.execute(
668
+ "SELECT description FROM file_descriptions WHERE project_id = ? AND branch = ?",
669
+ (project_id, branch)
670
+ )
671
+
672
+ rows = await cursor.fetchall()
673
+
674
+ # Combine all descriptions
675
+ all_text = " ".join(row['description'] for row in rows)
676
+
677
+ # Tokenize and filter
678
+ words = re.findall(r'\b[a-zA-Z]{2,}\b', all_text.lower())
679
+ filtered_words = [word for word in words if word not in stop_words]
680
+
681
+ # Count frequencies
682
+ word_counts = Counter(filtered_words)
683
+
684
+ # Create result
685
+ top_terms = [
686
+ WordFrequencyTerm(term=term, frequency=count)
687
+ for term, count in word_counts.most_common(limit)
688
+ ]
689
+
690
+ return WordFrequencyResult(
691
+ top_terms=top_terms,
692
+ total_terms_analyzed=len(filtered_words),
693
+ total_unique_terms=len(word_counts)
694
+ )
@@ -62,6 +62,22 @@ class MergeConflict(BaseModel):
62
62
  created: datetime = Field(default_factory=datetime.utcnow, description="Creation timestamp")
63
63
 
64
64
 
65
+ class ProjectOverview(BaseModel):
66
+ """
67
+ Represents a condensed, interpretive overview of an entire codebase.
68
+
69
+ Stores a comprehensive narrative that captures architecture, components,
70
+ relationships, and design patterns in a single document rather than
71
+ individual file descriptions.
72
+ """
73
+ project_id: str = Field(..., description="Reference to project")
74
+ branch: str = Field(..., description="Git branch name")
75
+ overview: str = Field(..., description="Comprehensive codebase narrative")
76
+ last_modified: datetime = Field(default_factory=datetime.utcnow, description="Last update timestamp")
77
+ total_files: int = Field(..., description="Number of files in codebase")
78
+ total_tokens: int = Field(..., description="Total tokens in individual descriptions")
79
+
80
+
65
81
  class CodebaseOverview(BaseModel):
66
82
  """
67
83
  Represents a complete codebase structure with file descriptions.
@@ -116,6 +132,25 @@ class CodebaseSizeInfo(BaseModel):
116
132
  is_large: bool = Field(..., description="Whether codebase exceeds token limit")
117
133
  recommendation: str = Field(..., description="Recommended approach (use_search or use_overview)")
118
134
  token_limit: int = Field(..., description="Configured token limit")
135
+ cleaned_up_files: List[str] = Field(default_factory=list, description="Files removed during cleanup")
136
+ cleaned_up_count: int = Field(default=0, description="Number of files cleaned up")
137
+
138
+
139
+ class WordFrequencyTerm(BaseModel):
140
+ """
141
+ Represents a term and its frequency from word analysis.
142
+ """
143
+ term: str = Field(..., description="The word/term")
144
+ frequency: int = Field(..., description="Number of occurrences")
145
+
146
+
147
+ class WordFrequencyResult(BaseModel):
148
+ """
149
+ Results from word frequency analysis of file descriptions.
150
+ """
151
+ top_terms: List[WordFrequencyTerm] = Field(..., description="Top frequent terms")
152
+ total_terms_analyzed: int = Field(..., description="Total terms processed")
153
+ total_unique_terms: int = Field(..., description="Number of unique terms found")
119
154
 
120
155
 
121
156
  # Enable forward references for recursive models
@@ -26,7 +26,8 @@ from mcp_code_indexer.file_scanner import FileScanner
26
26
  from mcp_code_indexer.token_counter import TokenCounter
27
27
  from mcp_code_indexer.database.models import (
28
28
  Project, FileDescription, CodebaseOverview, SearchResult,
29
- CodebaseSizeInfo, FolderNode, FileNode
29
+ CodebaseSizeInfo, FolderNode, FileNode, ProjectOverview,
30
+ WordFrequencyResult
30
31
  )
31
32
  from mcp_code_indexer.error_handler import setup_error_handling, ErrorHandler
32
33
  from mcp_code_indexer.middleware.error_middleware import create_tool_middleware, AsyncTaskManager
@@ -297,8 +298,8 @@ class MCPCodeIndexServer:
297
298
  }
298
299
  ),
299
300
  types.Tool(
300
- name="get_codebase_overview",
301
- description="Returns the complete file and folder structure of a codebase with all descriptions. For large codebases, this will recommend using search_descriptions instead.",
301
+ name="get_all_descriptions",
302
+ description="Returns the complete file-by-file structure of a codebase with individual descriptions for each file. For large codebases, consider using get_codebase_overview for a condensed summary instead.",
302
303
  inputSchema={
303
304
  "type": "object",
304
305
  "properties": {
@@ -338,6 +339,87 @@ class MCPCodeIndexServer:
338
339
  },
339
340
  "required": ["projectName", "folderPath", "sourceBranch", "targetBranch"]
340
341
  }
342
+ ),
343
+ types.Tool(
344
+ name="get_codebase_overview",
345
+ description="Returns a condensed, interpretive overview of the entire codebase. This is a single comprehensive narrative that captures the architecture, key components, relationships, and design patterns. Unlike get_all_descriptions which lists every file, this provides a holistic view suitable for understanding the codebase's structure and purpose. If no overview exists, returns empty string.",
346
+ inputSchema={
347
+ "type": "object",
348
+ "properties": {
349
+ "projectName": {"type": "string", "description": "The name of the project"},
350
+ "folderPath": {"type": "string", "description": "Absolute path to the project folder on disk"},
351
+ "branch": {"type": "string", "description": "Git branch name"},
352
+ "remoteOrigin": {"type": "string", "description": "Git remote origin URL if available"},
353
+ "upstreamOrigin": {"type": "string", "description": "Upstream repository URL if this is a fork"}
354
+ },
355
+ "required": ["projectName", "folderPath", "branch"]
356
+ }
357
+ ),
358
+ types.Tool(
359
+ name="update_codebase_overview",
360
+ description="""Updates the condensed codebase overview. Create a comprehensive narrative that would help a new developer understand this codebase. Include: (1) A visual directory tree showing the main folders and their purposes, (2) Overall architecture - how components fit together, (3) Core business logic and main workflows, (4) Key technical patterns and conventions used, (5) Important dependencies and integrations, (6) Database schema overview if applicable, (7) API structure if applicable, (8) Testing approach, (9) Build and deployment notes. Write in a clear, structured format with headers and sections. Be thorough but organized - imagine writing a technical onboarding document. The overview should be substantial (think 10-20 pages of text) but well-structured so specific sections can be found easily.
361
+
362
+ Example Structure:
363
+
364
+ ````
365
+ ## Directory Structure
366
+ ```
367
+ src/
368
+ ├── api/ # REST API endpoints and middleware
369
+ ├── models/ # Database models and business logic
370
+ ├── services/ # External service integrations
371
+ ├── utils/ # Shared utilities and helpers
372
+ └── tests/ # Test suites
373
+ ```
374
+
375
+ ## Architecture Overview
376
+ [Describe how components interact, data flow, key design decisions]
377
+
378
+ ## Core Components
379
+ ### API Layer
380
+ [Details about API structure, authentication, routing]
381
+
382
+ ### Data Model
383
+ [Key entities, relationships, database design]
384
+
385
+ ## Key Workflows
386
+ 1. User Authentication Flow
387
+ [Step-by-step description]
388
+ 2. Data Processing Pipeline
389
+ [How data moves through the system]
390
+
391
+ [Continue with other sections...]"
392
+ ````
393
+
394
+ """,
395
+ inputSchema={
396
+ "type": "object",
397
+ "properties": {
398
+ "projectName": {"type": "string", "description": "The name of the project"},
399
+ "folderPath": {"type": "string", "description": "Absolute path to the project folder on disk"},
400
+ "branch": {"type": "string", "description": "Git branch name"},
401
+ "remoteOrigin": {"type": "string", "description": "Git remote origin URL if available"},
402
+ "upstreamOrigin": {"type": "string", "description": "Upstream repository URL if this is a fork"},
403
+ "overview": {"type": "string", "description": "Comprehensive narrative overview of the codebase (10-30k tokens recommended)"}
404
+ },
405
+ "required": ["projectName", "folderPath", "branch", "overview"]
406
+ }
407
+ ),
408
+ types.Tool(
409
+ name="get_word_frequency",
410
+ description="Analyzes all file descriptions to find the most frequently used technical terms. Filters out common English stop words and symbols, returning the top 200 meaningful terms. Useful for understanding the codebase's domain vocabulary and finding all functions/files related to specific concepts.",
411
+ inputSchema={
412
+ "type": "object",
413
+ "properties": {
414
+ "projectName": {"type": "string", "description": "The name of the project"},
415
+ "folderPath": {"type": "string", "description": "Absolute path to the project folder on disk"},
416
+ "branch": {"type": "string", "description": "Git branch name"},
417
+ "remoteOrigin": {"type": "string", "description": "Git remote origin URL if available"},
418
+ "upstreamOrigin": {"type": "string", "description": "Upstream repository URL if this is a fork"},
419
+ "limit": {"type": "integer", "default": 200, "description": "Number of top terms to return"}
420
+ },
421
+ "required": ["projectName", "folderPath", "branch"]
422
+ }
341
423
  )
342
424
  ]
343
425
 
@@ -351,7 +433,10 @@ class MCPCodeIndexServer:
351
433
  "check_codebase_size": self._handle_check_codebase_size,
352
434
  "find_missing_descriptions": self._handle_find_missing_descriptions,
353
435
  "search_descriptions": self._handle_search_descriptions,
354
- "get_codebase_overview": self._handle_get_codebase_overview,
436
+ "get_all_descriptions": self._handle_get_codebase_overview,
437
+ "get_codebase_overview": self._handle_get_condensed_overview,
438
+ "update_codebase_overview": self._handle_update_codebase_overview,
439
+ "get_word_frequency": self._handle_get_word_frequency,
355
440
  "merge_branch_descriptions": self._handle_merge_branch_descriptions,
356
441
  }
357
442
 
@@ -679,8 +764,16 @@ class MCPCodeIndexServer:
679
764
  """Handle check_codebase_size tool calls."""
680
765
  project_id = await self._get_or_create_project_id(arguments)
681
766
  resolved_branch = await self._resolve_branch(project_id, arguments["branch"])
767
+ folder_path = Path(arguments["folderPath"])
682
768
 
683
- # Get file descriptions for this project/branch
769
+ # Clean up descriptions for files that no longer exist
770
+ cleaned_up_files = await self.db_manager.cleanup_missing_files(
771
+ project_id=project_id,
772
+ branch=resolved_branch,
773
+ project_root=folder_path
774
+ )
775
+
776
+ # Get file descriptions for this project/branch (after cleanup)
684
777
  file_descriptions = await self.db_manager.get_all_file_descriptions(
685
778
  project_id=project_id,
686
779
  branch=resolved_branch
@@ -699,7 +792,9 @@ class MCPCodeIndexServer:
699
792
  "isLarge": is_large,
700
793
  "recommendation": recommendation,
701
794
  "tokenLimit": token_limit,
702
- "totalFiles": len(file_descriptions)
795
+ "totalFiles": len(file_descriptions),
796
+ "cleanedUpFiles": cleaned_up_files,
797
+ "cleanedUpCount": len(cleaned_up_files)
703
798
  }
704
799
 
705
800
  async def _handle_find_missing_descriptions(self, arguments: Dict[str, Any]) -> Dict[str, Any]:
@@ -787,18 +882,7 @@ class MCPCodeIndexServer:
787
882
  total_tokens = self.token_counter.calculate_codebase_tokens(file_descriptions)
788
883
  is_large = self.token_counter.is_large_codebase(total_tokens)
789
884
 
790
- # If large, recommend search instead
791
- if is_large:
792
- return {
793
- "isLarge": True,
794
- "totalTokens": total_tokens,
795
- "tokenLimit": self.token_counter.token_limit,
796
- "totalFiles": len(file_descriptions),
797
- "recommendation": "use_search",
798
- "message": f"Codebase has {total_tokens} tokens (limit: {self.token_counter.token_limit}). Use search_descriptions instead for better performance."
799
- }
800
-
801
- # Build folder structure
885
+ # Always build and return the folder structure - if the AI called this tool, it wants the overview
802
886
  structure = self._build_folder_structure(file_descriptions)
803
887
 
804
888
  return {
@@ -915,6 +999,83 @@ class MCPCodeIndexServer:
915
999
  **result
916
1000
  }
917
1001
 
1002
+ async def _handle_get_condensed_overview(self, arguments: Dict[str, Any]) -> Dict[str, Any]:
1003
+ """Handle get_codebase_overview tool calls for condensed overviews."""
1004
+ project_id = await self._get_or_create_project_id(arguments)
1005
+ resolved_branch = await self._resolve_branch(project_id, arguments["branch"])
1006
+
1007
+ # Try to get existing overview
1008
+ overview = await self.db_manager.get_project_overview(project_id, resolved_branch)
1009
+
1010
+ if overview:
1011
+ return {
1012
+ "overview": overview.overview,
1013
+ "lastModified": overview.last_modified.isoformat(),
1014
+ "totalFiles": overview.total_files,
1015
+ "totalTokensInFullDescriptions": overview.total_tokens
1016
+ }
1017
+ else:
1018
+ return {
1019
+ "overview": "",
1020
+ "lastModified": "",
1021
+ "totalFiles": 0,
1022
+ "totalTokensInFullDescriptions": 0
1023
+ }
1024
+
1025
+ async def _handle_update_codebase_overview(self, arguments: Dict[str, Any]) -> Dict[str, Any]:
1026
+ """Handle update_codebase_overview tool calls."""
1027
+ project_id = await self._get_or_create_project_id(arguments)
1028
+ resolved_branch = await self._resolve_branch(project_id, arguments["branch"])
1029
+ folder_path = Path(arguments["folderPath"])
1030
+
1031
+ # Get current file count and total tokens for context
1032
+ file_descriptions = await self.db_manager.get_all_file_descriptions(
1033
+ project_id=project_id,
1034
+ branch=resolved_branch
1035
+ )
1036
+
1037
+ total_files = len(file_descriptions)
1038
+ total_tokens = self.token_counter.calculate_codebase_tokens(file_descriptions)
1039
+
1040
+ # Create overview record
1041
+ overview = ProjectOverview(
1042
+ project_id=project_id,
1043
+ branch=resolved_branch,
1044
+ overview=arguments["overview"],
1045
+ last_modified=datetime.utcnow(),
1046
+ total_files=total_files,
1047
+ total_tokens=total_tokens
1048
+ )
1049
+
1050
+ await self.db_manager.create_project_overview(overview)
1051
+
1052
+ return {
1053
+ "success": True,
1054
+ "message": f"Overview updated for {total_files} files",
1055
+ "totalFiles": total_files,
1056
+ "totalTokens": total_tokens,
1057
+ "overviewLength": len(arguments["overview"])
1058
+ }
1059
+
1060
+ async def _handle_get_word_frequency(self, arguments: Dict[str, Any]) -> Dict[str, Any]:
1061
+ """Handle get_word_frequency tool calls."""
1062
+ project_id = await self._get_or_create_project_id(arguments)
1063
+ resolved_branch = await self._resolve_branch(project_id, arguments["branch"])
1064
+ limit = arguments.get("limit", 200)
1065
+
1066
+ # Analyze word frequency
1067
+ result = await self.db_manager.analyze_word_frequency(
1068
+ project_id=project_id,
1069
+ branch=resolved_branch,
1070
+ limit=limit
1071
+ )
1072
+
1073
+ return {
1074
+ "topTerms": [{"term": term.term, "frequency": term.frequency} for term in result.top_terms],
1075
+ "totalTermsAnalyzed": result.total_terms_analyzed,
1076
+ "totalUniqueTerms": result.total_unique_terms
1077
+ }
1078
+
918
1079
  async def _run_session_with_retry(self, read_stream, write_stream, initialization_options) -> None:
919
1080
  """Run a single MCP session with error handling and retry logic."""
920
1081
  max_retries = 3
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: mcp-code-indexer
3
- Version: 1.1.5
3
+ Version: 1.2.1
4
4
  Summary: MCP server that tracks file descriptions across codebases, enabling AI agents to efficiently navigate and understand code through searchable summaries and token-aware overviews.
5
5
  Author: MCP Code Indexer Contributors
6
6
  Maintainer: MCP Code Indexer Contributors
@@ -158,12 +158,12 @@ mypy src/
158
158
 
159
159
  ## 🛠️ MCP Tools Available
160
160
 
161
- The server provides **8 powerful MCP tools** for intelligent codebase management:
161
+ The server provides **11 powerful MCP tools** for intelligent codebase management:
162
162
 
163
163
  ### Core Operations
164
164
  - **`get_file_description`** - Retrieve stored file descriptions instantly
165
165
  - **`update_file_description`** - Store detailed file summaries and metadata
166
- - **`check_codebase_size`** - Get token count and size-based recommendations
166
+ - **`check_codebase_size`** - Get token count and size-based recommendations with automatic file cleanup
167
167
 
168
168
  ### Batch Operations
169
169
  - **`find_missing_descriptions`** - Scan projects for files without descriptions
@@ -171,10 +171,13 @@ The server provides **8 powerful MCP tools** for intelligent codebase management
171
171
 
172
172
  ### Search & Discovery
173
173
  - **`search_descriptions`** - Fast full-text search across all descriptions
174
- - **`get_codebase_overview`** - Complete hierarchical project structure
174
+ - **`get_all_descriptions`** - Complete hierarchical project structure
175
+ - **`get_codebase_overview`** - Condensed narrative overview of entire codebase
176
+ - **`get_word_frequency`** - Technical vocabulary analysis with stop-word filtering
175
177
 
176
178
  ### Advanced Features
177
179
  - **`merge_branch_descriptions`** - Two-phase merge with conflict resolution
180
+ - **`update_codebase_overview`** - Create comprehensive codebase documentation
178
181
 
179
182
  ## 🏗️ Architecture Highlights
180
183
 
@@ -6,18 +6,19 @@ mcp_code_indexer/logging_config.py,sha256=5L1cYIG8IAX91yCjc5pzkbO_KPt0bvm_ABHB53
6
6
  mcp_code_indexer/main.py,sha256=eRc0Vl3DVDGS5XtuPCDBArgmqcBIi92O97LbE8HYGGA,13601
7
7
  mcp_code_indexer/merge_handler.py,sha256=lJR8eVq2qSrF6MW9mR3Fy8UzrNAaQ7RsI2FMNXne3vQ,14692
8
8
  mcp_code_indexer/token_counter.py,sha256=WrifOkbF99nWWHlRlhCHAB2KN7qr83GOHl7apE-hJcE,8460
9
+ mcp_code_indexer/data/stop_words_english.txt,sha256=7Zdd9ameVgA6tN_zuXROvHXD4hkWeELVywPhb7FJEkw,6343
9
10
  mcp_code_indexer/database/__init__.py,sha256=aPq_aaRp0aSwOBIq9GkuMNjmLxA411zg2vhdrAuHm-w,38
10
- mcp_code_indexer/database/database.py,sha256=eG2xY5cd-oxRZ6mgGkqqBiJJfGCPqJgzoFq6kR99WfA,20300
11
- mcp_code_indexer/database/models.py,sha256=3wOxHKb6j3zKPWFSwB5g1TLpI507vLNZcqsxZR4VuRs,5528
11
+ mcp_code_indexer/database/database.py,sha256=ziePr0QHkPwv-plLRdySB8ei8fcXc3SOIgC0uRi47KI,26600
12
+ mcp_code_indexer/database/models.py,sha256=_vCmJnPXZSiInRzyvs4c7QUWuNNW8qsOoDlGX8J-Gnk,7124
12
13
  mcp_code_indexer/middleware/__init__.py,sha256=p-mP0pMsfiU2yajCPvokCUxUEkh_lu4XJP1LyyMW2ug,220
13
14
  mcp_code_indexer/middleware/error_middleware.py,sha256=v6jaHmPxf3qerYdb85X1tHIXLxgcbybpitKVakFLQTA,10109
14
15
  mcp_code_indexer/server/__init__.py,sha256=16xMcuriUOBlawRqWNBk6niwrvtv_JD5xvI36X1Vsmk,41
15
- mcp_code_indexer/server/mcp_server.py,sha256=DtfoeoOF72kdWO7fCYU9qvhg3LUyJU12KpjCCqw_Uxw,50589
16
+ mcp_code_indexer/server/mcp_server.py,sha256=X6_TV9ac5fZX-5bKblhx-gKHP1Cba_KDatYYLqBDvkc,59474
16
17
  mcp_code_indexer/tiktoken_cache/9b5ad71b2ce5302211f9c61530b329a4922fc6a4,sha256=Ijkht27pm96ZW3_3OFE-7xAPtR0YyTWXoRO8_-hlsqc,1681126
17
18
  mcp_code_indexer/tools/__init__.py,sha256=m01mxML2UdD7y5rih_XNhNSCMzQTz7WQ_T1TeOcYlnE,49
18
- mcp_code_indexer-1.1.5.dist-info/licenses/LICENSE,sha256=JN9dyPPgYwH9C-UjYM7FLNZjQ6BF7kAzpF3_4PwY4rY,1086
19
- mcp_code_indexer-1.1.5.dist-info/METADATA,sha256=anW6T4WSZBQjIgvFaViNibdh3mgIeLW35vxiaj3wrjM,11930
20
- mcp_code_indexer-1.1.5.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
21
- mcp_code_indexer-1.1.5.dist-info/entry_points.txt,sha256=8HqWOw1Is7jOP1bvIgaSwouvT9z_Boe-9hd4NzyJOhY,68
22
- mcp_code_indexer-1.1.5.dist-info/top_level.txt,sha256=yKYCM-gMGt-cnupGfAhnZaoEsROLB6DQ1KFUuyKx4rw,17
23
- mcp_code_indexer-1.1.5.dist-info/RECORD,,
19
+ mcp_code_indexer-1.2.1.dist-info/licenses/LICENSE,sha256=JN9dyPPgYwH9C-UjYM7FLNZjQ6BF7kAzpF3_4PwY4rY,1086
20
+ mcp_code_indexer-1.2.1.dist-info/METADATA,sha256=NcWRwgD-W2qG8g6aDsT7uKxA5Ya4hFFKnGlMPmwEef0,12201
21
+ mcp_code_indexer-1.2.1.dist-info/WHEEL,sha256=_zCd3N1l69ArxyTb8rzEoP9TpbYXkqRFSNOD5OuxnTs,91
22
+ mcp_code_indexer-1.2.1.dist-info/entry_points.txt,sha256=8HqWOw1Is7jOP1bvIgaSwouvT9z_Boe-9hd4NzyJOhY,68
23
+ mcp_code_indexer-1.2.1.dist-info/top_level.txt,sha256=yKYCM-gMGt-cnupGfAhnZaoEsROLB6DQ1KFUuyKx4rw,17
24
+ mcp_code_indexer-1.2.1.dist-info/RECORD,,