n2-soul 7.0.0 → 7.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.ko.md CHANGED
@@ -4,14 +4,15 @@
4
4
 
5
5
  **AI 에이전트는 세션이 끝나면 모든 걸 잊어버립니다. Soul이 그걸 해결합니다.**
6
6
  **AI 에이전트가 위험한 행동을 할 수도 있습니다. Ark가 그걸 막습니다.**
7
+ **AI 에이전트가 관련 없는 코드를 읽느라 토큰을 낭비합니다. Arachne가 그걸 해결합니다.**
7
8
 
8
- > ### 🚀 v6.1 업데이트 — 클라우드 저장
9
+ > ### 🚀 v7.0 업데이트 — Arachne
9
10
  >
10
- > AI 기억을 **어디에든** 저장하세요 Google Drive, OneDrive, NAS, 회사 서버, USB. 설정 한 줄이면 끝:
11
- > ```js
12
- > DATA_DIR: 'G:/내 드라이브/n2-soul'
11
+ > **Arachne** 코드 컨텍스트 어셈블리 엔진. 코드베이스 전체를 인덱싱하고 AI에게 **정확히** 필요한 것만 전달합니다.
13
12
  > ```
14
- > **월 $0. API 없음. 의존성 없음.** Soul은 기존 파일 동기화를 활용합니다. [자세히 →](#️-클라우드-저장--ai-기억을-원하는-곳-어디에든)
13
+ > 50,000 파일 프로젝트 가장 관련 있는 30개 청크 30K 토큰 (500K+ 대신)
14
+ > ```
15
+ > BM25 검색 + 의존성 추적 + 스마트 어셈블리. Ollama를 통한 시맨틱 검색도 지원. [자세히 →](#arachne--최고의-직조사)
15
16
  >
16
17
  > **Ark** (v6.0) 포함 — 토큰 비용 0으로 위험한 행동을 차단하는 AI 안전 시스템. [자세히 →](#ark--최후의-방패)
17
18
 
@@ -24,6 +25,7 @@ Cursor, VS Code Copilot 등 MCP 호환 AI 에이전트와 새 채팅을 시작
24
25
  - 🏷️ **엔티티 메모리** — 인물, 하드웨어, 프로젝트를 자동 추적합니다 (v5.0)
25
26
  - 💡 **코어 메모리** — 에이전트별 핵심 사실이 항상 로드됩니다 (v5.0)
26
27
  - 🛡️ **Ark** — 토큰 비용 0으로 위험한 행동을 차단하는 AI 안전 시스템 (v6.0)
28
+ - 🕸️ **Arachne** — AI에게 정확히 필요한 코드만 전달하는 코드 컨텍스트 엔진 (v7.0)
27
29
 
28
30
  > ⚡ **Soul은 N2 Browser의 작은 부속품 하나입니다** — 우리가 만들고 있는 AI 네이티브 브라우저의 일부예요. 멀티 에이전트 오케스트레이션, 실시간 도구 라우팅, 에이전트 간 통신 등 훨씬 더 많은 기능들이 현재 테스트 중입니다. 이건 시작에 불과합니다.
29
31
 
@@ -206,6 +208,7 @@ n2_work_end(project, title, summary, todo, entities, insights)
206
208
  | **시맨틱 검색** | Ollama 임베딩 연동 (nomic-embed-text, 선택사항) |
207
209
  | **백업/복원** | 설정 가능한 보존 기간의 증분 백업 |
208
210
  | **Ark** | 토큰 비용 0으로 위험한 행동을 차단하는 AI 안전 시스템 (v6.0) |
211
+ | **🕸️ Arachne** | 🆕 코드 컨텍스트 어셈블리 — 코드베이스를 인덱싱하고 AI에게 필요한 것만 전달 (v7.0) |
209
212
  | **☁️ 클라우드 저장** | 기억을 어디에든 저장 — Google Drive, NAS, 회사 서버, 아무 경로나 (v6.1) |
210
213
 
211
214
  ## ☁️ 클라우드 저장 — AI 기억을 원하는 곳 어디에든
@@ -356,6 +359,7 @@ Soul의 데이터는 모두 '일반 파일'이므로, OS 기본 기능(크론탭
356
359
  | `n2_kv_backup` | SQLite DB로 백업 |
357
360
  | `n2_kv_restore` | 백업에서 복원 |
358
361
  | `n2_kv_backup_list` | 백업 이력 조회 |
362
+ | `n2_arachne` | 🆕 코드 컨텍스트: 인덱싱, 검색, 어셈블, 백업, 상태 (v7.0) |
359
363
 
360
364
  ## KV-Cache 점진적 로딩
361
365
 
@@ -403,22 +407,23 @@ module.exports = {
403
407
 
404
408
  ```
405
409
  soul/
406
- ├── rules/ # Ark 안전 규칙 (활성) ← NEW v6.0
410
+ ├── rules/ # Ark 안전 규칙 (활성) ← v6.0
407
411
  │ └── default.n2 # 기본 규칙셋 (125개 패턴)
408
412
  ├── lib/
409
- └── ark/ # Ark 코어 엔진 ← NEW v6.0
410
- ├── index.js # createArk() 팩토리
411
- ├── gate.js # SafetyGate 엔진
412
- ├── parser.js # .n2 규칙 파서
413
- ├── audit.js # 감사 로거
414
- └── examples/ # 업종별 규칙 템플릿
415
- ├── medical.n2 # 의료 (HIPAA, 처방)
416
- ├── military.n2 # 군사 (교전, 핵)
417
- ├── financial.n2 # 금융 (결제, 거래)
418
- ├── legal.n2 # 법률 (계약, 소송)
419
- ├── privacy.n2 # 개인정보 (GDPR, CCPA)
420
- ├── autonomous.n2 # 자율주행 (드론, 차량)
421
- └── system.n2 # 시스템 (배포, 인프라)
413
+ ├── ark/ # Ark 코어 엔진 ← v6.0
414
+ ├── index.js # createArk() 팩토리
415
+ ├── gate.js # SafetyGate 엔진
416
+ ├── parser.js # .n2 규칙 파서
417
+ ├── audit.js # 감사 로거
418
+ └── examples/ # 업종별 규칙 템플릿
419
+ └── arachne/ # Arachne 코드 컨텍스트 엔진 ← NEW v7.0
420
+ ├── index.js # createArachne() 팩토리
421
+ ├── indexer.js # 파일 스캐너 + 증분 인덱싱
422
+ ├── chunker.js # 언어 인식 코드 청킹
423
+ ├── search.js # BM25 검색 엔진
424
+ ├── assembler.js # 토큰 예산 기반 컨텍스트 어셈블리
425
+ ├── store.js # SQLite 저장 (sql.js)
426
+ │ └── ignore.js # .gitignore + .contextignore 지원
422
427
  ├── data/
423
428
  │ ├── memory/ # 공유 두뇌 (n2_brain_read/write)
424
429
  │ │ ├── entities.json # 엔티티 메모리 (자동 추적)
@@ -433,7 +438,8 @@ soul/
433
438
  │ │ └── ledger/ # 변경 불가능한 작업 로그
434
439
  │ │ └── 2026/03/09/
435
440
  │ │ └── 001-agent.json
436
- │ ├── ark-audit/ # Ark 차단/통과 로그 ← NEW v6.0
441
+ │ ├── ark-audit/ # Ark 차단/통과 로그 ← v6.0
442
+ │ ├── arachne/ # Arachne 인덱스 DB + 임베딩 ← NEW v7.0
437
443
  │ └── kv-cache/ # 세션 스냅샷
438
444
  │ ├── snapshots/ # JSON 백엔드
439
445
  │ ├── sqlite/ # SQLite 백엔드
package/README.md CHANGED
@@ -6,18 +6,19 @@
6
6
  [![License](https://img.shields.io/badge/license-Apache--2.0-blue.svg)](LICENSE)
7
7
  [![Node](https://img.shields.io/badge/node-%3E%3D18-brightgreen.svg)](https://nodejs.org)
8
8
  [![npm downloads](https://img.shields.io/npm/dm/n2-soul.svg)](https://www.npmjs.com/package/n2-soul)
9
- [![NEW](https://img.shields.io/badge/v6.1-Cloud%20Storage-4488ff?style=for-the-badge)](https://github.com/choihyunsus/soul#️-cloud-storage--store-your-ai-memory-anywhere)
9
+ [![NEW](https://img.shields.io/badge/v7.0-Arachne-9944ff?style=for-the-badge)](https://github.com/choihyunsus/soul#arachne--the-greatest-weaver)
10
10
 
11
11
  **Your AI agent forgets everything when a session ends. Soul fixes that.**
12
12
  **Your AI agent might do something dangerous. Ark stops that.**
13
+ **Your AI agent wastes tokens reading irrelevant code. Arachne fixes that.**
13
14
 
14
- > ### 🚀 What's New in v6.1Cloud Storage
15
+ > ### 🚀 What's New in v7.0Arachne
15
16
  >
16
- > Store your AI memory **anywhere** Google Drive, OneDrive, NAS, company server, USB. Just one line:
17
- > ```js
18
- > DATA_DIR: 'G:/My Drive/n2-soul'
17
+ > **Arachne** Code Context Assembly Engine. Indexes your entire codebase and picks **exactly** what your AI needs.
19
18
  > ```
20
- > **$0/month. Zero API keys. Zero new dependencies.** Soul uses your existing file sync. [Learn more →](#️-cloud-storage--store-your-ai-memory-anywhere)
19
+ > 50,000 file project 30 most relevant chunks 30K tokens (instead of 500K+)
20
+ > ```
21
+ > BM25 search + dependency tracking + smart assembly. Optional semantic search via Ollama. [Learn more →](#arachne--the-greatest-weaver)
21
22
  >
22
23
  > Also includes **Ark** (v6.0) — built-in AI safety that blocks dangerous actions at zero token cost. [Learn more →](#ark--the-last-shield)
23
24
 
@@ -30,6 +31,7 @@ Every time you start a new chat with Cursor, VS Code Copilot, or any MCP-compati
30
31
  - 🏷️ **Entity Memory** — auto-tracks people, hardware, projects (v5.0)
31
32
  - 💡 **Core Memory** — agent-specific always-loaded facts (v5.0)
32
33
  - 🛡️ **Ark** — built-in AI safety that blocks dangerous actions at zero token cost (v6.0)
34
+ - 🕸️ **Arachne** — code context assembly engine that picks exactly what AI needs (v7.0)
33
35
 
34
36
  > ⚡ **Soul is one small component of N2 Browser** — an AI-native browser we're building. Multi-agent orchestration, real-time tool routing, inter-agent communication, and much more are currently in testing. This is just the beginning.
35
37
 
@@ -47,6 +49,7 @@ Every time you start a new chat with Cursor, VS Code Copilot, or any MCP-compati
47
49
  - [Configuration](#configuration)
48
50
  - [Contributing](#contributing)
49
51
  - [Ark — The Last Shield](#ark--the-last-shield)
52
+ - [Arachne — The Greatest Weaver](#arachne--the-greatest-weaver)
50
53
 
51
54
  ## Quick Start
52
55
 
@@ -243,6 +246,7 @@ n2_work_end(project, title, summary, todo, entities, insights)
243
246
  | **Semantic Search** | Optional Ollama embedding (nomic-embed-text) |
244
247
  | **Backup/Restore** | Incremental backups with configurable retention |
245
248
  | **Ark** | 🆕 Built-in AI safety — blocks dangerous actions at zero token cost |
249
+ | **Arachne** | 🆕 Code context assembly — indexes codebase, picks exactly what AI needs (v7.0) |
246
250
  | **Cloud Storage** | 🆕 Store memory anywhere — Google Drive, NAS, network server, any path (v6.1) |
247
251
 
248
252
  ## ☁️ Cloud Storage — Store Your AI Memory Anywhere
@@ -536,6 +540,102 @@ module.exports = {
536
540
  - **Wildcard destruction** — blocks `rm *`, `find -delete`, `xargs rm` (self-protection bypass)
537
541
  - **Command execution gate** — `@gate` on `execute_command`, `run_shell`, etc. (whitelist approach)
538
542
 
543
+ ## Arachne — The Greatest Weaver
544
+
545
+ > *In Greek mythology, Arachne was a mortal weaver whose tapestries rivaled the gods. She wove exactly the right threads in exactly the right places.*
546
+
547
+ **Arachne** is Soul's code context assembly engine. It indexes your entire codebase and picks **exactly** the chunks your AI agent needs — no more, no less.
548
+
549
+ ### The Problem
550
+
551
+ AI agents waste massive tokens reading irrelevant code:
552
+
553
+ | Approach | Tokens used | Relevance |
554
+ |----------|:----------:|:---------:|
555
+ | **Paste entire file** | 10,000+ | ~20% relevant |
556
+ | **Dump whole project** | 500,000+ | ~5% relevant |
557
+ | **Arachne** | ~14,000 | **~90% relevant** |
558
+
559
+ ### Real-World Benchmark (N2 Browser Project)
560
+
561
+ | Metric | Value |
562
+ |--------|:-----:|
563
+ | **Project size** | 3,219 files, 4.68M tokens |
564
+ | **Arachne output** | 14,074 tokens |
565
+ | **Compression** | **333x** (99.7% reduction) |
566
+ | **Index time** | 627ms (incremental: 0ms) |
567
+ | **DB size** | 24 MB |
568
+
569
+ ### How Arachne Works
570
+
571
+ ```
572
+ Your 50,000-file project
573
+
574
+ ┌────┴────┐
575
+ │ Index │ ← Scans all files, chunks by function/class
576
+ │ (boot) │ Incremental: only re-indexes changed files
577
+ └────┬────┘
578
+
579
+ ┌────┴────┐
580
+ │ Search │ ← BM25 keyword search (+ optional semantic via Ollama)
581
+ │ (query) │ Finds the most relevant chunks across all files
582
+ └────┬────┘
583
+
584
+ ┌────┴────┐
585
+ │ Assemble │ ← Picks top chunks within your token budget
586
+ │ (budget) │ 4 layers: fixed + short-term + associative + spare
587
+ └────┬────┘
588
+
589
+ 30 most relevant
590
+ code chunks → AI
591
+ ```
592
+
593
+ ### Key Features
594
+
595
+ | Feature | Description |
596
+ |---------|------------|
597
+ | **Incremental Indexing** | Only re-indexes changed files (hash-based detection) |
598
+ | **Language-Aware Chunking** | Splits code by function/class boundaries, not arbitrary lines |
599
+ | **BM25 Search** | Fast keyword search with TF-IDF ranking |
600
+ | **Semantic Search** | Optional Ollama embeddings (nomic-embed-text) |
601
+ | **Token Budget Assembly** | Smart context assembly within configurable token limits |
602
+ | **4-Layer Assembly** | Fixed (10%) + Short-term (30%) + Associative (40%) + Spare (20%) |
603
+ | **17 Languages** | JS, TS, Python, Rust, Go, Java, C/C++, C#, Ruby, PHP, Swift, Kotlin |
604
+ | **12 Text Formats** | MD, JSON, YAML, XML, HTML, CSS, SQL, Shell scripts |
605
+ | **Backup/Restore** | Incremental backups with configurable retention |
606
+
607
+ ### Configuration
608
+
609
+ Arachne settings in `lib/config.default.js`:
610
+
611
+ ```js
612
+ ARACHNE: {
613
+ projectDir: null, // Set to your project root to enable
614
+ indexing: {
615
+ autoIndex: true, // Auto-index on boot
616
+ maxFileSize: 512 * 1024,
617
+ },
618
+ assembly: {
619
+ defaultBudget: 30000, // Token budget for context
620
+ },
621
+ embedding: {
622
+ enabled: false, // true = requires: ollama pull nomic-embed-text
623
+ },
624
+ }
625
+ ```
626
+
627
+ ### Usage
628
+
629
+ ```
630
+ n2_arachne(action: "index") → Index your project files
631
+ n2_arachne(action: "search", query: "authentication JWT") → Search code
632
+ n2_arachne(action: "assemble", query: "how does auth work?", budget: 30000) → Full context assembly
633
+ n2_arachne(action: "status") → Check index status
634
+ n2_arachne(action: "backup") → Backup index DB
635
+ ```
636
+
637
+ > **Also available as standalone package:** [`n2-arachne`](https://www.npmjs.com/package/n2-arachne) — use Arachne without Soul.
638
+
539
639
  ## Available Tools
540
640
 
541
641
  | Tool | Description |
@@ -559,6 +659,7 @@ module.exports = {
559
659
  | `n2_kv_backup` | Backup to portable SQLite DB |
560
660
  | `n2_kv_restore` | Restore from backup |
561
661
  | `n2_kv_backup_list` | List backup history |
662
+ | `n2_arachne` | 🆕 Code context: index, search, assemble, backup, status (v7.0) |
562
663
 
563
664
  ## KV-Cache Progressive Loading
564
665
 
@@ -676,22 +777,23 @@ All runtime data is stored in `data/` (gitignored, auto-created):
676
777
 
677
778
  ```
678
779
  soul/
679
- ├── rules/ # Ark safety rules (active) ← NEW v6.0
780
+ ├── rules/ # Ark safety rules (active) ← v6.0
680
781
  │ └── default.n2 # Default ruleset (125 patterns)
681
782
  ├── lib/
682
- └── ark/ # Ark core engine ← NEW v6.0
683
- ├── index.js # createArk() factory
684
- ├── gate.js # SafetyGate engine
685
- ├── parser.js # .n2 rule parser
686
- ├── audit.js # Audit logger
687
- └── examples/ # Industry rule templates
688
- ├── medical.n2 # Healthcare (HIPAA, prescriptions)
689
- ├── military.n2 # Defense (engagement, nuclear)
690
- ├── financial.n2 # Finance (payments, transactions)
691
- ├── legal.n2 # Legal (contracts, litigation)
692
- ├── privacy.n2 # Privacy (GDPR, CCPA, PII)
693
- ├── autonomous.n2 # Autonomous (drones, vehicles)
694
- └── system.n2 # DevOps (deployment, infra)
783
+ ├── ark/ # Ark core engine ← v6.0
784
+ ├── index.js # createArk() factory
785
+ ├── gate.js # SafetyGate engine
786
+ ├── parser.js # .n2 rule parser
787
+ ├── audit.js # Audit logger
788
+ └── examples/ # Industry rule templates
789
+ └── arachne/ # Arachne code context engine ← NEW v7.0
790
+ ├── index.js # createArachne() factory
791
+ ├── indexer.js # File scanner + incremental indexing
792
+ ├── chunker.js # Language-aware code chunking
793
+ ├── search.js # BM25 search engine
794
+ ├── assembler.js # Context assembly with token budget
795
+ ├── store.js # SQLite storage (sql.js)
796
+ │ └── ignore.js # .gitignore + .contextignore support
695
797
  ├── data/
696
798
  │ ├── memory/ # Shared brain (n2_brain_read/write)
697
799
  │ │ ├── entities.json # Entity Memory (auto-tracked)
@@ -706,7 +808,8 @@ soul/
706
808
  │ │ └── ledger/ # Immutable work logs
707
809
  │ │ └── 2026/03/09/
708
810
  │ │ └── 001-agent.json
709
- │ ├── ark-audit/ # Ark block/pass logs ← NEW v6.0
811
+ │ ├── ark-audit/ # Ark block/pass logs ← v6.0
812
+ │ ├── arachne/ # Arachne index DB + embeddings ← NEW v7.0
710
813
  │ └── kv-cache/ # Session snapshots
711
814
  │ ├── snapshots/ # JSON backend
712
815
  │ ├── sqlite/ # SQLite backend
package/index.js CHANGED
@@ -39,23 +39,32 @@ const ark = createArk({
39
39
  auditEnabled: true,
40
40
  });
41
41
 
42
- const _origRegisterTool = server.registerTool.bind(server);
42
+ // Ark-wrapped registerTool shim — bridges legacy registerTool() to SDK v1.6.1 server.tool()
43
+ const _origTool = server.tool.bind(server);
44
+ const _arkWrap = (name, handler) => async (args) => {
45
+ const content = JSON.stringify(args);
46
+ const result = ark.check(name, content, 'tool_call');
47
+ if (!result.allowed) {
48
+ return {
49
+ content: [{
50
+ type: 'text',
51
+ text: `[n2-ark] BLOCKED: ${result.reason}\n` +
52
+ `Rule: ${result.rule} | Action: ${result.action}\n` +
53
+ `This action requires human approval.`,
54
+ }],
55
+ };
56
+ }
57
+ return handler(args);
58
+ };
59
+ // Shim: server.registerTool(name, {title, description, inputSchema}, handler) → server.tool()
43
60
  server.registerTool = (name, schema, handler) => {
44
- _origRegisterTool(name, schema, async (args) => {
45
- const content = JSON.stringify(args);
46
- const result = ark.check(name, content, 'tool_call');
47
- if (!result.allowed) {
48
- return {
49
- content: [{
50
- type: 'text',
51
- text: `[n2-ark] BLOCKED: ${result.reason}\n` +
52
- `Rule: ${result.rule} | Action: ${result.action}\n` +
53
- `This action requires human approval.`,
54
- }],
55
- };
56
- }
57
- return handler(args);
58
- });
61
+ const desc = schema.description || schema.title || name;
62
+ _origTool(name, desc, schema.inputSchema || {}, _arkWrap(name, handler));
63
+ };
64
+ // Override: server.tool() with Ark check (for files using new API directly, e.g. arachne.js)
65
+ server.tool = (name, ...rest) => {
66
+ const handler = rest.pop();
67
+ _origTool(name, ...rest, _arkWrap(name, handler));
59
68
  };
60
69
  // ═══ End Ark ═══
61
70
 
@@ -77,13 +86,8 @@ async function boot() {
77
86
  if (config.ARACHNE?.projectDir) {
78
87
  try {
79
88
  const arachne = await createArachne({
89
+ ...config.ARACHNE,
80
90
  dataDir: config.ARACHNE.dataDir ?? path.join(config.DATA_DIR, 'arachne'),
81
- projectDir: config.ARACHNE.projectDir,
82
- indexing: config.ARACHNE.indexing ?? { autoIndex: true },
83
- search: config.ARACHNE.search ?? {},
84
- assembly: config.ARACHNE.assembly ?? {},
85
- backup: config.ARACHNE.backup ?? {},
86
- embedding: config.ARACHNE.embedding ?? {},
87
91
  });
88
92
  registerArachneTools(server, z, arachne, config);
89
93
  console.error(`[n2-soul] Arachne enabled: ${config.ARACHNE.projectDir}`);
@@ -1,12 +1,27 @@
1
1
  // chunker.js — Split source code into function/class-level chunks
2
2
  // Phase 1: Regex-based (no parser needed), Phase 2: AST replacement possible
3
3
 
4
+ // Default: ~3.5 chars/token for English code
5
+ // CJK (Korean/Chinese/Japanese): ~1.5 chars/token (set via config)
6
+ let _tokenMultiplier = 3.5;
7
+
8
+ /**
9
+ * Set token multiplier (called from config)
10
+ * @param {number|object} multiplier — number (global) or {default, ko, zh, ja} per-language
11
+ */
12
+ function setTokenMultiplier(multiplier) {
13
+ if (typeof multiplier === 'number' && multiplier > 0) {
14
+ _tokenMultiplier = multiplier;
15
+ }
16
+ }
17
+
4
18
  /**
5
- * Estimate token count (approximate without exact tokenizer)
6
- * Code average: ~3.5 characters = 1 token
19
+ * Estimate token count (configurable via tokenMultiplier)
20
+ * @param {string} text
21
+ * @param {string} [language] — optional language hint for per-language multiplier
7
22
  */
8
- function estimateTokens(text) {
9
- return Math.ceil(text.length / 3.5);
23
+ function estimateTokens(text, language) {
24
+ return Math.ceil(text.length / _tokenMultiplier);
10
25
  }
11
26
 
12
27
  // Language-specific chunk detection patterns
@@ -235,4 +250,4 @@ function detectLanguage(filePath) {
235
250
  return ext;
236
251
  }
237
252
 
238
- module.exports = { chunkCode, estimateTokens, detectLanguage, LANG_MAP, CHUNK_PATTERNS };
253
+ module.exports = { chunkCode, estimateTokens, detectLanguage, setTokenMultiplier, LANG_MAP, CHUNK_PATTERNS };
@@ -3,7 +3,7 @@ const fs = require('fs');
3
3
  const path = require('path');
4
4
  const crypto = require('crypto');
5
5
  const { IgnoreFilter } = require('./ignore');
6
- const { chunkCode, detectLanguage } = require('./chunker');
6
+ const { chunkCode, detectLanguage, setTokenMultiplier } = require('./chunker');
7
7
  const { indexFileDependencies } = require('./dependency');
8
8
 
9
9
  class Indexer {
@@ -15,6 +15,11 @@ class Indexer {
15
15
  this._store = store;
16
16
  this._config = config;
17
17
  this._ignoreFilter = null;
18
+
19
+ // Apply token multiplier from config
20
+ if (config.indexing?.tokenMultiplier) {
21
+ setTokenMultiplier(config.indexing.tokenMultiplier);
22
+ }
18
23
  }
19
24
 
20
25
  /**
@@ -93,11 +93,39 @@ module.exports = {
93
93
  dataDir: null, // null = DATA_DIR/arachne/
94
94
  indexing: {
95
95
  autoIndex: true, // Auto-index on boot
96
+ incremental: true,
96
97
  maxFileSize: 512 * 1024, // 512KB max per file
98
+ maxFiles: 50000,
99
+ chunkStrategy: 'regex',
100
+ tokenMultiplier: 3.5, // Chars per token. English/code: 3.5, CJK: 1.5
101
+ supportedLanguages: ['js', 'ts', 'jsx', 'tsx', 'py', 'rs', 'go', 'java', 'c', 'cpp', 'h', 'hpp', 'cs', 'rb', 'php', 'swift', 'kt'],
102
+ alsoIndexAsText: ['md', 'json', 'yaml', 'yml', 'toml', 'xml', 'html', 'css', 'sql', 'sh', 'bat', 'ps1'],
103
+ },
104
+ ignore: {
105
+ useGitignore: true,
106
+ useContextignore: true,
107
+ patterns: [
108
+ 'node_modules/**', 'vendor/**', '__pycache__/**', '.venv/**',
109
+ 'dist/**', 'build/**', 'out/**', '.next/**', 'target/**',
110
+ '.git/**',
111
+ '*.png', '*.jpg', '*.jpeg', '*.gif', '*.ico', '*.svg', '*.bmp', '*.webp',
112
+ '*.woff', '*.woff2', '*.ttf', '*.eot',
113
+ '*.mp3', '*.mp4', '*.wav', '*.avi', '*.mkv', '*.webm',
114
+ '*.zip', '*.tar', '*.gz', '*.rar', '*.7z',
115
+ '*.exe', '*.dll', '*.so', '*.dylib', '*.bin',
116
+ '*.min.js', '*.min.css', '*.map',
117
+ 'package-lock.json', 'yarn.lock', 'pnpm-lock.yaml',
118
+ 'soul/data/**', 'data/**',
119
+ ],
120
+ },
121
+ search: {
122
+ bm25: { k1: 1.2, b: 0.75 },
123
+ topK: 10,
97
124
  },
98
- search: {},
99
125
  assembly: {
100
126
  defaultBudget: 30000, // Token budget for context assembly
127
+ layers: { fixed: 0.10, shortTerm: 0.30, associative: 0.40, spare: 0.20 },
128
+ dependencyDepth: 2,
101
129
  },
102
130
  backup: {},
103
131
  embedding: {
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "n2-soul",
3
- "version": "7.0.0",
3
+ "version": "7.0.2",
4
4
  "description": "Multi-agent session orchestrator with KV-Cache, Ark safety, and Arachne code context for MCP",
5
5
  "main": "index.js",
6
6
  "scripts": {