@sap-ux/fiori-docs-embeddings 1.0.1 → 1.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -25,9 +25,11 @@ https://github.com/SAP/open-ux-tools/blob/main/packages/create/README.md (Fiori
25
25
 
26
26
  https://www.npmjs.com/package/@sap/ux-ui5-tooling (@sap/ux-ui5-tooling documentation)
27
27
 
28
+ https://github.com/sap-tutorials/Tutorials/blob/master/tutorials/fiori-tools-mockserver-opa-testing/fiori-tools-mockserver-opa-testing.md (OPA mock server testing guide)
29
+
28
30
  - Parses markdown, JSON, TypeScript, and other file types
29
31
  - Generates AI-powered vector embeddings using transformers
30
- - Stores embeddings in a local LanceDB vector database
32
+ - Stores embeddings in a flat binary vector store (`embeddings.bin` + `records.jsonl`)
31
33
  - Provides tools for semantic and keyword search across documentation
32
34
 
33
35
  ## Installation
@@ -54,9 +56,15 @@ const embeddingsPath = getEmbeddingsPath();
54
56
  # Set GitHub token to avoid rate limits
55
57
  export GITHUB_TOKEN=your_github_token
56
58
 
57
- # Build documentation index
59
+ # Build documentation index (all sources)
58
60
  npm run update-docs
59
61
 
62
+ # Build a single source by ID
63
+ npm run update-docs-script -- --source=fiori-tools-opa-guide
64
+
65
+ # Shortcut script for the OPA testing guide
66
+ npm run update-docs-opa-guide
67
+
60
68
  # Generate embeddings
61
69
  npm run update-embeddings
62
70
 
@@ -66,17 +74,24 @@ npm run update-all
66
74
 
67
75
  ### Available Scripts
68
76
 
69
- - `update-docs` - Crawl and index documentation from configured sources
77
+ - `update-docs` - Crawl and index documentation from all configured sources
78
+ - `update-docs-script -- --source=<id>` - Crawl a single source by ID
79
+ - `update-docs-opa-guide` - Fetch only the OPA mock server testing guide
70
80
  - `update-embeddings` - Generate vector embeddings from indexed documents
71
81
  - `update-all` - Run both documentation indexing and embedding generation
72
82
 
73
83
  ### Configuration
74
84
 
75
85
  The module indexes documentation from these sources by default:
76
- - SAP-docs/btp-fiori-tools (Fiori Tools documentation)
77
- - SAP-docs/sapui5 (UI5 framework documentation)
78
- - SAP-samples/fiori-tools-samples (Sample applications)
79
- - SAP-samples/fiori-elements-feature-showcase (Feature examples)
86
+
87
+ | Source ID | Description |
88
+ |---|---|
89
+ | `btp-fiori-tools` | SAP-docs/btp-fiori-tools Fiori Tools documentation |
90
+ | `sapui5` | SAP-docs/sapui5 — UI5 Fiori Elements documentation |
91
+ | `fiori-samples` | SAP-samples/fiori-tools-samples — Sample applications |
92
+ | `fiori-showcase` | SAP-samples/fiori-elements-feature-showcase — Feature examples |
93
+ | `tools-suite` | ux-engineering/tools-suite — Internal Fiori Tools commands (requires `GITHUB_TOKEN`) |
94
+ | `fiori-tools-opa-guide` | sap-tutorials/Tutorials — OPA mock server testing guide |
80
95
 
81
96
  ### Environment Variables
82
97
 
@@ -88,7 +103,7 @@ Generated data is organized as:
88
103
  ```
89
104
  data/
90
105
  ├── docs/ # Parsed documentation files
91
- ├── embeddings/ # Vector database (LanceDB)
106
+ ├── embeddings/ # Flat binary vector store (embeddings.bin, records.jsonl, metadata.json)
92
107
  └── search/ # Search indexes
93
108
  ```
94
109
 
@@ -96,8 +111,8 @@ data/
96
111
 
97
112
  - **Multi-source indexing** - Supports GitHub repositories and JSON APIs
98
113
  - **File type support** - Markdown, JSON, TypeScript, JavaScript, XML, YAML, and more
99
- - **Vector embeddings** - Uses sentence-transformers/all-MiniLM-L6-v2 model
100
- - **Local storage** - All data stored locally with LanceDB
114
+ - **Vector embeddings** - Uses `@huggingface/transformers` with the `Xenova/all-MiniLM-L6-v2` model (q8 quantized)
115
+ - **Local storage** - All data stored locally as a flat binary vector store (no native database dependency)
101
116
  - **Caching** - Intelligent caching to avoid unnecessary API calls
102
117
  - **Chunking** - Smart document chunking for optimal embedding generation
103
118
 
Binary file
@@ -1,10 +1,10 @@
1
1
  {
2
- "version": "1.0.0",
3
- "createdAt": "2026-06-04T11:06:17.796Z",
2
+ "version": "2.0.0",
3
+ "createdAt": "2026-06-11T15:49:06.269Z",
4
4
  "model": "Xenova/all-MiniLM-L6-v2",
5
5
  "dimensions": 384,
6
- "totalVectors": 749,
7
- "totalDocuments": 749,
6
+ "totalVectors": 750,
7
+ "totalDocuments": 750,
8
8
  "chunkSize": 2000,
9
9
  "chunkOverlap": 100
10
10
  }