@sjovanovic/recall.js 1.0.0 → 1.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +31 -29
  2. package/package.json +1 -1
  3. package/recall.js +58 -61
package/README.md CHANGED
@@ -6,25 +6,33 @@
6
6
 
7
7
  Recall.js is long term memory for AI apps!
8
8
 
9
- It is a generic RAG (Retrieval-augmented generation) JavaScript library and command line interface focused on speed, ease of use and embeddability.
9
+ It is a tool for building RAG (Retrieval-augmented generation) in a form of JavaScript library and command line utility focused on speed, ease of use and embeddability.
10
10
 
11
- It is versatile: use it for generic Semantic Search, as expert memory for your AI app, as a recommendation system, there are so many possibilities.
11
+ It is versatile and you don't have to use it exclusively for RAG, use it for generic Semantic Search, as expert memory for your AI app, as a recommendation system, there are so many possibilities...
12
12
 
13
13
  Recall.js supports multilingual embeddings out of the box so you can add data in one language and then query it in another.
14
14
 
15
- Under the hood, recall.js uses sentence vector embeddings and a vector database to index and query your data. It is a light wrapper around local language models such as [MiniLM-L12-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L12-v2) (optionally LLMs can be used) and [CozoDB](https://www.cozodb.org/) vector database.
15
+ Under the hood, recall.js uses sentence vector embeddings and a vector database to index and query your data. It is a light wrapper around local language models such as [MiniLM-L12-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L12-v2) and [CozoDB](https://www.cozodb.org/) vector database.
16
16
 
17
17
  ## Install
18
18
 
19
- `npm install recall`
19
+ `npm install @sjovanovic/recall.js`
20
20
 
21
21
  ## Usage
22
22
 
23
- Warning: when this library is used for the first time, it will download a local language model MiniLM-L12-v2 which may take long time depending on your Internet connectivity. Please be patient.
23
+ Console:
24
+
25
+ ```console
26
+ recall --add 'The quick brown fox jumps over the lazy dog|Fox|{"foo":"bar"}'
27
+ recall --query "Un animal saute par-dessus un autre animal" --limit 1
28
+ ```
29
+ **Warning:** when this library is used for the first time, it will download a local language model MiniLM-L12-v2 which may take long time depending on your Internet connectivity. Please be patient.
30
+
31
+ Below is the same example in JavaScript:
24
32
 
25
33
  ```javascript
26
34
 
27
- import * as RECALL from 'recall'
35
+ import * as RECALL from '@sjovanovic/recall.js'
28
36
 
29
37
  const testRecall = async () => {
30
38
  await RECALL.addBatch([
@@ -36,7 +44,8 @@ const testRecall = async () => {
36
44
  ])
37
45
 
38
46
  // Semantic search query in different language (French) "Animal jumps over another animal"
39
- let response = await RECALL.searchText("Un animal saute par-dessus un autre animal", 1)
47
+ let response = await RECALL.searchText("Un animal saute par-dessus un autre animal", 1)
48
+ console.log(response)
40
49
  }
41
50
  testRecall()
42
51
 
@@ -54,7 +63,7 @@ response:
54
63
  "rows": [
55
64
  [
56
65
  0.5840495824813843, // vector similarity
57
- "Fox",
66
+ "Fox and dog",
58
67
  "08840189191373282",
59
68
  {
60
69
  "foo": "bar"
@@ -67,18 +76,11 @@ response:
67
76
 
68
77
  ```
69
78
 
70
- Here's how the above example looks like in CLI:
71
-
72
- ```log
73
- recall --add 'The quick brown fox jumps over the lazy dog|Fox|{"foo":"bar"}'
74
- recall --query "Un animal saute par-dessus un autre animal" --limit 1
75
- ```
76
-
77
79
  ## Options
78
80
 
79
- Easiest way to get all the options is via command line:
81
+ Easy way to view all the options is via command line:
80
82
 
81
- ```log
83
+ ```console
82
84
  recall --help
83
85
 
84
86
  Usage:
@@ -98,12 +100,12 @@ Options:
98
100
  --json "FILE_NAME" - import from file which has one json object per line: {input:"", result:"", data:{}}
99
101
  ```
100
102
 
101
- Note when adding data recall will generate unique id automatically. To set custom id add it as a string property named "id" in the data object (i.e. `{"id":"customID"}`).
103
+ **Note:** when adding data recall will generate unique id automatically. To set custom id add it as a string property named "id" in the data object (i.e. `{"id":"customID"}`).
102
104
 
103
105
 
104
106
  ## JavaScript API Reference
105
107
 
106
- **RECALL.config**
108
+ ### RECALL.config
107
109
 
108
110
  Configuration object.
109
111
 
@@ -117,19 +119,19 @@ export const config = {
117
119
  }
118
120
  ```
119
121
 
120
- **RECALL.getDb()**
122
+ ### RECALL.getDb()
121
123
 
122
124
  Returns reference to the CozoDB instance.
123
125
 
124
- **RECALL.getEmbeddings(text) -> Promise(Array)**
126
+ ### RECALL.getEmbeddings(text) -> Promise(Array)
125
127
 
126
128
  Given text calculates the embeddings vector
127
129
 
128
- **RECALL.add(input, result, data={}) -> Promise(Object)**
130
+ ### RECALL.add(input, result, data={}) -> Promise(Object)
129
131
 
130
132
  Add data. `input` is the sentence to get embeddings from. `result` is the string to show in the results. `data` is arbitrary object intended to hold related pieces of information and references. If `data` object contains `id` property it will be used as unique id of the record.
131
133
 
132
- **RECALL.addBatch(batch) -> Promise(Object)**
134
+ ### RECALL.addBatch(batch) -> Promise(Object)
133
135
 
134
136
  Add data in batches (faster than using add repeteadely).
135
137
  `batch` is an Array that looks like this:
@@ -137,19 +139,19 @@ Add data in batches (faster than using add repeteadely).
137
139
  let batch = [{input:"", result:"", data:{}}]
138
140
  ```
139
141
 
140
- **RECALL.remove(id) -> Promise(Object)**
142
+ ### RECALL.remove(id) -> Promise(Object)
141
143
 
142
144
  Remove data by id. id is a string.
143
145
 
144
- **RECALL.searchText(text, numResults = 5) -> Promise(Object)**
146
+ ### RECALL.searchText(text, numResults = 5) -> Promise(Object)
145
147
 
146
148
  Query the vector database. Accepts query text and number of results to return.
147
149
 
148
- **RECALL.nuke()**
150
+ ### RECALL.nuke()
149
151
 
150
152
  Deletes the database.
151
153
 
152
- **RECALL.importFromJSONStream(fileName) -> Promise(object)**
154
+ ### RECALL.importFromJSONStream(fileName) -> Promise(object)
153
155
 
154
156
  Imports from readable stream or file which consists of JSON objects, one per line. e.g.
155
157
  ```
@@ -159,13 +161,13 @@ Imports from readable stream or file which consists of JSON objects, one per lin
159
161
  ```
160
162
  This is the most efficient way to import data.
161
163
 
162
- **RECALL.importFromCSVorTSV(fileName, inputHeader=null, resultHeader=null) -> Promise()**
164
+ ### RECALL.importFromCSVorTSV(fileName, inputHeader=null, resultHeader=null) -> Promise()
163
165
 
164
166
  Imports from CSV or TSV file. By default fist column is used as input, second as result and remaining columns are put in the data object.
165
167
  If `inputHeader` is specified, function will try to find the column by that name and use it as input.
166
168
  If `resultHeader` is specified, function will try to find the column by that name and use it as result.
167
169
 
168
- **RECALL.mcp() -> Promise()**
170
+ ### RECALL.mcp() -> Promise()
169
171
 
170
172
  (Experimental)
171
173
  Runs MCP server and makes the results available when mentioning `Recall search` in the prompt. Currently only supports STDIO.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@sjovanovic/recall.js",
3
- "version": "1.0.0",
3
+ "version": "1.0.3",
4
4
  "description": "Semantic search as long term memory for LLMs",
5
5
  "main": "recall.js",
6
6
  "bin": {
package/recall.js CHANGED
@@ -23,22 +23,26 @@ export const config = {
23
23
  PATH: PATH // directory of recall.js
24
24
  }
25
25
 
26
- var db = null
26
+ var db = null, initDone = false
27
27
 
28
28
  export const getDb = () => {
29
- if(!db) db = new CozoDb('sqlite', config.DB_FILE)
29
+ if(!db) {
30
+ db = new CozoDb('sqlite', config.DB_FILE)
31
+ }
30
32
  return db
31
33
  }
32
34
 
33
35
  async function printQuery(query, params = {}) {
34
- try {
35
- if(!db) {
36
- getDb()
37
- try {
38
- let isCreated = await createTable()
39
- if(isCreated) console.log('Created embeddings table.')
40
- }catch(err) {}
36
+
37
+ try{
38
+ if(!initDone) {
39
+ initDone = true
40
+ await createTable()
41
41
  }
42
+ }catch(err) {
43
+ //console.log('CREATE TABLE ERROR', err)
44
+ }
45
+ try {
42
46
  let data = getDb().run(query, params)
43
47
  return data
44
48
  }catch(err){
@@ -57,7 +61,7 @@ export const getEmbeddings = async (text) => {
57
61
 
58
62
  export const createTable = async () => {
59
63
  // create table (id, v, input, result, data)
60
- let tableCreated = await printQuery(`:create embeddings {id: String => v: <F32; ${config.VECTOR_SIZE}>, input: String, result: String, data: Json}`)
64
+ let tableCreated = await printQuery(`:create embeddings {id: String, category: String => v: <F32; ${config.VECTOR_SIZE}>, input: String, result: String, data: Json}`)
61
65
  if(tableCreated){
62
66
  // create index
63
67
  let indexCreated = await printQuery(`::hnsw create embeddings:index_name {
@@ -67,7 +71,6 @@ export const createTable = async () => {
67
71
  fields: [v],
68
72
  distance: L2, # Cosine, IP
69
73
  ef_construction:50, # number of nearest neighbors
70
- #filter: k != 'foo', # only those rows for which the expression evaluates to true are indexed
71
74
  extend_candidates: false, # include nearest neighbors of the nearest neighbors
72
75
  keep_pruned_connections: false,
73
76
  }`)
@@ -76,21 +79,15 @@ export const createTable = async () => {
76
79
  return false
77
80
  }
78
81
 
79
- export const add = async (input, result, data={}) => {
82
+ export const add = async (input, result, data={}, category="") => {
80
83
  if(!input || !result) return
81
84
 
82
85
  input = sanitizeString(input)
83
86
  result = sanitizeString(result)
84
-
85
-
86
-
87
87
  const embedding = await getEmbeddings(input)
88
-
89
- console.log('Adding', input, '->', result)
90
-
91
88
  let id = data.id || Math.random().toString().substring(2)
92
- return await printQuery(`?[id, v, input, result, data] <- [["${id}", ${JSON.stringify(embedding)}, ${JSON.stringify(input.replaceAll('"', "'"))}, ${JSON.stringify(result.replaceAll('"', "'"))}, ${JSON.stringify(data)} ]]
93
- :put embeddings {id => v, input, result, data}
89
+ return await printQuery(`?[id, v, input, result, data, category] <- [["${id}", ${JSON.stringify(embedding)}, ${JSON.stringify(input.replaceAll('"', "'"))}, ${JSON.stringify(result.replaceAll('"', "'"))}, ${JSON.stringify(data)}, ${JSON.stringify(category.replaceAll('"', "'"))} ]]
90
+ :put embeddings {id, category => v, input, result, data}
94
91
  `)
95
92
  }
96
93
 
@@ -106,25 +103,26 @@ export const addBatch = async (batch) => {
106
103
  if(!batch || !Array.isArray(batch)) return
107
104
  let vectorBatch = []
108
105
  for(let i=0;i<batch.length; i++){
109
- let {input, result, data} = batch[i]
106
+ let {input, result, data, category} = batch[i]
110
107
 
111
108
  if(!input || !result) continue
112
109
  if(!data) data = {}
110
+ if(!category) category = ''
113
111
  const embedding = await getEmbeddings(input)
114
112
  batch[i].embedding = embedding
115
113
  let item = ''
116
114
  if(i == 0) {
117
- item += `?[id, v, input, result, data] <- [`
115
+ item += `?[id, v, input, result, data, category] <- [`
118
116
  }
119
117
 
120
118
  input = sanitizeString(input)
121
119
  result = sanitizeString(result)
122
120
 
123
121
  let id = data?.id ? data.id : Math.random().toString().substring(2)
124
- item += `["${id}", ${JSON.stringify(embedding)}, ${JSON.stringify(input)}, ${JSON.stringify(result)}, ${JSON.stringify(data)} ],`
122
+ item += `["${id}", ${JSON.stringify(embedding)}, ${JSON.stringify(input)}, ${JSON.stringify(result)}, ${JSON.stringify(data)}, ${JSON.stringify(category)} ],`
125
123
  if(i == batch.length-1) {
126
124
  item += `]
127
- :put embeddings {id => v, input, result, data}`
125
+ :put embeddings {id, category => v, input, result, data}`
128
126
  }
129
127
  vectorBatch.push(item)
130
128
  }
@@ -135,31 +133,38 @@ const sanitizeString = (str)=>{
135
133
  return str.replace(/[\/#$%\^&\*{}=_`~()\"]/g," ").replace(/\s{2,}/g, " ")
136
134
  }
137
135
 
138
- export const remove = async (id) => {
136
+ export const remove = async (id, category="") => {
139
137
  if(!id || typeof id != 'string') return
140
138
  id.replace(/[^a-zA-Z0-9]/g, '')
141
139
  if(!id) return
142
140
  let results = await printQuery(
143
- `?[id] <- [['${id}']]
141
+ `?[id, category] <- [['${id}', '${category}']]
144
142
  ::remove embeddings {id}`)
145
143
  return results
146
144
  }
147
145
 
148
- export const searchText = async (text, numResults = 5) => {
146
+ export const searchText = async (text, category="", numResults = 5) => {
149
147
  const embedding = await getEmbeddings(text)
150
- let results = await printQuery(`?[dist, result, id, data] := ~embeddings:index_name{ id, v, input, result, data |
148
+ let results = await printQuery(`?[dist, result, id, data, category] := ~embeddings:index_name { id, v, input, result, data, category |
151
149
  query: q,
152
150
  k: ${numResults}, # number of results
153
- ef: 90, # number of neighbours to consider
151
+ ef: 50, # number of neighbours to consider
154
152
  bind_distance: dist,
153
+ filter: category==${JSON.stringify(category)},
155
154
  radius: 10.0
156
155
  }, q = vec(${JSON.stringify(embedding)})
157
- :sort dist`)
156
+ :sort -dist`)
158
157
  return results
159
158
  }
160
159
 
161
- export const vectorSearch = async (query, numResults=5) => {
162
- return await searchText(query, numResults)
160
+ export const vectorSearch = async (query, category='', numResults=5) => {
161
+ let result = undefined
162
+ try{
163
+ result = await searchText(query, category, numResults)
164
+ }catch(err){
165
+ if(config.SHOW_ERRORS) console.error(err.display || err.message)
166
+ }
167
+ return result
163
168
  }
164
169
 
165
170
  const cmdArgs = (list = []) => {
@@ -274,19 +279,6 @@ export const importFromCSVorTSV = async (fileName, inputHeader, resultHeader) =>
274
279
 
275
280
  let results = await fetchFromFile(fileName)
276
281
 
277
- // // split results to sentences
278
- // let results_raw = await fetchFromFile(fileName)
279
- // let results = []
280
- // for(let i=0;i<results_raw.length; i++){
281
- // let sentences = splitSentences(results_raw[i].input)
282
- // for(let j=0; j<sentences.length; j++){
283
- // results.push({
284
- // ...results_raw[i],
285
- // ...{ input: sentences[j] }
286
- // })
287
- // }
288
- // }
289
-
290
282
  let batchSize = 40, batch = [], currentBatch = 0, totalBatches = Math.ceil(results.length / batchSize), dataHeaders = Object.keys(results[results.length-1]).filter(k => k != 'input' && k != 'result'), data
291
283
  for(let i=0; i<results.length; i++){
292
284
  if(i % batchSize === 0){
@@ -431,18 +423,22 @@ const splitSentences = (text) => {
431
423
  }
432
424
 
433
425
  const runCLI = async () => {
434
- let args = cmdArgs(['--query', '-q', '--add', '--db', '--import', '--json', '--mcp', '--nuke', '--input-header', '--result-header', '--test', '--limit'])
426
+ let args = cmdArgs(['--query', '-q', '--add', '--db', '--import', '--json', '--mcp', '--nuke', '--input-header', '--result-header', '--test', '--limit', '--category'])
435
427
  let query = args['--query'] || args['-q']
436
428
  if(args['--db']){
437
429
  config.DB_FILE = args['--db']
438
430
  }
431
+ let category = ''
432
+ if(args['--category']) {
433
+ category = args['--category']
434
+ }
439
435
  if(query){
440
436
  let numResults = 5
441
437
  if(args['--limit'] && parseInt(args['--limit'])) {
442
438
  numResults = parseInt(args['--limit'])
443
439
  }
444
440
  console.time('Search time')
445
- let result = await vectorSearch(query, numResults)
441
+ let result = await vectorSearch(query, category, numResults)
446
442
  console.timeEnd('Search time')
447
443
  console.log('Results:')
448
444
  console.log(JSON.stringify(result, null, 2))
@@ -451,17 +447,17 @@ const runCLI = async () => {
451
447
  if(!input || !result) {
452
448
  console.log('Usage:')
453
449
  return console.log(args._cmd + `--add 'input|result|{"foo":"bar"}'`)
454
- }
450
+ }
455
451
  let data = {}
456
452
  if(dataString) {
457
453
  try {data = JSON.parse(dataString)}catch(err) {}
458
454
  }
459
- let resp = await add(input, result, data)
455
+ let resp = await add(input, result, data, category)
460
456
  console.log(JSON.stringify(resp, null, 2))
461
457
  }else if(args['--remove']){
462
458
  let id = args['--remove']
463
459
  if(!id) return console.log('Please specify ID to remove')
464
- let resp = await remove(id)
460
+ let resp = await remove(id, category)
465
461
  console.log(JSON.stringify(resp, null, 2))
466
462
  }else if(args['--nuke'] != undefined){
467
463
  nuke()
@@ -481,17 +477,18 @@ const runCLI = async () => {
481
477
  console.log('Usage:')
482
478
  console.log(args._cmd + ' --query "Foo Bar"')
483
479
  console.log("\n" + 'Options:')
484
- console.log('--query "SEARCH_STRING" - search')
485
- console.log('--limit 2 - limit number of results (used with --query)')
486
- console.log(`--add 'input|result|{"foo":"bar"}' - add data`)
487
- console.log(`--remove 'id' - remove data`)
488
- console.log(`--nuke - destroy database`)
489
- console.log(`--mcp - run as MCP server (experimental)`)
490
- console.log(`--db "FILE_NAME" - database file (SQLite)`)
491
- console.log(`--import "file.csv | file.tsv" - import from CSV or TSV w/ columns: 1. input 2. result 3. and remaining columns are additional data`)
492
- console.log('--input-header "foo" - when used with --import designates specific header column as input')
493
- console.log('--result-header "bar" - when used with --import designates specific header column as result')
494
- console.log(`--json "FILE_NAME" - import from file which has one json object per line: {input:"", result:"", data:{}}`)
480
+ console.log('--query "SEARCH_STRING" - search')
481
+ console.log('--limit 2 - limit number of results (used with --query)')
482
+ console.log(`--add 'input|result|{"foo":"bar"}|categ' - add data`)
483
+ console.log(`--remove 'id' - remove data`)
484
+ console.log(`--nuke - destroy database`)
485
+ console.log(`--mcp - run as MCP server (experimental)`)
486
+ console.log(`--db "FILE_NAME" - database file (SQLite)`)
487
+ console.log(`--import "file.csv | file.tsv" - import from CSV or TSV w/ columns: 1. input 2. result 3. and remaining columns are additional data`)
488
+ console.log('--input-header "foo" - when used with --import designates specific header column as input')
489
+ console.log('--result-header "bar" - when used with --import designates specific header column as result')
490
+ console.log(`--json "FILE_NAME" - import from file which has one json object per line: {input:"", result:"", data:{}}`)
491
+ console.log(`--category "CATEGORY" - specify category when adding data and to filter by when querying (defaults to empty string)`)
495
492
  }
496
493
  }
497
494