fact_db 0.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/.envrc +1 -0
- data/CHANGELOG.md +48 -0
- data/COMMITS.md +196 -0
- data/README.md +102 -0
- data/Rakefile +41 -0
- data/db/migrate/001_enable_extensions.rb +7 -0
- data/db/migrate/002_create_contents.rb +44 -0
- data/db/migrate/003_create_entities.rb +36 -0
- data/db/migrate/004_create_entity_aliases.rb +18 -0
- data/db/migrate/005_create_facts.rb +65 -0
- data/db/migrate/006_create_entity_mentions.rb +18 -0
- data/db/migrate/007_create_fact_sources.rb +18 -0
- data/docs/api/extractors/index.md +71 -0
- data/docs/api/extractors/llm.md +162 -0
- data/docs/api/extractors/manual.md +92 -0
- data/docs/api/extractors/rule-based.md +165 -0
- data/docs/api/facts.md +300 -0
- data/docs/api/index.md +66 -0
- data/docs/api/models/content.md +165 -0
- data/docs/api/models/entity.md +202 -0
- data/docs/api/models/fact.md +270 -0
- data/docs/api/models/index.md +77 -0
- data/docs/api/pipeline/extraction.md +175 -0
- data/docs/api/pipeline/index.md +72 -0
- data/docs/api/pipeline/resolution.md +209 -0
- data/docs/api/services/content-service.md +166 -0
- data/docs/api/services/entity-service.md +202 -0
- data/docs/api/services/fact-service.md +223 -0
- data/docs/api/services/index.md +55 -0
- data/docs/architecture/database-schema.md +293 -0
- data/docs/architecture/entity-resolution.md +293 -0
- data/docs/architecture/index.md +149 -0
- data/docs/architecture/temporal-facts.md +268 -0
- data/docs/architecture/three-layer-model.md +242 -0
- data/docs/assets/css/custom.css +137 -0
- data/docs/assets/fact_db.jpg +0 -0
- data/docs/assets/images/fact_db.jpg +0 -0
- data/docs/concepts.md +183 -0
- data/docs/examples/basic-usage.md +235 -0
- data/docs/examples/hr-onboarding.md +312 -0
- data/docs/examples/index.md +64 -0
- data/docs/examples/news-analysis.md +288 -0
- data/docs/getting-started/database-setup.md +170 -0
- data/docs/getting-started/index.md +71 -0
- data/docs/getting-started/installation.md +98 -0
- data/docs/getting-started/quick-start.md +191 -0
- data/docs/guides/batch-processing.md +325 -0
- data/docs/guides/configuration.md +243 -0
- data/docs/guides/entity-management.md +364 -0
- data/docs/guides/extracting-facts.md +299 -0
- data/docs/guides/index.md +22 -0
- data/docs/guides/ingesting-content.md +252 -0
- data/docs/guides/llm-integration.md +299 -0
- data/docs/guides/temporal-queries.md +315 -0
- data/docs/index.md +121 -0
- data/examples/README.md +130 -0
- data/examples/basic_usage.rb +164 -0
- data/examples/entity_management.rb +216 -0
- data/examples/hr_system.rb +428 -0
- data/examples/rule_based_extraction.rb +258 -0
- data/examples/temporal_queries.rb +245 -0
- data/lib/fact_db/config.rb +71 -0
- data/lib/fact_db/database.rb +45 -0
- data/lib/fact_db/errors.rb +10 -0
- data/lib/fact_db/extractors/base.rb +117 -0
- data/lib/fact_db/extractors/llm_extractor.rb +179 -0
- data/lib/fact_db/extractors/manual_extractor.rb +53 -0
- data/lib/fact_db/extractors/rule_based_extractor.rb +228 -0
- data/lib/fact_db/llm/adapter.rb +109 -0
- data/lib/fact_db/models/content.rb +62 -0
- data/lib/fact_db/models/entity.rb +84 -0
- data/lib/fact_db/models/entity_alias.rb +26 -0
- data/lib/fact_db/models/entity_mention.rb +33 -0
- data/lib/fact_db/models/fact.rb +192 -0
- data/lib/fact_db/models/fact_source.rb +35 -0
- data/lib/fact_db/pipeline/extraction_pipeline.rb +146 -0
- data/lib/fact_db/pipeline/resolution_pipeline.rb +129 -0
- data/lib/fact_db/resolution/entity_resolver.rb +261 -0
- data/lib/fact_db/resolution/fact_resolver.rb +259 -0
- data/lib/fact_db/services/content_service.rb +93 -0
- data/lib/fact_db/services/entity_service.rb +150 -0
- data/lib/fact_db/services/fact_service.rb +193 -0
- data/lib/fact_db/temporal/query.rb +125 -0
- data/lib/fact_db/temporal/timeline.rb +134 -0
- data/lib/fact_db/version.rb +5 -0
- data/lib/fact_db.rb +141 -0
- data/mkdocs.yml +198 -0
- metadata +288 -0
|
@@ -0,0 +1,315 @@
|
|
|
1
|
+
# Temporal Queries
|
|
2
|
+
|
|
3
|
+
FactDb's temporal query system lets you retrieve facts across time - what's true now, what was true then, and how things changed.
|
|
4
|
+
|
|
5
|
+
## Current Facts
|
|
6
|
+
|
|
7
|
+
Get facts that are valid right now:
|
|
8
|
+
|
|
9
|
+
```ruby
|
|
10
|
+
facts = FactDb.new
|
|
11
|
+
|
|
12
|
+
# All currently valid canonical facts
|
|
13
|
+
current = facts.query_facts(status: :canonical)
|
|
14
|
+
|
|
15
|
+
# Current facts about a specific entity
|
|
16
|
+
current_about_paula = facts.current_facts_for(paula.id)
|
|
17
|
+
|
|
18
|
+
# Current facts on a topic
|
|
19
|
+
engineering_facts = facts.query_facts(topic: "engineering", status: :canonical)
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
## Point-in-Time Queries
|
|
23
|
+
|
|
24
|
+
What was true at a specific moment:
|
|
25
|
+
|
|
26
|
+
```ruby
|
|
27
|
+
# What did we know about Paula on June 15, 2023?
|
|
28
|
+
historical = facts.facts_at(
|
|
29
|
+
Date.parse("2023-06-15"),
|
|
30
|
+
entity: paula.id
|
|
31
|
+
)
|
|
32
|
+
|
|
33
|
+
# What did we know about Microsoft on Jan 1, 2024?
|
|
34
|
+
microsoft_facts = facts.facts_at(
|
|
35
|
+
Date.parse("2024-01-01"),
|
|
36
|
+
entity: microsoft.id
|
|
37
|
+
)
|
|
38
|
+
```
|
|
39
|
+
|
|
40
|
+
## Time Range Queries
|
|
41
|
+
|
|
42
|
+
Facts active during a period:
|
|
43
|
+
|
|
44
|
+
```ruby
|
|
45
|
+
# Facts valid during Q4 2023
|
|
46
|
+
q4_facts = facts.fact_service.query(
|
|
47
|
+
from: Date.parse("2023-10-01"),
|
|
48
|
+
to: Date.parse("2023-12-31")
|
|
49
|
+
)
|
|
50
|
+
|
|
51
|
+
# Paula's employment history for 2023
|
|
52
|
+
paula_2023 = facts.fact_service.query(
|
|
53
|
+
entity: paula.id,
|
|
54
|
+
from: Date.parse("2023-01-01"),
|
|
55
|
+
to: Date.parse("2023-12-31")
|
|
56
|
+
)
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
## Timelines
|
|
60
|
+
|
|
61
|
+
Build complete timelines for entities:
|
|
62
|
+
|
|
63
|
+
```ruby
|
|
64
|
+
# Full timeline
|
|
65
|
+
timeline = facts.timeline_for(paula.id)
|
|
66
|
+
|
|
67
|
+
timeline.each do |fact|
|
|
68
|
+
range = fact.invalid_at ? "#{fact.valid_at} - #{fact.invalid_at}" : "#{fact.valid_at} - present"
|
|
69
|
+
puts "#{range}: #{fact.fact_text}"
|
|
70
|
+
end
|
|
71
|
+
|
|
72
|
+
# Timeline for specific period
|
|
73
|
+
timeline = facts.timeline_for(
|
|
74
|
+
paula.id,
|
|
75
|
+
from: Date.parse("2023-01-01"),
|
|
76
|
+
to: Date.parse("2024-12-31")
|
|
77
|
+
)
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
### Timeline Output Example
|
|
81
|
+
|
|
82
|
+
```
|
|
83
|
+
2022-03-15 - 2023-01-09: Paula Chen is Software Engineer at Company
|
|
84
|
+
2023-01-10 - 2024-01-09: Paula Chen is Senior Engineer at Company
|
|
85
|
+
2024-01-10 - present: Paula Chen is Principal Engineer at Microsoft
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
## Filtering by Status
|
|
89
|
+
|
|
90
|
+
Query facts by their status:
|
|
91
|
+
|
|
92
|
+
```ruby
|
|
93
|
+
# Only canonical (current authoritative) facts
|
|
94
|
+
canonical = facts.query_facts(status: :canonical)
|
|
95
|
+
|
|
96
|
+
# Only corroborated (confirmed by multiple sources) facts
|
|
97
|
+
corroborated = facts.query_facts(status: :corroborated)
|
|
98
|
+
|
|
99
|
+
# Include both canonical and corroborated
|
|
100
|
+
trusted = facts.query_facts(status: [:canonical, :corroborated])
|
|
101
|
+
|
|
102
|
+
# Superseded facts (historical)
|
|
103
|
+
superseded = facts.query_facts(status: :superseded)
|
|
104
|
+
|
|
105
|
+
# Synthesized facts (derived)
|
|
106
|
+
synthesized = facts.query_facts(status: :synthesized)
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
## Topic Search
|
|
110
|
+
|
|
111
|
+
Search facts by text content:
|
|
112
|
+
|
|
113
|
+
```ruby
|
|
114
|
+
# Full-text search
|
|
115
|
+
engineering_facts = facts.query_facts(topic: "engineering")
|
|
116
|
+
|
|
117
|
+
# Combined with entity filter
|
|
118
|
+
paula_engineering = facts.query_facts(
|
|
119
|
+
entity: paula.id,
|
|
120
|
+
topic: "promotion"
|
|
121
|
+
)
|
|
122
|
+
|
|
123
|
+
# Combined with time filter
|
|
124
|
+
recent_engineering = facts.query_facts(
|
|
125
|
+
topic: "engineering",
|
|
126
|
+
at: Date.today
|
|
127
|
+
)
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
## Advanced Queries
|
|
131
|
+
|
|
132
|
+
### Using Scopes
|
|
133
|
+
|
|
134
|
+
```ruby
|
|
135
|
+
# Direct ActiveRecord queries on Fact model
|
|
136
|
+
facts = FactDb::Models::Fact
|
|
137
|
+
.canonical
|
|
138
|
+
.currently_valid
|
|
139
|
+
.mentioning_entity(paula.id)
|
|
140
|
+
.search_text("engineer")
|
|
141
|
+
.order(valid_at: :desc)
|
|
142
|
+
```
|
|
143
|
+
|
|
144
|
+
### Available Scopes
|
|
145
|
+
|
|
146
|
+
| Scope | Description |
|
|
147
|
+
|-------|-------------|
|
|
148
|
+
| `canonical` | Status is 'canonical' |
|
|
149
|
+
| `currently_valid` | invalid_at is nil |
|
|
150
|
+
| `valid_at(date)` | Valid at specific date |
|
|
151
|
+
| `valid_during(from, to)` | Valid during range |
|
|
152
|
+
| `mentioning_entity(id)` | Mentions specific entity |
|
|
153
|
+
| `search_text(query)` | Full-text search |
|
|
154
|
+
| `by_extraction_method(method)` | Filter by extractor |
|
|
155
|
+
| `high_confidence` | Confidence > 0.8 |
|
|
156
|
+
|
|
157
|
+
### Combining Scopes
|
|
158
|
+
|
|
159
|
+
```ruby
|
|
160
|
+
# High-confidence facts about Paula currently valid
|
|
161
|
+
facts = FactDb::Models::Fact
|
|
162
|
+
.mentioning_entity(paula.id)
|
|
163
|
+
.canonical
|
|
164
|
+
.currently_valid
|
|
165
|
+
.high_confidence
|
|
166
|
+
|
|
167
|
+
# LLM-extracted facts from last month
|
|
168
|
+
facts = FactDb::Models::Fact
|
|
169
|
+
.by_extraction_method('llm')
|
|
170
|
+
.where('created_at > ?', 1.month.ago)
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
## Semantic Search
|
|
174
|
+
|
|
175
|
+
Search by meaning using embeddings:
|
|
176
|
+
|
|
177
|
+
```ruby
|
|
178
|
+
# Find facts semantically similar to a query
|
|
179
|
+
similar_facts = facts.fact_service.semantic_search(
|
|
180
|
+
"Paula's career progression",
|
|
181
|
+
limit: 10
|
|
182
|
+
)
|
|
183
|
+
|
|
184
|
+
# Combined with entity filter
|
|
185
|
+
similar_about_paula = facts.fact_service.semantic_search(
|
|
186
|
+
"job title changes",
|
|
187
|
+
entity: paula.id,
|
|
188
|
+
limit: 5
|
|
189
|
+
)
|
|
190
|
+
```
|
|
191
|
+
|
|
192
|
+
## Query Results
|
|
193
|
+
|
|
194
|
+
### Fact Attributes
|
|
195
|
+
|
|
196
|
+
```ruby
|
|
197
|
+
fact = facts.query_facts(entity: paula.id).first
|
|
198
|
+
|
|
199
|
+
fact.fact_text # The assertion text
|
|
200
|
+
fact.valid_at # When it became true
|
|
201
|
+
fact.invalid_at # When it stopped (nil if current)
|
|
202
|
+
fact.status # canonical, superseded, etc.
|
|
203
|
+
fact.confidence # 0.0 to 1.0
|
|
204
|
+
fact.extraction_method # manual, llm, rule_based
|
|
205
|
+
fact.metadata # Additional data
|
|
206
|
+
```
|
|
207
|
+
|
|
208
|
+
### Related Data
|
|
209
|
+
|
|
210
|
+
```ruby
|
|
211
|
+
# Entity mentions
|
|
212
|
+
fact.entity_mentions.each do |mention|
|
|
213
|
+
puts "#{mention.entity.canonical_name} (#{mention.mention_role})"
|
|
214
|
+
end
|
|
215
|
+
|
|
216
|
+
# Source content
|
|
217
|
+
fact.fact_sources.each do |source|
|
|
218
|
+
puts "Source: #{source.content.title}"
|
|
219
|
+
puts "Excerpt: #{source.excerpt}"
|
|
220
|
+
end
|
|
221
|
+
|
|
222
|
+
# Superseding fact
|
|
223
|
+
if fact.superseded?
|
|
224
|
+
new_fact = fact.superseded_by
|
|
225
|
+
puts "Superseded by: #{new_fact.fact_text}"
|
|
226
|
+
end
|
|
227
|
+
|
|
228
|
+
# Source facts (for synthesized)
|
|
229
|
+
if fact.synthesized?
|
|
230
|
+
fact.derived_from_ids.each do |id|
|
|
231
|
+
source = FactDb::Models::Fact.find(id)
|
|
232
|
+
puts "Derived from: #{source.fact_text}"
|
|
233
|
+
end
|
|
234
|
+
end
|
|
235
|
+
```
|
|
236
|
+
|
|
237
|
+
## Performance Tips
|
|
238
|
+
|
|
239
|
+
### Use Indexes
|
|
240
|
+
|
|
241
|
+
The temporal indexes are optimized for:
|
|
242
|
+
|
|
243
|
+
```ruby
|
|
244
|
+
# These queries are fast
|
|
245
|
+
facts.facts_at(Date.today)
|
|
246
|
+
facts.query_facts(entity: id, status: :canonical)
|
|
247
|
+
```
|
|
248
|
+
|
|
249
|
+
### Limit Results
|
|
250
|
+
|
|
251
|
+
```ruby
|
|
252
|
+
# Always limit when possible
|
|
253
|
+
queried = facts.fact_service.query(
|
|
254
|
+
entity: paula.id,
|
|
255
|
+
limit: 100
|
|
256
|
+
)
|
|
257
|
+
```
|
|
258
|
+
|
|
259
|
+
### Eager Load Associations
|
|
260
|
+
|
|
261
|
+
```ruby
|
|
262
|
+
facts = FactDb::Models::Fact
|
|
263
|
+
.includes(:entity_mentions, :fact_sources)
|
|
264
|
+
.mentioning_entity(paula.id)
|
|
265
|
+
```
|
|
266
|
+
|
|
267
|
+
### Use Count for Totals
|
|
268
|
+
|
|
269
|
+
```ruby
|
|
270
|
+
# Don't load all records just to count
|
|
271
|
+
total = FactDb::Models::Fact.canonical.currently_valid.count
|
|
272
|
+
```
|
|
273
|
+
|
|
274
|
+
## Common Patterns
|
|
275
|
+
|
|
276
|
+
### Before/After Comparison
|
|
277
|
+
|
|
278
|
+
```ruby
|
|
279
|
+
# What changed for Paula?
|
|
280
|
+
before = facts.facts_at(Date.parse("2023-12-31"), entity: paula.id)
|
|
281
|
+
after = facts.facts_at(Date.parse("2024-01-31"), entity: paula.id)
|
|
282
|
+
|
|
283
|
+
# Find differences
|
|
284
|
+
new_facts = after - before
|
|
285
|
+
```
|
|
286
|
+
|
|
287
|
+
### Audit Trail
|
|
288
|
+
|
|
289
|
+
```ruby
|
|
290
|
+
# Get complete history of a topic
|
|
291
|
+
all_facts = FactDb::Models::Fact
|
|
292
|
+
.mentioning_entity(paula.id)
|
|
293
|
+
.search_text("title")
|
|
294
|
+
.order(valid_at: :asc)
|
|
295
|
+
|
|
296
|
+
all_facts.each do |fact|
|
|
297
|
+
status_info = fact.superseded? ? "(superseded)" : "(current)"
|
|
298
|
+
puts "#{fact.valid_at}: #{fact.fact_text} #{status_info}"
|
|
299
|
+
end
|
|
300
|
+
```
|
|
301
|
+
|
|
302
|
+
### Change Detection
|
|
303
|
+
|
|
304
|
+
```ruby
|
|
305
|
+
# Find facts that changed recently
|
|
306
|
+
recently_superseded = FactDb::Models::Fact
|
|
307
|
+
.where(status: 'superseded')
|
|
308
|
+
.where('invalid_at > ?', 1.week.ago)
|
|
309
|
+
.includes(:superseded_by)
|
|
310
|
+
|
|
311
|
+
recently_superseded.each do |old_fact|
|
|
312
|
+
puts "Changed: #{old_fact.fact_text}"
|
|
313
|
+
puts "To: #{old_fact.superseded_by.fact_text}"
|
|
314
|
+
end
|
|
315
|
+
```
|
data/docs/index.md
ADDED
|
@@ -0,0 +1,121 @@
|
|
|
1
|
+
# FactDb
|
|
2
|
+
|
|
3
|
+
> [!CAUTION]
|
|
4
|
+
> This gem is under active development. APIs and features may change without notice. See the [CHANGELOG](https://github.com/MadBomber/fact_db/blob/main/CHANGELOG.md) for details.
|
|
5
|
+
|
|
6
|
+
<table>
|
|
7
|
+
<tr>
|
|
8
|
+
<td width="50%" align="center" valign="top">
|
|
9
|
+
<img src="assets/images/fact_db.jpg" alt="FactDb"><br>
|
|
10
|
+
<em>"Do you swear to add the facts and only the facts?"</em>
|
|
11
|
+
</td>
|
|
12
|
+
<td width="50%" valign="top">
|
|
13
|
+
FactDb implements the Event Clock concept - a powerful approach to capturing organizational knowledge through temporal facts. Every fact has explicit validity periods (<code>valid_at</code>/<code>invalid_at</code>) so you always know when information became true and when it changed.<br><br>
|
|
14
|
+
The system resolves entity mentions to canonical identities, supports aliases and fuzzy matching, and maintains complete audit trails linking every fact back to its source content. Whether you're tracking employee roles, organizational changes, or any evolving information, FactDb gives you a queryable timeline of truth.
|
|
15
|
+
</td>
|
|
16
|
+
</tr>
|
|
17
|
+
</table>
|
|
18
|
+
|
|
19
|
+
## Key Features
|
|
20
|
+
|
|
21
|
+
- **Temporal Facts** - Track facts with validity periods, knowing when information became true and when it changed
|
|
22
|
+
- **Entity Resolution** - Resolve mentions to canonical entities with alias support and fuzzy matching
|
|
23
|
+
- **Audit Trails** - Every fact links back to source content for full provenance
|
|
24
|
+
- **Multiple Extractors** - Extract facts manually, via LLM, or with rule-based patterns
|
|
25
|
+
- **Semantic Search** - Built on PostgreSQL with pgvector for vector similarity search
|
|
26
|
+
- **Concurrent Processing** - Batch process content with parallel pipelines
|
|
27
|
+
|
|
28
|
+
## Quick Example
|
|
29
|
+
|
|
30
|
+
```ruby
|
|
31
|
+
require 'fact_db'
|
|
32
|
+
|
|
33
|
+
# Initialize the facts instance
|
|
34
|
+
facts = FactDb.new
|
|
35
|
+
|
|
36
|
+
# Ingest content
|
|
37
|
+
content = facts.ingest(
|
|
38
|
+
"Paula Chen joined Microsoft as Principal Engineer on January 10, 2024.",
|
|
39
|
+
type: :email,
|
|
40
|
+
captured_at: Time.current
|
|
41
|
+
)
|
|
42
|
+
|
|
43
|
+
# Extract facts using LLM
|
|
44
|
+
extracted = facts.extract_facts(content.id, extractor: :llm)
|
|
45
|
+
|
|
46
|
+
# Query current facts about Paula
|
|
47
|
+
current_facts = facts.query_facts(entity: paula.id)
|
|
48
|
+
|
|
49
|
+
# Query facts valid at a specific point in time
|
|
50
|
+
historical_facts = facts.facts_at(Date.parse("2023-06-15"), entity: paula.id)
|
|
51
|
+
|
|
52
|
+
# Build a timeline
|
|
53
|
+
timeline = facts.timeline_for(paula.id)
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
## The Event Clock Concept
|
|
57
|
+
|
|
58
|
+
The Event Clock model organizes information into three layers:
|
|
59
|
+
|
|
60
|
+
```mermaid
|
|
61
|
+
graph TB
|
|
62
|
+
subgraph Content["Content Layer (Immutable)"]
|
|
63
|
+
C1[Email]
|
|
64
|
+
C2[Document]
|
|
65
|
+
C3[News Article]
|
|
66
|
+
end
|
|
67
|
+
|
|
68
|
+
subgraph Entities["Entity Layer"]
|
|
69
|
+
E1[Paula Chen]
|
|
70
|
+
E2[Microsoft]
|
|
71
|
+
end
|
|
72
|
+
|
|
73
|
+
subgraph Facts["Fact Layer (Temporal)"]
|
|
74
|
+
F1["Paula is Principal Engineer<br/>valid: 2024-01-10 to present"]
|
|
75
|
+
F2["Paula works at Microsoft<br/>valid: 2024-01-10 to present"]
|
|
76
|
+
end
|
|
77
|
+
|
|
78
|
+
C1 --> F1
|
|
79
|
+
C1 --> F2
|
|
80
|
+
F1 --> E1
|
|
81
|
+
F2 --> E1
|
|
82
|
+
F2 --> E2
|
|
83
|
+
|
|
84
|
+
style C1 fill:#1E40AF,stroke:#1E3A8A,color:#FFFFFF
|
|
85
|
+
style C2 fill:#1E40AF,stroke:#1E3A8A,color:#FFFFFF
|
|
86
|
+
style C3 fill:#1E40AF,stroke:#1E3A8A,color:#FFFFFF
|
|
87
|
+
style E1 fill:#047857,stroke:#065F46,color:#FFFFFF
|
|
88
|
+
style E2 fill:#047857,stroke:#065F46,color:#FFFFFF
|
|
89
|
+
style F1 fill:#B91C1C,stroke:#991B1B,color:#FFFFFF
|
|
90
|
+
style F2 fill:#B91C1C,stroke:#991B1B,color:#FFFFFF
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
1. **Content** - Immutable source documents that serve as evidence
|
|
94
|
+
2. **Entities** - Resolved identities (people, organizations, places)
|
|
95
|
+
3. **Facts** - Temporal assertions with validity periods
|
|
96
|
+
|
|
97
|
+
## Installation
|
|
98
|
+
|
|
99
|
+
Add to your Gemfile:
|
|
100
|
+
|
|
101
|
+
```ruby
|
|
102
|
+
gem 'fact_db'
|
|
103
|
+
```
|
|
104
|
+
|
|
105
|
+
Then run:
|
|
106
|
+
|
|
107
|
+
```bash
|
|
108
|
+
bundle install
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
See the [Installation Guide](getting-started/installation.md) for detailed setup instructions.
|
|
112
|
+
|
|
113
|
+
## Requirements
|
|
114
|
+
|
|
115
|
+
- Ruby >= 3.0
|
|
116
|
+
- PostgreSQL with pgvector extension
|
|
117
|
+
- Optional: ruby_llm gem for LLM-powered extraction
|
|
118
|
+
|
|
119
|
+
## License
|
|
120
|
+
|
|
121
|
+
MIT License - Copyright (c) 2025 Dewayne VanHoozer
|
data/examples/README.md
ADDED
|
@@ -0,0 +1,130 @@
|
|
|
1
|
+
# FactDb Examples
|
|
2
|
+
|
|
3
|
+
This directory contains demonstration programs showcasing the capabilities of the FactDb gem.
|
|
4
|
+
|
|
5
|
+
## Prerequisites
|
|
6
|
+
|
|
7
|
+
1. PostgreSQL database with pgvector extension
|
|
8
|
+
2. Set the `DATABASE_URL` environment variable or use the default `postgres://localhost/fact_db_demo`
|
|
9
|
+
3. Run migrations to set up the schema
|
|
10
|
+
|
|
11
|
+
```bash
|
|
12
|
+
bundle install
|
|
13
|
+
DATABASE_URL=postgres://localhost/fact_db_demo rake db:migrate
|
|
14
|
+
```
|
|
15
|
+
|
|
16
|
+
## Examples
|
|
17
|
+
|
|
18
|
+
### basic_usage.rb
|
|
19
|
+
|
|
20
|
+
**Foundational introduction to FactDb**
|
|
21
|
+
|
|
22
|
+
Demonstrates:
|
|
23
|
+
- Configuring FactDb
|
|
24
|
+
- Ingesting content (emails, documents)
|
|
25
|
+
- Creating entities (people, organizations)
|
|
26
|
+
- Creating facts with entity mentions
|
|
27
|
+
- Basic fact queries
|
|
28
|
+
- Getting system statistics
|
|
29
|
+
|
|
30
|
+
```bash
|
|
31
|
+
ruby examples/basic_usage.rb
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
### entity_management.rb
|
|
35
|
+
|
|
36
|
+
**Deep dive into entity operations**
|
|
37
|
+
|
|
38
|
+
Demonstrates:
|
|
39
|
+
- Creating entities with multiple types (person, organization, place)
|
|
40
|
+
- Managing aliases (names, emails, abbreviations)
|
|
41
|
+
- Entity resolution using fuzzy matching
|
|
42
|
+
- Merging duplicate entities
|
|
43
|
+
- Searching entities by name and type
|
|
44
|
+
- Building entity timelines
|
|
45
|
+
|
|
46
|
+
```bash
|
|
47
|
+
ruby examples/entity_management.rb
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
### temporal_queries.rb
|
|
51
|
+
|
|
52
|
+
**Working with time-based data**
|
|
53
|
+
|
|
54
|
+
Demonstrates:
|
|
55
|
+
- Creating facts with temporal bounds (valid_at, invalid_at)
|
|
56
|
+
- Point-in-time queries ("What was true on date X?")
|
|
57
|
+
- Distinguishing current vs historical facts
|
|
58
|
+
- Superseding facts (replacing old information with new)
|
|
59
|
+
- Building temporal timelines
|
|
60
|
+
- Computing diffs between time periods
|
|
61
|
+
- Querying facts by entity role
|
|
62
|
+
|
|
63
|
+
```bash
|
|
64
|
+
ruby examples/temporal_queries.rb
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
### rule_based_extraction.rb
|
|
68
|
+
|
|
69
|
+
**Automatic fact extraction from text**
|
|
70
|
+
|
|
71
|
+
Demonstrates:
|
|
72
|
+
- Using the rule-based extractor
|
|
73
|
+
- Pattern detection for employment, relationships, locations
|
|
74
|
+
- Processing extraction results
|
|
75
|
+
- Saving extracted facts to the database
|
|
76
|
+
- Entity auto-creation from extracted text
|
|
77
|
+
- Testing individual extraction patterns
|
|
78
|
+
|
|
79
|
+
```bash
|
|
80
|
+
ruby examples/rule_based_extraction.rb
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
### hr_system.rb
|
|
84
|
+
|
|
85
|
+
**Practical HR knowledge management system**
|
|
86
|
+
|
|
87
|
+
A comprehensive real-world example demonstrating:
|
|
88
|
+
- Organizational hierarchy (company, departments, locations)
|
|
89
|
+
- Employee profile management
|
|
90
|
+
- Recording employment history
|
|
91
|
+
- Processing promotions with fact supersession
|
|
92
|
+
- Recording employee transfers
|
|
93
|
+
- Historical queries ("What was X's role in 2024?")
|
|
94
|
+
- Organization chart queries
|
|
95
|
+
- Complete audit trails with source documents
|
|
96
|
+
- HR statistics and reporting
|
|
97
|
+
|
|
98
|
+
```bash
|
|
99
|
+
ruby examples/hr_system.rb
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
## Key Concepts
|
|
103
|
+
|
|
104
|
+
### The Event Clock Pattern
|
|
105
|
+
|
|
106
|
+
FactDb implements the Event Clock concept where:
|
|
107
|
+
- Every fact has a `valid_at` timestamp (when it became true)
|
|
108
|
+
- Facts may have an `invalid_at` timestamp (when they stopped being true)
|
|
109
|
+
- This enables temporal queries at any point in time
|
|
110
|
+
|
|
111
|
+
### Entity Resolution
|
|
112
|
+
|
|
113
|
+
Entities can have multiple aliases and are resolved using fuzzy matching:
|
|
114
|
+
- "Bob Johnson" and "Robert Johnson" can resolve to the same entity
|
|
115
|
+
- Duplicates can be merged while preserving audit history
|
|
116
|
+
|
|
117
|
+
### Fact Lifecycle
|
|
118
|
+
|
|
119
|
+
Facts progress through states:
|
|
120
|
+
- `canonical` - Currently accepted as true
|
|
121
|
+
- `superseded` - Replaced by a newer fact
|
|
122
|
+
- `corroborated` - Supported by other evidence
|
|
123
|
+
- `synthesized` - Derived from multiple sources
|
|
124
|
+
|
|
125
|
+
### Source Tracking
|
|
126
|
+
|
|
127
|
+
All facts link back to source content:
|
|
128
|
+
- Primary sources (direct evidence)
|
|
129
|
+
- Supporting sources (additional evidence)
|
|
130
|
+
- Corroborating sources (independent confirmation)
|