github-dkg 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- github_dkg-0.1.0/.gitignore +5 -0
- github_dkg-0.1.0/DESIGN_BRIEF.md +192 -0
- github_dkg-0.1.0/Dockerfile +12 -0
- github_dkg-0.1.0/LICENSE +21 -0
- github_dkg-0.1.0/PKG-INFO +143 -0
- github_dkg-0.1.0/README.md +127 -0
- github_dkg-0.1.0/action.yml +57 -0
- github_dkg-0.1.0/demo.ipynb +419 -0
- github_dkg-0.1.0/entrypoint.py +95 -0
- github_dkg-0.1.0/examples/demo_video.py +217 -0
- github_dkg-0.1.0/examples/workflow.yml +50 -0
- github_dkg-0.1.0/pyproject.toml +42 -0
- github_dkg-0.1.0/src/github_dkg/__init__.py +8 -0
- github_dkg-0.1.0/src/github_dkg/cli.py +187 -0
- github_dkg-0.1.0/src/github_dkg/client.py +139 -0
- github_dkg-0.1.0/src/github_dkg/formatter.py +135 -0
- github_dkg-0.1.0/src/github_dkg/github_client.py +182 -0
- github_dkg-0.1.0/src/github_dkg/ingestor.py +203 -0
- github_dkg-0.1.0/tests/__init__.py +0 -0
- github_dkg-0.1.0/tests/integration/__init__.py +0 -0
- github_dkg-0.1.0/tests/integration/test_integration.py +61 -0
- github_dkg-0.1.0/tests/unit/__init__.py +0 -0
- github_dkg-0.1.0/tests/unit/test_client.py +116 -0
- github_dkg-0.1.0/tests/unit/test_formatter.py +142 -0
- github_dkg-0.1.0/tests/unit/test_ingestor.py +151 -0
|
@@ -0,0 +1,192 @@
|
|
|
1
|
+
# github-dkg Design Brief
|
|
2
|
+
|
|
3
|
+
**Package:** `github-dkg`
|
|
4
|
+
**Bounty tag:** `cfi-dkgv10-r1`
|
|
5
|
+
**Tier target:** Flagship (8,000–10,000 TRAC)
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## 1. Problem
|
|
10
|
+
|
|
11
|
+
Software teams produce tacit knowledge continuously — in issue threads, PR descriptions, code review comments, post-mortem discussions. This knowledge evaporates the moment an issue is closed or a PR merges. It lives in GitHub, locked away from agents.
|
|
12
|
+
|
|
13
|
+
Existing retrieval approaches (GitHub search, grep, CI logs) can surface raw text, but they provide:
|
|
14
|
+
- No provenance — who decided what, and when
|
|
15
|
+
- No trust gradient — a passing remark and an architecture decision look identical
|
|
16
|
+
- No agent-native interface — agents cannot treat GitHub as a queryable knowledge substrate
|
|
17
|
+
|
|
18
|
+
DKG v10 Working Memory solves all three. This package ingests GitHub's knowledge stream into DKG v10, where it becomes attributable, queryable, and promotable toward on-chain verification.
|
|
19
|
+
|
|
20
|
+
---
|
|
21
|
+
|
|
22
|
+
## 2. Target users
|
|
23
|
+
|
|
24
|
+
- **Platform and DevOps teams** who want their engineering knowledge base to accumulate passively, without manual curation
|
|
25
|
+
- **Research-engineering teams** running long-horizon agentic workflows (code analysis, architecture review, dependency audits) that need to query what the team previously decided
|
|
26
|
+
- **Multi-agent systems** that coordinate across a repository: an agent can write a PR review into Shared Working Memory and a downstream agent can query it to inform its next action
|
|
27
|
+
|
|
28
|
+
---
|
|
29
|
+
|
|
30
|
+
## 3. Architecture
|
|
31
|
+
|
|
32
|
+
```
|
|
33
|
+
GitHub Repository
|
|
34
|
+
│
|
|
35
|
+
├─ Issues (+ comments)
|
|
36
|
+
├─ Pull Requests (+ reviews + inline comments)
|
|
37
|
+
│
|
|
38
|
+
▼
|
|
39
|
+
GitHubDKGIngestor
|
|
40
|
+
│
|
|
41
|
+
├─ GitHubClient ──────────────► GitHub REST API v3
|
|
42
|
+
│ (issues, pulls, reviews, (unauthenticated rate-limited
|
|
43
|
+
│ inline comments) or authenticated via GITHUB_TOKEN)
|
|
44
|
+
│
|
|
45
|
+
├─ MarkdownFormatter ─────────► Structured Markdown Knowledge Asset
|
|
46
|
+
│ (one KA per issue/PR, per item, with code-aware tagging
|
|
47
|
+
│ comments embedded) and provenance metadata)
|
|
48
|
+
│
|
|
49
|
+
└─ DKGClient ─────────────────► POST /api/memory/turn
|
|
50
|
+
(one Knowledge Asset per item,
|
|
51
|
+
scoped to a repo Context Graph,
|
|
52
|
+
sessionUri = github.com/owner/repo)
|
|
53
|
+
|
|
54
|
+
GitHub Action (Docker)
|
|
55
|
+
│
|
|
56
|
+
└─ Triggered on: issues, pull_request, pull_request_review events
|
|
57
|
+
Reads: GITHUB_EVENT_PATH payload → item number → ingest
|
|
58
|
+
Writes: GITHUB_OUTPUT turn-uri, layer
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
### API surface used
|
|
62
|
+
|
|
63
|
+
All communication is over the public DKG v10 HTTP API — no internal packages.
|
|
64
|
+
|
|
65
|
+
| Endpoint | Purpose |
|
|
66
|
+
|---|---|
|
|
67
|
+
| `GET /api/agents` | Health check / token validation |
|
|
68
|
+
| `POST /api/memory/turn` | Write Knowledge Asset (one per issue/PR) |
|
|
69
|
+
| `POST /api/memory/search` | Tri-modal search across ingested knowledge |
|
|
70
|
+
| `POST /api/assertion/:name/promote` | SHARE to Shared Working Memory |
|
|
71
|
+
|
|
72
|
+
---
|
|
73
|
+
|
|
74
|
+
## 4. Memory layer mapping and LLM-Wiki alignment
|
|
75
|
+
|
|
76
|
+
Karpathy's LLM-Wiki frames knowledge substrates by who can read and write them. GitHub is where engineering teams produce knowledge that currently has no agent-native substrate. This package maps GitHub's knowledge types to the v10 trust gradient:
|
|
77
|
+
|
|
78
|
+
| GitHub artifact | DKG v10 layer | Default | Promotion trigger |
|
|
79
|
+
|---|---|---|---|
|
|
80
|
+
| Open issue | Working Memory | `wm` | Team label e.g. `architecture-decision` |
|
|
81
|
+
| Closed issue | Working Memory | `wm` | Post-mortem label or manual promote |
|
|
82
|
+
| Draft PR | Working Memory | `wm` | — |
|
|
83
|
+
| Merged PR | Working Memory | `wm` | Label `architecture-decision`, or manual |
|
|
84
|
+
| Review comment (APPROVED) | Working Memory | `wm` | PR merge + label |
|
|
85
|
+
|
|
86
|
+
The **sessionUri** for every Knowledge Asset is set to `https://github.com/owner/repo`, linking all assets for a repository into a coherent session in the Context Graph. This allows an agent to retrieve all knowledge about a repository in a single search.
|
|
87
|
+
|
|
88
|
+
---
|
|
89
|
+
|
|
90
|
+
## 5. Trust gradient and promotion path
|
|
91
|
+
|
|
92
|
+
### Working Memory → Shared Working Memory (SHARE)
|
|
93
|
+
|
|
94
|
+
An engineering team's GitHub repo is a natural unit of Shared Memory. When a significant PR merges — one labelled `architecture-decision`, or one identified by an agent as high-signal — the workflow promotes its Knowledge Asset:
|
|
95
|
+
|
|
96
|
+
```bash
|
|
97
|
+
github-dkg promote dkg://wm/turn/abc123 --context-graph my-project
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
This calls `POST /api/assertion/:name/promote` — a Curator-authorized operation. Nothing is promoted automatically; the agent or CI pipeline must explicitly decide.
|
|
101
|
+
|
|
102
|
+
The example workflow in `examples/workflow.yml` shows a concrete trigger: PRs with the `architecture-decision` label are automatically promoted on merge.
|
|
103
|
+
|
|
104
|
+
### Shared Working Memory → Verified Memory (PUBLISH)
|
|
105
|
+
|
|
106
|
+
Round 2 surface. Once an architecture decision or post-mortem is in Shared Working Memory, it can be published to Verified Memory via `POST /api/shared-memory/publish`. The UAL chain is preserved through all promotions: the on-chain record traces back to the original GitHub issue or PR, preserving full provenance.
|
|
107
|
+
|
|
108
|
+
### Oracle-readiness
|
|
109
|
+
|
|
110
|
+
Every Knowledge Asset written by this package:
|
|
111
|
+
- Has a stable UAL (the `turnUri` returned by `/api/memory/turn`)
|
|
112
|
+
- Is scoped to a Context Graph, making it consumable by a context oracle querying that graph
|
|
113
|
+
- Uses `sessionUri` to link assets for a repository, enabling oracle queries like "all architecture decisions for repo X"
|
|
114
|
+
- Structured Markdown with explicit field headers (`**Author:**`, `**Labels:**`, `**State:**`) produces consistent RDF triples, making semantic queries predictable
|
|
115
|
+
|
|
116
|
+
---
|
|
117
|
+
|
|
118
|
+
## 6. Knowledge Asset format
|
|
119
|
+
|
|
120
|
+
Each GitHub item is encoded as structured Markdown before ingestion. The DKG node runs structural + semantic extraction on this Markdown, building RDF triples from the field headers and free-text content.
|
|
121
|
+
|
|
122
|
+
**Issue example:**
|
|
123
|
+
|
|
124
|
+
```markdown
|
|
125
|
+
**GitHub Issue #42:** Fix null pointer in auth flow
|
|
126
|
+
**Repository:** acme/api
|
|
127
|
+
**Author:** alice | **Labels:** bug, priority-high | **State:** closed (completed)
|
|
128
|
+
**Created:** 2024-03-01 | **Closed:** 2024-03-05
|
|
129
|
+
**URL:** https://github.com/acme/api/issues/42
|
|
130
|
+
|
|
131
|
+
**Description:**
|
|
132
|
+
The login endpoint returns 500 when password contains `$`.
|
|
133
|
+
|
|
134
|
+
**Comments:**
|
|
135
|
+
- **bob** (2024-03-02): Reproduced on 2.3.1.
|
|
136
|
+
- **alice** (2024-03-04): Fixed in commit abc123.
|
|
137
|
+
```
|
|
138
|
+
|
|
139
|
+
**PR example:**
|
|
140
|
+
|
|
141
|
+
```markdown
|
|
142
|
+
**GitHub PR #99:** Add PKCE support for mobile OAuth
|
|
143
|
+
**Repository:** acme/api
|
|
144
|
+
**Author:** carol | **Labels:** feature | **State:** merged
|
|
145
|
+
**Requested reviewers:** dave
|
|
146
|
+
**Branch:** feature/pkce → main
|
|
147
|
+
**Created:** 2024-03-10 | **Merged:** 2024-03-15
|
|
148
|
+
**URL:** https://github.com/acme/api/pull/99
|
|
149
|
+
|
|
150
|
+
**Description:**
|
|
151
|
+
Implements RFC 7636 PKCE for public clients.
|
|
152
|
+
|
|
153
|
+
**Reviews:**
|
|
154
|
+
- **dave** APPROVED (2024-03-14): LGTM, clean implementation.
|
|
155
|
+
|
|
156
|
+
**Inline review comments:**
|
|
157
|
+
- `src/auth/pkce.py`:
|
|
158
|
+
- dave: Consider caching the verifier.
|
|
159
|
+
```
|
|
160
|
+
|
|
161
|
+
---
|
|
162
|
+
|
|
163
|
+
## 7. v10 vocabulary compliance
|
|
164
|
+
|
|
165
|
+
All code and documentation uses the exact DKG v10 terminology:
|
|
166
|
+
|
|
167
|
+
- **Context Graph** — one per repository, scoping all Knowledge Assets for that repo
|
|
168
|
+
- **Knowledge Asset** — one per GitHub issue or PR
|
|
169
|
+
- **Working Memory** / **Shared Working Memory** / **Verified Memory** — never "private/public/chain"
|
|
170
|
+
- **SHARE** — for promotion to Shared Working Memory
|
|
171
|
+
- **PUBLISH** — for promotion to Verified Memory (Round 2)
|
|
172
|
+
- **Curator** — the authority required for SHARE/PUBLISH operations
|
|
173
|
+
|
|
174
|
+
One intentional deviation: the CLI uses `layer` as shorthand for `--layer wm|swm` as a usability affordance for operators. Internal code and documentation always expands this to the full v10 term.
|
|
175
|
+
|
|
176
|
+
---
|
|
177
|
+
|
|
178
|
+
## 8. Security notes
|
|
179
|
+
|
|
180
|
+
- All credentials (`DKG_TOKEN`, `GITHUB_TOKEN`) are read from environment variables — never hardcoded or logged
|
|
181
|
+
- No Curator operations (SHARE/PUBLISH) are performed automatically; all promotion is explicit and operator-initiated
|
|
182
|
+
- The Docker action image has no `postinstall` or `preinstall` scripts
|
|
183
|
+
- Network egress: GitHub REST API (`api.github.com`) and the configured DKG node endpoint — no other external domains
|
|
184
|
+
- Write authority: only `POST /api/memory/turn` (write Working Memory) and `POST /api/assertion/:name/promote` (SHARE, Curator-authorized). No chain-write operations.
|
|
185
|
+
- No dynamic code loading, no `eval` on external input
|
|
186
|
+
- The `GITHUB_TOKEN` used in the Action has the minimum required permissions: `contents: read`, `issues: read`, `pull-requests: read`
|
|
187
|
+
|
|
188
|
+
---
|
|
189
|
+
|
|
190
|
+
## 9. Maintenance commitment
|
|
191
|
+
|
|
192
|
+
Six-month support window from submission date. Issues and pull requests will be reviewed within 5 business days. The package follows semantic versioning; breaking changes will be major version bumps with migration notes.
|
github_dkg-0.1.0/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2025
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,143 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: github-dkg
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Ingest GitHub issues, PRs, and review comments into DKG v10 Working Memory
|
|
5
|
+
Project-URL: Repository, https://github.com/haroldboom/github-dkg
|
|
6
|
+
License: MIT
|
|
7
|
+
License-File: LICENSE
|
|
8
|
+
Requires-Python: >=3.10
|
|
9
|
+
Requires-Dist: click>=8.1
|
|
10
|
+
Requires-Dist: httpx>=0.27
|
|
11
|
+
Provides-Extra: dev
|
|
12
|
+
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
|
|
13
|
+
Requires-Dist: pytest>=8; extra == 'dev'
|
|
14
|
+
Requires-Dist: respx>=0.21; extra == 'dev'
|
|
15
|
+
Description-Content-Type: text/markdown
|
|
16
|
+
|
|
17
|
+
# github-dkg
|
|
18
|
+
|
|
19
|
+
Ingest GitHub issues, pull requests, and review comments into [DKG v10](https://docs.origintrail.io) Working Memory as Knowledge Assets.
|
|
20
|
+
|
|
21
|
+
Every issue and PR becomes a queryable, attributable Knowledge Asset in your DKG v10 node. Key decisions can be promoted to Shared Working Memory — making your team's engineering knowledge accessible to agents.
|
|
22
|
+
|
|
23
|
+
## Demo
|
|
24
|
+
|
|
25
|
+
- **Walkthrough notebook:** [`demo.ipynb`](demo.ipynb) — runs end-to-end against a built-in mock of GitHub and the DKG node, no tokens required. Open in [Colab](https://colab.research.google.com/github/haroldboom/github-dkg/blob/master/demo.ipynb).
|
|
26
|
+
- **Live recording script:** [`examples/demo_video.py`](examples/demo_video.py) — drives all three demos against a real DKG node and the GitHub API; this is the script behind the bounty walkthrough video.
|
|
27
|
+
|
|
28
|
+
## Install
|
|
29
|
+
|
|
30
|
+
```bash
|
|
31
|
+
pip install github-dkg
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
## Quickstart
|
|
35
|
+
|
|
36
|
+
```bash
|
|
37
|
+
export DKG_TOKEN=your-dkg-token
|
|
38
|
+
export DKG_BASE_URL=http://localhost:9200
|
|
39
|
+
export DKG_CONTEXT_GRAPH=your-context-graph-id
|
|
40
|
+
export GITHUB_TOKEN=your-github-token
|
|
41
|
+
|
|
42
|
+
# Bulk-ingest all issues and PRs from a repository
|
|
43
|
+
github-dkg ingest owner/repo --context-graph $DKG_CONTEXT_GRAPH
|
|
44
|
+
|
|
45
|
+
# Ingest a single issue
|
|
46
|
+
github-dkg ingest-one owner/repo 42 --type issue --context-graph $DKG_CONTEXT_GRAPH
|
|
47
|
+
|
|
48
|
+
# Ingest a single PR
|
|
49
|
+
github-dkg ingest-one owner/repo 99 --type pr --context-graph $DKG_CONTEXT_GRAPH
|
|
50
|
+
|
|
51
|
+
# Search ingested knowledge
|
|
52
|
+
github-dkg search "authentication bug" --context-graph $DKG_CONTEXT_GRAPH
|
|
53
|
+
|
|
54
|
+
# Promote a Working Memory asset to Shared Working Memory (SHARE)
|
|
55
|
+
github-dkg promote dkg://wm/turn/abc123 --context-graph $DKG_CONTEXT_GRAPH
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
## GitHub Action
|
|
59
|
+
|
|
60
|
+
Automatically ingest issues and PRs as they are created or updated. Add to `.github/workflows/dkg-ingest.yml`:
|
|
61
|
+
|
|
62
|
+
```yaml
|
|
63
|
+
on:
|
|
64
|
+
issues:
|
|
65
|
+
types: [opened, edited, closed]
|
|
66
|
+
pull_request:
|
|
67
|
+
types: [opened, edited, closed]
|
|
68
|
+
pull_request_review:
|
|
69
|
+
types: [submitted]
|
|
70
|
+
|
|
71
|
+
jobs:
|
|
72
|
+
ingest:
|
|
73
|
+
runs-on: ubuntu-latest
|
|
74
|
+
steps:
|
|
75
|
+
- uses: haroldboom/github-dkg@v0.1.0
|
|
76
|
+
id: ingest
|
|
77
|
+
with:
|
|
78
|
+
dkg-token: ${{ secrets.DKG_TOKEN }}
|
|
79
|
+
dkg-base-url: ${{ secrets.DKG_BASE_URL }}
|
|
80
|
+
dkg-context-graph: ${{ secrets.DKG_CONTEXT_GRAPH }}
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
See `examples/workflow.yml` for a complete example including automatic promotion of architecture-decision PRs to Shared Working Memory.
|
|
84
|
+
|
|
85
|
+
## Python API
|
|
86
|
+
|
|
87
|
+
```python
|
|
88
|
+
import asyncio
|
|
89
|
+
from github_dkg import DKGClient, GitHubClient, GitHubDKGIngestor
|
|
90
|
+
|
|
91
|
+
async def main():
|
|
92
|
+
dkg = DKGClient(base_url="http://localhost:9200", token="your-token")
|
|
93
|
+
gh = GitHubClient(token="your-github-token")
|
|
94
|
+
ingestor = GitHubDKGIngestor(dkg=dkg, github=gh, context_graph_id="cg-123")
|
|
95
|
+
|
|
96
|
+
# Bulk ingest
|
|
97
|
+
result = await ingestor.ingest_repo("owner", "repo", since="2024-01-01")
|
|
98
|
+
print(f"Ingested {result.total} items ({len(result.errors)} errors)")
|
|
99
|
+
|
|
100
|
+
# Single item
|
|
101
|
+
resp = await ingestor.ingest_issue("owner", "repo", 42)
|
|
102
|
+
print(f"Turn URI: {resp['turnUri']}")
|
|
103
|
+
|
|
104
|
+
# Promote to Shared Working Memory
|
|
105
|
+
await ingestor.promote(resp["turnUri"])
|
|
106
|
+
|
|
107
|
+
asyncio.run(main())
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
## `--since` filtering
|
|
111
|
+
|
|
112
|
+
`--since` accepts an ISO 8601 timestamp and limits ingest to items updated after that point.
|
|
113
|
+
|
|
114
|
+
- **Issues:** filtered server-side by GitHub via the `since` parameter on `/issues`.
|
|
115
|
+
- **Pull requests:** GitHub's `/pulls` endpoint has no `since` filter, so the package requests `sort=updated&direction=desc` and stops paginating once results fall below the cutoff. Net result: only PRs touched after `--since` are fetched and ingested.
|
|
116
|
+
|
|
117
|
+
Comment-only updates (a new comment without an issue/PR body edit) still bump `updated_at`, so they're included.
|
|
118
|
+
|
|
119
|
+
## Rate limiting
|
|
120
|
+
|
|
121
|
+
`GitHubClient` raises `github_dkg.github_client.GitHubRateLimitError` when GitHub returns `403`/`429` with `X-RateLimit-Remaining: 0`. The exception carries `reset_at` (unix timestamp) so callers can decide whether to back off, sleep, or fail. Authenticated tokens get 5,000 requests/hour; bulk-ingesting a large repo with many comment-heavy PRs can approach this limit.
|
|
122
|
+
|
|
123
|
+
```python
|
|
124
|
+
from github_dkg.github_client import GitHubRateLimitError
|
|
125
|
+
|
|
126
|
+
try:
|
|
127
|
+
result = await ingestor.ingest_repo("OriginTrail", "dkg-v9")
|
|
128
|
+
except GitHubRateLimitError as e:
|
|
129
|
+
print(f"Rate limited; resets at unix={e.reset_at}")
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
## Memory layers
|
|
133
|
+
|
|
134
|
+
| Layer | Flag | Visibility |
|
|
135
|
+
|---|---|---|
|
|
136
|
+
| Working Memory | `--layer wm` (default) | Private to your node |
|
|
137
|
+
| Shared Working Memory | `--layer swm` | Gossiped across the paranet |
|
|
138
|
+
|
|
139
|
+
Promotion from Working Memory to Shared Working Memory is always explicit — nothing is shared automatically.
|
|
140
|
+
|
|
141
|
+
## License
|
|
142
|
+
|
|
143
|
+
MIT
|
|
@@ -0,0 +1,127 @@
|
|
|
1
|
+
# github-dkg
|
|
2
|
+
|
|
3
|
+
Ingest GitHub issues, pull requests, and review comments into [DKG v10](https://docs.origintrail.io) Working Memory as Knowledge Assets.
|
|
4
|
+
|
|
5
|
+
Every issue and PR becomes a queryable, attributable Knowledge Asset in your DKG v10 node. Key decisions can be promoted to Shared Working Memory — making your team's engineering knowledge accessible to agents.
|
|
6
|
+
|
|
7
|
+
## Demo
|
|
8
|
+
|
|
9
|
+
- **Walkthrough notebook:** [`demo.ipynb`](demo.ipynb) — runs end-to-end against a built-in mock of GitHub and the DKG node, no tokens required. Open in [Colab](https://colab.research.google.com/github/haroldboom/github-dkg/blob/master/demo.ipynb).
|
|
10
|
+
- **Live recording script:** [`examples/demo_video.py`](examples/demo_video.py) — drives all three demos against a real DKG node and the GitHub API; this is the script behind the bounty walkthrough video.
|
|
11
|
+
|
|
12
|
+
## Install
|
|
13
|
+
|
|
14
|
+
```bash
|
|
15
|
+
pip install github-dkg
|
|
16
|
+
```
|
|
17
|
+
|
|
18
|
+
## Quickstart
|
|
19
|
+
|
|
20
|
+
```bash
|
|
21
|
+
export DKG_TOKEN=your-dkg-token
|
|
22
|
+
export DKG_BASE_URL=http://localhost:9200
|
|
23
|
+
export DKG_CONTEXT_GRAPH=your-context-graph-id
|
|
24
|
+
export GITHUB_TOKEN=your-github-token
|
|
25
|
+
|
|
26
|
+
# Bulk-ingest all issues and PRs from a repository
|
|
27
|
+
github-dkg ingest owner/repo --context-graph $DKG_CONTEXT_GRAPH
|
|
28
|
+
|
|
29
|
+
# Ingest a single issue
|
|
30
|
+
github-dkg ingest-one owner/repo 42 --type issue --context-graph $DKG_CONTEXT_GRAPH
|
|
31
|
+
|
|
32
|
+
# Ingest a single PR
|
|
33
|
+
github-dkg ingest-one owner/repo 99 --type pr --context-graph $DKG_CONTEXT_GRAPH
|
|
34
|
+
|
|
35
|
+
# Search ingested knowledge
|
|
36
|
+
github-dkg search "authentication bug" --context-graph $DKG_CONTEXT_GRAPH
|
|
37
|
+
|
|
38
|
+
# Promote a Working Memory asset to Shared Working Memory (SHARE)
|
|
39
|
+
github-dkg promote dkg://wm/turn/abc123 --context-graph $DKG_CONTEXT_GRAPH
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
## GitHub Action
|
|
43
|
+
|
|
44
|
+
Automatically ingest issues and PRs as they are created or updated. Add to `.github/workflows/dkg-ingest.yml`:
|
|
45
|
+
|
|
46
|
+
```yaml
|
|
47
|
+
on:
|
|
48
|
+
issues:
|
|
49
|
+
types: [opened, edited, closed]
|
|
50
|
+
pull_request:
|
|
51
|
+
types: [opened, edited, closed]
|
|
52
|
+
pull_request_review:
|
|
53
|
+
types: [submitted]
|
|
54
|
+
|
|
55
|
+
jobs:
|
|
56
|
+
ingest:
|
|
57
|
+
runs-on: ubuntu-latest
|
|
58
|
+
steps:
|
|
59
|
+
- uses: haroldboom/github-dkg@v0.1.0
|
|
60
|
+
id: ingest
|
|
61
|
+
with:
|
|
62
|
+
dkg-token: ${{ secrets.DKG_TOKEN }}
|
|
63
|
+
dkg-base-url: ${{ secrets.DKG_BASE_URL }}
|
|
64
|
+
dkg-context-graph: ${{ secrets.DKG_CONTEXT_GRAPH }}
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
See `examples/workflow.yml` for a complete example including automatic promotion of architecture-decision PRs to Shared Working Memory.
|
|
68
|
+
|
|
69
|
+
## Python API
|
|
70
|
+
|
|
71
|
+
```python
|
|
72
|
+
import asyncio
|
|
73
|
+
from github_dkg import DKGClient, GitHubClient, GitHubDKGIngestor
|
|
74
|
+
|
|
75
|
+
async def main():
|
|
76
|
+
dkg = DKGClient(base_url="http://localhost:9200", token="your-token")
|
|
77
|
+
gh = GitHubClient(token="your-github-token")
|
|
78
|
+
ingestor = GitHubDKGIngestor(dkg=dkg, github=gh, context_graph_id="cg-123")
|
|
79
|
+
|
|
80
|
+
# Bulk ingest
|
|
81
|
+
result = await ingestor.ingest_repo("owner", "repo", since="2024-01-01")
|
|
82
|
+
print(f"Ingested {result.total} items ({len(result.errors)} errors)")
|
|
83
|
+
|
|
84
|
+
# Single item
|
|
85
|
+
resp = await ingestor.ingest_issue("owner", "repo", 42)
|
|
86
|
+
print(f"Turn URI: {resp['turnUri']}")
|
|
87
|
+
|
|
88
|
+
# Promote to Shared Working Memory
|
|
89
|
+
await ingestor.promote(resp["turnUri"])
|
|
90
|
+
|
|
91
|
+
asyncio.run(main())
|
|
92
|
+
```
|
|
93
|
+
|
|
94
|
+
## `--since` filtering
|
|
95
|
+
|
|
96
|
+
`--since` accepts an ISO 8601 timestamp and limits ingest to items updated after that point.
|
|
97
|
+
|
|
98
|
+
- **Issues:** filtered server-side by GitHub via the `since` parameter on `/issues`.
|
|
99
|
+
- **Pull requests:** GitHub's `/pulls` endpoint has no `since` filter, so the package requests `sort=updated&direction=desc` and stops paginating once results fall below the cutoff. Net result: only PRs touched after `--since` are fetched and ingested.
|
|
100
|
+
|
|
101
|
+
Comment-only updates (a new comment without an issue/PR body edit) still bump `updated_at`, so they're included.
|
|
102
|
+
|
|
103
|
+
## Rate limiting
|
|
104
|
+
|
|
105
|
+
`GitHubClient` raises `github_dkg.github_client.GitHubRateLimitError` when GitHub returns `403`/`429` with `X-RateLimit-Remaining: 0`. The exception carries `reset_at` (unix timestamp) so callers can decide whether to back off, sleep, or fail. Authenticated tokens get 5,000 requests/hour; bulk-ingesting a large repo with many comment-heavy PRs can approach this limit.
|
|
106
|
+
|
|
107
|
+
```python
|
|
108
|
+
from github_dkg.github_client import GitHubRateLimitError
|
|
109
|
+
|
|
110
|
+
try:
|
|
111
|
+
result = await ingestor.ingest_repo("OriginTrail", "dkg-v9")
|
|
112
|
+
except GitHubRateLimitError as e:
|
|
113
|
+
print(f"Rate limited; resets at unix={e.reset_at}")
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
## Memory layers
|
|
117
|
+
|
|
118
|
+
| Layer | Flag | Visibility |
|
|
119
|
+
|---|---|---|
|
|
120
|
+
| Working Memory | `--layer wm` (default) | Private to your node |
|
|
121
|
+
| Shared Working Memory | `--layer swm` | Gossiped across the paranet |
|
|
122
|
+
|
|
123
|
+
Promotion from Working Memory to Shared Working Memory is always explicit — nothing is shared automatically.
|
|
124
|
+
|
|
125
|
+
## License
|
|
126
|
+
|
|
127
|
+
MIT
|
|
@@ -0,0 +1,57 @@
|
|
|
1
|
+
name: 'DKG Working Memory Ingest'
|
|
2
|
+
description: 'Ingest GitHub issues, PRs, and review comments into DKG v10 Working Memory as Knowledge Assets'
|
|
3
|
+
author: 'github-dkg'
|
|
4
|
+
|
|
5
|
+
branding:
|
|
6
|
+
icon: 'database'
|
|
7
|
+
color: 'purple'
|
|
8
|
+
|
|
9
|
+
inputs:
|
|
10
|
+
dkg-token:
|
|
11
|
+
description: 'DKG v10 node bearer token'
|
|
12
|
+
required: true
|
|
13
|
+
dkg-base-url:
|
|
14
|
+
description: 'DKG v10 node base URL (e.g. http://my-node:9200)'
|
|
15
|
+
required: true
|
|
16
|
+
dkg-context-graph:
|
|
17
|
+
description: 'Context Graph ID to write into'
|
|
18
|
+
required: true
|
|
19
|
+
github-token:
|
|
20
|
+
description: 'GitHub token for API access'
|
|
21
|
+
required: false
|
|
22
|
+
default: ${{ github.token }}
|
|
23
|
+
layer:
|
|
24
|
+
description: 'Memory layer — "wm" (Working Memory, private) or "swm" (Shared Working Memory, gossiped)'
|
|
25
|
+
required: false
|
|
26
|
+
default: 'wm'
|
|
27
|
+
event-type:
|
|
28
|
+
description: 'Which GitHub event triggered this action (issues, pull_request, pull_request_review). Auto-detected from GITHUB_EVENT_NAME if omitted.'
|
|
29
|
+
required: false
|
|
30
|
+
default: ''
|
|
31
|
+
item-number:
|
|
32
|
+
description: 'Issue or PR number to ingest. Auto-detected from the event payload if omitted.'
|
|
33
|
+
required: false
|
|
34
|
+
default: ''
|
|
35
|
+
|
|
36
|
+
outputs:
|
|
37
|
+
turn-uri:
|
|
38
|
+
description: 'UAL of the Knowledge Asset written to Working Memory'
|
|
39
|
+
value: ${{ steps.ingest.outputs.turn-uri }}
|
|
40
|
+
layer:
|
|
41
|
+
description: 'Memory layer the asset was written to'
|
|
42
|
+
value: ${{ steps.ingest.outputs.layer }}
|
|
43
|
+
|
|
44
|
+
runs:
|
|
45
|
+
using: 'docker'
|
|
46
|
+
image: 'Dockerfile'
|
|
47
|
+
env:
|
|
48
|
+
DKG_TOKEN: ${{ inputs.dkg-token }}
|
|
49
|
+
DKG_BASE_URL: ${{ inputs.dkg-base-url }}
|
|
50
|
+
DKG_CONTEXT_GRAPH: ${{ inputs.dkg-context-graph }}
|
|
51
|
+
GITHUB_TOKEN: ${{ inputs.github-token }}
|
|
52
|
+
INPUT_LAYER: ${{ inputs.layer }}
|
|
53
|
+
INPUT_EVENT_TYPE: ${{ inputs.event-type }}
|
|
54
|
+
INPUT_ITEM_NUMBER: ${{ inputs.item-number }}
|
|
55
|
+
GITHUB_REPOSITORY: ${{ github.repository }}
|
|
56
|
+
GITHUB_EVENT_NAME: ${{ github.event_name }}
|
|
57
|
+
GITHUB_EVENT_PATH: ${{ github.event_path }}
|