PyPI - knwler - Versions diffs - 0.3.0__tar.gz - Mend

knwler 0.3.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

knwler-0.3.0/PKG-INFO +198 -0
knwler-0.3.0/README.md +184 -0
knwler-0.3.0/knwler.egg-info/PKG-INFO +198 -0
knwler-0.3.0/knwler.egg-info/SOURCES.txt +7 -0
knwler-0.3.0/knwler.egg-info/dependency_links.txt +1 -0
knwler-0.3.0/knwler.egg-info/requires.txt +7 -0
knwler-0.3.0/knwler.egg-info/top_level.txt +1 -0
knwler-0.3.0/pyproject.toml +15 -0
knwler-0.3.0/setup.cfg +4 -0

knwler-0.3.0/PKG-INFO ADDED Viewed

@@ -0,0 +1,198 @@
+Metadata-Version: 2.4
+Name: knwler
+Version: 0.3.0
+Summary: Fast and accurate graph extraction from text using LLMs
+Requires-Python: >=3.12
+Description-Content-Type: text/markdown
+Requires-Dist: aiohttp>=3.13.3
+Requires-Dist: tiktoken>=0.12.0
+Requires-Dist: typer>=0.21.1
+Requires-Dist: rich>=14.3.2
+Requires-Dist: networkx>=3.6.1
+Requires-Dist: pymupdf4llm[layout,ocr]>=0.2.9
+Requires-Dist: jinja2>=3.1.6
+# Knwler
+**Turn any document into a structured knowledge graph**
+Knwler is a lightweight, single-file Python tool that extracts structured knowledge graphs from documents using AI. Feed it a PDF or text file and receive a richly connected network of entities, relationships, and topics — complete with an interactive HTML report and exports ready for your favorite graph analytics platform.
+Built for compliance teams, legal departments, research analysts, and anyone who needs to rapidly understand the structure hidden inside dense documents.
+![](./Screenshot1.png)
+![](./Screenshot2.png)
+---
+## Why Knwler?
+| Challenge | How Knwler Solves It |
+|---|---|
+| Manually mapping entities and relationships in 100+ page regulatory documents | Automated extraction produces a navigable knowledge graph in minutes |
+| Expensive vendor lock-in for document intelligence | Runs fully local with Ollama (zero data leaves your machine) or via OpenAI for speed |
+| Documents in multiple languages across jurisdictions | Auto-detects language and adapts all prompts — supports English, German, French, Spanish, and Dutch out of the box |
+| Results trapped inside one tool | Exports to HTML, GML, GraphML, and raw JSON — import directly into Neo4j, Gephi, yEd, Memgraph, SurrealDB, or any graph platform |
+| High per-document processing costs | ~$0.20 per 20-page PDF with OpenAI/GPT-4o; completely free when running locally; LLM response caching means re-runs cost nothing |
+---
+## Key Features
+### Dual LLM Backend — Cloud or Fully Local
+Choose between **OpenAI** for maximum speed, or **Ollama** for fully offline, air-gapped operation. Qwen 2.5 at 3B–14B parameters delivers strong results locally. You can even switch backends between runs and incrementally augment the same graph.
+### Automatic Schema Discovery
+The pipeline analyzes a sample of your document and **infers the optimal entity types and relation types** — no manual ontology engineering required. You can also supply a schema if you wish. A schema is a set of types of entities (person, concept, location...) and relations (knows, has_accepted, has_signed...).
+### Multilingual by Design
+Language is **auto-detected** on every run. All prompts (summarization, extraction, community labeling) and all console/UI output are localized. Adding a new language is as simple as extending a single JSON file.
+### Incremental & Augmentable
+Re-run on new documents or updated schemas and **the existing graph is augmented** rather than rebuilt. Entity descriptions from multiple sources are intelligently consolidated via LLM-powered summarization.
+### Community Detection & Topic Assignment
+The Louvain algorithm automatically **discovers clusters of related entities** and an LLM labels each community with human-readable topics — giving you instant thematic insight into the document's structure.
+### Self-Contained HTML Report
+Export a **single HTML file** with interactive Cytoscape.js network visualization, entity index, topic overview, and rephrased text chunks — shareable without any server or dependencies.
+### Rich Export Ecosystem
+- **JSON** — the canonical output; import into Neo4j, Memgraph, SurrealDB, or generate vector embeddings
+- **GML / GraphML** — open directly in yEd, Gephi, or any standards-compliant graph tool
+- **HTML** — standalone interactive report
+### Intelligent Caching
+Every LLM call is **hashed and cached** locally. Re-generating reports, tweaking export settings, or re-running with a different schema costs zero additional API calls.
+### Human-Readable Chunk Rephrasing
+Each text chunk is rephrased for readability alongside the original, making the report accessible to non-expert stakeholders while preserving full traceability to source text.
+### PDF & Text Ingestion
+Handles **PDF-to-text extraction** (via PyMuPDF) as well as plain text and Markdown files. Extracted text is cached to avoid redundant PDF parsing on subsequent runs.
+### Portable & Minimal
+A **single Python file (~2,000 lines)**, managed via `uv` with minimal dependencies. No database, no backend server, no Docker required.
+---
+## Cost & Performance
+| Scenario | Time (20-page PDF) | Cost |
+|---|---|---|
+| OpenAI GPT-4o / GPT-4o-mini | ~2–4 minutes | ~$0.20 |
+| Ollama Qwen 2.5 (Mac M4 Pro, 64 GB) | ~20–40 minutes | Free |
+| Cached re-run (any backend) | Seconds | Free |
+---
+## Quick Start
+```bash
+# Install dependencies
+uv sync
+# Run with OpenAI
+uv run main.py --openai -f document.pdf
+# Run fully local with Ollama
+uv run main.py -f document.pdf
+# Re-export HTML only (no LLM calls)
+uv run main.py --html-only
+```
+> **Tip:** When running Ollama locally, launch it via CLI with parallel processing for best throughput:
+> ```bash
+> OLLAMA_NUM_PARALLEL=8 ollama serve
+> ```
+> Adjust the number based on your machine specs (8 is suitable for a Mac M4 Pro with 64 GB RAM).
+## CLI Options
+| Option | Description |
+|---|---|
+| `--file`, `-f` | Input PDF or text file |
+| `--openai` | Use OpenAI API instead of Ollama |
+| `--extraction-model`, `-e` | Model for chunk extraction (default: `qwen2.5:3b` / `gpt-4o-mini`) |
+| `--discovery-model`, `-d` | Model for schema discovery (default: `qwen2.5:14b` / `gpt-4o`) |
+| `--concurrent`, `-c` | Max concurrent LLM requests (default: 10) |
+| `--max-tokens` | Max tokens per chunk (default: 400) |
+| `--no-discovery` | Skip schema discovery, use built-in defaults |
+| `--no-cache` | Disable LLM response caching |
+| `--language`, `-l` | Force language code (e.g., `en`, `de`, `fr`) — auto-detects if omitted |
+| `--url`, `-u` | Source URL for metadata |
+| `--output`, `-o` | Output JSON filename (saved to `results/`) |
+| `--html-report` | Generate HTML report (default: on) |
+| `--gml-export` | Generate GML graph file (default: on) |
+| `--html-only` | Re-export HTML from existing results without re-running extraction |
+## Examples
+```bash
+# EU AI Act (English)
+uv run main.py --openai \
+  --url "https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=OJ:L_202401689" \
+  -f samples/EUAI.pdf
+# NIST AI Risk Management Framework
+uv run main.py --openai \
+  --url "https://nvlpubs.nist.gov/nistpubs/ai/nist.ai.100-1.pdf" \
+  -f samples/Nist.pdf
+# Belgian Civil Code (Dutch — auto-detected)
+uv run main.py --openai \
+  --url "https://www.ejustice.just.fgov.be/cgi/article_body.pl?language=nl&pub_date=2022-07-01&caller=list&numac=2022032058" \
+  -f samples/BurgerlijkBoek5.pdf
+# Deloitte Sustainability Report (German — auto-detected)
+uv run main.py --openai \
+  --url "https://www.deloitte.com/de/de/legal/publikationen.html" \
+  -f examples/Deloitte/Deloitte-Nachhaltigkeitsbericht-2024.pdf
+```
+## Integration
+The raw JSON output is designed for downstream integration:
+- **Import into Neo4j / Memgraph / SurrealDB** — entities and relations map directly to nodes and edges
+- **Generate vector embeddings** — use entity descriptions for semantic search
+- **Feed into n8n workflows** — connect document intelligence to CRM, alerting, or reporting pipelines without code
+- **Visualize in yEd or Gephi** — open the GML/GraphML export for advanced layout and analysis
+---
+## Examples
+You can find example reports and raw graph data in diverse languages in the `examples` directory.
+## Language
+Everything language related sits in the `languages.json` and this contains both the language-specific prompts as well as the text used for console output.
+You can easily add additional languages, simply ask Copilot, Gemini or any AI to translate the JSON.
+## OpenAI Key
+If you run the process in your terminal the code will look for the usual `OPENAI_API_KEY`.
+You can assign it explicitly via a terminal export
+```bash
+export OPENAI_API_KEY=...
+```
+or in the code (look for `os.environ.get("OPENAI_API_KEY", "")`).
+## Ollama
+Ollama is just a convenient local LLM service, you can use LMStudio or any other service.
+The default model is Qwen 2.5 but here as well, experiment and see what works best for you.
+We have done lots of benchmarks and bigger models are not better, sometimes quite the opposite. Small models of 3 or 7 billion parameters will be fine and a lot faster.
+Thinking, in particular, is really standing in the way of graph extraction. Whatever you do, don't enable thinking and don't use advanced MOE models.
+## Disclaimer
+The information extracted by Knwler is generated via machine learning and natural language processing, which may result in errors, omissions, or misinterpretations of the original source material. This tool is provided "as is" for informational purposes only. Users are advised to independently verify any critical data against original source documents before making business, legal, or financial decisions.
+---
+*Built by [Orbifold Consulting](https://orbifold.net) and inspired by [Knwl](https://knwl.ai)*.

knwler-0.3.0/README.md ADDED Viewed

@@ -0,0 +1,184 @@
+# Knwler
+**Turn any document into a structured knowledge graph**
+Knwler is a lightweight, single-file Python tool that extracts structured knowledge graphs from documents using AI. Feed it a PDF or text file and receive a richly connected network of entities, relationships, and topics — complete with an interactive HTML report and exports ready for your favorite graph analytics platform.
+Built for compliance teams, legal departments, research analysts, and anyone who needs to rapidly understand the structure hidden inside dense documents.
+![](./Screenshot1.png)
+![](./Screenshot2.png)
+---
+## Why Knwler?
+| Challenge | How Knwler Solves It |
+|---|---|
+| Manually mapping entities and relationships in 100+ page regulatory documents | Automated extraction produces a navigable knowledge graph in minutes |
+| Expensive vendor lock-in for document intelligence | Runs fully local with Ollama (zero data leaves your machine) or via OpenAI for speed |
+| Documents in multiple languages across jurisdictions | Auto-detects language and adapts all prompts — supports English, German, French, Spanish, and Dutch out of the box |
+| Results trapped inside one tool | Exports to HTML, GML, GraphML, and raw JSON — import directly into Neo4j, Gephi, yEd, Memgraph, SurrealDB, or any graph platform |
+| High per-document processing costs | ~$0.20 per 20-page PDF with OpenAI/GPT-4o; completely free when running locally; LLM response caching means re-runs cost nothing |
+---
+## Key Features
+### Dual LLM Backend — Cloud or Fully Local
+Choose between **OpenAI** for maximum speed, or **Ollama** for fully offline, air-gapped operation. Qwen 2.5 at 3B–14B parameters delivers strong results locally. You can even switch backends between runs and incrementally augment the same graph.
+### Automatic Schema Discovery
+The pipeline analyzes a sample of your document and **infers the optimal entity types and relation types** — no manual ontology engineering required. You can also supply a schema if you wish. A schema is a set of types of entities (person, concept, location...) and relations (knows, has_accepted, has_signed...).
+### Multilingual by Design
+Language is **auto-detected** on every run. All prompts (summarization, extraction, community labeling) and all console/UI output are localized. Adding a new language is as simple as extending a single JSON file.
+### Incremental & Augmentable
+Re-run on new documents or updated schemas and **the existing graph is augmented** rather than rebuilt. Entity descriptions from multiple sources are intelligently consolidated via LLM-powered summarization.
+### Community Detection & Topic Assignment
+The Louvain algorithm automatically **discovers clusters of related entities** and an LLM labels each community with human-readable topics — giving you instant thematic insight into the document's structure.
+### Self-Contained HTML Report
+Export a **single HTML file** with interactive Cytoscape.js network visualization, entity index, topic overview, and rephrased text chunks — shareable without any server or dependencies.
+### Rich Export Ecosystem
+- **JSON** — the canonical output; import into Neo4j, Memgraph, SurrealDB, or generate vector embeddings
+- **GML / GraphML** — open directly in yEd, Gephi, or any standards-compliant graph tool
+- **HTML** — standalone interactive report
+### Intelligent Caching
+Every LLM call is **hashed and cached** locally. Re-generating reports, tweaking export settings, or re-running with a different schema costs zero additional API calls.
+### Human-Readable Chunk Rephrasing
+Each text chunk is rephrased for readability alongside the original, making the report accessible to non-expert stakeholders while preserving full traceability to source text.
+### PDF & Text Ingestion
+Handles **PDF-to-text extraction** (via PyMuPDF) as well as plain text and Markdown files. Extracted text is cached to avoid redundant PDF parsing on subsequent runs.
+### Portable & Minimal
+A **single Python file (~2,000 lines)**, managed via `uv` with minimal dependencies. No database, no backend server, no Docker required.
+---
+## Cost & Performance
+| Scenario | Time (20-page PDF) | Cost |
+|---|---|---|
+| OpenAI GPT-4o / GPT-4o-mini | ~2–4 minutes | ~$0.20 |
+| Ollama Qwen 2.5 (Mac M4 Pro, 64 GB) | ~20–40 minutes | Free |
+| Cached re-run (any backend) | Seconds | Free |
+---
+## Quick Start
+```bash
+# Install dependencies
+uv sync
+# Run with OpenAI
+uv run main.py --openai -f document.pdf
+# Run fully local with Ollama
+uv run main.py -f document.pdf
+# Re-export HTML only (no LLM calls)
+uv run main.py --html-only
+```
+> **Tip:** When running Ollama locally, launch it via CLI with parallel processing for best throughput:
+> ```bash
+> OLLAMA_NUM_PARALLEL=8 ollama serve
+> ```
+> Adjust the number based on your machine specs (8 is suitable for a Mac M4 Pro with 64 GB RAM).
+## CLI Options
+| Option | Description |
+|---|---|
+| `--file`, `-f` | Input PDF or text file |
+| `--openai` | Use OpenAI API instead of Ollama |
+| `--extraction-model`, `-e` | Model for chunk extraction (default: `qwen2.5:3b` / `gpt-4o-mini`) |
+| `--discovery-model`, `-d` | Model for schema discovery (default: `qwen2.5:14b` / `gpt-4o`) |
+| `--concurrent`, `-c` | Max concurrent LLM requests (default: 10) |
+| `--max-tokens` | Max tokens per chunk (default: 400) |
+| `--no-discovery` | Skip schema discovery, use built-in defaults |
+| `--no-cache` | Disable LLM response caching |
+| `--language`, `-l` | Force language code (e.g., `en`, `de`, `fr`) — auto-detects if omitted |
+| `--url`, `-u` | Source URL for metadata |
+| `--output`, `-o` | Output JSON filename (saved to `results/`) |
+| `--html-report` | Generate HTML report (default: on) |
+| `--gml-export` | Generate GML graph file (default: on) |
+| `--html-only` | Re-export HTML from existing results without re-running extraction |
+## Examples
+```bash
+# EU AI Act (English)
+uv run main.py --openai \
+  --url "https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=OJ:L_202401689" \
+  -f samples/EUAI.pdf
+# NIST AI Risk Management Framework
+uv run main.py --openai \
+  --url "https://nvlpubs.nist.gov/nistpubs/ai/nist.ai.100-1.pdf" \
+  -f samples/Nist.pdf
+# Belgian Civil Code (Dutch — auto-detected)
+uv run main.py --openai \
+  --url "https://www.ejustice.just.fgov.be/cgi/article_body.pl?language=nl&pub_date=2022-07-01&caller=list&numac=2022032058" \
+  -f samples/BurgerlijkBoek5.pdf
+# Deloitte Sustainability Report (German — auto-detected)
+uv run main.py --openai \
+  --url "https://www.deloitte.com/de/de/legal/publikationen.html" \
+  -f examples/Deloitte/Deloitte-Nachhaltigkeitsbericht-2024.pdf
+```
+## Integration
+The raw JSON output is designed for downstream integration:
+- **Import into Neo4j / Memgraph / SurrealDB** — entities and relations map directly to nodes and edges
+- **Generate vector embeddings** — use entity descriptions for semantic search
+- **Feed into n8n workflows** — connect document intelligence to CRM, alerting, or reporting pipelines without code
+- **Visualize in yEd or Gephi** — open the GML/GraphML export for advanced layout and analysis
+---
+## Examples
+You can find example reports and raw graph data in diverse languages in the `examples` directory.
+## Language
+Everything language related sits in the `languages.json` and this contains both the language-specific prompts as well as the text used for console output.
+You can easily add additional languages, simply ask Copilot, Gemini or any AI to translate the JSON.
+## OpenAI Key
+If you run the process in your terminal the code will look for the usual `OPENAI_API_KEY`.
+You can assign it explicitly via a terminal export
+```bash
+export OPENAI_API_KEY=...
+```
+or in the code (look for `os.environ.get("OPENAI_API_KEY", "")`).
+## Ollama
+Ollama is just a convenient local LLM service, you can use LMStudio or any other service.
+The default model is Qwen 2.5 but here as well, experiment and see what works best for you.
+We have done lots of benchmarks and bigger models are not better, sometimes quite the opposite. Small models of 3 or 7 billion parameters will be fine and a lot faster.
+Thinking, in particular, is really standing in the way of graph extraction. Whatever you do, don't enable thinking and don't use advanced MOE models.
+## Disclaimer
+The information extracted by Knwler is generated via machine learning and natural language processing, which may result in errors, omissions, or misinterpretations of the original source material. This tool is provided "as is" for informational purposes only. Users are advised to independently verify any critical data against original source documents before making business, legal, or financial decisions.
+---
+*Built by [Orbifold Consulting](https://orbifold.net) and inspired by [Knwl](https://knwl.ai)*.

knwler-0.3.0/knwler.egg-info/PKG-INFO ADDED Viewed

@@ -0,0 +1,198 @@
+Metadata-Version: 2.4
+Name: knwler
+Version: 0.3.0
+Summary: Fast and accurate graph extraction from text using LLMs
+Requires-Python: >=3.12
+Description-Content-Type: text/markdown
+Requires-Dist: aiohttp>=3.13.3
+Requires-Dist: tiktoken>=0.12.0
+Requires-Dist: typer>=0.21.1
+Requires-Dist: rich>=14.3.2
+Requires-Dist: networkx>=3.6.1
+Requires-Dist: pymupdf4llm[layout,ocr]>=0.2.9
+Requires-Dist: jinja2>=3.1.6
+# Knwler
+**Turn any document into a structured knowledge graph**
+Knwler is a lightweight, single-file Python tool that extracts structured knowledge graphs from documents using AI. Feed it a PDF or text file and receive a richly connected network of entities, relationships, and topics — complete with an interactive HTML report and exports ready for your favorite graph analytics platform.
+Built for compliance teams, legal departments, research analysts, and anyone who needs to rapidly understand the structure hidden inside dense documents.
+![](./Screenshot1.png)
+![](./Screenshot2.png)
+---
+## Why Knwler?
+| Challenge | How Knwler Solves It |
+|---|---|
+| Manually mapping entities and relationships in 100+ page regulatory documents | Automated extraction produces a navigable knowledge graph in minutes |
+| Expensive vendor lock-in for document intelligence | Runs fully local with Ollama (zero data leaves your machine) or via OpenAI for speed |
+| Documents in multiple languages across jurisdictions | Auto-detects language and adapts all prompts — supports English, German, French, Spanish, and Dutch out of the box |
+| Results trapped inside one tool | Exports to HTML, GML, GraphML, and raw JSON — import directly into Neo4j, Gephi, yEd, Memgraph, SurrealDB, or any graph platform |
+| High per-document processing costs | ~$0.20 per 20-page PDF with OpenAI/GPT-4o; completely free when running locally; LLM response caching means re-runs cost nothing |
+---
+## Key Features
+### Dual LLM Backend — Cloud or Fully Local
+Choose between **OpenAI** for maximum speed, or **Ollama** for fully offline, air-gapped operation. Qwen 2.5 at 3B–14B parameters delivers strong results locally. You can even switch backends between runs and incrementally augment the same graph.
+### Automatic Schema Discovery
+The pipeline analyzes a sample of your document and **infers the optimal entity types and relation types** — no manual ontology engineering required. You can also supply a schema if you wish. A schema is a set of types of entities (person, concept, location...) and relations (knows, has_accepted, has_signed...).
+### Multilingual by Design
+Language is **auto-detected** on every run. All prompts (summarization, extraction, community labeling) and all console/UI output are localized. Adding a new language is as simple as extending a single JSON file.
+### Incremental & Augmentable
+Re-run on new documents or updated schemas and **the existing graph is augmented** rather than rebuilt. Entity descriptions from multiple sources are intelligently consolidated via LLM-powered summarization.
+### Community Detection & Topic Assignment
+The Louvain algorithm automatically **discovers clusters of related entities** and an LLM labels each community with human-readable topics — giving you instant thematic insight into the document's structure.
+### Self-Contained HTML Report
+Export a **single HTML file** with interactive Cytoscape.js network visualization, entity index, topic overview, and rephrased text chunks — shareable without any server or dependencies.
+### Rich Export Ecosystem
+- **JSON** — the canonical output; import into Neo4j, Memgraph, SurrealDB, or generate vector embeddings
+- **GML / GraphML** — open directly in yEd, Gephi, or any standards-compliant graph tool
+- **HTML** — standalone interactive report
+### Intelligent Caching
+Every LLM call is **hashed and cached** locally. Re-generating reports, tweaking export settings, or re-running with a different schema costs zero additional API calls.
+### Human-Readable Chunk Rephrasing
+Each text chunk is rephrased for readability alongside the original, making the report accessible to non-expert stakeholders while preserving full traceability to source text.
+### PDF & Text Ingestion
+Handles **PDF-to-text extraction** (via PyMuPDF) as well as plain text and Markdown files. Extracted text is cached to avoid redundant PDF parsing on subsequent runs.
+### Portable & Minimal
+A **single Python file (~2,000 lines)**, managed via `uv` with minimal dependencies. No database, no backend server, no Docker required.
+---
+## Cost & Performance
+| Scenario | Time (20-page PDF) | Cost |
+|---|---|---|
+| OpenAI GPT-4o / GPT-4o-mini | ~2–4 minutes | ~$0.20 |
+| Ollama Qwen 2.5 (Mac M4 Pro, 64 GB) | ~20–40 minutes | Free |
+| Cached re-run (any backend) | Seconds | Free |
+---
+## Quick Start
+```bash
+# Install dependencies
+uv sync
+# Run with OpenAI
+uv run main.py --openai -f document.pdf
+# Run fully local with Ollama
+uv run main.py -f document.pdf
+# Re-export HTML only (no LLM calls)
+uv run main.py --html-only
+```
+> **Tip:** When running Ollama locally, launch it via CLI with parallel processing for best throughput:
+> ```bash
+> OLLAMA_NUM_PARALLEL=8 ollama serve
+> ```
+> Adjust the number based on your machine specs (8 is suitable for a Mac M4 Pro with 64 GB RAM).
+## CLI Options
+| Option | Description |
+|---|---|
+| `--file`, `-f` | Input PDF or text file |
+| `--openai` | Use OpenAI API instead of Ollama |
+| `--extraction-model`, `-e` | Model for chunk extraction (default: `qwen2.5:3b` / `gpt-4o-mini`) |
+| `--discovery-model`, `-d` | Model for schema discovery (default: `qwen2.5:14b` / `gpt-4o`) |
+| `--concurrent`, `-c` | Max concurrent LLM requests (default: 10) |
+| `--max-tokens` | Max tokens per chunk (default: 400) |
+| `--no-discovery` | Skip schema discovery, use built-in defaults |
+| `--no-cache` | Disable LLM response caching |
+| `--language`, `-l` | Force language code (e.g., `en`, `de`, `fr`) — auto-detects if omitted |
+| `--url`, `-u` | Source URL for metadata |
+| `--output`, `-o` | Output JSON filename (saved to `results/`) |
+| `--html-report` | Generate HTML report (default: on) |
+| `--gml-export` | Generate GML graph file (default: on) |
+| `--html-only` | Re-export HTML from existing results without re-running extraction |
+## Examples
+```bash
+# EU AI Act (English)
+uv run main.py --openai \
+  --url "https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=OJ:L_202401689" \
+  -f samples/EUAI.pdf
+# NIST AI Risk Management Framework
+uv run main.py --openai \
+  --url "https://nvlpubs.nist.gov/nistpubs/ai/nist.ai.100-1.pdf" \
+  -f samples/Nist.pdf
+# Belgian Civil Code (Dutch — auto-detected)
+uv run main.py --openai \
+  --url "https://www.ejustice.just.fgov.be/cgi/article_body.pl?language=nl&pub_date=2022-07-01&caller=list&numac=2022032058" \
+  -f samples/BurgerlijkBoek5.pdf
+# Deloitte Sustainability Report (German — auto-detected)
+uv run main.py --openai \
+  --url "https://www.deloitte.com/de/de/legal/publikationen.html" \
+  -f examples/Deloitte/Deloitte-Nachhaltigkeitsbericht-2024.pdf
+```
+## Integration
+The raw JSON output is designed for downstream integration:
+- **Import into Neo4j / Memgraph / SurrealDB** — entities and relations map directly to nodes and edges
+- **Generate vector embeddings** — use entity descriptions for semantic search
+- **Feed into n8n workflows** — connect document intelligence to CRM, alerting, or reporting pipelines without code
+- **Visualize in yEd or Gephi** — open the GML/GraphML export for advanced layout and analysis
+---
+## Examples
+You can find example reports and raw graph data in diverse languages in the `examples` directory.
+## Language
+Everything language related sits in the `languages.json` and this contains both the language-specific prompts as well as the text used for console output.
+You can easily add additional languages, simply ask Copilot, Gemini or any AI to translate the JSON.
+## OpenAI Key
+If you run the process in your terminal the code will look for the usual `OPENAI_API_KEY`.
+You can assign it explicitly via a terminal export
+```bash
+export OPENAI_API_KEY=...
+```
+or in the code (look for `os.environ.get("OPENAI_API_KEY", "")`).
+## Ollama
+Ollama is just a convenient local LLM service, you can use LMStudio or any other service.
+The default model is Qwen 2.5 but here as well, experiment and see what works best for you.
+We have done lots of benchmarks and bigger models are not better, sometimes quite the opposite. Small models of 3 or 7 billion parameters will be fine and a lot faster.
+Thinking, in particular, is really standing in the way of graph extraction. Whatever you do, don't enable thinking and don't use advanced MOE models.
+## Disclaimer
+The information extracted by Knwler is generated via machine learning and natural language processing, which may result in errors, omissions, or misinterpretations of the original source material. This tool is provided "as is" for informational purposes only. Users are advised to independently verify any critical data against original source documents before making business, legal, or financial decisions.
+---
+*Built by [Orbifold Consulting](https://orbifold.net) and inspired by [Knwl](https://knwl.ai)*.

knwler-0.3.0/knwler.egg-info/SOURCES.txt ADDED Viewed

@@ -0,0 +1,7 @@
+README.md
+pyproject.toml
+knwler.egg-info/PKG-INFO
+knwler.egg-info/SOURCES.txt
+knwler.egg-info/dependency_links.txt
+knwler.egg-info/requires.txt
+knwler.egg-info/top_level.txt

knwler-0.3.0/knwler.egg-info/dependency_links.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+

knwler-0.3.0/knwler.egg-info/requires.txt ADDED Viewed

@@ -0,0 +1,7 @@
+aiohttp>=3.13.3
+tiktoken>=0.12.0
+typer>=0.21.1
+rich>=14.3.2
+networkx>=3.6.1
+pymupdf4llm[layout,ocr]>=0.2.9
+jinja2>=3.1.6

knwler-0.3.0/knwler.egg-info/top_level.txt ADDED Viewed

	@@ -0,0 +1 @@
1	+ templates

knwler-0.3.0/pyproject.toml ADDED Viewed

@@ -0,0 +1,15 @@
+[project]
+name = "knwler"
+version = "0.3.0"
+description = "Fast and accurate graph extraction from text using LLMs"
+readme = "README.md"
+requires-python = ">=3.12"
+dependencies = [
+    "aiohttp>=3.13.3",
+    "tiktoken>=0.12.0",
+    "typer>=0.21.1",
+    "rich>=14.3.2",
+    "networkx>=3.6.1",
+    "pymupdf4llm[layout,ocr]>=0.2.9",
+    "jinja2>=3.1.6",
+]

knwler-0.3.0/setup.cfg ADDED Viewed

@@ -0,0 +1,4 @@
+[egg_info]
+tag_build =
+tag_date = 0