setlr 1.0.1__tar.gz → 1.0.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (34) hide show
  1. setlr-1.0.2/CHANGELOG.md +72 -0
  2. setlr-1.0.2/MANIFEST.in +33 -0
  3. setlr-1.0.2/MIGRATION.md +166 -0
  4. setlr-1.0.2/PKG-INFO +209 -0
  5. setlr-1.0.2/README.md +177 -0
  6. setlr-1.0.2/example/ontology.csv +76 -0
  7. setlr-1.0.2/example/ontology.setl.ttl +143 -0
  8. setlr-1.0.2/example/ontology.ttl +271 -0
  9. setlr-1.0.2/example/social-naive.setl.ttl +39 -0
  10. setlr-1.0.2/example/social-naive.ttl +27 -0
  11. setlr-1.0.2/example/social.csv +5 -0
  12. setlr-1.0.2/example/social.setl.ttl +45 -0
  13. setlr-1.0.2/example/social.ttl +23 -0
  14. setlr-1.0.2/pyproject.toml +48 -0
  15. setlr-1.0.2/setlr/__init__.py +89 -0
  16. setlr-1.0.1/setlr/__init__.py → setlr-1.0.2/setlr/core.py +306 -162
  17. {setlr-1.0.1 → setlr-1.0.2}/setlr/trig_store.py +3 -5
  18. setlr-1.0.2/setlr.egg-info/PKG-INFO +209 -0
  19. setlr-1.0.2/setlr.egg-info/SOURCES.txt +26 -0
  20. {setlr-1.0.1 → setlr-1.0.2}/setlr.egg-info/requires.txt +0 -1
  21. {setlr-1.0.1 → setlr-1.0.2}/setup.cfg +0 -6
  22. setlr-1.0.2/setup.py +11 -0
  23. setlr-1.0.1/PKG-INFO +0 -34
  24. setlr-1.0.1/README.md +0 -15
  25. setlr-1.0.1/setlr/sqlite-store.py +0 -0
  26. setlr-1.0.1/setlr.egg-info/PKG-INFO +0 -34
  27. setlr-1.0.1/setlr.egg-info/SOURCES.txt +0 -15
  28. setlr-1.0.1/setlr.egg-info/pbr.json +0 -1
  29. setlr-1.0.1/setup.py +0 -60
  30. {setlr-1.0.1 → setlr-1.0.2}/LICENSE +0 -0
  31. {setlr-1.0.1 → setlr-1.0.2}/setlr/iterparse_filter.py +0 -0
  32. {setlr-1.0.1 → setlr-1.0.2}/setlr.egg-info/dependency_links.txt +0 -0
  33. {setlr-1.0.1 → setlr-1.0.2}/setlr.egg-info/entry_points.txt +0 -0
  34. {setlr-1.0.1 → setlr-1.0.2}/setlr.egg-info/top_level.txt +0 -0
@@ -0,0 +1,72 @@
1
+ # Changelog
2
+
3
+ All notable changes to this project will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+
8
+ ## [Unreleased]
9
+
10
+ ## [1.0.2] - 2026-01-18
11
+
12
+ ### Changed
13
+ - Migrated from `setup.py` to `pyproject.toml` following PEP 517/518 standards for modern Python packaging
14
+ - Restructured codebase: moved implementation from `setlr/__init__.py` to `setlr/core.py` (~1020 lines)
15
+ - `setlr/__init__.py` now serves as a clean public API interface (~90 lines)
16
+
17
+ ### Added
18
+ - New public API function `run_setl()` with comprehensive documentation and type hints
19
+ - Proper deprecation warning for `_setl()` function (still available for backward compatibility)
20
+ - Improved error messages for NaN/missing values (now displays `<empty/missing>` instead of `nan`)
21
+ - Extended JSON error context from 4 to 8 lines before error for better debugging
22
+ - Comprehensive API documentation with usage examples
23
+ - Development scripts for bootstrap, build, and release
24
+ - GitHub Actions workflows for automated testing and linting
25
+ - Migration documentation (MIGRATION.md)
26
+
27
+ ### Fixed
28
+ - Improved error reporting for missing data scenarios
29
+ - Better context display for JSON syntax errors in templates
30
+ - Python version compatibility for JSON error handling
31
+
32
+ ## [1.0.1] - 2024-08-09
33
+
34
+ ### Changed
35
+ - Moved version information from `_version.py` directly into `setup.py`
36
+ - Modified `setup.py` to support `--version` flag
37
+
38
+ ### Fixed
39
+ - Fixed SHACL constraint in ontology example (changed `sh:minCount` from 1 to 0 for `rdfs:subClassOf`)
40
+
41
+ ## [1.0.0] - 2024-04-29
42
+
43
+ ### Added
44
+ - Initial stable release of setlr
45
+ - Core SETL (Semantic Extract, Transform, Load) functionality
46
+ - Support for generating RDF graphs from tabular data
47
+ - CLI tool via `setlr` command
48
+ - Data source readers: CSV, Excel, JSON, XML, and RDF graphs
49
+ - Template-based transformation using Jinja2
50
+ - Named graph support via ConjunctiveGraph
51
+ - RDF namespaces: csvw, ov, setl, prov, pv, sp, sd, dc, void, shacl
52
+ - Utility functions: `extract()`, `transform()`, `load()`, `hash()`, `camelcase()`
53
+ - SHACL validation support with pyshacl[js]
54
+ - Python 3.8+ support
55
+ - Comprehensive test suite
56
+
57
+ ### Dependencies
58
+ - rdflib >= 6.0.0
59
+ - pandas >= 0.23.0
60
+ - jinja2
61
+ - click (CLI support)
62
+ - tqdm (progress bars)
63
+ - pyshacl[js] (validation)
64
+ - beautifulsoup4, lxml (XML/HTML parsing)
65
+ - requests (HTTP support)
66
+ - toposort (dependency ordering)
67
+ - Other utility libraries: numpy, xlrd, ijson, python-slugify
68
+
69
+ [Unreleased]: https://github.com/tetherless-world/setlr/compare/v1.0.2...HEAD
70
+ [1.0.2]: https://github.com/tetherless-world/setlr/compare/v1.0.1...v1.0.2
71
+ [1.0.1]: https://github.com/tetherless-world/setlr/compare/v1.0.0...v1.0.1
72
+ [1.0.0]: https://github.com/tetherless-world/setlr/releases/tag/v1.0.0
@@ -0,0 +1,33 @@
1
+ # Include important files
2
+ include README.md
3
+ include LICENSE
4
+ include CHANGELOG.md
5
+ include MIGRATION.md
6
+ include pyproject.toml
7
+ include setup.py
8
+ include setup.cfg
9
+
10
+ # Include example files
11
+ recursive-include example *.csv *.ttl *.setl.ttl
12
+
13
+ # Exclude unwanted files and directories
14
+ global-exclude __pycache__
15
+ global-exclude *.py[cod]
16
+ global-exclude *.so
17
+ global-exclude .DS_Store
18
+ global-exclude *.egg-info
19
+ recursive-exclude * __pycache__
20
+ recursive-exclude * *.py[cod]
21
+
22
+ # Exclude test files
23
+ prune tests
24
+ prune .github
25
+ prune .circleci
26
+ prune script
27
+ prune docs/_build
28
+
29
+ # Exclude development files
30
+ exclude .gitignore
31
+ exclude .pylintrc
32
+ exclude unittest.cfg
33
+ exclude IMPROVEMENT_SUMMARY.md
@@ -0,0 +1,166 @@
1
+ # Migration to pyproject.toml and API Improvements
2
+
3
+ This document describes the changes made to migrate the project to modern Python packaging standards and improve the API.
4
+
5
+ ## Changes Made
6
+
7
+ ### 1. Migration to pyproject.toml
8
+
9
+ The project has been migrated from `setup.py` to `pyproject.toml`, following PEP 517/518 standards for modern Python packaging.
10
+
11
+ - **New file**: `pyproject.toml` - Contains all project metadata, dependencies, and build configuration
12
+ - **Status of setup.py**: The old `setup.py` file is still present for compatibility but is no longer the primary packaging configuration
13
+
14
+ ### 2. Code Restructuring
15
+
16
+ The implementation code has been moved from `setlr/__init__.py` to `setlr/core.py` following best practices:
17
+
18
+ - **setlr/core.py**: Contains all implementation code (916+ lines)
19
+ - **setlr/__init__.py**: Now serves as a clean public API interface (~90 lines)
20
+
21
+ This separation provides:
22
+ - Better code organization
23
+ - Clearer public API surface
24
+ - Easier maintenance
25
+ - Improved IDE support and code navigation
26
+
27
+ ### 3. New Public API: `run_setl()`
28
+
29
+ A new, well-documented public function `run_setl()` has been introduced:
30
+
31
+ ```python
32
+ from rdflib import ConjunctiveGraph
33
+ from setlr import run_setl
34
+
35
+ # Load a SETL script
36
+ setl_graph = ConjunctiveGraph()
37
+ setl_graph.parse("my_script.setl.ttl", format="turtle")
38
+
39
+ # Execute the script
40
+ resources = run_setl(setl_graph)
41
+
42
+ # Access generated resources
43
+ output_graph = resources['http://example.com/output']
44
+ ```
45
+
46
+ **Features:**
47
+ - Comprehensive docstring with examples
48
+ - Proper type hints in documentation
49
+ - Clear description of parameters and return values
50
+ - Usage examples
51
+
52
+ ### 4. Backward Compatibility
53
+
54
+ The old `_setl()` function is still available for backward compatibility:
55
+
56
+ ```python
57
+ from setlr import _setl # Still works, but deprecated
58
+
59
+ # Old code continues to work
60
+ resources = _setl(setl_graph)
61
+ ```
62
+
63
+ **Deprecation Warning:**
64
+ - Using `_setl()` will emit a `DeprecationWarning`
65
+ - The warning suggests using `run_setl()` instead
66
+ - No breaking changes - existing code continues to work
67
+
68
+ ### 5. Exported API
69
+
70
+ The following are now officially exported from the `setlr` package:
71
+
72
+ **Main Functions:**
73
+ - `run_setl()` - Primary API function (recommended)
74
+ - `_setl()` - Deprecated, use `run_setl()` instead
75
+ - `main()` - CLI entry point
76
+
77
+ **Utility Functions:**
78
+ - `read_csv()`, `read_excel()`, `read_json()`, `read_xml()`, `read_graph()`
79
+ - `extract()`, `json_transform()`, `transform()`, `load()`
80
+ - `isempty()`, `hash()`, `camelcase()`, `get_content()`
81
+
82
+ **Namespaces:**
83
+ - `csvw`, `ov`, `setl`, `prov`, `pv`, `sp`, `sd`, `dc`, `void`, `shacl`, `api_vocab`
84
+
85
+ ## Migration Guide for Users
86
+
87
+ ### If you were using `_setl()`:
88
+
89
+ **Before:**
90
+ ```python
91
+ from setlr import _setl
92
+
93
+ resources = _setl(setl_graph)
94
+ ```
95
+
96
+ **After (recommended):**
97
+ ```python
98
+ from setlr import run_setl
99
+
100
+ resources = run_setl(setl_graph)
101
+ ```
102
+
103
+ **Note:** Your old code will continue to work, but you'll see a deprecation warning. Update at your convenience.
104
+
105
+ ### If you were importing internal functions:
106
+
107
+ **Before:**
108
+ ```python
109
+ from setlr import read_csv, extract
110
+ ```
111
+
112
+ **After:**
113
+ ```python
114
+ from setlr import read_csv, extract # Still works!
115
+ ```
116
+
117
+ No changes needed - all utility functions are properly exported.
118
+
119
+ ## For Package Maintainers
120
+
121
+ ### Building the Package
122
+
123
+ With pyproject.toml, you can now build the package using modern tools:
124
+
125
+ ```bash
126
+ # Install build tool
127
+ pip install build
128
+
129
+ # Build the package
130
+ python -m build
131
+ ```
132
+
133
+ This creates both wheel and source distributions in the `dist/` directory.
134
+
135
+ ### Installing from Source
136
+
137
+ ```bash
138
+ # Development installation
139
+ pip install -e .
140
+
141
+ # Regular installation
142
+ pip install .
143
+ ```
144
+
145
+ ### Running Tests
146
+
147
+ ```bash
148
+ # Install test dependencies
149
+ pip install nose2 coverage
150
+
151
+ # Run tests
152
+ nose2 --verbose
153
+ ```
154
+
155
+ ## Benefits of This Migration
156
+
157
+ 1. **Modern Standards**: Uses PEP 517/518 standards for Python packaging
158
+ 2. **Better Documentation**: Clear, comprehensive API documentation
159
+ 3. **Improved Structure**: Cleaner separation between public API and implementation
160
+ 4. **Backward Compatible**: No breaking changes for existing users
161
+ 5. **Future-Proof**: Follows current Python best practices
162
+ 6. **Better IDE Support**: Clearer module structure aids code completion and navigation
163
+
164
+ ## Questions or Issues?
165
+
166
+ If you encounter any issues with the migration or have questions about the new API, please open an issue on GitHub.
setlr-1.0.2/PKG-INFO ADDED
@@ -0,0 +1,209 @@
1
+ Metadata-Version: 2.4
2
+ Name: setlr
3
+ Version: 1.0.2
4
+ Summary: setlr is a tool for Semantic Extraction, Transformation, and Loading.
5
+ Author-email: Jamie McCusker <mccusj@cs.rpi.edu>
6
+ Project-URL: Homepage, http://packages.python.org/setlr
7
+ Keywords: rdf,semantic,etl
8
+ Classifier: Development Status :: 5 - Production/Stable
9
+ Classifier: Topic :: Utilities
10
+ Requires-Python: >=3.8
11
+ Description-Content-Type: text/markdown
12
+ License-File: LICENSE
13
+ Requires-Dist: future
14
+ Requires-Dist: cython
15
+ Requires-Dist: numpy
16
+ Requires-Dist: rdflib>=6.0.0
17
+ Requires-Dist: pandas>=0.23.0
18
+ Requires-Dist: requests
19
+ Requires-Dist: toposort
20
+ Requires-Dist: beautifulsoup4
21
+ Requires-Dist: jinja2
22
+ Requires-Dist: lxml
23
+ Requires-Dist: six
24
+ Requires-Dist: xlrd
25
+ Requires-Dist: ijson
26
+ Requires-Dist: click
27
+ Requires-Dist: tqdm
28
+ Requires-Dist: requests-testadapter
29
+ Requires-Dist: python-slugify
30
+ Requires-Dist: pyshacl[js]
31
+ Dynamic: license-file
32
+
33
+ # setlr: Semantic Extract, Transform and Load
34
+
35
+ [![Unit Tests](https://github.com/tetherless-world/setlr/actions/workflows/test.yml/badge.svg)](https://github.com/tetherless-world/setlr/actions/workflows/test.yml)
36
+ [![Lint](https://github.com/tetherless-world/setlr/actions/workflows/lint.yml/badge.svg)](https://github.com/tetherless-world/setlr/actions/workflows/lint.yml)
37
+ [![codecov](https://codecov.io/gh/tetherless-world/setlr/branch/main/graph/badge.svg)](https://codecov.io/gh/tetherless-world/setlr)
38
+
39
+ **SETLr** is a powerful Python tool for generating RDF graphs from tabular data using declarative SETL (Semantic Extract, Transform, Load) scripts.
40
+
41
+ ## Features
42
+
43
+ ✨ **Multiple Data Sources**: CSV, Excel, JSON, XML, RDF, SAS files
44
+ 🔄 **Flexible Transformations**: JSON-LD templates with Jinja2, Python functions, SPARQL
45
+ ⚡ **High Performance**: Streaming XML parsing, pandas DataFrames, progress tracking
46
+ 🐍 **Python Integration**: Use as library or CLI tool
47
+ ✅ **Validation**: Built-in SHACL validation
48
+ 📝 **Well Documented**: Comprehensive guides and API reference
49
+
50
+ ## Quick Start
51
+
52
+ ### Installation
53
+
54
+ ```bash
55
+ pip install setlr
56
+ ```
57
+
58
+ ### Simple Example
59
+
60
+ Create `data.csv`:
61
+ ```csv
62
+ ID,Name,Email
63
+ 1,Alice,alice@example.com
64
+ 2,Bob,bob@example.com
65
+ ```
66
+
67
+ Create `transform.setl.ttl`:
68
+ ```turtle
69
+ @prefix setl: <http://purl.org/twc/vocab/setl/> .
70
+ @prefix prov: <http://www.w3.org/ns/prov#> .
71
+ @prefix csvw: <http://www.w3.org/ns/csvw#> .
72
+ @prefix void: <http://rdfs.org/ns/void#> .
73
+ @prefix : <http://example.com/> .
74
+
75
+ :table a csvw:Table, setl:Table ;
76
+ prov:wasGeneratedBy [ a setl:Extract ; prov:used <data.csv> ] .
77
+
78
+ :output a void:Dataset ;
79
+ prov:wasGeneratedBy [
80
+ a setl:Transform, setl:JSLDT ;
81
+ prov:used :table ;
82
+ prov:value '''[{
83
+ "@id": "http://example.com/person/{{row.ID}}",
84
+ "@type": "http://xmlns.com/foaf/0.1/Person",
85
+ "http://xmlns.com/foaf/0.1/name": "{{row.Name}}",
86
+ "http://xmlns.com/foaf/0.1/mbox": "mailto:{{row.Email}}"
87
+ }]'''
88
+ ] .
89
+ ```
90
+
91
+ Run SETLr:
92
+ ```bash
93
+ setlr transform.setl.ttl
94
+ ```
95
+
96
+ ### Using from Python
97
+
98
+ ```python
99
+ from rdflib import Graph, URIRef
100
+ import setlr
101
+
102
+ # Load SETL script
103
+ setl_graph = Graph()
104
+ setl_graph.parse("transform.setl.ttl", format="turtle")
105
+
106
+ # Execute ETL pipeline
107
+ resources = setlr.run_setl(setl_graph)
108
+
109
+ # Access generated RDF
110
+ output = resources[URIRef('http://example.com/output')]
111
+ print(f"Generated {len(output)} RDF triples")
112
+ ```
113
+
114
+ ## Documentation
115
+
116
+ 📚 **[Complete Documentation](docs/README.md)** - Full guides and references
117
+
118
+ **Quick Links:**
119
+ - [Tutorial](docs/tutorial.md) - Step-by-step guide to SETLr
120
+ - [JSLDT Template Language](docs/jsldt.md) - Transform syntax reference
121
+ - [Python API](docs/python-api.md) - Using SETLr from Python
122
+ - [Quick Start](docs/quickstart.md) - Get started in 5 minutes
123
+ - [Examples](docs/examples.md) - Real-world examples
124
+
125
+ **Advanced Topics:**
126
+ - [Streaming XML with XPath](docs/streaming-xml.md) - Efficient large file processing
127
+ - [Python Functions](docs/python-functions.md) - Custom Python transforms
128
+ - [SPARQL Support](docs/sparql.md) - Query and update endpoints
129
+ - [SHACL Validation](docs/shacl.md) - Validate your RDF output
130
+
131
+ ## Key Concepts
132
+
133
+ SETLr uses RDF (with PROV-O vocabulary) to describe ETL workflows:
134
+
135
+ 1. **Extract**: Load data from sources (CSV, Excel, JSON, XML, RDF, SAS)
136
+ 2. **Transform**: Apply templates or Python scripts to generate RDF
137
+ 3. **Load**: Save to files or SPARQL endpoints
138
+
139
+ ## Supported Formats
140
+
141
+ **Input:**
142
+ - Tabular: CSV, TSV, Excel (XLS/XLSX), SAS (XPORT/SAS7BDAT)
143
+ - Structured: JSON (with ijson selectors), XML (with XPath streaming)
144
+ - Semantic: RDF (Turtle, JSON-LD, RDF/XML, etc.), OWL Ontologies
145
+
146
+ **Output:**
147
+ - RDF: Turtle, TriG, N-Triples, N3, RDF/XML, JSON-LD
148
+ - Destinations: Files, SPARQL Update endpoints
149
+
150
+ ## Examples
151
+
152
+ See the [examples/](example/) directory for complete working examples:
153
+
154
+ - `social.setl.ttl` - Basic CSV to RDF with conditionals and loops
155
+ - `ontology.setl.ttl` - OWL ontology transformation with SHACL shapes
156
+
157
+ ## Development
158
+
159
+ ```bash
160
+ # Clone repository
161
+ git clone https://github.com/tetherless-world/setlr.git
162
+ cd setlr
163
+
164
+ # Bootstrap (creates venv and installs dependencies)
165
+ ./script/bootstrap
166
+
167
+ # Activate virtual environment
168
+ source venv/bin/activate
169
+
170
+ # Run tests
171
+ ./script/build
172
+
173
+ # Run linter
174
+ flake8 setlr/
175
+ ```
176
+
177
+ ## Contributing
178
+
179
+ Contributions are welcome! Please see our [Contributing Guide](CONTRIBUTING.md) for details on:
180
+ - Development setup and workflow
181
+ - Code standards and style guidelines
182
+ - Testing requirements
183
+ - Pull request process
184
+
185
+ Please note that this project follows a [Code of Conduct](CODE_OF_CONDUCT.md). By participating, you are expected to uphold this code.
186
+
187
+ ## License
188
+
189
+ Apache License 2.0 - see [LICENSE](LICENSE) file for details.
190
+
191
+ ## Citation
192
+
193
+ If you use SETLr in your research, please cite:
194
+
195
+ ```bibtex
196
+ @software{setlr,
197
+ title = {SETLr: Semantic Extract, Transform and Load},
198
+ author = {McCusker, Jamie},
199
+ year = {2024},
200
+ url = {https://github.com/tetherless-world/setlr}
201
+ }
202
+ ```
203
+
204
+ ## Support
205
+
206
+ - 📖 [Documentation](docs/README.md)
207
+ - 🐛 [Issue Tracker](https://github.com/tetherless-world/setlr/issues)
208
+ - 💬 [Discussions](https://github.com/tetherless-world/setlr/discussions)
209
+ - 🔒 [Security Policy](SECURITY.md) - Report security vulnerabilities
setlr-1.0.2/README.md ADDED
@@ -0,0 +1,177 @@
1
+ # setlr: Semantic Extract, Transform and Load
2
+
3
+ [![Unit Tests](https://github.com/tetherless-world/setlr/actions/workflows/test.yml/badge.svg)](https://github.com/tetherless-world/setlr/actions/workflows/test.yml)
4
+ [![Lint](https://github.com/tetherless-world/setlr/actions/workflows/lint.yml/badge.svg)](https://github.com/tetherless-world/setlr/actions/workflows/lint.yml)
5
+ [![codecov](https://codecov.io/gh/tetherless-world/setlr/branch/main/graph/badge.svg)](https://codecov.io/gh/tetherless-world/setlr)
6
+
7
+ **SETLr** is a powerful Python tool for generating RDF graphs from tabular data using declarative SETL (Semantic Extract, Transform, Load) scripts.
8
+
9
+ ## Features
10
+
11
+ ✨ **Multiple Data Sources**: CSV, Excel, JSON, XML, RDF, SAS files
12
+ 🔄 **Flexible Transformations**: JSON-LD templates with Jinja2, Python functions, SPARQL
13
+ ⚡ **High Performance**: Streaming XML parsing, pandas DataFrames, progress tracking
14
+ 🐍 **Python Integration**: Use as library or CLI tool
15
+ ✅ **Validation**: Built-in SHACL validation
16
+ 📝 **Well Documented**: Comprehensive guides and API reference
17
+
18
+ ## Quick Start
19
+
20
+ ### Installation
21
+
22
+ ```bash
23
+ pip install setlr
24
+ ```
25
+
26
+ ### Simple Example
27
+
28
+ Create `data.csv`:
29
+ ```csv
30
+ ID,Name,Email
31
+ 1,Alice,alice@example.com
32
+ 2,Bob,bob@example.com
33
+ ```
34
+
35
+ Create `transform.setl.ttl`:
36
+ ```turtle
37
+ @prefix setl: <http://purl.org/twc/vocab/setl/> .
38
+ @prefix prov: <http://www.w3.org/ns/prov#> .
39
+ @prefix csvw: <http://www.w3.org/ns/csvw#> .
40
+ @prefix void: <http://rdfs.org/ns/void#> .
41
+ @prefix : <http://example.com/> .
42
+
43
+ :table a csvw:Table, setl:Table ;
44
+ prov:wasGeneratedBy [ a setl:Extract ; prov:used <data.csv> ] .
45
+
46
+ :output a void:Dataset ;
47
+ prov:wasGeneratedBy [
48
+ a setl:Transform, setl:JSLDT ;
49
+ prov:used :table ;
50
+ prov:value '''[{
51
+ "@id": "http://example.com/person/{{row.ID}}",
52
+ "@type": "http://xmlns.com/foaf/0.1/Person",
53
+ "http://xmlns.com/foaf/0.1/name": "{{row.Name}}",
54
+ "http://xmlns.com/foaf/0.1/mbox": "mailto:{{row.Email}}"
55
+ }]'''
56
+ ] .
57
+ ```
58
+
59
+ Run SETLr:
60
+ ```bash
61
+ setlr transform.setl.ttl
62
+ ```
63
+
64
+ ### Using from Python
65
+
66
+ ```python
67
+ from rdflib import Graph, URIRef
68
+ import setlr
69
+
70
+ # Load SETL script
71
+ setl_graph = Graph()
72
+ setl_graph.parse("transform.setl.ttl", format="turtle")
73
+
74
+ # Execute ETL pipeline
75
+ resources = setlr.run_setl(setl_graph)
76
+
77
+ # Access generated RDF
78
+ output = resources[URIRef('http://example.com/output')]
79
+ print(f"Generated {len(output)} RDF triples")
80
+ ```
81
+
82
+ ## Documentation
83
+
84
+ 📚 **[Complete Documentation](docs/README.md)** - Full guides and references
85
+
86
+ **Quick Links:**
87
+ - [Tutorial](docs/tutorial.md) - Step-by-step guide to SETLr
88
+ - [JSLDT Template Language](docs/jsldt.md) - Transform syntax reference
89
+ - [Python API](docs/python-api.md) - Using SETLr from Python
90
+ - [Quick Start](docs/quickstart.md) - Get started in 5 minutes
91
+ - [Examples](docs/examples.md) - Real-world examples
92
+
93
+ **Advanced Topics:**
94
+ - [Streaming XML with XPath](docs/streaming-xml.md) - Efficient large file processing
95
+ - [Python Functions](docs/python-functions.md) - Custom Python transforms
96
+ - [SPARQL Support](docs/sparql.md) - Query and update endpoints
97
+ - [SHACL Validation](docs/shacl.md) - Validate your RDF output
98
+
99
+ ## Key Concepts
100
+
101
+ SETLr uses RDF (with PROV-O vocabulary) to describe ETL workflows:
102
+
103
+ 1. **Extract**: Load data from sources (CSV, Excel, JSON, XML, RDF, SAS)
104
+ 2. **Transform**: Apply templates or Python scripts to generate RDF
105
+ 3. **Load**: Save to files or SPARQL endpoints
106
+
107
+ ## Supported Formats
108
+
109
+ **Input:**
110
+ - Tabular: CSV, TSV, Excel (XLS/XLSX), SAS (XPORT/SAS7BDAT)
111
+ - Structured: JSON (with ijson selectors), XML (with XPath streaming)
112
+ - Semantic: RDF (Turtle, JSON-LD, RDF/XML, etc.), OWL Ontologies
113
+
114
+ **Output:**
115
+ - RDF: Turtle, TriG, N-Triples, N3, RDF/XML, JSON-LD
116
+ - Destinations: Files, SPARQL Update endpoints
117
+
118
+ ## Examples
119
+
120
+ See the [examples/](example/) directory for complete working examples:
121
+
122
+ - `social.setl.ttl` - Basic CSV to RDF with conditionals and loops
123
+ - `ontology.setl.ttl` - OWL ontology transformation with SHACL shapes
124
+
125
+ ## Development
126
+
127
+ ```bash
128
+ # Clone repository
129
+ git clone https://github.com/tetherless-world/setlr.git
130
+ cd setlr
131
+
132
+ # Bootstrap (creates venv and installs dependencies)
133
+ ./script/bootstrap
134
+
135
+ # Activate virtual environment
136
+ source venv/bin/activate
137
+
138
+ # Run tests
139
+ ./script/build
140
+
141
+ # Run linter
142
+ flake8 setlr/
143
+ ```
144
+
145
+ ## Contributing
146
+
147
+ Contributions are welcome! Please see our [Contributing Guide](CONTRIBUTING.md) for details on:
148
+ - Development setup and workflow
149
+ - Code standards and style guidelines
150
+ - Testing requirements
151
+ - Pull request process
152
+
153
+ Please note that this project follows a [Code of Conduct](CODE_OF_CONDUCT.md). By participating, you are expected to uphold this code.
154
+
155
+ ## License
156
+
157
+ Apache License 2.0 - see [LICENSE](LICENSE) file for details.
158
+
159
+ ## Citation
160
+
161
+ If you use SETLr in your research, please cite:
162
+
163
+ ```bibtex
164
+ @software{setlr,
165
+ title = {SETLr: Semantic Extract, Transform and Load},
166
+ author = {McCusker, Jamie},
167
+ year = {2024},
168
+ url = {https://github.com/tetherless-world/setlr}
169
+ }
170
+ ```
171
+
172
+ ## Support
173
+
174
+ - 📖 [Documentation](docs/README.md)
175
+ - 🐛 [Issue Tracker](https://github.com/tetherless-world/setlr/issues)
176
+ - 💬 [Discussions](https://github.com/tetherless-world/setlr/discussions)
177
+ - 🔒 [Security Policy](SECURITY.md) - Report security vulnerabilities