setlr 1.0.0__tar.gz → 1.0.2__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- setlr-1.0.2/CHANGELOG.md +72 -0
- setlr-1.0.2/MANIFEST.in +33 -0
- setlr-1.0.2/MIGRATION.md +166 -0
- setlr-1.0.2/PKG-INFO +209 -0
- setlr-1.0.2/README.md +177 -0
- setlr-1.0.2/example/ontology.csv +76 -0
- setlr-1.0.2/example/ontology.setl.ttl +143 -0
- setlr-1.0.2/example/ontology.ttl +271 -0
- setlr-1.0.2/example/social-naive.setl.ttl +39 -0
- setlr-1.0.2/example/social-naive.ttl +27 -0
- setlr-1.0.2/example/social.csv +5 -0
- setlr-1.0.2/example/social.setl.ttl +45 -0
- setlr-1.0.2/example/social.ttl +23 -0
- setlr-1.0.2/pyproject.toml +48 -0
- setlr-1.0.2/setlr/__init__.py +89 -0
- setlr-1.0.0/setlr/__init__.py → setlr-1.0.2/setlr/core.py +306 -162
- {setlr-1.0.0 → setlr-1.0.2}/setlr/trig_store.py +3 -5
- setlr-1.0.2/setlr.egg-info/PKG-INFO +209 -0
- setlr-1.0.2/setlr.egg-info/SOURCES.txt +26 -0
- {setlr-1.0.0 → setlr-1.0.2}/setlr.egg-info/requires.txt +0 -1
- {setlr-1.0.0 → setlr-1.0.2}/setup.cfg +0 -6
- setlr-1.0.2/setup.py +11 -0
- setlr-1.0.0/PKG-INFO +0 -34
- setlr-1.0.0/README.md +0 -15
- setlr-1.0.0/setlr/_version.py +0 -4
- setlr-1.0.0/setlr/sqlite-store.py +0 -0
- setlr-1.0.0/setlr.egg-info/PKG-INFO +0 -34
- setlr-1.0.0/setlr.egg-info/SOURCES.txt +0 -16
- setlr-1.0.0/setlr.egg-info/pbr.json +0 -1
- setlr-1.0.0/setup.py +0 -53
- {setlr-1.0.0 → setlr-1.0.2}/LICENSE +0 -0
- {setlr-1.0.0 → setlr-1.0.2}/setlr/iterparse_filter.py +0 -0
- {setlr-1.0.0 → setlr-1.0.2}/setlr.egg-info/dependency_links.txt +0 -0
- {setlr-1.0.0 → setlr-1.0.2}/setlr.egg-info/entry_points.txt +0 -0
- {setlr-1.0.0 → setlr-1.0.2}/setlr.egg-info/top_level.txt +0 -0
setlr-1.0.2/CHANGELOG.md
ADDED
|
@@ -0,0 +1,72 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to this project will be documented in this file.
|
|
4
|
+
|
|
5
|
+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
|
+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
|
+
|
|
8
|
+
## [Unreleased]
|
|
9
|
+
|
|
10
|
+
## [1.0.2] - 2026-01-18
|
|
11
|
+
|
|
12
|
+
### Changed
|
|
13
|
+
- Migrated from `setup.py` to `pyproject.toml` following PEP 517/518 standards for modern Python packaging
|
|
14
|
+
- Restructured codebase: moved implementation from `setlr/__init__.py` to `setlr/core.py` (~1020 lines)
|
|
15
|
+
- `setlr/__init__.py` now serves as a clean public API interface (~90 lines)
|
|
16
|
+
|
|
17
|
+
### Added
|
|
18
|
+
- New public API function `run_setl()` with comprehensive documentation and type hints
|
|
19
|
+
- Proper deprecation warning for `_setl()` function (still available for backward compatibility)
|
|
20
|
+
- Improved error messages for NaN/missing values (now displays `<empty/missing>` instead of `nan`)
|
|
21
|
+
- Extended JSON error context from 4 to 8 lines before error for better debugging
|
|
22
|
+
- Comprehensive API documentation with usage examples
|
|
23
|
+
- Development scripts for bootstrap, build, and release
|
|
24
|
+
- GitHub Actions workflows for automated testing and linting
|
|
25
|
+
- Migration documentation (MIGRATION.md)
|
|
26
|
+
|
|
27
|
+
### Fixed
|
|
28
|
+
- Improved error reporting for missing data scenarios
|
|
29
|
+
- Better context display for JSON syntax errors in templates
|
|
30
|
+
- Python version compatibility for JSON error handling
|
|
31
|
+
|
|
32
|
+
## [1.0.1] - 2024-08-09
|
|
33
|
+
|
|
34
|
+
### Changed
|
|
35
|
+
- Moved version information from `_version.py` directly into `setup.py`
|
|
36
|
+
- Modified `setup.py` to support `--version` flag
|
|
37
|
+
|
|
38
|
+
### Fixed
|
|
39
|
+
- Fixed SHACL constraint in ontology example (changed `sh:minCount` from 1 to 0 for `rdfs:subClassOf`)
|
|
40
|
+
|
|
41
|
+
## [1.0.0] - 2024-04-29
|
|
42
|
+
|
|
43
|
+
### Added
|
|
44
|
+
- Initial stable release of setlr
|
|
45
|
+
- Core SETL (Semantic Extract, Transform, Load) functionality
|
|
46
|
+
- Support for generating RDF graphs from tabular data
|
|
47
|
+
- CLI tool via `setlr` command
|
|
48
|
+
- Data source readers: CSV, Excel, JSON, XML, and RDF graphs
|
|
49
|
+
- Template-based transformation using Jinja2
|
|
50
|
+
- Named graph support via ConjunctiveGraph
|
|
51
|
+
- RDF namespaces: csvw, ov, setl, prov, pv, sp, sd, dc, void, shacl
|
|
52
|
+
- Utility functions: `extract()`, `transform()`, `load()`, `hash()`, `camelcase()`
|
|
53
|
+
- SHACL validation support with pyshacl[js]
|
|
54
|
+
- Python 3.8+ support
|
|
55
|
+
- Comprehensive test suite
|
|
56
|
+
|
|
57
|
+
### Dependencies
|
|
58
|
+
- rdflib >= 6.0.0
|
|
59
|
+
- pandas >= 0.23.0
|
|
60
|
+
- jinja2
|
|
61
|
+
- click (CLI support)
|
|
62
|
+
- tqdm (progress bars)
|
|
63
|
+
- pyshacl[js] (validation)
|
|
64
|
+
- beautifulsoup4, lxml (XML/HTML parsing)
|
|
65
|
+
- requests (HTTP support)
|
|
66
|
+
- toposort (dependency ordering)
|
|
67
|
+
- Other utility libraries: numpy, xlrd, ijson, python-slugify
|
|
68
|
+
|
|
69
|
+
[Unreleased]: https://github.com/tetherless-world/setlr/compare/v1.0.2...HEAD
|
|
70
|
+
[1.0.2]: https://github.com/tetherless-world/setlr/compare/v1.0.1...v1.0.2
|
|
71
|
+
[1.0.1]: https://github.com/tetherless-world/setlr/compare/v1.0.0...v1.0.1
|
|
72
|
+
[1.0.0]: https://github.com/tetherless-world/setlr/releases/tag/v1.0.0
|
setlr-1.0.2/MANIFEST.in
ADDED
|
@@ -0,0 +1,33 @@
|
|
|
1
|
+
# Include important files
|
|
2
|
+
include README.md
|
|
3
|
+
include LICENSE
|
|
4
|
+
include CHANGELOG.md
|
|
5
|
+
include MIGRATION.md
|
|
6
|
+
include pyproject.toml
|
|
7
|
+
include setup.py
|
|
8
|
+
include setup.cfg
|
|
9
|
+
|
|
10
|
+
# Include example files
|
|
11
|
+
recursive-include example *.csv *.ttl *.setl.ttl
|
|
12
|
+
|
|
13
|
+
# Exclude unwanted files and directories
|
|
14
|
+
global-exclude __pycache__
|
|
15
|
+
global-exclude *.py[cod]
|
|
16
|
+
global-exclude *.so
|
|
17
|
+
global-exclude .DS_Store
|
|
18
|
+
global-exclude *.egg-info
|
|
19
|
+
recursive-exclude * __pycache__
|
|
20
|
+
recursive-exclude * *.py[cod]
|
|
21
|
+
|
|
22
|
+
# Exclude test files
|
|
23
|
+
prune tests
|
|
24
|
+
prune .github
|
|
25
|
+
prune .circleci
|
|
26
|
+
prune script
|
|
27
|
+
prune docs/_build
|
|
28
|
+
|
|
29
|
+
# Exclude development files
|
|
30
|
+
exclude .gitignore
|
|
31
|
+
exclude .pylintrc
|
|
32
|
+
exclude unittest.cfg
|
|
33
|
+
exclude IMPROVEMENT_SUMMARY.md
|
setlr-1.0.2/MIGRATION.md
ADDED
|
@@ -0,0 +1,166 @@
|
|
|
1
|
+
# Migration to pyproject.toml and API Improvements
|
|
2
|
+
|
|
3
|
+
This document describes the changes made to migrate the project to modern Python packaging standards and improve the API.
|
|
4
|
+
|
|
5
|
+
## Changes Made
|
|
6
|
+
|
|
7
|
+
### 1. Migration to pyproject.toml
|
|
8
|
+
|
|
9
|
+
The project has been migrated from `setup.py` to `pyproject.toml`, following PEP 517/518 standards for modern Python packaging.
|
|
10
|
+
|
|
11
|
+
- **New file**: `pyproject.toml` - Contains all project metadata, dependencies, and build configuration
|
|
12
|
+
- **Status of setup.py**: The old `setup.py` file is still present for compatibility but is no longer the primary packaging configuration
|
|
13
|
+
|
|
14
|
+
### 2. Code Restructuring
|
|
15
|
+
|
|
16
|
+
The implementation code has been moved from `setlr/__init__.py` to `setlr/core.py` following best practices:
|
|
17
|
+
|
|
18
|
+
- **setlr/core.py**: Contains all implementation code (916+ lines)
|
|
19
|
+
- **setlr/__init__.py**: Now serves as a clean public API interface (~90 lines)
|
|
20
|
+
|
|
21
|
+
This separation provides:
|
|
22
|
+
- Better code organization
|
|
23
|
+
- Clearer public API surface
|
|
24
|
+
- Easier maintenance
|
|
25
|
+
- Improved IDE support and code navigation
|
|
26
|
+
|
|
27
|
+
### 3. New Public API: `run_setl()`
|
|
28
|
+
|
|
29
|
+
A new, well-documented public function `run_setl()` has been introduced:
|
|
30
|
+
|
|
31
|
+
```python
|
|
32
|
+
from rdflib import ConjunctiveGraph
|
|
33
|
+
from setlr import run_setl
|
|
34
|
+
|
|
35
|
+
# Load a SETL script
|
|
36
|
+
setl_graph = ConjunctiveGraph()
|
|
37
|
+
setl_graph.parse("my_script.setl.ttl", format="turtle")
|
|
38
|
+
|
|
39
|
+
# Execute the script
|
|
40
|
+
resources = run_setl(setl_graph)
|
|
41
|
+
|
|
42
|
+
# Access generated resources
|
|
43
|
+
output_graph = resources['http://example.com/output']
|
|
44
|
+
```
|
|
45
|
+
|
|
46
|
+
**Features:**
|
|
47
|
+
- Comprehensive docstring with examples
|
|
48
|
+
- Proper type hints in documentation
|
|
49
|
+
- Clear description of parameters and return values
|
|
50
|
+
- Usage examples
|
|
51
|
+
|
|
52
|
+
### 4. Backward Compatibility
|
|
53
|
+
|
|
54
|
+
The old `_setl()` function is still available for backward compatibility:
|
|
55
|
+
|
|
56
|
+
```python
|
|
57
|
+
from setlr import _setl # Still works, but deprecated
|
|
58
|
+
|
|
59
|
+
# Old code continues to work
|
|
60
|
+
resources = _setl(setl_graph)
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
**Deprecation Warning:**
|
|
64
|
+
- Using `_setl()` will emit a `DeprecationWarning`
|
|
65
|
+
- The warning suggests using `run_setl()` instead
|
|
66
|
+
- No breaking changes - existing code continues to work
|
|
67
|
+
|
|
68
|
+
### 5. Exported API
|
|
69
|
+
|
|
70
|
+
The following are now officially exported from the `setlr` package:
|
|
71
|
+
|
|
72
|
+
**Main Functions:**
|
|
73
|
+
- `run_setl()` - Primary API function (recommended)
|
|
74
|
+
- `_setl()` - Deprecated, use `run_setl()` instead
|
|
75
|
+
- `main()` - CLI entry point
|
|
76
|
+
|
|
77
|
+
**Utility Functions:**
|
|
78
|
+
- `read_csv()`, `read_excel()`, `read_json()`, `read_xml()`, `read_graph()`
|
|
79
|
+
- `extract()`, `json_transform()`, `transform()`, `load()`
|
|
80
|
+
- `isempty()`, `hash()`, `camelcase()`, `get_content()`
|
|
81
|
+
|
|
82
|
+
**Namespaces:**
|
|
83
|
+
- `csvw`, `ov`, `setl`, `prov`, `pv`, `sp`, `sd`, `dc`, `void`, `shacl`, `api_vocab`
|
|
84
|
+
|
|
85
|
+
## Migration Guide for Users
|
|
86
|
+
|
|
87
|
+
### If you were using `_setl()`:
|
|
88
|
+
|
|
89
|
+
**Before:**
|
|
90
|
+
```python
|
|
91
|
+
from setlr import _setl
|
|
92
|
+
|
|
93
|
+
resources = _setl(setl_graph)
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
**After (recommended):**
|
|
97
|
+
```python
|
|
98
|
+
from setlr import run_setl
|
|
99
|
+
|
|
100
|
+
resources = run_setl(setl_graph)
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
**Note:** Your old code will continue to work, but you'll see a deprecation warning. Update at your convenience.
|
|
104
|
+
|
|
105
|
+
### If you were importing internal functions:
|
|
106
|
+
|
|
107
|
+
**Before:**
|
|
108
|
+
```python
|
|
109
|
+
from setlr import read_csv, extract
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
**After:**
|
|
113
|
+
```python
|
|
114
|
+
from setlr import read_csv, extract # Still works!
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
No changes needed - all utility functions are properly exported.
|
|
118
|
+
|
|
119
|
+
## For Package Maintainers
|
|
120
|
+
|
|
121
|
+
### Building the Package
|
|
122
|
+
|
|
123
|
+
With pyproject.toml, you can now build the package using modern tools:
|
|
124
|
+
|
|
125
|
+
```bash
|
|
126
|
+
# Install build tool
|
|
127
|
+
pip install build
|
|
128
|
+
|
|
129
|
+
# Build the package
|
|
130
|
+
python -m build
|
|
131
|
+
```
|
|
132
|
+
|
|
133
|
+
This creates both wheel and source distributions in the `dist/` directory.
|
|
134
|
+
|
|
135
|
+
### Installing from Source
|
|
136
|
+
|
|
137
|
+
```bash
|
|
138
|
+
# Development installation
|
|
139
|
+
pip install -e .
|
|
140
|
+
|
|
141
|
+
# Regular installation
|
|
142
|
+
pip install .
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
### Running Tests
|
|
146
|
+
|
|
147
|
+
```bash
|
|
148
|
+
# Install test dependencies
|
|
149
|
+
pip install nose2 coverage
|
|
150
|
+
|
|
151
|
+
# Run tests
|
|
152
|
+
nose2 --verbose
|
|
153
|
+
```
|
|
154
|
+
|
|
155
|
+
## Benefits of This Migration
|
|
156
|
+
|
|
157
|
+
1. **Modern Standards**: Uses PEP 517/518 standards for Python packaging
|
|
158
|
+
2. **Better Documentation**: Clear, comprehensive API documentation
|
|
159
|
+
3. **Improved Structure**: Cleaner separation between public API and implementation
|
|
160
|
+
4. **Backward Compatible**: No breaking changes for existing users
|
|
161
|
+
5. **Future-Proof**: Follows current Python best practices
|
|
162
|
+
6. **Better IDE Support**: Clearer module structure aids code completion and navigation
|
|
163
|
+
|
|
164
|
+
## Questions or Issues?
|
|
165
|
+
|
|
166
|
+
If you encounter any issues with the migration or have questions about the new API, please open an issue on GitHub.
|
setlr-1.0.2/PKG-INFO
ADDED
|
@@ -0,0 +1,209 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: setlr
|
|
3
|
+
Version: 1.0.2
|
|
4
|
+
Summary: setlr is a tool for Semantic Extraction, Transformation, and Loading.
|
|
5
|
+
Author-email: Jamie McCusker <mccusj@cs.rpi.edu>
|
|
6
|
+
Project-URL: Homepage, http://packages.python.org/setlr
|
|
7
|
+
Keywords: rdf,semantic,etl
|
|
8
|
+
Classifier: Development Status :: 5 - Production/Stable
|
|
9
|
+
Classifier: Topic :: Utilities
|
|
10
|
+
Requires-Python: >=3.8
|
|
11
|
+
Description-Content-Type: text/markdown
|
|
12
|
+
License-File: LICENSE
|
|
13
|
+
Requires-Dist: future
|
|
14
|
+
Requires-Dist: cython
|
|
15
|
+
Requires-Dist: numpy
|
|
16
|
+
Requires-Dist: rdflib>=6.0.0
|
|
17
|
+
Requires-Dist: pandas>=0.23.0
|
|
18
|
+
Requires-Dist: requests
|
|
19
|
+
Requires-Dist: toposort
|
|
20
|
+
Requires-Dist: beautifulsoup4
|
|
21
|
+
Requires-Dist: jinja2
|
|
22
|
+
Requires-Dist: lxml
|
|
23
|
+
Requires-Dist: six
|
|
24
|
+
Requires-Dist: xlrd
|
|
25
|
+
Requires-Dist: ijson
|
|
26
|
+
Requires-Dist: click
|
|
27
|
+
Requires-Dist: tqdm
|
|
28
|
+
Requires-Dist: requests-testadapter
|
|
29
|
+
Requires-Dist: python-slugify
|
|
30
|
+
Requires-Dist: pyshacl[js]
|
|
31
|
+
Dynamic: license-file
|
|
32
|
+
|
|
33
|
+
# setlr: Semantic Extract, Transform and Load
|
|
34
|
+
|
|
35
|
+
[](https://github.com/tetherless-world/setlr/actions/workflows/test.yml)
|
|
36
|
+
[](https://github.com/tetherless-world/setlr/actions/workflows/lint.yml)
|
|
37
|
+
[](https://codecov.io/gh/tetherless-world/setlr)
|
|
38
|
+
|
|
39
|
+
**SETLr** is a powerful Python tool for generating RDF graphs from tabular data using declarative SETL (Semantic Extract, Transform, Load) scripts.
|
|
40
|
+
|
|
41
|
+
## Features
|
|
42
|
+
|
|
43
|
+
✨ **Multiple Data Sources**: CSV, Excel, JSON, XML, RDF, SAS files
|
|
44
|
+
🔄 **Flexible Transformations**: JSON-LD templates with Jinja2, Python functions, SPARQL
|
|
45
|
+
⚡ **High Performance**: Streaming XML parsing, pandas DataFrames, progress tracking
|
|
46
|
+
🐍 **Python Integration**: Use as library or CLI tool
|
|
47
|
+
✅ **Validation**: Built-in SHACL validation
|
|
48
|
+
📝 **Well Documented**: Comprehensive guides and API reference
|
|
49
|
+
|
|
50
|
+
## Quick Start
|
|
51
|
+
|
|
52
|
+
### Installation
|
|
53
|
+
|
|
54
|
+
```bash
|
|
55
|
+
pip install setlr
|
|
56
|
+
```
|
|
57
|
+
|
|
58
|
+
### Simple Example
|
|
59
|
+
|
|
60
|
+
Create `data.csv`:
|
|
61
|
+
```csv
|
|
62
|
+
ID,Name,Email
|
|
63
|
+
1,Alice,alice@example.com
|
|
64
|
+
2,Bob,bob@example.com
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
Create `transform.setl.ttl`:
|
|
68
|
+
```turtle
|
|
69
|
+
@prefix setl: <http://purl.org/twc/vocab/setl/> .
|
|
70
|
+
@prefix prov: <http://www.w3.org/ns/prov#> .
|
|
71
|
+
@prefix csvw: <http://www.w3.org/ns/csvw#> .
|
|
72
|
+
@prefix void: <http://rdfs.org/ns/void#> .
|
|
73
|
+
@prefix : <http://example.com/> .
|
|
74
|
+
|
|
75
|
+
:table a csvw:Table, setl:Table ;
|
|
76
|
+
prov:wasGeneratedBy [ a setl:Extract ; prov:used <data.csv> ] .
|
|
77
|
+
|
|
78
|
+
:output a void:Dataset ;
|
|
79
|
+
prov:wasGeneratedBy [
|
|
80
|
+
a setl:Transform, setl:JSLDT ;
|
|
81
|
+
prov:used :table ;
|
|
82
|
+
prov:value '''[{
|
|
83
|
+
"@id": "http://example.com/person/{{row.ID}}",
|
|
84
|
+
"@type": "http://xmlns.com/foaf/0.1/Person",
|
|
85
|
+
"http://xmlns.com/foaf/0.1/name": "{{row.Name}}",
|
|
86
|
+
"http://xmlns.com/foaf/0.1/mbox": "mailto:{{row.Email}}"
|
|
87
|
+
}]'''
|
|
88
|
+
] .
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
Run SETLr:
|
|
92
|
+
```bash
|
|
93
|
+
setlr transform.setl.ttl
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
### Using from Python
|
|
97
|
+
|
|
98
|
+
```python
|
|
99
|
+
from rdflib import Graph, URIRef
|
|
100
|
+
import setlr
|
|
101
|
+
|
|
102
|
+
# Load SETL script
|
|
103
|
+
setl_graph = Graph()
|
|
104
|
+
setl_graph.parse("transform.setl.ttl", format="turtle")
|
|
105
|
+
|
|
106
|
+
# Execute ETL pipeline
|
|
107
|
+
resources = setlr.run_setl(setl_graph)
|
|
108
|
+
|
|
109
|
+
# Access generated RDF
|
|
110
|
+
output = resources[URIRef('http://example.com/output')]
|
|
111
|
+
print(f"Generated {len(output)} RDF triples")
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
## Documentation
|
|
115
|
+
|
|
116
|
+
📚 **[Complete Documentation](docs/README.md)** - Full guides and references
|
|
117
|
+
|
|
118
|
+
**Quick Links:**
|
|
119
|
+
- [Tutorial](docs/tutorial.md) - Step-by-step guide to SETLr
|
|
120
|
+
- [JSLDT Template Language](docs/jsldt.md) - Transform syntax reference
|
|
121
|
+
- [Python API](docs/python-api.md) - Using SETLr from Python
|
|
122
|
+
- [Quick Start](docs/quickstart.md) - Get started in 5 minutes
|
|
123
|
+
- [Examples](docs/examples.md) - Real-world examples
|
|
124
|
+
|
|
125
|
+
**Advanced Topics:**
|
|
126
|
+
- [Streaming XML with XPath](docs/streaming-xml.md) - Efficient large file processing
|
|
127
|
+
- [Python Functions](docs/python-functions.md) - Custom Python transforms
|
|
128
|
+
- [SPARQL Support](docs/sparql.md) - Query and update endpoints
|
|
129
|
+
- [SHACL Validation](docs/shacl.md) - Validate your RDF output
|
|
130
|
+
|
|
131
|
+
## Key Concepts
|
|
132
|
+
|
|
133
|
+
SETLr uses RDF (with PROV-O vocabulary) to describe ETL workflows:
|
|
134
|
+
|
|
135
|
+
1. **Extract**: Load data from sources (CSV, Excel, JSON, XML, RDF, SAS)
|
|
136
|
+
2. **Transform**: Apply templates or Python scripts to generate RDF
|
|
137
|
+
3. **Load**: Save to files or SPARQL endpoints
|
|
138
|
+
|
|
139
|
+
## Supported Formats
|
|
140
|
+
|
|
141
|
+
**Input:**
|
|
142
|
+
- Tabular: CSV, TSV, Excel (XLS/XLSX), SAS (XPORT/SAS7BDAT)
|
|
143
|
+
- Structured: JSON (with ijson selectors), XML (with XPath streaming)
|
|
144
|
+
- Semantic: RDF (Turtle, JSON-LD, RDF/XML, etc.), OWL Ontologies
|
|
145
|
+
|
|
146
|
+
**Output:**
|
|
147
|
+
- RDF: Turtle, TriG, N-Triples, N3, RDF/XML, JSON-LD
|
|
148
|
+
- Destinations: Files, SPARQL Update endpoints
|
|
149
|
+
|
|
150
|
+
## Examples
|
|
151
|
+
|
|
152
|
+
See the [examples/](example/) directory for complete working examples:
|
|
153
|
+
|
|
154
|
+
- `social.setl.ttl` - Basic CSV to RDF with conditionals and loops
|
|
155
|
+
- `ontology.setl.ttl` - OWL ontology transformation with SHACL shapes
|
|
156
|
+
|
|
157
|
+
## Development
|
|
158
|
+
|
|
159
|
+
```bash
|
|
160
|
+
# Clone repository
|
|
161
|
+
git clone https://github.com/tetherless-world/setlr.git
|
|
162
|
+
cd setlr
|
|
163
|
+
|
|
164
|
+
# Bootstrap (creates venv and installs dependencies)
|
|
165
|
+
./script/bootstrap
|
|
166
|
+
|
|
167
|
+
# Activate virtual environment
|
|
168
|
+
source venv/bin/activate
|
|
169
|
+
|
|
170
|
+
# Run tests
|
|
171
|
+
./script/build
|
|
172
|
+
|
|
173
|
+
# Run linter
|
|
174
|
+
flake8 setlr/
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
## Contributing
|
|
178
|
+
|
|
179
|
+
Contributions are welcome! Please see our [Contributing Guide](CONTRIBUTING.md) for details on:
|
|
180
|
+
- Development setup and workflow
|
|
181
|
+
- Code standards and style guidelines
|
|
182
|
+
- Testing requirements
|
|
183
|
+
- Pull request process
|
|
184
|
+
|
|
185
|
+
Please note that this project follows a [Code of Conduct](CODE_OF_CONDUCT.md). By participating, you are expected to uphold this code.
|
|
186
|
+
|
|
187
|
+
## License
|
|
188
|
+
|
|
189
|
+
Apache License 2.0 - see [LICENSE](LICENSE) file for details.
|
|
190
|
+
|
|
191
|
+
## Citation
|
|
192
|
+
|
|
193
|
+
If you use SETLr in your research, please cite:
|
|
194
|
+
|
|
195
|
+
```bibtex
|
|
196
|
+
@software{setlr,
|
|
197
|
+
title = {SETLr: Semantic Extract, Transform and Load},
|
|
198
|
+
author = {McCusker, Jamie},
|
|
199
|
+
year = {2024},
|
|
200
|
+
url = {https://github.com/tetherless-world/setlr}
|
|
201
|
+
}
|
|
202
|
+
```
|
|
203
|
+
|
|
204
|
+
## Support
|
|
205
|
+
|
|
206
|
+
- 📖 [Documentation](docs/README.md)
|
|
207
|
+
- 🐛 [Issue Tracker](https://github.com/tetherless-world/setlr/issues)
|
|
208
|
+
- 💬 [Discussions](https://github.com/tetherless-world/setlr/discussions)
|
|
209
|
+
- 🔒 [Security Policy](SECURITY.md) - Report security vulnerabilities
|
setlr-1.0.2/README.md
ADDED
|
@@ -0,0 +1,177 @@
|
|
|
1
|
+
# setlr: Semantic Extract, Transform and Load
|
|
2
|
+
|
|
3
|
+
[](https://github.com/tetherless-world/setlr/actions/workflows/test.yml)
|
|
4
|
+
[](https://github.com/tetherless-world/setlr/actions/workflows/lint.yml)
|
|
5
|
+
[](https://codecov.io/gh/tetherless-world/setlr)
|
|
6
|
+
|
|
7
|
+
**SETLr** is a powerful Python tool for generating RDF graphs from tabular data using declarative SETL (Semantic Extract, Transform, Load) scripts.
|
|
8
|
+
|
|
9
|
+
## Features
|
|
10
|
+
|
|
11
|
+
✨ **Multiple Data Sources**: CSV, Excel, JSON, XML, RDF, SAS files
|
|
12
|
+
🔄 **Flexible Transformations**: JSON-LD templates with Jinja2, Python functions, SPARQL
|
|
13
|
+
⚡ **High Performance**: Streaming XML parsing, pandas DataFrames, progress tracking
|
|
14
|
+
🐍 **Python Integration**: Use as library or CLI tool
|
|
15
|
+
✅ **Validation**: Built-in SHACL validation
|
|
16
|
+
📝 **Well Documented**: Comprehensive guides and API reference
|
|
17
|
+
|
|
18
|
+
## Quick Start
|
|
19
|
+
|
|
20
|
+
### Installation
|
|
21
|
+
|
|
22
|
+
```bash
|
|
23
|
+
pip install setlr
|
|
24
|
+
```
|
|
25
|
+
|
|
26
|
+
### Simple Example
|
|
27
|
+
|
|
28
|
+
Create `data.csv`:
|
|
29
|
+
```csv
|
|
30
|
+
ID,Name,Email
|
|
31
|
+
1,Alice,alice@example.com
|
|
32
|
+
2,Bob,bob@example.com
|
|
33
|
+
```
|
|
34
|
+
|
|
35
|
+
Create `transform.setl.ttl`:
|
|
36
|
+
```turtle
|
|
37
|
+
@prefix setl: <http://purl.org/twc/vocab/setl/> .
|
|
38
|
+
@prefix prov: <http://www.w3.org/ns/prov#> .
|
|
39
|
+
@prefix csvw: <http://www.w3.org/ns/csvw#> .
|
|
40
|
+
@prefix void: <http://rdfs.org/ns/void#> .
|
|
41
|
+
@prefix : <http://example.com/> .
|
|
42
|
+
|
|
43
|
+
:table a csvw:Table, setl:Table ;
|
|
44
|
+
prov:wasGeneratedBy [ a setl:Extract ; prov:used <data.csv> ] .
|
|
45
|
+
|
|
46
|
+
:output a void:Dataset ;
|
|
47
|
+
prov:wasGeneratedBy [
|
|
48
|
+
a setl:Transform, setl:JSLDT ;
|
|
49
|
+
prov:used :table ;
|
|
50
|
+
prov:value '''[{
|
|
51
|
+
"@id": "http://example.com/person/{{row.ID}}",
|
|
52
|
+
"@type": "http://xmlns.com/foaf/0.1/Person",
|
|
53
|
+
"http://xmlns.com/foaf/0.1/name": "{{row.Name}}",
|
|
54
|
+
"http://xmlns.com/foaf/0.1/mbox": "mailto:{{row.Email}}"
|
|
55
|
+
}]'''
|
|
56
|
+
] .
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
Run SETLr:
|
|
60
|
+
```bash
|
|
61
|
+
setlr transform.setl.ttl
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
### Using from Python
|
|
65
|
+
|
|
66
|
+
```python
|
|
67
|
+
from rdflib import Graph, URIRef
|
|
68
|
+
import setlr
|
|
69
|
+
|
|
70
|
+
# Load SETL script
|
|
71
|
+
setl_graph = Graph()
|
|
72
|
+
setl_graph.parse("transform.setl.ttl", format="turtle")
|
|
73
|
+
|
|
74
|
+
# Execute ETL pipeline
|
|
75
|
+
resources = setlr.run_setl(setl_graph)
|
|
76
|
+
|
|
77
|
+
# Access generated RDF
|
|
78
|
+
output = resources[URIRef('http://example.com/output')]
|
|
79
|
+
print(f"Generated {len(output)} RDF triples")
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
## Documentation
|
|
83
|
+
|
|
84
|
+
📚 **[Complete Documentation](docs/README.md)** - Full guides and references
|
|
85
|
+
|
|
86
|
+
**Quick Links:**
|
|
87
|
+
- [Tutorial](docs/tutorial.md) - Step-by-step guide to SETLr
|
|
88
|
+
- [JSLDT Template Language](docs/jsldt.md) - Transform syntax reference
|
|
89
|
+
- [Python API](docs/python-api.md) - Using SETLr from Python
|
|
90
|
+
- [Quick Start](docs/quickstart.md) - Get started in 5 minutes
|
|
91
|
+
- [Examples](docs/examples.md) - Real-world examples
|
|
92
|
+
|
|
93
|
+
**Advanced Topics:**
|
|
94
|
+
- [Streaming XML with XPath](docs/streaming-xml.md) - Efficient large file processing
|
|
95
|
+
- [Python Functions](docs/python-functions.md) - Custom Python transforms
|
|
96
|
+
- [SPARQL Support](docs/sparql.md) - Query and update endpoints
|
|
97
|
+
- [SHACL Validation](docs/shacl.md) - Validate your RDF output
|
|
98
|
+
|
|
99
|
+
## Key Concepts
|
|
100
|
+
|
|
101
|
+
SETLr uses RDF (with PROV-O vocabulary) to describe ETL workflows:
|
|
102
|
+
|
|
103
|
+
1. **Extract**: Load data from sources (CSV, Excel, JSON, XML, RDF, SAS)
|
|
104
|
+
2. **Transform**: Apply templates or Python scripts to generate RDF
|
|
105
|
+
3. **Load**: Save to files or SPARQL endpoints
|
|
106
|
+
|
|
107
|
+
## Supported Formats
|
|
108
|
+
|
|
109
|
+
**Input:**
|
|
110
|
+
- Tabular: CSV, TSV, Excel (XLS/XLSX), SAS (XPORT/SAS7BDAT)
|
|
111
|
+
- Structured: JSON (with ijson selectors), XML (with XPath streaming)
|
|
112
|
+
- Semantic: RDF (Turtle, JSON-LD, RDF/XML, etc.), OWL Ontologies
|
|
113
|
+
|
|
114
|
+
**Output:**
|
|
115
|
+
- RDF: Turtle, TriG, N-Triples, N3, RDF/XML, JSON-LD
|
|
116
|
+
- Destinations: Files, SPARQL Update endpoints
|
|
117
|
+
|
|
118
|
+
## Examples
|
|
119
|
+
|
|
120
|
+
See the [examples/](example/) directory for complete working examples:
|
|
121
|
+
|
|
122
|
+
- `social.setl.ttl` - Basic CSV to RDF with conditionals and loops
|
|
123
|
+
- `ontology.setl.ttl` - OWL ontology transformation with SHACL shapes
|
|
124
|
+
|
|
125
|
+
## Development
|
|
126
|
+
|
|
127
|
+
```bash
|
|
128
|
+
# Clone repository
|
|
129
|
+
git clone https://github.com/tetherless-world/setlr.git
|
|
130
|
+
cd setlr
|
|
131
|
+
|
|
132
|
+
# Bootstrap (creates venv and installs dependencies)
|
|
133
|
+
./script/bootstrap
|
|
134
|
+
|
|
135
|
+
# Activate virtual environment
|
|
136
|
+
source venv/bin/activate
|
|
137
|
+
|
|
138
|
+
# Run tests
|
|
139
|
+
./script/build
|
|
140
|
+
|
|
141
|
+
# Run linter
|
|
142
|
+
flake8 setlr/
|
|
143
|
+
```
|
|
144
|
+
|
|
145
|
+
## Contributing
|
|
146
|
+
|
|
147
|
+
Contributions are welcome! Please see our [Contributing Guide](CONTRIBUTING.md) for details on:
|
|
148
|
+
- Development setup and workflow
|
|
149
|
+
- Code standards and style guidelines
|
|
150
|
+
- Testing requirements
|
|
151
|
+
- Pull request process
|
|
152
|
+
|
|
153
|
+
Please note that this project follows a [Code of Conduct](CODE_OF_CONDUCT.md). By participating, you are expected to uphold this code.
|
|
154
|
+
|
|
155
|
+
## License
|
|
156
|
+
|
|
157
|
+
Apache License 2.0 - see [LICENSE](LICENSE) file for details.
|
|
158
|
+
|
|
159
|
+
## Citation
|
|
160
|
+
|
|
161
|
+
If you use SETLr in your research, please cite:
|
|
162
|
+
|
|
163
|
+
```bibtex
|
|
164
|
+
@software{setlr,
|
|
165
|
+
title = {SETLr: Semantic Extract, Transform and Load},
|
|
166
|
+
author = {McCusker, Jamie},
|
|
167
|
+
year = {2024},
|
|
168
|
+
url = {https://github.com/tetherless-world/setlr}
|
|
169
|
+
}
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
## Support
|
|
173
|
+
|
|
174
|
+
- 📖 [Documentation](docs/README.md)
|
|
175
|
+
- 🐛 [Issue Tracker](https://github.com/tetherless-world/setlr/issues)
|
|
176
|
+
- 💬 [Discussions](https://github.com/tetherless-world/setlr/discussions)
|
|
177
|
+
- 🔒 [Security Policy](SECURITY.md) - Report security vulnerabilities
|