hlsv 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +7 -0
- data/CHANGELOG.md +5 -0
- data/LICENSE +676 -0
- data/README.md +356 -0
- data/bin/hlsv +4 -0
- data/config.default.yaml +19 -0
- data/lib/hlsv/cli.rb +85 -0
- data/lib/hlsv/find_keys.rb +979 -0
- data/lib/hlsv/html2word.rb +602 -0
- data/lib/hlsv/mon_script.rb +169 -0
- data/lib/hlsv/version.rb +5 -0
- data/lib/hlsv/web_app.rb +569 -0
- data/lib/hlsv/xpt/dataset.rb +38 -0
- data/lib/hlsv/xpt/library.rb +28 -0
- data/lib/hlsv/xpt/reader.rb +367 -0
- data/lib/hlsv/xpt/variable.rb +130 -0
- data/lib/hlsv/xpt.rb +11 -0
- data/lib/hlsv.rb +49 -0
- data/public/Contact-LOGO.png +0 -0
- data/public/app.js +569 -0
- data/public/styles.css +586 -0
- data/public/styles_csv.css +448 -0
- data/views/csv_view.erb +85 -0
- data/views/index.erb +233 -0
- data/views/report_template.erb +1144 -0
- metadata +176 -0
data/README.md
ADDED
|
@@ -0,0 +1,356 @@
|
|
|
1
|
+
# High Level SDTM Validation — `hlsv`
|
|
2
|
+
|
|
3
|
+
> A Ruby gem providing a local web application for automated structural checks on SDTM packages,
|
|
4
|
+
> including ASCII validation, define.xml key verification, and natural key discovery.
|
|
5
|
+
|
|
6
|
+
An open-source SDTM structural validation tool for clinical data teams.
|
|
7
|
+
|
|
8
|
+
[](https://rubygems.org/gems/hlsv)
|
|
9
|
+

|
|
10
|
+

|
|
11
|
+
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
## 🎯 Why this tool exists
|
|
15
|
+
|
|
16
|
+
Regulatory submissions require SDTM packages to be structurally sound and internally consistent.
|
|
17
|
+
Manual validation is often time-consuming and prone to human error.
|
|
18
|
+
|
|
19
|
+
This gem provides:
|
|
20
|
+
|
|
21
|
+
- Immediate detection of structural inconsistencies
|
|
22
|
+
- Automated key verification against `define.xml`
|
|
23
|
+
- Early duplicate detection
|
|
24
|
+
- Rapid quality control before sponsor or regulatory review
|
|
25
|
+
|
|
26
|
+
All processing is performed locally, ensuring full data confidentiality.
|
|
27
|
+
|
|
28
|
+
---
|
|
29
|
+
|
|
30
|
+
## ✨ Features
|
|
31
|
+
|
|
32
|
+
- **ASCII Validation** — detects non-ASCII characters in SDTM datasets
|
|
33
|
+
- **Define.xml Key Verification** — validates keys declared in `define.xml`
|
|
34
|
+
- **Natural Key Discovery** — performs ad hoc searches for natural keys across datasets
|
|
35
|
+
- **Duplicate Analysis** — identifies and reports duplicate records with visual grouping
|
|
36
|
+
- **Interactive Web Interface** — user-friendly configuration and results viewing
|
|
37
|
+
- **Multiple Export Formats** — Excel (.xlsx) and CSV outputs
|
|
38
|
+
- **Excel Export with README** — comprehensive Excel reports with explanatory notes
|
|
39
|
+
- **Local-First Processing** — all processing occurs on your machine, no data leaves your computer
|
|
40
|
+
|
|
41
|
+
---
|
|
42
|
+
|
|
43
|
+
## 👤 Who is this for?
|
|
44
|
+
|
|
45
|
+
- Clinical Data Managers
|
|
46
|
+
- Biostatisticians
|
|
47
|
+
- Regulatory submission teams
|
|
48
|
+
- CROs preparing SDTM packages
|
|
49
|
+
- Sponsors performing internal QC
|
|
50
|
+
|
|
51
|
+
---
|
|
52
|
+
|
|
53
|
+
## 🔧 Prerequisites
|
|
54
|
+
|
|
55
|
+
- **Ruby** >= 3.0
|
|
56
|
+
- **RubyGems** (bundled with Ruby)
|
|
57
|
+
|
|
58
|
+
All gem dependencies are installed automatically.
|
|
59
|
+
|
|
60
|
+
---
|
|
61
|
+
|
|
62
|
+
## 📦 Installation
|
|
63
|
+
|
|
64
|
+
```bash
|
|
65
|
+
gem install hlsv
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
That's it. All dependencies (Sinatra, Puma, Excel libraries, etc.) are installed automatically.
|
|
69
|
+
|
|
70
|
+
---
|
|
71
|
+
|
|
72
|
+
## 🚀 Quick Start
|
|
73
|
+
|
|
74
|
+
### Start the application
|
|
75
|
+
|
|
76
|
+
```bash
|
|
77
|
+
hlsv
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
The application starts on **http://localhost:4567** and is accessible only from your machine.
|
|
81
|
+
|
|
82
|
+
### Custom port or host
|
|
83
|
+
|
|
84
|
+
```bash
|
|
85
|
+
hlsv --port 8080
|
|
86
|
+
hlsv --host 0.0.0.0 --port 8080
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
### First-time setup
|
|
90
|
+
|
|
91
|
+
1. Open your browser at **http://localhost:4567**
|
|
92
|
+
2. Fill in the configuration form:
|
|
93
|
+
- **Study Name** — unique identifier for your study
|
|
94
|
+
- **Output Directory** — where duplicate files will be saved
|
|
95
|
+
- **Datasets Directory** — path to your `.xpt` files
|
|
96
|
+
- **Define.xml Path** — path to `define.xml` (or `"-"` to skip validation)
|
|
97
|
+
- **Key Configuration** — variables to test for each dataset type
|
|
98
|
+
3. Click **"💾 Save Current Configuration"**
|
|
99
|
+
4. Click **"🚀 Start Analysis"**
|
|
100
|
+
5. Wait for processing to complete
|
|
101
|
+
6. Click the report file to view the full analysis
|
|
102
|
+
7. Click a CSV link to inspect detected duplicates
|
|
103
|
+
8. Refine the **Key Configuration** if needed and run again
|
|
104
|
+
9. Click **"📂 Load Results"** to browse all generated reports
|
|
105
|
+
|
|
106
|
+
---
|
|
107
|
+
|
|
108
|
+
## ⚙️ Configuration
|
|
109
|
+
|
|
110
|
+
Configuration is managed through the web interface or by editing `config.yaml` directly
|
|
111
|
+
in the directory where you launch `hlsv`.
|
|
112
|
+
|
|
113
|
+
### Parameter reference
|
|
114
|
+
|
|
115
|
+
| Parameter | Type | Description |
|
|
116
|
+
|-----------|------|-------------|
|
|
117
|
+
| `study_name` | string | Unique identifier for your study |
|
|
118
|
+
| `output_type` | string | Output format: `csv` (web interface) |
|
|
119
|
+
| `output_directory` | string | Directory to save duplicate files |
|
|
120
|
+
| `data_directory` | string | Path to your `.xpt` dataset files |
|
|
121
|
+
| `define_path` | string | Path to `define.xml`; use `"-"` to skip |
|
|
122
|
+
| `excluded_ds` | string | Space-separated datasets to exclude (e.g. `"DM SUPPDM"`) |
|
|
123
|
+
| `event_key` | string | Key variables for event datasets (e.g. AE, BE) |
|
|
124
|
+
| `intervention_key` | string | Key variables for intervention datasets (e.g. CM, EX) |
|
|
125
|
+
| `finding_key` | string | Key variables for finding datasets (e.g. LB, VS) |
|
|
126
|
+
| `finding_about_key` | string | Key variables for finding-about datasets (e.g. FA) |
|
|
127
|
+
| `ds_key` | string | Key variables for DS dataset |
|
|
128
|
+
| `relrec_key` | string | Key variables for RELREC dataset |
|
|
129
|
+
| `CO_key` … `TV_key` | string | Keys for Trial Design datasets (CO, TA, TE, TI, TS, TV) |
|
|
130
|
+
|
|
131
|
+
### Example `config.yaml`
|
|
132
|
+
|
|
133
|
+
```yaml
|
|
134
|
+
study_name: "MY_STUDY_001"
|
|
135
|
+
output_type: "csv"
|
|
136
|
+
output_directory: "duplicates"
|
|
137
|
+
data_directory: "/path/to/datasets"
|
|
138
|
+
define_path: "/path/to/define.xml"
|
|
139
|
+
excluded_ds: "DM SUPPDM"
|
|
140
|
+
|
|
141
|
+
event_key: "USUBJID AESEQ"
|
|
142
|
+
intervention_key: "USUBJID CMSEQ"
|
|
143
|
+
finding_key: "USUBJID VISITNUM SPEC SEQ"
|
|
144
|
+
finding_about_key: "USUBJID FAOBJ FATESTCD"
|
|
145
|
+
ds_key: "USUBJID EPOCH DSDECOD"
|
|
146
|
+
relrec_key: "STUDYID RDOMAIN USUBJID IDVAR IDVARVAL"
|
|
147
|
+
|
|
148
|
+
CO_key: "STUDYID DOMAIN USUBJID COSEQ"
|
|
149
|
+
TA_key: "STUDYID ARMCD EPOCH"
|
|
150
|
+
TE_key: "STUDYID ETCD"
|
|
151
|
+
TI_key: "STUDYID IETESTCD"
|
|
152
|
+
TS_key: "STUDYID TSPARMCD TSSEQ"
|
|
153
|
+
TV_key: "STUDYID VISITNUM"
|
|
154
|
+
```
|
|
155
|
+
|
|
156
|
+
### Configuration tips
|
|
157
|
+
|
|
158
|
+
- Separate variables with **spaces** in key configurations
|
|
159
|
+
- Use `"-"` for `define_path` to skip define.xml validation
|
|
160
|
+
- List excluded datasets separated by spaces: `"DM SUPPDM CO"`
|
|
161
|
+
|
|
162
|
+
---
|
|
163
|
+
|
|
164
|
+
## 📊 Output examples
|
|
165
|
+
|
|
166
|
+
```
|
|
167
|
+
Dataset: AE
|
|
168
|
+
Number of records: 12
|
|
169
|
+
Dataset type: General Observation, event dataset
|
|
170
|
+
|
|
171
|
+
✓ ASCII Verification
|
|
172
|
+
No non-ASCII characters found
|
|
173
|
+
|
|
174
|
+
✓ Valid define.xml Key
|
|
175
|
+
Key: STUDYID, USUBJID, AEDECOD, AESTDTC
|
|
176
|
+
|
|
177
|
+
✓ Minimum Key Found
|
|
178
|
+
Key: USUBJID, AETERM
|
|
179
|
+
|
|
180
|
+
---
|
|
181
|
+
|
|
182
|
+
Dataset: BE
|
|
183
|
+
Number of records: 3322
|
|
184
|
+
Dataset type: General Observation, event dataset
|
|
185
|
+
|
|
186
|
+
✓ ASCII Verification
|
|
187
|
+
No non-ASCII characters found
|
|
188
|
+
|
|
189
|
+
✓ Valid define.xml Key
|
|
190
|
+
Key: STUDYID, USUBJID, BEREFID, BETERM
|
|
191
|
+
|
|
192
|
+
⚠ No Valid Key Found
|
|
193
|
+
Tested variables: USUBJID, BETERM, BESTDTC
|
|
194
|
+
Last key tested: USUBJID, BETERM, BESTDTC
|
|
195
|
+
File containing duplicated records: data_BE.csv
|
|
196
|
+
```
|
|
197
|
+
|
|
198
|
+
Duplicate records are grouped by the last key tested. A group identifier appears in the
|
|
199
|
+
first column, with alternating row colors for visual clarity.
|
|
200
|
+
|
|
201
|
+
---
|
|
202
|
+
|
|
203
|
+
## 🏗️ Architecture
|
|
204
|
+
|
|
205
|
+
```
|
|
206
|
+
hlsv/
|
|
207
|
+
├── bin/
|
|
208
|
+
│ └── hlsv # Executable — starts the server
|
|
209
|
+
├── lib/
|
|
210
|
+
│ ├── hlsv.rb # Entry point — loads all components
|
|
211
|
+
│ └── hlsv/
|
|
212
|
+
│ ├── version.rb # Gem version
|
|
213
|
+
│ ├── web_app.rb # Sinatra web application (routes, helpers)
|
|
214
|
+
│ ├── mon_script.rb # Orchestration layer
|
|
215
|
+
│ ├── find_keys.rb # Analysis engine
|
|
216
|
+
│ ├── html2word.rb # HTML to DOCX converter
|
|
217
|
+
│ └── xpt.rb # XPT file reader
|
|
218
|
+
├── views/ # ERB templates
|
|
219
|
+
│ ├── index.erb # Main interface
|
|
220
|
+
│ ├── csv_view.erb # CSV viewer
|
|
221
|
+
│ └── report_template.erb # HTML report template
|
|
222
|
+
├── public/ # Static assets
|
|
223
|
+
│ ├── app.js
|
|
224
|
+
│ ├── styles.css
|
|
225
|
+
│ ├── styles_csv.css
|
|
226
|
+
│ └── logo.png
|
|
227
|
+
├── hlsv.gemspec # Gem specification
|
|
228
|
+
├── Gemfile # Development dependencies
|
|
229
|
+
├── config.default.yaml # Default configuration template
|
|
230
|
+
├── LICENSE # License file
|
|
231
|
+
└── README.md
|
|
232
|
+
```
|
|
233
|
+
|
|
234
|
+
Results are written into a `hlsv_results/` directory created at runtime in the working directory
|
|
235
|
+
where `hlsv` is launched.
|
|
236
|
+
|
|
237
|
+
---
|
|
238
|
+
|
|
239
|
+
## 🔒 Security
|
|
240
|
+
|
|
241
|
+
- **Local-only access** — binds to `127.0.0.1` by default (localhost only)
|
|
242
|
+
- **No external connections** — all processing is local
|
|
243
|
+
- **Path traversal protection** — directory traversal attacks are prevented in all file routes
|
|
244
|
+
- **No data collection** — no analytics or tracking of any kind
|
|
245
|
+
|
|
246
|
+
---
|
|
247
|
+
|
|
248
|
+
## 🐛 Troubleshooting
|
|
249
|
+
|
|
250
|
+
### "Port already in use"
|
|
251
|
+
|
|
252
|
+
```bash
|
|
253
|
+
# Find the process using port 4567
|
|
254
|
+
lsof -i :4567
|
|
255
|
+
|
|
256
|
+
# Kill it
|
|
257
|
+
kill -9 <PID>
|
|
258
|
+
|
|
259
|
+
# Or launch hlsv on a different port
|
|
260
|
+
hlsv --port 8080
|
|
261
|
+
```
|
|
262
|
+
|
|
263
|
+
### "Gem not found" or load errors
|
|
264
|
+
|
|
265
|
+
```bash
|
|
266
|
+
# Reinstall the gem
|
|
267
|
+
gem uninstall hlsv
|
|
268
|
+
gem install hlsv
|
|
269
|
+
```
|
|
270
|
+
|
|
271
|
+
### Define.xml not found
|
|
272
|
+
|
|
273
|
+
- Use an absolute path, not a relative one
|
|
274
|
+
- Verify the file exists and is readable
|
|
275
|
+
- Use `"-"` to skip define.xml validation entirely
|
|
276
|
+
|
|
277
|
+
### Debug mode
|
|
278
|
+
|
|
279
|
+
```bash
|
|
280
|
+
RACK_ENV=development hlsv
|
|
281
|
+
```
|
|
282
|
+
|
|
283
|
+
---
|
|
284
|
+
|
|
285
|
+
## 🤝 Contributing
|
|
286
|
+
|
|
287
|
+
Contributions are welcome!
|
|
288
|
+
|
|
289
|
+
1. Fork the repository
|
|
290
|
+
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
|
|
291
|
+
3. Commit your changes (`git commit -m 'Add amazing feature'`)
|
|
292
|
+
4. Push to the branch (`git push origin feature/amazing-feature`)
|
|
293
|
+
5. Open a Pull Request
|
|
294
|
+
|
|
295
|
+
---
|
|
296
|
+
|
|
297
|
+
## 📝 License
|
|
298
|
+
|
|
299
|
+
This project is licensed under the **GNU Affero General Public License v3.0 (AGPL-3.0)**.
|
|
300
|
+
|
|
301
|
+
You may use, modify, and redistribute the software under the terms of the AGPL-3.0.
|
|
302
|
+
Any modified version deployed over a network must also be made available under the same license.
|
|
303
|
+
|
|
304
|
+
For commercial licensing inquiries or proprietary integration, please contact:
|
|
305
|
+
📩 contact@adclin.com
|
|
306
|
+
|
|
307
|
+
---
|
|
308
|
+
|
|
309
|
+
## 👥 Authors
|
|
310
|
+
|
|
311
|
+
- **AdClin Team** — [https://adclin.com](https://adclin.com)
|
|
312
|
+
- [**Marie Ober**](https://www.linkedin.com/in/marie-ober-50528048)
|
|
313
|
+
|
|
314
|
+
---
|
|
315
|
+
|
|
316
|
+
## 🙏 Acknowledgments
|
|
317
|
+
|
|
318
|
+
- Built with [Sinatra](http://sinatrarb.com/)
|
|
319
|
+
- Excel generation with [fast_excel](https://github.com/Paxa/fast_excel)
|
|
320
|
+
- Served by [Puma](https://puma.io/)
|
|
321
|
+
|
|
322
|
+
---
|
|
323
|
+
|
|
324
|
+
## 📞 Support
|
|
325
|
+
|
|
326
|
+
- 📧 Email: adclin@gmail.com
|
|
327
|
+
- 🐛 Issues: [GitHub Issues](https://github.com/adclin/hlsv/issues)
|
|
328
|
+
|
|
329
|
+
## 💼 Professional Services
|
|
330
|
+
|
|
331
|
+
AdClin offers professional services related to this tool:
|
|
332
|
+
|
|
333
|
+
- Implementation of custom validation rules
|
|
334
|
+
- Integration into existing SDTM pipelines
|
|
335
|
+
- Deployment in secured environments
|
|
336
|
+
- Training sessions for data management teams
|
|
337
|
+
|
|
338
|
+
📩 contact@adclin.com
|
|
339
|
+
|
|
340
|
+
---
|
|
341
|
+
|
|
342
|
+
## 🗓️ Changelog
|
|
343
|
+
|
|
344
|
+
### Version 1.0.0 (2026-02-23)
|
|
345
|
+
|
|
346
|
+
- ✨ Initial release as a Ruby gem
|
|
347
|
+
- 🎨 Modern responsive web interface
|
|
348
|
+
- 📊 Excel export with README sheet
|
|
349
|
+
- 🔍 ASCII validation
|
|
350
|
+
- ✅ Define.xml key verification
|
|
351
|
+
- 🔑 Natural key discovery
|
|
352
|
+
- 📱 Mobile responsive design
|
|
353
|
+
|
|
354
|
+
---
|
|
355
|
+
|
|
356
|
+
*Made with ❤️ by AdClin*
|
data/bin/hlsv
ADDED
data/config.default.yaml
ADDED
|
@@ -0,0 +1,19 @@
|
|
|
1
|
+
---
|
|
2
|
+
study_name: ''
|
|
3
|
+
output_type: csv
|
|
4
|
+
output_directory: duplicates
|
|
5
|
+
data_directory: ''
|
|
6
|
+
define_path: ''
|
|
7
|
+
excluded_ds: ''
|
|
8
|
+
event_key: TERM STDTC
|
|
9
|
+
intervention_key: TRT STDTC
|
|
10
|
+
finding_key: VISITNUM TPTNUM TESTCD
|
|
11
|
+
finding_about_key: VISITNUM OBJ CAT TESTCD
|
|
12
|
+
ds_key: CAT SCAT STDTC
|
|
13
|
+
relrec_key: USUBJID RDOMAIN IDVARVAL IDVAR RELID
|
|
14
|
+
CO_key: USUBJID COREF VISITNUM IDVARVAL RDOMAIN IDVAR COVAL
|
|
15
|
+
TA_key: ARMCD TAETORD
|
|
16
|
+
TE_key: ETCD
|
|
17
|
+
TI_key: IETESTCD TIVERS
|
|
18
|
+
TS_key: TSPARMCD TSSEQ
|
|
19
|
+
TV_key: VISITNUM
|
data/lib/hlsv/cli.rb
ADDED
|
@@ -0,0 +1,85 @@
|
|
|
1
|
+
module Hlsv
|
|
2
|
+
|
|
3
|
+
##
|
|
4
|
+
# Command Line Interpreter
|
|
5
|
+
|
|
6
|
+
class CLI
|
|
7
|
+
|
|
8
|
+
attr_reader :host
|
|
9
|
+
attr_reader :port
|
|
10
|
+
|
|
11
|
+
def initialize
|
|
12
|
+
end
|
|
13
|
+
|
|
14
|
+
def default_port = 4567
|
|
15
|
+
def default_host = '127.0.0.1'
|
|
16
|
+
|
|
17
|
+
def run
|
|
18
|
+
# p ARGV
|
|
19
|
+
# exit
|
|
20
|
+
parse_command_line
|
|
21
|
+
Hlsv.start_server(host: host, port: port)
|
|
22
|
+
end
|
|
23
|
+
|
|
24
|
+
def parse_command_line
|
|
25
|
+
|
|
26
|
+
@host = default_host
|
|
27
|
+
@port = default_port
|
|
28
|
+
|
|
29
|
+
while ARGV.first && ARGV.first[0] == '-'
|
|
30
|
+
option = ARGV.shift
|
|
31
|
+
case option
|
|
32
|
+
when '-h', '--help'
|
|
33
|
+
usage
|
|
34
|
+
exit 1
|
|
35
|
+
when '-v', '--version'
|
|
36
|
+
display_version
|
|
37
|
+
exit 1
|
|
38
|
+
when '--host'
|
|
39
|
+
host = ARGV.shift
|
|
40
|
+
if host
|
|
41
|
+
@host = host
|
|
42
|
+
else
|
|
43
|
+
warn "argument missing for #{option.inspect}"
|
|
44
|
+
end
|
|
45
|
+
when '--port'
|
|
46
|
+
port = ARGV.shift
|
|
47
|
+
if port
|
|
48
|
+
@port = port
|
|
49
|
+
else
|
|
50
|
+
warn "argument missing for #{option.inspect}"
|
|
51
|
+
end
|
|
52
|
+
when '--'
|
|
53
|
+
break
|
|
54
|
+
else
|
|
55
|
+
warn "invalid option #{option.inspect}, ignored"
|
|
56
|
+
end
|
|
57
|
+
end
|
|
58
|
+
end
|
|
59
|
+
|
|
60
|
+
def display_version
|
|
61
|
+
puts "hlsv #{Hlsv::VERSION}"
|
|
62
|
+
end
|
|
63
|
+
|
|
64
|
+
def usage
|
|
65
|
+
puts help_text
|
|
66
|
+
end
|
|
67
|
+
|
|
68
|
+
def help_text
|
|
69
|
+
<<~HELP.lines.map { |line| " #{line}" }.join
|
|
70
|
+
|
|
71
|
+
Usage: hlsv [options]
|
|
72
|
+
|
|
73
|
+
Starts a web server for SDTM validation.
|
|
74
|
+
|
|
75
|
+
Options
|
|
76
|
+
--host HOST listen at IP HOST (default #{default_host})
|
|
77
|
+
--port PORT listen on port PORT (default #{default_port})
|
|
78
|
+
-v, --version display the version
|
|
79
|
+
-h, --help display this text
|
|
80
|
+
HELP
|
|
81
|
+
end
|
|
82
|
+
|
|
83
|
+
|
|
84
|
+
end
|
|
85
|
+
end
|