hlsv 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/README.md ADDED
@@ -0,0 +1,356 @@
1
+ # High Level SDTM Validation — `hlsv`
2
+
3
+ > A Ruby gem providing a local web application for automated structural checks on SDTM packages,
4
+ > including ASCII validation, define.xml key verification, and natural key discovery.
5
+
6
+ An open-source SDTM structural validation tool for clinical data teams.
7
+
8
+ [![Gem Version](https://img.shields.io/gem/v/hlsv.svg)](https://rubygems.org/gems/hlsv)
9
+ ![Ruby](https://img.shields.io/badge/ruby-%3E%3D3.0-red.svg)
10
+ ![License](https://img.shields.io/badge/license-GNU%20AGPLv3-green.svg)
11
+
12
+ ---
13
+
14
+ ## 🎯 Why this tool exists
15
+
16
+ Regulatory submissions require SDTM packages to be structurally sound and internally consistent.
17
+ Manual validation is often time-consuming and prone to human error.
18
+
19
+ This gem provides:
20
+
21
+ - Immediate detection of structural inconsistencies
22
+ - Automated key verification against `define.xml`
23
+ - Early duplicate detection
24
+ - Rapid quality control before sponsor or regulatory review
25
+
26
+ All processing is performed locally, ensuring full data confidentiality.
27
+
28
+ ---
29
+
30
+ ## ✨ Features
31
+
32
+ - **ASCII Validation** — detects non-ASCII characters in SDTM datasets
33
+ - **Define.xml Key Verification** — validates keys declared in `define.xml`
34
+ - **Natural Key Discovery** — performs ad hoc searches for natural keys across datasets
35
+ - **Duplicate Analysis** — identifies and reports duplicate records with visual grouping
36
+ - **Interactive Web Interface** — user-friendly configuration and results viewing
37
+ - **Multiple Export Formats** — Excel (.xlsx) and CSV outputs
38
+ - **Excel Export with README** — comprehensive Excel reports with explanatory notes
39
+ - **Local-First Processing** — all processing occurs on your machine, no data leaves your computer
40
+
41
+ ---
42
+
43
+ ## 👤 Who is this for?
44
+
45
+ - Clinical Data Managers
46
+ - Biostatisticians
47
+ - Regulatory submission teams
48
+ - CROs preparing SDTM packages
49
+ - Sponsors performing internal QC
50
+
51
+ ---
52
+
53
+ ## 🔧 Prerequisites
54
+
55
+ - **Ruby** >= 3.0
56
+ - **RubyGems** (bundled with Ruby)
57
+
58
+ All gem dependencies are installed automatically.
59
+
60
+ ---
61
+
62
+ ## 📦 Installation
63
+
64
+ ```bash
65
+ gem install hlsv
66
+ ```
67
+
68
+ That's it. All dependencies (Sinatra, Puma, Excel libraries, etc.) are installed automatically.
69
+
70
+ ---
71
+
72
+ ## 🚀 Quick Start
73
+
74
+ ### Start the application
75
+
76
+ ```bash
77
+ hlsv
78
+ ```
79
+
80
+ The application starts on **http://localhost:4567** and is accessible only from your machine.
81
+
82
+ ### Custom port or host
83
+
84
+ ```bash
85
+ hlsv --port 8080
86
+ hlsv --host 0.0.0.0 --port 8080
87
+ ```
88
+
89
+ ### First-time setup
90
+
91
+ 1. Open your browser at **http://localhost:4567**
92
+ 2. Fill in the configuration form:
93
+ - **Study Name** — unique identifier for your study
94
+ - **Output Directory** — where duplicate files will be saved
95
+ - **Datasets Directory** — path to your `.xpt` files
96
+ - **Define.xml Path** — path to `define.xml` (or `"-"` to skip validation)
97
+ - **Key Configuration** — variables to test for each dataset type
98
+ 3. Click **"💾 Save Current Configuration"**
99
+ 4. Click **"🚀 Start Analysis"**
100
+ 5. Wait for processing to complete
101
+ 6. Click the report file to view the full analysis
102
+ 7. Click a CSV link to inspect detected duplicates
103
+ 8. Refine the **Key Configuration** if needed and run again
104
+ 9. Click **"📂 Load Results"** to browse all generated reports
105
+
106
+ ---
107
+
108
+ ## ⚙️ Configuration
109
+
110
+ Configuration is managed through the web interface or by editing `config.yaml` directly
111
+ in the directory where you launch `hlsv`.
112
+
113
+ ### Parameter reference
114
+
115
+ | Parameter | Type | Description |
116
+ |-----------|------|-------------|
117
+ | `study_name` | string | Unique identifier for your study |
118
+ | `output_type` | string | Output format: `csv` (web interface) |
119
+ | `output_directory` | string | Directory to save duplicate files |
120
+ | `data_directory` | string | Path to your `.xpt` dataset files |
121
+ | `define_path` | string | Path to `define.xml`; use `"-"` to skip |
122
+ | `excluded_ds` | string | Space-separated datasets to exclude (e.g. `"DM SUPPDM"`) |
123
+ | `event_key` | string | Key variables for event datasets (e.g. AE, BE) |
124
+ | `intervention_key` | string | Key variables for intervention datasets (e.g. CM, EX) |
125
+ | `finding_key` | string | Key variables for finding datasets (e.g. LB, VS) |
126
+ | `finding_about_key` | string | Key variables for finding-about datasets (e.g. FA) |
127
+ | `ds_key` | string | Key variables for DS dataset |
128
+ | `relrec_key` | string | Key variables for RELREC dataset |
129
+ | `CO_key` … `TV_key` | string | Keys for Trial Design datasets (CO, TA, TE, TI, TS, TV) |
130
+
131
+ ### Example `config.yaml`
132
+
133
+ ```yaml
134
+ study_name: "MY_STUDY_001"
135
+ output_type: "csv"
136
+ output_directory: "duplicates"
137
+ data_directory: "/path/to/datasets"
138
+ define_path: "/path/to/define.xml"
139
+ excluded_ds: "DM SUPPDM"
140
+
141
+ event_key: "USUBJID AESEQ"
142
+ intervention_key: "USUBJID CMSEQ"
143
+ finding_key: "USUBJID VISITNUM SPEC SEQ"
144
+ finding_about_key: "USUBJID FAOBJ FATESTCD"
145
+ ds_key: "USUBJID EPOCH DSDECOD"
146
+ relrec_key: "STUDYID RDOMAIN USUBJID IDVAR IDVARVAL"
147
+
148
+ CO_key: "STUDYID DOMAIN USUBJID COSEQ"
149
+ TA_key: "STUDYID ARMCD EPOCH"
150
+ TE_key: "STUDYID ETCD"
151
+ TI_key: "STUDYID IETESTCD"
152
+ TS_key: "STUDYID TSPARMCD TSSEQ"
153
+ TV_key: "STUDYID VISITNUM"
154
+ ```
155
+
156
+ ### Configuration tips
157
+
158
+ - Separate variables with **spaces** in key configurations
159
+ - Use `"-"` for `define_path` to skip define.xml validation
160
+ - List excluded datasets separated by spaces: `"DM SUPPDM CO"`
161
+
162
+ ---
163
+
164
+ ## 📊 Output examples
165
+
166
+ ```
167
+ Dataset: AE
168
+ Number of records: 12
169
+ Dataset type: General Observation, event dataset
170
+
171
+ ✓ ASCII Verification
172
+ No non-ASCII characters found
173
+
174
+ ✓ Valid define.xml Key
175
+ Key: STUDYID, USUBJID, AEDECOD, AESTDTC
176
+
177
+ ✓ Minimum Key Found
178
+ Key: USUBJID, AETERM
179
+
180
+ ---
181
+
182
+ Dataset: BE
183
+ Number of records: 3322
184
+ Dataset type: General Observation, event dataset
185
+
186
+ ✓ ASCII Verification
187
+ No non-ASCII characters found
188
+
189
+ ✓ Valid define.xml Key
190
+ Key: STUDYID, USUBJID, BEREFID, BETERM
191
+
192
+ ⚠ No Valid Key Found
193
+ Tested variables: USUBJID, BETERM, BESTDTC
194
+ Last key tested: USUBJID, BETERM, BESTDTC
195
+ File containing duplicated records: data_BE.csv
196
+ ```
197
+
198
+ Duplicate records are grouped by the last key tested. A group identifier appears in the
199
+ first column, with alternating row colors for visual clarity.
200
+
201
+ ---
202
+
203
+ ## 🏗️ Architecture
204
+
205
+ ```
206
+ hlsv/
207
+ ├── bin/
208
+ │ └── hlsv # Executable — starts the server
209
+ ├── lib/
210
+ │ ├── hlsv.rb # Entry point — loads all components
211
+ │ └── hlsv/
212
+ │ ├── version.rb # Gem version
213
+ │ ├── web_app.rb # Sinatra web application (routes, helpers)
214
+ │ ├── mon_script.rb # Orchestration layer
215
+ │ ├── find_keys.rb # Analysis engine
216
+ │ ├── html2word.rb # HTML to DOCX converter
217
+ │ └── xpt.rb # XPT file reader
218
+ ├── views/ # ERB templates
219
+ │ ├── index.erb # Main interface
220
+ │ ├── csv_view.erb # CSV viewer
221
+ │ └── report_template.erb # HTML report template
222
+ ├── public/ # Static assets
223
+ │ ├── app.js
224
+ │ ├── styles.css
225
+ │ ├── styles_csv.css
226
+ │ └── logo.png
227
+ ├── hlsv.gemspec # Gem specification
228
+ ├── Gemfile # Development dependencies
229
+ ├── config.default.yaml # Default configuration template
230
+ ├── LICENSE # License file
231
+ └── README.md
232
+ ```
233
+
234
+ Results are written into a `hlsv_results/` directory created at runtime in the working directory
235
+ where `hlsv` is launched.
236
+
237
+ ---
238
+
239
+ ## 🔒 Security
240
+
241
+ - **Local-only access** — binds to `127.0.0.1` by default (localhost only)
242
+ - **No external connections** — all processing is local
243
+ - **Path traversal protection** — directory traversal attacks are prevented in all file routes
244
+ - **No data collection** — no analytics or tracking of any kind
245
+
246
+ ---
247
+
248
+ ## 🐛 Troubleshooting
249
+
250
+ ### "Port already in use"
251
+
252
+ ```bash
253
+ # Find the process using port 4567
254
+ lsof -i :4567
255
+
256
+ # Kill it
257
+ kill -9 <PID>
258
+
259
+ # Or launch hlsv on a different port
260
+ hlsv --port 8080
261
+ ```
262
+
263
+ ### "Gem not found" or load errors
264
+
265
+ ```bash
266
+ # Reinstall the gem
267
+ gem uninstall hlsv
268
+ gem install hlsv
269
+ ```
270
+
271
+ ### Define.xml not found
272
+
273
+ - Use an absolute path, not a relative one
274
+ - Verify the file exists and is readable
275
+ - Use `"-"` to skip define.xml validation entirely
276
+
277
+ ### Debug mode
278
+
279
+ ```bash
280
+ RACK_ENV=development hlsv
281
+ ```
282
+
283
+ ---
284
+
285
+ ## 🤝 Contributing
286
+
287
+ Contributions are welcome!
288
+
289
+ 1. Fork the repository
290
+ 2. Create a feature branch (`git checkout -b feature/amazing-feature`)
291
+ 3. Commit your changes (`git commit -m 'Add amazing feature'`)
292
+ 4. Push to the branch (`git push origin feature/amazing-feature`)
293
+ 5. Open a Pull Request
294
+
295
+ ---
296
+
297
+ ## 📝 License
298
+
299
+ This project is licensed under the **GNU Affero General Public License v3.0 (AGPL-3.0)**.
300
+
301
+ You may use, modify, and redistribute the software under the terms of the AGPL-3.0.
302
+ Any modified version deployed over a network must also be made available under the same license.
303
+
304
+ For commercial licensing inquiries or proprietary integration, please contact:
305
+ 📩 contact@adclin.com
306
+
307
+ ---
308
+
309
+ ## 👥 Authors
310
+
311
+ - **AdClin Team** — [https://adclin.com](https://adclin.com)
312
+ - [**Marie Ober**](https://www.linkedin.com/in/marie-ober-50528048)
313
+
314
+ ---
315
+
316
+ ## 🙏 Acknowledgments
317
+
318
+ - Built with [Sinatra](http://sinatrarb.com/)
319
+ - Excel generation with [fast_excel](https://github.com/Paxa/fast_excel)
320
+ - Served by [Puma](https://puma.io/)
321
+
322
+ ---
323
+
324
+ ## 📞 Support
325
+
326
+ - 📧 Email: adclin@gmail.com
327
+ - 🐛 Issues: [GitHub Issues](https://github.com/adclin/hlsv/issues)
328
+
329
+ ## 💼 Professional Services
330
+
331
+ AdClin offers professional services related to this tool:
332
+
333
+ - Implementation of custom validation rules
334
+ - Integration into existing SDTM pipelines
335
+ - Deployment in secured environments
336
+ - Training sessions for data management teams
337
+
338
+ 📩 contact@adclin.com
339
+
340
+ ---
341
+
342
+ ## 🗓️ Changelog
343
+
344
+ ### Version 1.0.0 (2026-02-23)
345
+
346
+ - ✨ Initial release as a Ruby gem
347
+ - 🎨 Modern responsive web interface
348
+ - 📊 Excel export with README sheet
349
+ - 🔍 ASCII validation
350
+ - ✅ Define.xml key verification
351
+ - 🔑 Natural key discovery
352
+ - 📱 Mobile responsive design
353
+
354
+ ---
355
+
356
+ *Made with ❤️ by AdClin*
data/bin/hlsv ADDED
@@ -0,0 +1,4 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'hlsv'
4
+ Hlsv::CLI.new.run
@@ -0,0 +1,19 @@
1
+ ---
2
+ study_name: ''
3
+ output_type: csv
4
+ output_directory: duplicates
5
+ data_directory: ''
6
+ define_path: ''
7
+ excluded_ds: ''
8
+ event_key: TERM STDTC
9
+ intervention_key: TRT STDTC
10
+ finding_key: VISITNUM TPTNUM TESTCD
11
+ finding_about_key: VISITNUM OBJ CAT TESTCD
12
+ ds_key: CAT SCAT STDTC
13
+ relrec_key: USUBJID RDOMAIN IDVARVAL IDVAR RELID
14
+ CO_key: USUBJID COREF VISITNUM IDVARVAL RDOMAIN IDVAR COVAL
15
+ TA_key: ARMCD TAETORD
16
+ TE_key: ETCD
17
+ TI_key: IETESTCD TIVERS
18
+ TS_key: TSPARMCD TSSEQ
19
+ TV_key: VISITNUM
data/lib/hlsv/cli.rb ADDED
@@ -0,0 +1,85 @@
1
+ module Hlsv
2
+
3
+ ##
4
+ # Command Line Interpreter
5
+
6
+ class CLI
7
+
8
+ attr_reader :host
9
+ attr_reader :port
10
+
11
+ def initialize
12
+ end
13
+
14
+ def default_port = 4567
15
+ def default_host = '127.0.0.1'
16
+
17
+ def run
18
+ # p ARGV
19
+ # exit
20
+ parse_command_line
21
+ Hlsv.start_server(host: host, port: port)
22
+ end
23
+
24
+ def parse_command_line
25
+
26
+ @host = default_host
27
+ @port = default_port
28
+
29
+ while ARGV.first && ARGV.first[0] == '-'
30
+ option = ARGV.shift
31
+ case option
32
+ when '-h', '--help'
33
+ usage
34
+ exit 1
35
+ when '-v', '--version'
36
+ display_version
37
+ exit 1
38
+ when '--host'
39
+ host = ARGV.shift
40
+ if host
41
+ @host = host
42
+ else
43
+ warn "argument missing for #{option.inspect}"
44
+ end
45
+ when '--port'
46
+ port = ARGV.shift
47
+ if port
48
+ @port = port
49
+ else
50
+ warn "argument missing for #{option.inspect}"
51
+ end
52
+ when '--'
53
+ break
54
+ else
55
+ warn "invalid option #{option.inspect}, ignored"
56
+ end
57
+ end
58
+ end
59
+
60
+ def display_version
61
+ puts "hlsv #{Hlsv::VERSION}"
62
+ end
63
+
64
+ def usage
65
+ puts help_text
66
+ end
67
+
68
+ def help_text
69
+ <<~HELP.lines.map { |line| " #{line}" }.join
70
+
71
+ Usage: hlsv [options]
72
+
73
+ Starts a web server for SDTM validation.
74
+
75
+ Options
76
+ --host HOST listen at IP HOST (default #{default_host})
77
+ --port PORT listen on port PORT (default #{default_port})
78
+ -v, --version display the version
79
+ -h, --help display this text
80
+ HELP
81
+ end
82
+
83
+
84
+ end
85
+ end