conceptql 0.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (111) hide show
  1. checksums.yaml +7 -0
  2. data/.gitignore +22 -0
  3. data/CHANGELOG.md +17 -0
  4. data/Gemfile +4 -0
  5. data/Guardfile +28 -0
  6. data/LICENSE.txt +22 -0
  7. data/README.md +108 -0
  8. data/Rakefile +1 -0
  9. data/bin/conceptql +5 -0
  10. data/conceptql.gemspec +30 -0
  11. data/doc/ConceptQL Specification (alpha).pdf +0 -0
  12. data/doc/diagram_0.png +0 -0
  13. data/doc/spec.md +1208 -0
  14. data/lib/conceptql/behaviors/dottable.rb +71 -0
  15. data/lib/conceptql/cli.rb +135 -0
  16. data/lib/conceptql/date_adjuster.rb +45 -0
  17. data/lib/conceptql/graph.rb +49 -0
  18. data/lib/conceptql/graph_nodifier.rb +123 -0
  19. data/lib/conceptql/logger.rb +10 -0
  20. data/lib/conceptql/nodes/after.rb +12 -0
  21. data/lib/conceptql/nodes/before.rb +11 -0
  22. data/lib/conceptql/nodes/binary_operator_node.rb +41 -0
  23. data/lib/conceptql/nodes/casting_node.rb +75 -0
  24. data/lib/conceptql/nodes/complement.rb +16 -0
  25. data/lib/conceptql/nodes/concept.rb +38 -0
  26. data/lib/conceptql/nodes/condition_type.rb +63 -0
  27. data/lib/conceptql/nodes/cpt.rb +20 -0
  28. data/lib/conceptql/nodes/date_range.rb +39 -0
  29. data/lib/conceptql/nodes/death.rb +19 -0
  30. data/lib/conceptql/nodes/during.rb +16 -0
  31. data/lib/conceptql/nodes/except.rb +11 -0
  32. data/lib/conceptql/nodes/first.rb +24 -0
  33. data/lib/conceptql/nodes/from.rb +15 -0
  34. data/lib/conceptql/nodes/gender.rb +27 -0
  35. data/lib/conceptql/nodes/hcpcs.rb +20 -0
  36. data/lib/conceptql/nodes/icd10.rb +23 -0
  37. data/lib/conceptql/nodes/icd9.rb +23 -0
  38. data/lib/conceptql/nodes/icd9_procedure.rb +20 -0
  39. data/lib/conceptql/nodes/intersect.rb +29 -0
  40. data/lib/conceptql/nodes/last.rb +24 -0
  41. data/lib/conceptql/nodes/loinc.rb +20 -0
  42. data/lib/conceptql/nodes/node.rb +71 -0
  43. data/lib/conceptql/nodes/occurrence.rb +47 -0
  44. data/lib/conceptql/nodes/pass_thru.rb +11 -0
  45. data/lib/conceptql/nodes/person.rb +25 -0
  46. data/lib/conceptql/nodes/person_filter.rb +12 -0
  47. data/lib/conceptql/nodes/place_of_service_code.rb +23 -0
  48. data/lib/conceptql/nodes/procedure_occurrence.rb +21 -0
  49. data/lib/conceptql/nodes/race.rb +23 -0
  50. data/lib/conceptql/nodes/rxnorm.rb +20 -0
  51. data/lib/conceptql/nodes/snomed.rb +19 -0
  52. data/lib/conceptql/nodes/source_vocabulary_node.rb +54 -0
  53. data/lib/conceptql/nodes/standard_vocabulary_node.rb +43 -0
  54. data/lib/conceptql/nodes/started_by.rb +16 -0
  55. data/lib/conceptql/nodes/temporal_node.rb +25 -0
  56. data/lib/conceptql/nodes/time_window.rb +54 -0
  57. data/lib/conceptql/nodes/union.rb +15 -0
  58. data/lib/conceptql/nodes/visit.rb +11 -0
  59. data/lib/conceptql/nodes/visit_occurrence.rb +26 -0
  60. data/lib/conceptql/nodifier.rb +9 -0
  61. data/lib/conceptql/query.rb +39 -0
  62. data/lib/conceptql/tree.rb +36 -0
  63. data/lib/conceptql/version.rb +3 -0
  64. data/lib/conceptql/view_maker.rb +56 -0
  65. data/lib/conceptql.rb +7 -0
  66. data/spec/conceptql/behaviors/dottable_spec.rb +111 -0
  67. data/spec/conceptql/date_adjuster_spec.rb +68 -0
  68. data/spec/conceptql/nodes/after_spec.rb +18 -0
  69. data/spec/conceptql/nodes/before_spec.rb +18 -0
  70. data/spec/conceptql/nodes/casting_node_spec.rb +73 -0
  71. data/spec/conceptql/nodes/complement_spec.rb +15 -0
  72. data/spec/conceptql/nodes/concept_spec.rb +34 -0
  73. data/spec/conceptql/nodes/condition_type_spec.rb +113 -0
  74. data/spec/conceptql/nodes/cpt_spec.rb +31 -0
  75. data/spec/conceptql/nodes/date_range_spec.rb +35 -0
  76. data/spec/conceptql/nodes/death_spec.rb +12 -0
  77. data/spec/conceptql/nodes/during_spec.rb +32 -0
  78. data/spec/conceptql/nodes/except_spec.rb +18 -0
  79. data/spec/conceptql/nodes/first_spec.rb +37 -0
  80. data/spec/conceptql/nodes/from_spec.rb +15 -0
  81. data/spec/conceptql/nodes/gender_spec.rb +29 -0
  82. data/spec/conceptql/nodes/hcpcs_spec.rb +31 -0
  83. data/spec/conceptql/nodes/icd10_spec.rb +36 -0
  84. data/spec/conceptql/nodes/icd9_procedure_spec.rb +31 -0
  85. data/spec/conceptql/nodes/icd9_spec.rb +36 -0
  86. data/spec/conceptql/nodes/intersect_spec.rb +33 -0
  87. data/spec/conceptql/nodes/last_spec.rb +38 -0
  88. data/spec/conceptql/nodes/loinc_spec.rb +31 -0
  89. data/spec/conceptql/nodes/occurrence_spec.rb +89 -0
  90. data/spec/conceptql/nodes/person_filter_spec.rb +18 -0
  91. data/spec/conceptql/nodes/person_spec.rb +12 -0
  92. data/spec/conceptql/nodes/place_of_service_code_spec.rb +26 -0
  93. data/spec/conceptql/nodes/procedure_occurrence_spec.rb +12 -0
  94. data/spec/conceptql/nodes/query_double.rb +19 -0
  95. data/spec/conceptql/nodes/race_spec.rb +23 -0
  96. data/spec/conceptql/nodes/rxnorm_spec.rb +31 -0
  97. data/spec/conceptql/nodes/snomed_spec.rb +31 -0
  98. data/spec/conceptql/nodes/source_vocabulary_node_spec.rb +37 -0
  99. data/spec/conceptql/nodes/standard_vocabulary_node_spec.rb +40 -0
  100. data/spec/conceptql/nodes/started_by_spec.rb +25 -0
  101. data/spec/conceptql/nodes/temporal_node_spec.rb +57 -0
  102. data/spec/conceptql/nodes/time_window_spec.rb +66 -0
  103. data/spec/conceptql/nodes/union_spec.rb +25 -0
  104. data/spec/conceptql/nodes/visit_occurrence_spec.rb +12 -0
  105. data/spec/conceptql/query_spec.rb +20 -0
  106. data/spec/conceptql/tree_spec.rb +54 -0
  107. data/spec/doubles/stream_for_casting_double.rb +9 -0
  108. data/spec/doubles/stream_for_occurrence_double.rb +21 -0
  109. data/spec/doubles/stream_for_temporal_double.rb +6 -0
  110. data/spec/spec_helper.rb +74 -0
  111. metadata +327 -0
data/doc/spec.md ADDED
@@ -0,0 +1,1208 @@
1
+ # ConceptQL Specification
2
+
3
+ ConceptQL (pronounced concept-Q-L) is a high-level language that allows researchers to unambiguously define their research algorithms.
4
+
5
+ ## Motivation for ConceptQL
6
+ Outcomes Insights intends to build a vast library of research algorithms and apply those algorithms to large databases of claims data. Early into building the library, we realized we had to overcome two major issues:
7
+
8
+ 1. Methods sections of research papers commonly use natural language to specify the criteria used to build cohorts from a claims database.
9
+ - Algorithms defined in natural language are often imprecise, open to multiple interpretations, and generally difficult to reproduce.
10
+ - Researchers could benefit from a language that removes the ambiguity of natural language while increasing the reproducibility of their research algorithms.
11
+ 2. Querying against claims databases is often difficult.
12
+ - Hand-coding algorithms to extract cohorts from datasets is time-consuming, error-prone, and opaque.
13
+ - Researchers could benefit from a language that allows algorithms to be defined at a high-level and then gets translated into the appropriate queries against a database.
14
+
15
+ We developed ConceptQL to address these two issues.
16
+
17
+ We are writing a tool that can read research algorithms defined in ConceptQL. The tool can create a diagram for the algorithm which makes it easy to visualize and understand. The tool can also translate the algorithm into a SQL query which runs against data structured in [OMOP's Common Data Model (CDM)](http://omop.org/CDM). The purpose of the CDM is to standardize the format and content of observational data, so standardized applications, tools and methods can be applied to them.
18
+
19
+ For instance, using ConceptQL we can take a statement that looks like this:
20
+ ```YAML
21
+ :icd9: '412'
22
+ ```
23
+
24
+ And generate a diagram that looks like this:
25
+ ```ConceptQL
26
+ { icd9: '412' }
27
+ ```
28
+
29
+ And generate SQL that looks like this:
30
+ ```SQL
31
+ SELECT *
32
+ FROM cdm_data.condition_occurrence AS co
33
+ JOIN vocabulary.source_to_concept_map AS scm ON (c.condition_concept_id = scm.target_concept_id)
34
+ WHERE scm.source_code IN ('412')
35
+ AND scm.source_vocabulary_id = 2
36
+ AND scm.source_code = co.condition_source_value
37
+ ```
38
+
39
+ As stated above, one of the goals of ConcegtQL is to make it easy to assemble fairly complex queries without having to roll up our sleeves and write raw SQL. To accommodate this complexity, ConceptQL itself has some complexities of its own. That said, we believe ConceptQL will help researchers define, hone, and share their research algorithms.
40
+
41
+
42
+ ## ConceptQL Overview
43
+ ### What ConceptQL Looks Like
44
+
45
+ I find seeing examples to be the quickest way to get a sense of a language. Here is a trivial example to whet your appetite. The example is in YAML, but could just as easily be in JSON or any other markup language capable of representing nested sets of heterogeneous arrays and hashes. In fact, the ConceptQL "language" is a just set of nested hashes and arrays representing search criteria and some set operations and temporal operations to glue those criteria together.
46
+
47
+ ```YAML
48
+ # Example 1: A simple example in YAML
49
+ # This is just a simple hash with a key of :icd9 and a value of 412
50
+ # This example will search the condition_occurrence table for all conditions that match the ICD-9 concept of 412.
51
+ ---
52
+ :icd9: '412'
53
+ ```
54
+
55
+ ### ConceptQL Diagrams
56
+ Reading ConceptQL in YAML or JSON seems hard to me. I prefer to explore ConceptQL using directed graphs. For instance, the diagram for the simple example listed in YAML above is:
57
+ ```ConceptQL
58
+ # All Conditions Matching MI
59
+ { icd9: '412' }
60
+ ```
61
+
62
+
63
+ Each oval depicts a "node", or rather, a ConceptQL expression. An arrow between a pair of nodes indicates that the results from the node on the tail of the arrow pass on to the node at the head of the arrow. A simple example should help here:
64
+ ```ConceptQL
65
+ # First Office Visit Per Patient
66
+ {
67
+ first: {
68
+ cpt: '99214'
69
+ }
70
+ }
71
+ ```
72
+
73
+
74
+ The diagram above reads "get all procedures that match the CPT 99214 (Office Visit) and then filter them down to the first occurrence for each person". The diagram is much more terse than that and to accurately read the diagram, you need a lot of implicit knowledge about how each node operates. Fortunately, this document will (hopefully) impart that knowledge to you.
75
+
76
+ Please note that all of my diagrams end with an arrow pointing at nothing. You'll see why soon.
77
+
78
+
79
+ ### Think of Results as a Stream
80
+ I draw my ConceptQL diagrams with leaf nodes at the top and the "trunk" nodes at the bottom. I like to think of the results of a ConceptQL statement as a flowing stream of data. The leaf nodes, or nodes that gather results out of the database, act like tributaries. The results flow downwards and either join with other results, or filter out other results until the streams emerge at the bottom of the diagram. Think of each arrow as a stream of results, flowing down through one node to the next.
81
+
82
+ The trailing arrow in the diagrams serves as a reminder that ConceptQL yields a stream of results.
83
+
84
+
85
+ ### Streams have Types
86
+ You might have noticed that the nodes and edges in the diagrams often have a color. That color represents what "type" of stream the node or edge represents. There are many types in ConceptQL, and you'll notice they are __strongly__ correlated with the tables found in [CDM v4.0](http://omop.org/CDM):
87
+
88
+ - condition_occurrence
89
+ - red
90
+ - death
91
+ - brown
92
+ - drug_cost
93
+ - TBD
94
+ - drug_exposure
95
+ - purple
96
+ - observation
97
+ - TBD
98
+ - payer_plan_period
99
+ - TBD
100
+ - person
101
+ - blue
102
+ - procedure_cost
103
+ - gold
104
+ - procedure_occurrence
105
+ - green
106
+ - visit_occurrence
107
+ - orange
108
+
109
+ Each stream has a point of origin (essentially, the table from which we pulled the results for a stream). Based on that origin, each stream will have a particular type. The stream carries this type information as it moves through each node. When certain nodes, particularly set and temporal operation nodes, need to perform filtering, they can use this type information to determine how to best filter a stream. There will be much more discussion about types woven throughout this document. For now, it is sufficient to know that each stream has a type.
110
+
111
+ You'll also notice that the trailing arrow(s) at the end of the diagrams indicate which types of streams are ultimately passed on at the end of a ConceptQL statement.
112
+
113
+
114
+ ### What *are* Streams Really?
115
+ Though I think that a "stream" is a helpful abstraction when thinking in ConceptQL, on a few occasions we need to know what's going on under the hood.
116
+
117
+ Every table in the CDM structure has a surrogate key column (an ID column). When we execute a ConceptQL statement, the "streams" that are generated by the statement are just sets of these IDs for rows that matched the ConceptQL criteria. So each stream is just a set of IDs that point back to some rows in one of the CDM tables. When a stream has a "type" it is really just that the stream contains IDs associated with its table of origin.
118
+
119
+ So when we execute this ConceptQL statement, the resulting "stream" is all the person IDs for all male patients in the database:
120
+ ```ConceptQL
121
+ # All Male Patients
122
+ { gender: 'Male' }
123
+ ```
124
+
125
+ When we execute this ConceptQL statement, the resulting "stream" is all condition_occurrence IDs that match ICD-9 799.22:
126
+ ```ConceptQL
127
+ # All Condition Occurrences that match ICD-9 799.22
128
+ { icd9: '799.22' }
129
+ ```
130
+
131
+ Generally, I find it helpful to just think of those queries generating a "stream of people" or a "stream of conditions" and not worry about the table of origin or the fact that they are just IDs.
132
+
133
+ When a ConceptQL statement is executed, it yields a final set of streams that are just all the IDs that passed through all the criteria. What is done with that set of IDs is up to the user who assembled the ConceptQL statement. If a user gathers all 799.22 Conditions, they will end up with a set of condition_occurrence_ids. They could take those IDs and do all sorts of things like:
134
+
135
+ - Gather the first and last date of occurrence per person
136
+ - Count the number of occurrences per person
137
+ - Count number of persons with the condition
138
+ - Count the total number of occurrences for the entire population
139
+
140
+ This kind of aggregation and analysis is beyond the scope of ConceptQL. ConceptQL will get you the IDs of the rows you're interested in, its up to other parts of the calling system to determine what you do with them.
141
+
142
+ ## Criterion Nodes
143
+
144
+ Criterion nodes are the parts of a ConceptQL query that search for specific values within the CDM data, e.g. searching the condition_occurrence table for a diagnosis of an old myocardial infarction (ICD-9 412) is a criterion. Criterion nodes are always leaf nodes.
145
+
146
+ There are _many_ criterion nodes. A list of currently implemented nodes is available in Appendix A.
147
+
148
+ ## All Other Nodes
149
+
150
+ Virtually all other nodes add, remove, filter, or otherwise alter streams of results. They are discussed in this section.
151
+
152
+ ## Set Operation Nodes
153
+ Because streams represent sets of results, its makes sense to include a nodes that operate on sets
154
+
155
+ ### Union
156
+ - Takes any number of child nodes and aggregates their streams
157
+ - Unions together streams with identical types
158
+ - Think of streams with the same type flowing together into a single stream
159
+ - We're really just gathering the union of all IDs for identically-typed streams
160
+ - Streams with the different types flow along together concurrently without interacting
161
+ - It does not make sense to union, say, condition_occurrence_ids with visit_occurrence_ids, so streams with different types won't mingle together, but will continue to flow downstream in parallel
162
+ ```ConceptQL
163
+ # Two streams of the same type (condition_occurrence) joined into a single stream
164
+ {
165
+ union: [
166
+ { icd9: '412' },
167
+ { icd9: '799.22' }
168
+ ]
169
+ }
170
+ ```
171
+ ```ConceptQL
172
+ # Two streams of the same type (condition_occurrence) joined into a single stream, then a different stream (visit_occurrence) flows concurrently
173
+ {
174
+ union: [
175
+ {union: [
176
+ { icd9: '412' },
177
+ { icd9: '799.22' }
178
+ ]},
179
+ { place_of_service: 'Inpatient' }
180
+ ]
181
+ }
182
+ ```
183
+ ```ConceptQL
184
+ # Two streams of the same type (condition_occurrence) joined into a single stream, along with a different stream (visit_occurrence) flows concurrently (same as above example)
185
+ {
186
+ union: [
187
+ { icd9: '412' },
188
+ { icd9: '799.22' },
189
+ { place_of_service: 'Inpatient' }
190
+ ]
191
+ }
192
+ ```
193
+
194
+ ### Intersect
195
+ 1. Group incoming streams by type
196
+ 2. For each group of same-type streams
197
+ 1. Intersect all streams, yielding a single stream that contains only those IDs common to those streams
198
+ 3. A single stream for each incoming type is sent downstream
199
+ 1. If only a single stream of a type is upstream, that stream is essentially unaltered as it is passed downstream
200
+ ```ConceptQL
201
+ # Yields a single stream of all Conditions where MI was Primary Diagnosis. This involves two Condition streams and so results are intersected
202
+ {
203
+ intersect: [
204
+ { icd9: '412' },
205
+ { primary_diagnosis: true }
206
+ ]
207
+ }
208
+ ```
209
+ ```ConceptQL
210
+ # Yields two streams: a stream of all MI Conditions and a stream of all Male patients. This is essentially the same behavior as Union in this case
211
+ {
212
+ intersect: [
213
+ { icd9: '412' },
214
+ { gender: 'Male' }
215
+ ]
216
+ }
217
+ ```
218
+ ```ConceptQL
219
+ # Yields two streams: a stream of all Conditions where MI was Primary Diagnosis and a stream of all White, Male patients.
220
+ {
221
+ intersect: [
222
+ { icd9: '412' },
223
+ { primary_diagnosis: true },
224
+ { gender: 'Male' },
225
+ { race: 'White' }
226
+ ]
227
+ }
228
+ ```
229
+
230
+ ### Complement
231
+ This node will take the complement of each set of IDs in the incoming streams.
232
+ ```ConceptQL
233
+ # All non-MI Conditions
234
+ {
235
+ complement: { icd9: '412' }
236
+ }
237
+ ```
238
+
239
+ If you're familiar with set operations, the complement of a union is the intersect of the complements of the items unioned. So in our world, these next two examples are identical:
240
+ ```ConceptQL
241
+ # All Conditions where the Condition isn't an MI as the Primary Diagnosis
242
+ {
243
+ complement: {
244
+ union: [
245
+ { icd9: '412' },
246
+ { primary_diagnosis: true }
247
+ ]
248
+ }
249
+ }
250
+ ```
251
+ ```ConceptQL
252
+ # All Conditions where the Condition isn't an MI as the Primary Diagnosis (same as above)
253
+ {
254
+ intersect: [
255
+ { complement: { icd9: '412' } },
256
+ { complement: { primary_diagnosis: true } }
257
+ ]
258
+ }
259
+ ```
260
+
261
+ But please be aware that this behavior of complement only affects streams of the same type. If more than one stream is involved, you need to evaluate the effects of complement on a stream-by-stream basis:
262
+ ```ConceptQL
263
+ # Yields two streams: a stream of all Conditions where the conditions isn't an MI and Primary Diagnosis and a stream of all non-office visit Procedures
264
+ {
265
+ complement: {
266
+ union: [
267
+ { icd9: '412' },
268
+ { primary_diagnosis: true },
269
+ { cpt: '99214' }
270
+ ]
271
+ }
272
+ }
273
+ ```
274
+ ```ConceptQL
275
+ # Yields two streams: a stream of all Conditions where the conditions isn't an MI and Primary Diagnosis and a stream of all non-office visit Procedures (same as above)
276
+ {
277
+ intersect: [
278
+ { complement: { icd9: '412' } },
279
+ { complement: { primary_diagnosis: true } },
280
+ { complement: { cpt: '99214' } }
281
+ ]
282
+ }
283
+ ```
284
+ ```ConceptQL
285
+ # Yields two streams: a stream of all Conditions where the conditions isn't an MI and Primary Diagnosis and a stream of all non-office visit Procedures (same as above)
286
+ {
287
+ union: [
288
+ {
289
+ intersect: [
290
+ { complement: { icd9: '412' } },
291
+ { complement: { primary_diagnosis: true } }
292
+ ]
293
+ },
294
+ { complement: { cpt: '99214' } }
295
+ ]
296
+ }
297
+ ```
298
+
299
+
300
+ ### Except
301
+ This node takes two sets of incoming streams, a left-hand stream and a right-hand stream. The node matches like-type streams between the left-hand and right-hand streams. The node removes any results in the left-hand stream if they appear in the right-hand stream. The node passes only results for the left-hand stream downstream. The node discards all results in the right-hand stream. For example:
302
+ ```ConceptQL
303
+ # All Conditions that are MI unless they are primary diagnoses
304
+ {
305
+ except: {
306
+ left: { icd9: '412' },
307
+ right: { primary_diagnosis: true }
308
+ }
309
+ }
310
+ ```
311
+ ```ConceptQL
312
+ # All Conditions that are MI unless they are primary diagnoses (same as above)
313
+ {
314
+ intersect: [
315
+ { icd9: '412' },
316
+ { complement: { primary_diagnosis: true } }
317
+ ]
318
+ }
319
+ ```
320
+
321
+ If the left-hand stream has no types that match the right-hand stream, the left-hand stream passes through unaffected:
322
+ ```ConceptQL
323
+ # All Conditions that are MI
324
+ {
325
+ except: {
326
+ left: { icd9: '412' },
327
+ right: { cpt: '99214' }
328
+ }
329
+ }
330
+ ```
331
+
332
+ And just to show how multiple streams behave:
333
+ ```ConceptQL
334
+ # Passes three streams downstream: a stream of Conditions that are MI but not primary diagnosis, a stream of People that are Male but not White, and a stream of Procedures that are office visits (this stream is completely unaffected by the right hand stream)
335
+ {
336
+ except: {
337
+ left: {
338
+ union: [
339
+ { icd9: '412' },
340
+ { gender: 'Male' },
341
+ { cpt: '99214' }
342
+ ]
343
+ },
344
+ right: {
345
+ union: [
346
+ { primary_diagnosis: true },
347
+ { race: 'White' },
348
+ ]
349
+ }
350
+ }
351
+ }
352
+ ```
353
+ ### Discussion about Set Operation Nodes
354
+ #### Union Nodes
355
+ *Q. Why should we allow two different types of streams to continue downstream concurrently?*
356
+
357
+ - This feature lets us do interesting things, like find the first occurrence of either an MI or Death as in the example below
358
+ - Throw in a few more criteria and you could find the first occurrence of all censor events for each patient
359
+
360
+ ```ConceptQL
361
+ # First occurrence of either MI or Death for each patient
362
+ {
363
+ first: {
364
+ union: [
365
+ { icd9: '412' },
366
+ { death: true }
367
+ ]
368
+ }
369
+ }
370
+ ```
371
+
372
+
373
+ Q. Why aren't all streams passed forward unaltered? Why union like-typed streams?
374
+
375
+ - The way Intersect works, if we passed like-typed streams forward without unioning them, Intersect would end up intersecting the two un-unioned like-type streams and that's not what we intended
376
+ - Essentially, these two diagrams would be identical:
377
+
378
+ ```ConceptQL
379
+ # Two streams: a stream of all Conditions matching either 412 or 799.22 and a stream of Procedures matching 99214
380
+ {
381
+ intersect: [
382
+ {
383
+ union: [
384
+ { icd9: '412' },
385
+ { icd9: '799.22' },
386
+ ]
387
+ },
388
+ { cpt: '99214' }
389
+ ]
390
+ }
391
+ ```
392
+
393
+
394
+ ```ConceptQL
395
+ # Two streams: a stream of all Conditions matching either 412 AND 799.22 (an empty stream, a condition cannot be both 412 and 799.22 at the same time) and a stream of Procedures matching 99214
396
+ {
397
+ intersect: [
398
+ {
399
+ intersect: [
400
+ { icd9: '412' },
401
+ { icd9: '799.22' },
402
+ ]
403
+ },
404
+ { cpt: '99214' }
405
+ ]
406
+ }
407
+ ```
408
+
409
+ ## Time-oriented Nodes
410
+ All results in a stream carry a start_date and end_date with them. All temporal comparisons of streams use these two date columns. Each result in a stream derives its start and end date from its corresponding row in its table of origin.
411
+
412
+ For instance, a visit_occurrence result derives its start_date from visit_start_date and its end_date from visit_end_date.
413
+
414
+ If a result comes from a table that only has a single date value, the result derives both its start_date and end_date from that single date, e.g. an observation result derives both its start_date and end_date from its corresponding row's observation_date.
415
+
416
+ The person stream is a special case. Person results use the person's date of birth as the start_date and end_date. This may sound strange, but we will explain below why this makes sense.
417
+
418
+
419
+ ### Relative Temporal Nodes
420
+ When looking at a set of results for a person, perhaps we want to select just the chronologically first or last result. Or maybe we want to select the 2nd result or 2nd to last result. Relative temporal nodes provide this type of filtering. Relative temporal nodes use a result's start_date to do chronological ordering.
421
+
422
+ #### occurrence
423
+ - Takes a two arguments: the stream to select from and an integer argument
424
+ - For the integer argument
425
+ - Positive numbers mean 1st, 2nd, 3rd occurrence in chronological order
426
+ - e.g. 1 => first
427
+ - e.g. 4 => fourth
428
+ - Negative numbers mean 1st, 2nd, 3rd occurrence in reverse chronological order
429
+ - e.g. -1 => last
430
+ - e.g. -4 => fourth from last
431
+ - 0 is undefined?
432
+
433
+ ```ConceptQL
434
+ # For each patient, select the Condition that represents the third occurrence of an MI
435
+ {
436
+ occurrence: [
437
+ { icd9: '412' },
438
+ 3
439
+ ]
440
+ }
441
+ ```
442
+
443
+
444
+ #### first
445
+ - Node that is shorthand for writing "occurrence: 1"
446
+
447
+ ```ConceptQL
448
+ # For each patient, select the Condition that represents the first occurrence of an MI
449
+ {
450
+ first: { icd9: '412' }
451
+ }
452
+ ```
453
+
454
+
455
+ #### last
456
+ - Node that is just shorthand for writing "occurrence: -1"
457
+
458
+ ```ConceptQL
459
+ # For each patient, select the Condition that represents the last occurrence of an MI
460
+ {
461
+ last: { icd9: '412' }
462
+ }
463
+ ```
464
+
465
+
466
+ ### Date Literals
467
+ For situations where we need to represent pre-defined date ranges, we can use "date literal" nodes.
468
+
469
+ #### date_range
470
+ - Takes a hash with two elements: { start: \<date-format\>, end: \<date-format\> }
471
+ - Creates an inclusive, continuous range of dates defined by a start and end date
472
+
473
+
474
+ #### day
475
+ - Takes a single argument: \<date-format\>
476
+ - Represents a single day
477
+ - Shorthand for creating a date range that starts and ends on the same date
478
+ - *Not yet implemented*
479
+
480
+
481
+ #### What is \<date-format\>?
482
+ Dates follow these formats:
483
+
484
+ - "YYYY-MM-DD"
485
+ - Four-digit year, two-digit month with leading 0s, two-digit day with leading 0s
486
+ - "START"
487
+ - Represents the first date of information available from the data source
488
+ - "END"
489
+ - Represents the last date of information available from the data source.
490
+
491
+
492
+ ### Temporal Comparison Nodes
493
+ As described above, each result carries a start and end date, defining its own date range. It is through these date ranges that we are able to do temporal filtering of streams via temporal nodes.
494
+
495
+ Temporal nodes work by comparing a left-hand stream (L) against a right-hand stream (R). R can be either a set of streams or a pre-defined date range. Each temporal node has a comparison operator which defines how it compares dates between L and R. A temporal node passes results only from L downstream. A temporal node discards all results in the R stream after it makes all comparisons.
496
+
497
+ The available set of temporal nodes comes from the work of Allen's Interval Algebra[^AIA]. Interval Algebra defines 13 distinct temporal relationships, as shown in this handy chart [borrowed from this website](http://people.kmi.open.ac.uk/carlos/174): ![](http://people.kmi.open.ac.uk/carlos/wp-content/uploads/2011/02/Allens-Algebra.png)
498
+
499
+ Our implementation of this algebra is originally going to be as strict as listed here, meaning that:
500
+
501
+ - Before/After
502
+ - There must be a minimum 1-day gap between date ranges
503
+ - Meets/Met-by
504
+ - Only if the first date range starts/ends a day before the next date range ends/starts
505
+ - Started-by/Starts
506
+ - The start dates of the two ranges must be equal and the end dates must not be
507
+ - Finished-by/Finishes
508
+ - The end dates of the two ranges must be equal and the start dates must not be
509
+ - Contains/During
510
+ - The start/end dates of the two ranges must be different from each other
511
+ - Overlaps/Overlapped-by
512
+ - The start date of one range and the end date of the other range must be outside the overlapping range
513
+ - Temporally coincides
514
+ - Start dates must be equal, end dates must be equal
515
+
516
+ Ryan's Sidebar on These Definitions:
517
+ > These strict definitions may not be particularly handy or even intuitive. It seems like contains, starts, finishes, and coincides are all examples of overlapping ranges. Starts/finishes seem to be examples of one range containing another. Meets/met-by seem to be special cases of before/after. But these definitions, if used in their strict sense, are all mutually exclusive.
518
+
519
+ > We may want to adopt a less strict set of definitions, though their meaning may not be as easily defined as the one provided by Allen's Interval Algebra
520
+
521
+ When comparing results in L against a date range, results in L continue downstream only if they pass the comparison.
522
+ ```ConceptQL
523
+ # All MIs for the year 2010
524
+ {
525
+ during: {
526
+ left: { icd9: '412' },
527
+ right: {
528
+ date_range: {
529
+ start: '2010-01-01',
530
+ end: '2010-12-31'
531
+ }
532
+ }
533
+ }
534
+ }
535
+ ```
536
+
537
+ When comparing results in L against a set of results in R, the temporal node compares results in stream L against results in stream R on a person-by-person basis.
538
+
539
+ - If a person has results in L or R stream, but not in both, none of their results continue downstream
540
+ - On a per person basis, the temporal node joins all results in the L stream to all results in the R stream
541
+ - Any results in the L stream that meet the temporal comparison against any results in the R stream continue downstream
542
+ ```ConceptQL
543
+ # All MIs While Patients had Part A Medicare
544
+ {
545
+ during: {
546
+ left: { icd9: '412' },
547
+ right: { payer: 'Part A' }
548
+ }
549
+ }
550
+ ```
551
+
552
+ #### Edge behaviors
553
+ For 11 of the 13 temporal nodes, comparison of results is straight-forward. However, the before/after nodes have a slight twist.
554
+
555
+ Imagine events 1-1-2-1-2-1. In my mind, three 1's come before a 2 and two 1's come after a 2. Accordingly:
556
+
557
+ - When comparing L **before** R, the temporal node compares L against the **LAST** occurrence of R per person
558
+ - When comparing L **after** R, the temporal node compares L against the **FIRST** occurrence of R per person
559
+
560
+ If we're looking for events in L that occur before events in R, then any event in L that occurs before the last event in R technically meet the comparison of "before". The reverse is true for after: all events in L that occur after the first event in R technically occur after R.
561
+
562
+
563
+ ```ConceptQL
564
+ # All MIs that occurred before a patient's __last__ case of irritability (799.22)
565
+ {
566
+ before: {
567
+ left: { icd9: '412' },
568
+ right: { icd9: '799.22' }
569
+ }
570
+ }
571
+ ```
572
+
573
+ If this is not the behavior you desire, use one of the sequence nodes to select which event in R should be the one used to do comparison
574
+ ```ConceptQL
575
+ # All MIs that occurred before a patient's __first__ case of irritability (799.22)
576
+ {
577
+ before: {
578
+ left: { icd9: '412' },
579
+ right: {
580
+ first: {
581
+ icd9: '799.22'
582
+ }
583
+ }
584
+ }
585
+ }
586
+ ```
587
+
588
+ ### Time Windows
589
+ There are situations when the date columns associated with a result should have their values shifted forward or backward in time to make a comparison with another set of dates.
590
+
591
+ #### time_window
592
+ - Takes 2 arguments
593
+ - First argument is the stream on which to operate
594
+ - Second argument is a hash with two keys: [:start, :end] each with a value in the following format: "(-?\d+[dmy])+"
595
+ - Both start and end must be defined, even if you are only adjusting one of the dates
596
+ - Some examples
597
+ - 30d => 30 days
598
+ - 20 => 20 days
599
+ - d => 1 day
600
+ - 1y => 1 year
601
+ - -1m => -1 month
602
+ - 10d3m => 3 months and 10 days
603
+ - -2y10m-3d => -2 years, +10 months, -3 days
604
+ - The start or end value can also be '', '0', or nil
605
+ - This will leave the date unaffected
606
+ - The start or end value can also be the string 'start' or 'end'
607
+ - 'start' represents the start_date for each result
608
+ - 'end' represents the end_date for each result
609
+ - See the example below
610
+
611
+ ```ConceptQL
612
+ # All Diagnoses of Irritability (ICD-9 799.22) within 30 days of an MI
613
+ {
614
+ during: {
615
+ left: { icd9: '799.22' },
616
+ right: {
617
+ time_window: [
618
+ { icd9: '412' },
619
+ { start: '-30d', end: '30d' }
620
+ ]
621
+ }
622
+ }
623
+ }
624
+ ```
625
+
626
+ ```ConceptQL
627
+ # Shift the window for all MIs back by 2 years
628
+ {
629
+ time_window: [
630
+ { icd9: '412' },
631
+ { start: '-2y', end: '-2y' }
632
+ ]
633
+ }
634
+ ```
635
+
636
+ ```ConceptQL
637
+ # Expand the dates for all MIs to a window ranging from 2 months and 2 days prior to 1 year and 3 days after the MI
638
+ {
639
+ time_window: [
640
+ { icd9: '412' },
641
+ { start: '-2m-2d', end: '3d1y' }
642
+ ]
643
+ }
644
+ ```
645
+
646
+ ```ConceptQL
647
+ # Collapse all hospital visits' date ranges down to just the date of admission by leaving start_date unaffected and setting end_date to start_date
648
+ {
649
+ time_window: [
650
+ { place_of_service: 'inpatient' },
651
+ { start: '', end: 'start' }
652
+ ]
653
+ }
654
+ ```
655
+
656
+ ```ConceptQL
657
+ # Nonsensical, but allowed: swap the start_date and end_date for a range
658
+ {
659
+ time_window: [
660
+ { icd9: '412' },
661
+ { start: 'end', end: 'start' }
662
+ ]
663
+ }
664
+ ```
665
+
666
+ #### Temporal Nodes and Person Streams
667
+ Person streams carry a patient's date of birth in their date columns. This makes them almost useless when they are part of the L stream of a temporal node. But person streams are useful as the R stream. By ```time_window```ing the patient's date of birth, we can filter based on the patient's age like so:
668
+ ```ConceptQL
669
+ # All MIs that occurred after a male patient's 50th birthday
670
+ {
671
+ after: {
672
+ left: { icd9: '412' },
673
+ right: {
674
+ time_window: [
675
+ { gender: 'Male' },
676
+ {
677
+ start: '50y',
678
+ end: '50y'
679
+ }
680
+ ]
681
+ }
682
+ }
683
+ }
684
+ ```
685
+
686
+ ## Type Conversion
687
+ There are situations where it is appropriate to convert the type of a stream of results into a different type. In programmer parlance, we say "typecasting" or "casting", which is the terminology we'll use here. A good analogy and mnemonic for casting is to think of taking a piece of metal, say a candle holder, melting it down, and recasting it into, say, a lamp. We'll do something similar with streams. We'll take, for example, a visit_occurrence stream and recast it into a stream of person.
688
+
689
+ ### Casting to person
690
+ - Useful if we're just checking for the presence of a condition for a person
691
+ - E.g. We want to know *if* a person has an old MI, not when an MI or how many MIs occurred
692
+ ```ConceptQL
693
+ # All People Who Had an MI
694
+ {
695
+ person: {
696
+ icd9: '412'
697
+ }
698
+ }
699
+ ```
700
+ ### Casting to a visit_occurrence
701
+ - It is common to look for a set of conditions that coincide with a set of procedures
702
+ - Gathering conditions yields a condition stream, gathering procedures yields a procedure stream
703
+ - It is not possible to compare those two streams directly using AND
704
+ - It is possible to compare the streams temporally, but CDM provides a visit_occurrence table to explicitly tie a set of conditions to a set of procedures
705
+ - Casting both streams to visit_occurrence streams allows us to gather all visit_occurrences for which a set of conditions/procedures occurred in the same visit
706
+ ```ConceptQL
707
+ # All Visits Where a Patient Had an MI During and Office Visit
708
+ {
709
+ intersect: [
710
+ {
711
+ visit_occurrence: {
712
+ icd9: '412'
713
+ }
714
+ },
715
+ {
716
+ visit_occurrence: {
717
+ cpt: '99214'
718
+ }
719
+ }
720
+ ]
721
+ }
722
+ ```
723
+
724
+ Many tables have a foreign key (FK) reference to the visit_occurrence table. If we cast a result to a visit_occurrence, and its table of origin has a visit_occurrence_id FK column, the result becomes a visit_occurrence result corresponding to the row pointed to by visit_occurrence_id. If the row's visit_occurrence_id is NULL, the result is discarded from the stream.
725
+
726
+ If the result's table of origin has no visit_occurrence_id column, we will instead replace the result with ALL visit_occurrences for the person assigned to the result. This allows us to convert between a person stream and visit_occurrence stream and back. E.g. we can get all male patients, then ask for their visit_occurrences later downstream.
727
+ ```ConceptQL
728
+ # All Visits for All Male Patients
729
+ {
730
+ visit_occurrence: {
731
+ gender: 'Male'
732
+ }
733
+ }
734
+ ```
735
+
736
+ ### Casting Loses All Original Information
737
+ After a result undergoes casting, it loses its original information. E.g. casting a visit_occurrence to a person loses the visit_occurrence information and resets the start_date and end_date columns to the person's date of birth. As a side note, this is actually handy if a stream’s dates have been altered by a time_window node and you want the original dates later on. Just cast the stream to its same type and it will regain its original dates.
738
+
739
+
740
+ ### Cast all the Things!
741
+ Although casting to visit_occurrence and person are the most common types of casting, we can cast to and from any of the types in the ConceptQL system.
742
+
743
+ The general rule will be that if the source type has a defined relationship with the target type, we'll cast using that relationship, e.g. casting visit_occurrences to procedures will turn all visit_occurrence results into the set of procedure results that point at those original visit_occurrences. But if there is no direct relationship, we'll do a generous casting, e.g. casting observations to procedures will return all procedures for all persons in the observation stream.
744
+
745
+ INSERT HANDY TABLE SHOWING CONVERSION MATRIX HERE
746
+
747
+ ```ConceptQL
748
+ # Cost of 70012 while Hospitalized for MI
749
+ {
750
+ procedure_cost: {
751
+ intersect: [
752
+ { cpt: '70012' },
753
+ procedure: {
754
+ intersect: [
755
+ { place_of_service: 'inpatient' },
756
+ visit_occurrence: {
757
+ icd9: '412'
758
+ }
759
+ ]
760
+ }
761
+ ]
762
+ }
763
+ }
764
+ ```
765
+
766
+ ### Casting as a way to fetch all rows
767
+ The casting node doubles as a way to fetch all rows for a single type. Provide the casting node with an argument of ```true``` (instead of an upstream node) to get all rows as results:
768
+ ```ConceptQL
769
+ # All death results in the database
770
+ { death: true }
771
+ ```
772
+
773
+ This comes in handy for situations like these:
774
+ ```ConceptQL
775
+ # All Male patients who died
776
+ {
777
+ person_filter: {
778
+ left: { gender: 'Male' },
779
+ right: { death: true },
780
+ }
781
+ }
782
+ ```
783
+
784
+
785
+ ## Filtering by People
786
+ Often we want to filter out a set of results by people. For instance, say we wanted to find all MIs for all males. We'd use the person_filter node for that. Like the Except node, it takes a left-hand stream and a right-hand stream.
787
+
788
+ Unlike the ```except``` node, the person_filter node will use all types of all streams in the right-hand side to filter out results in all types of all streams on the left hand side.
789
+
790
+
791
+ ```ConceptQL
792
+ # All MI Conditions for people who are male
793
+ {
794
+ person_filter: {
795
+ left: { icd9: '412' },
796
+ right: { gender: 'Male'}
797
+ }
798
+ }
799
+ ```
800
+
801
+ But we can get crazier. The right-hand side doesn't have to be a person stream. If a non-person stream is used in the right-hand side, the person_filter will cast all right-hand streams to person first and use the union of those streams:
802
+ ```ConceptQL
803
+ # All MI Conditions for people who had an office visit at some point in the data
804
+ {
805
+ person_filter: {
806
+ left: { icd9: '412' },
807
+ right: { cpt: '99214' }
808
+ }
809
+ }
810
+ ```
811
+ ```ConceptQL
812
+ # All MI Conditions for people who had an office visit at some point in the data (an explicit representation of what's happening in the diagram above)
813
+ {
814
+ person_filter: {
815
+ left: { icd9: '412' },
816
+ right: { person: { cpt: '99214' } }
817
+ }
818
+ }
819
+ ```
820
+ ```ConceptQL
821
+ # All MI Conditions for people who are Male OR had an office visit at some point in the data
822
+ {
823
+ person_filter: {
824
+ left: { icd9: '412' },
825
+ right: {
826
+ union: [
827
+ { cpt: '99214' },
828
+ { gender: 'Male' }
829
+ ]
830
+ }
831
+ }
832
+ }
833
+ ```
834
+
835
+ And don't forget the left-hand side can have multiple types of streams:
836
+ ```ConceptQL
837
+ # Yields two streams: a stream of all MI Conditions for people who are Male and a stream of all office visit Procedures for people who are Male
838
+ {
839
+ person_filter: {
840
+ left: {
841
+ union: [
842
+ { icd9: '412' },
843
+ { cpt: '99214' }
844
+ ]
845
+ },
846
+ right: { gender: 'Male' }
847
+ }
848
+ }
849
+ ```
850
+
851
+ ## Concepts within Concepts
852
+ One of the main motivations behind keeping ConceptQL so flexible is to allow users to build ConceptQL statements from other ConceptQL statements. This section loosely describes how this feature will work. Its actual execution and implementation will differ from what is presented here.
853
+
854
+ Say a ConceptQL statement gathers all visit_occurrences where a patient had an MI and a Hospital encounter (CPT 99231):
855
+
856
+ ```ConceptQL
857
+ # All Visits where a Patient had both an MI and a Hospital Encounter
858
+ {
859
+ intersect: [
860
+ { visit_occurrence: { icd9: '412' } },
861
+ { visit_occurrence: { cpt: '99231' } }
862
+ ]
863
+ }
864
+ ```
865
+
866
+ If we wanted to gather all costs for all procedures for those visits, we could use the "concept" node to represent the concept defined above in a new concept:
867
+ ```ConceptQL
868
+ # All Procedure Costs for All Visits as defined above
869
+ {
870
+ procedure_cost: {
871
+ concept: "\nAll Visits\nwhere a Patient had\nboth an MI and\na Hospital Encounter"
872
+ }
873
+ }
874
+ ```
875
+ The color and edge coming from the concept node are black to denote that we don't know what types or streams are coming from the concept. In reality, any program that uses ConceptQL can ask the concept represented by the concept node for the concept's types. The result of nesting one concept within another is exactly the same had we taken concept node and replaced it with the ConceptQL statement for the concept it represents.
876
+
877
+ ```ConceptQL
878
+ # Procedure Costs for All Visits where a Patient had both an MI and a Hospital Encounter (same as above)
879
+ {
880
+ procedure_cost: {
881
+ intersect: [
882
+ { visit_occurrence: { icd9: '412' } },
883
+ { visit_occurrence: { cpt: '99231' } }
884
+ ]
885
+ }
886
+ }
887
+ ```
888
+
889
+ In the actual implementation of the concept node, each ConceptQL statement will have a unique identifier which the concept node will use. So, assuming that the ID 2031 represents the concept we want to gather all procedure costs for, our example should really read:
890
+
891
+ ```ConceptQL
892
+ {
893
+ procedure_cost: { concept: 2031 }
894
+ }
895
+ ```
896
+
897
+ # Appendix A - Criterion Nodes
898
+
899
+ | Node Name | Stream Type | Arguments | Returns |
900
+ | ---- | ---- | --------- | ------- |
901
+ | cpt | procedure_occurrence | 1 or more CPT codes | All results whose source_value match any of the CPT codes |
902
+ | icd9 | condition_occurrence | 1 or more ICD-9 codes | All results whose source_value match any of the ICD-9 codes |
903
+ | icd9_procedure | procedure_occurrence | 1 or more ICD-9 procedure codes | All results whose source_value match any of the ICD-9 procedure codes |
904
+ | icd10 | condition_occurrence | 1 or more ICD-10 | All results whose source_value match any of the ICD-10 codes |
905
+ | hcpcs | procedure_occurrence | 1 or more HCPCS codes | All results whose source_value match any of the HCPCS codes |
906
+ | gender | person | 1 or more gender concept_ids | All results whose gender_concept_id match any of the concept_ids|
907
+ | loinc | observation | 1 or more LOINC codes | All results whose source_value match any of the LOINC codes |
908
+ | place_of_service_code | visit_occurrence | 1 or more place of service codes | All results whose place of service matches any of the codes|
909
+ | race | person | 1 or more race concept_ids | All results whose race_concept_id match any of the concept_ids|
910
+ | rxnorm | drug_exposure | 1 or more RxNorm IDs | All results whose drug_concept_id match any of the RxNorm IDs|
911
+ | snomed | condition_occurrence | 1 or more SNOMED codes | All results whose source_value match any of the SNOMED codes |
912
+
913
+
914
+ # Appendix B - Concept Showcase
915
+ Here I take some concepts from [OMOP's Health Outcomes of Interest](http://omop.org/HOI) and turn them into ConceptQL statements to give more examples. I truncated some of the sets of codes to help ensure the diagrams didn't get too large.
916
+
917
+ ### Acute Kidney Injury - Narrow Definition and diagnositc procedure
918
+
919
+ - ICD-9 of 584
920
+ - AND
921
+ - ICD-9 procedure codes of 39.95 or 54.98 within 60 days after diagnosis
922
+ - AND NOT
923
+ - A diagnostic code of chronic dialysis any time before initial diagnosis
924
+ - V45.1, V56.0, V56.31, V56.32, V56.8
925
+ ```ConceptQL
926
+ {
927
+ during: {
928
+ left: {
929
+ except: {
930
+ left: { icd9: '584' },
931
+ right: {
932
+ after: {
933
+ left: { icd9: '584' },
934
+ right: { icd9: [ 'V45.1', 'V56.0', 'V56.31', 'V56.32', 'V56.8' ] }
935
+ }
936
+ }
937
+ }
938
+ },
939
+ right: {
940
+ time_window: [
941
+ { icd9_procedure: [ '39.95', '54.98' ] },
942
+ { start: '0', end: '60d' }
943
+ ]
944
+ }
945
+ }
946
+ }
947
+ ```
948
+
949
+ ### Mortality after Myocardial Infarction #3
950
+ - Person Died
951
+ - And Occurrence of 410\* prior to death
952
+ - And either
953
+ - MI diagnosis within 30 days prior to 410
954
+ - MI therapy within 60 days after 410
955
+ ```ConceptQL
956
+ {
957
+ during: {
958
+ left: {
959
+ before: {
960
+ left: { icd9: '410*' },
961
+ right: { death: true }
962
+ }
963
+ },
964
+ right: {
965
+ union: [
966
+ {
967
+ time_window: [
968
+ {
969
+ union: [
970
+ { cpt: [ '0146T', '75898', '82554', '92980', '93010', '93233', '93508', '93540', '93545' ] },
971
+ { icd9_procedure: [ '00.24', '36.02', '89.53', '89.57', '89.69' ] },
972
+ { loinc: [ '10839-9', '13969-1', '18843-3', '2154-3', '33204-9', '48425-3', '49259-5', '6597-9', '8634-8' ] }
973
+ ]
974
+ },
975
+ { start: '-30d', end: '0' }
976
+ ]
977
+ },
978
+ {
979
+ time_window: [
980
+ {
981
+ union: [
982
+ { cpt: [ '0146T', '75898', '82554', '92980', '93010', '93233'] },
983
+ { icd9_procedure: [ '00.24', '36.02', '89.53', '89.57', '89.69' ] }
984
+ ]
985
+ },
986
+ { start: '', end: '60d' }
987
+ ]
988
+ }
989
+ ]
990
+ }
991
+ }
992
+ }
993
+ ```
994
+
995
+ ### GI Ulcer Hospitalization 2 (5000001002)
996
+ - Occurrence of GI Ulcer diagnostic code
997
+ - Hospitalization at time of diagnostic code
998
+ - At least one diagnostic procedure during same hospitalization
999
+ ```ConceptQL
1000
+ # We use the fact that conditions, observations, and procedures all can be tied to a visit_occurrence to find situations where the appropriate conditions, diagnostic procedures, and place of service all occur in the same visit_occurrence
1001
+ {
1002
+ union: [
1003
+ { place_of_service: [ 'Inpatient' ]},
1004
+ { visit_occurrence: { icd9: '410' } },
1005
+ {
1006
+ visit_occurrence: {
1007
+ union: [
1008
+ { cpt: [ '0008T', '3142F', '43205', '43236', '76975', '91110', '91111' ] },
1009
+ { hcpcs: [ 'B4081', 'B4082' ] },
1010
+ { icd9_procedure: [ '42.22', '42.23', '44.13', '45.13', '52.21', '97.01' ] },
1011
+ { loinc: [ '16125-7', '17780-8', '40820-3', '50320-1', '5177-1', '7901-2' ] }
1012
+ ]
1013
+ }
1014
+ }
1015
+ ]
1016
+ }
1017
+ ```
1018
+
1019
+ # Appendix C - Under Development
1020
+ ConceptQL is not yet fully specified. These are modifications/enhancements that are under consideration. These ideas are most likely not completely refined and might actually represent changes that would fundamentally break ConceptQL.
1021
+
1022
+ ### Todo List
1023
+ 1. Handle costs
1024
+ - How do we aggregate?
1025
+ 2. How do we count?
1026
+ 3. How do we handle missing values in streams?
1027
+ - For instance, missing DoB on patient?
1028
+ 4. What does it mean to pass a date range as an L stream?
1029
+ - I'm thinking we pass through no results
1030
+ - Turns out that, as implemented, a date_range is really a person_stream where the start and end dates represent the range (instead of the date of birth) so we're probably OK
1031
+ 5. How do we want to look up standard vocab concepts?
1032
+ - I think Marc’s approach is a bit heavy-handed
1033
+
1034
+
1035
+ ### Slots and Variables
1036
+ Some statements maybe very useful and it would be handy to reuse the bulk of the statement, but perhaps vary just a few things about it. ConceptQL supports the idea of using variables to represent sub-expressions. The variable node is used as a place holder to say "some criteria set belongs here". That variable can be defined in another part of the criteria set and will be used in all places the variable node appears.
1037
+
1038
+ If a variable node is used, but not defined, the concept is still valid, but will fail to run until a definition for all missing variables is provided.
1039
+
1040
+ I don't have a good feel for:
1041
+
1042
+ - How to represent a variable node in a diagram
1043
+ - Whether we should have users name the variables, or auto-assign a name?
1044
+ - We risk name collisions if a concept includes a sub-concept with the same variable name
1045
+ - Probably need to name space all variables
1046
+ - How to prompt users to enter values for variables in a concept
1047
+ - If we have name-spaced variables and sub-concepts needing values, how do we show this in a coherent manner to a user?
1048
+ - We'll need to do a pass through a concept to find all variables and prompt a user, then do another pass through the concept before attempting to execute it to ensure all variables have values
1049
+ - Do we throw an exception if not?
1050
+ - Do we require calling programs to invoke a check on the concept before generating the query?
1051
+
1052
+ ### Value Nodes
1053
+ So far, we can’t recreate the Charlson comorbidity index using ConceptQL. If we added a “value” node, we could.
1054
+
1055
+ By default each result row will carry a value column, set to 1. Some examples:
1056
+ ```ConceptQL
1057
+ # All MIs, defaulting value to 1
1058
+ { icd9: '412' }
1059
+ ```
1060
+
1061
+ Passing streams through a value node changes the number stored in the value column:
1062
+
1063
+ ```ConceptQL
1064
+ # All MIs, changing value to 2
1065
+ {
1066
+ value: [
1067
+ { icd9: '412' },
1068
+ 2
1069
+ ]
1070
+ }
1071
+ ```
1072
+
1073
+ Value can also take a column name instead of a number. It will derive the results row's value from the value stored in the column specified.
1074
+ ```ConceptQL
1075
+ # All copays for 99214s
1076
+ {
1077
+ value: [
1078
+ { procedure_cost: { cpt: '99214' } },
1079
+ :paid_copay
1080
+ ]
1081
+ }
1082
+ ```
1083
+
1084
+ If something nonsensical happens, like the column specified isn't present in the table pointed to by a result row, the value in the result row will be unaffected:
1085
+ ```ConceptQL
1086
+ # Still all MIs with value defaulted to 1. condition_occurrence table doesn't have a "paid_copay" column
1087
+ {
1088
+ value: [
1089
+ { icd9: '412' },
1090
+ :paid_copay
1091
+ ]
1092
+ }
1093
+ ```
1094
+
1095
+ Or if the column specified exists, but refers to a non-numerical column, we'll set the value to 0
1096
+ ```ConceptQL
1097
+ # All MIs, with value set to 0 since the column specified by value node is a non-numerical column
1098
+ {
1099
+ value: [
1100
+ { icd9: '412' },
1101
+ :stop_reason
1102
+ ]
1103
+ }
1104
+ ```
1105
+
1106
+ With a value node defined, we could introduce a sum node that will sum by patient. This allows us to implement the Charlson comorbidity algorithm:
1107
+ ```ConceptQL
1108
+ {
1109
+ sum: [
1110
+ {
1111
+ union: [
1112
+ {
1113
+ value: [
1114
+ { person: { icd9: '412' } },
1115
+ 1
1116
+ ]
1117
+ },
1118
+ {
1119
+ value: [
1120
+ { person: { icd9: '278.02' } },
1121
+ 2
1122
+ ]
1123
+ }
1124
+ ]
1125
+ }
1126
+ ]
1127
+ }
1128
+ ```
1129
+
1130
+ ### Counting
1131
+ It might be helpful to count the number of occurrences of a result row in a stream. A simple "count" node could group identical rows and store the number of occurrences in the value column.
1132
+
1133
+ I need examples of algorithms that could benefit from this node. I'm concerned that we'll want to roll up occurrences by person most of the time and that would require us to first cast streams to person before passing the person stream to count.
1134
+ ```ConceptQL
1135
+ # Count the number of times each person was irritable
1136
+ {
1137
+ count: { person: { icd9: '799.22' } }
1138
+ }
1139
+ ```
1140
+
1141
+ We could do dumb things like count the number of times a row shows up in a union:
1142
+ ```ConceptQL
1143
+ # All rows with a value of 2 would be rows that were both MI and Primary
1144
+ {
1145
+ count: {
1146
+ union: [
1147
+ { icd9: '412' },
1148
+ { primary_diagnosis: true}
1149
+ ]
1150
+ }
1151
+ }
1152
+ ```
1153
+
1154
+ ### Value Comparison
1155
+ Acts like any other binary node. L and R streams, joined by person. Any L that pass comparison go downstream. R is thrown out. Comparison based on result row's value column.
1156
+
1157
+ - Less than
1158
+ - Less than or equal
1159
+ - Equal
1160
+ - Greater than or equal
1161
+ - Greater than
1162
+ - Not equal
1163
+ - Between
1164
+
1165
+
1166
+ ### value_literal
1167
+ ```ConceptQL
1168
+ # People with more than 1 MI
1169
+ {
1170
+
1171
+ greater_than: {
1172
+ left: { count: { person: { icd9: '412' }}},
1173
+ right: { value_literal: 1 }
1174
+ }
1175
+ }
1176
+ ```
1177
+
1178
+ ### Filter Node
1179
+ Inspired by person_filter, why not just have a "filter" node that filters L by R. Takes L, R, and an "as" option. As option temporarily casts the L and R streams to the type specified by :as and then does person by person comparison, only keeping rows that occur on both sides. Handy for keeping procedures that coincide with conditions without fully casting the streams:
1180
+ ```ConceptQL
1181
+ # All 99214's where person was irritable during a visit
1182
+ {
1183
+ filter: {
1184
+ left: { cpt: '99214' },
1185
+ right: { icd9: '799.22' },
1186
+ as: 'visit_occurrence'
1187
+ }
1188
+ }
1189
+ ```
1190
+
1191
+ person_filter then becomes a special case of general filter:
1192
+ ```ConceptQL
1193
+ # All 99214's where person was irritable at some point in the data
1194
+ {
1195
+ filter: {
1196
+ left: { cpt: '99214' },
1197
+ right: { icd9: '799.22' },
1198
+ as: 'person'
1199
+ }
1200
+ }
1201
+ ```
1202
+
1203
+ Filter node is the opposite of Except. It only includes L if R matches.
1204
+
1205
+ ### AS option for Except
1206
+ Just like Filter has an :as option, add one to Except node. This would simplify some of the algorithms I've developed.
1207
+
1208
+ [^AIA]: J. Allen. Maintaining knowledge about temporal intervals. Communications of the ACM (1983) vol. 26 (11) pp. 832-843