lsst-felis 26.2024.400__py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.


This version of lsst-felis might be problematic. Click here for more details.

@@ -0,0 +1,1064 @@
1
+ Metadata-Version: 2.1
2
+ Name: lsst-felis
3
+ Version: 26.2024.400
4
+ Summary: A vocabulary for describing catalogs and acting on those descriptions
5
+ Author-email: Rubin Observatory Data Management <dm-admin@lists.lsst.org>
6
+ License: GNU General Public License v3 or later (GPLv3+)
7
+ Project-URL: Homepage, https://github.com/lsst/felis
8
+ Keywords: lsst
9
+ Classifier: Intended Audience :: Science/Research
10
+ Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
11
+ Classifier: Operating System :: OS Independent
12
+ Classifier: Programming Language :: Python :: 3
13
+ Classifier: Programming Language :: Python :: 3.11
14
+ Classifier: Programming Language :: Python :: 3.12
15
+ Classifier: Topic :: Scientific/Engineering :: Astronomy
16
+ Requires-Python: >=3.11.0
17
+ Description-Content-Type: text/markdown
18
+ License-File: COPYRIGHT
19
+ License-File: LICENSE
20
+ Requires-Dist: astropy >=4
21
+ Requires-Dist: sqlalchemy >=1.4
22
+ Requires-Dist: click >=7
23
+ Requires-Dist: pyyaml >=6
24
+ Requires-Dist: pyld >=2
25
+ Requires-Dist: pydantic <3,>=2
26
+ Provides-Extra: test
27
+ Requires-Dist: pytest >=3.2 ; extra == 'test'
28
+
29
+ # Felis
30
+
31
+ ## Introduction
32
+
33
+ Felis is a way of describing database catalogs, scientific and
34
+ otherwise, in a language and DBMS agnostic way. It's built on concepts
35
+ from JSON-LD/RDF and CSVW, but intended to provide a comprehensive way
36
+ to describe tabular data, using annotations on tables, columns, and
37
+ schemas, to document scientifically useful metadata as well as
38
+ implementation-specific metadata for database management systems, file
39
+ formats, and application data models.
40
+
41
+ When processing a felis description, we envision SQLAlchemy to be the
42
+ target implementation backend, so descriptions for Tables, Columns,
43
+ Foreign Keys, Constraints, and Indexes should generally map very closely
44
+ to SQLAlchemy parameters for those objects.
45
+
46
+ Liquibase descriptions were also consulted. Liquibase is oriented around
47
+ the concept of a changeset. It should be the case that a felis
48
+ description could be transformed into a Liquibase changeset without too
49
+ much effort.
50
+
51
+ ## JSON-LD
52
+
53
+ JSON-LD is a way of representing data in a linked fashion. It is built
54
+ on the core concepts of [Linked
55
+ Data](https://www.w3.org/DesignIssues/LinkedData.html).
56
+
57
+ The rule we're most interested in for felis is the first rule:
58
+
59
+ > Use URIs as names for things
60
+
61
+ This rule, coupled with technologies in JSON-LD, allow us to identify
62
+ things in a well-defined manner using a syntax that is "very terse and
63
+ human readable". JSON-LD also provides algorithms to translate those
64
+ descriptions into objects that are easier to process by a computer.
65
+
66
+ Due of the emphasis put on linking data, it provides a natural way of
67
+ describing the fundamentally relational objects that make up a database.
68
+
69
+ Felis is influenced by work on CSVW, which uses JSON-LD to describe CSV
70
+ files. CSVW is oriented a bit more towards publishing data to the web,
71
+ and that doesn't quite capture the use case of describing tables,
72
+ especially those which haven't been created yet. Still, for services
73
+ which may return CSV files, a translation to CSVW will be
74
+ straightforward.
75
+
76
+ Some links that might be helpful for understanding
77
+ JSON-LD:
78
+
79
+ <http://arfon.org/json-ld-for-software-discovery-reuse-and-credit/index.html>
80
+ <https://w3c.github.io/json-ld-syntax/#basic-concept>
81
+
82
+ ### IRIs and @context
83
+
84
+ Following from the first rule of Linked Data, JSON-LD uses IRIs
85
+ (Internationalized Resource Identifiers as described in \[RFC3987\]) for
86
+ unambiguous identification. This means the key in every annotation must
87
+ be an IRI.
88
+
89
+ The simplest possible schema, a schema with one table which contains a
90
+ point, represented in JSON, would look like the following:
91
+
92
+ ``` sourceCode json
93
+ {
94
+ "name": "MySchema",
95
+ "tables": [
96
+ {
97
+ "name": "Point",
98
+ "columns": [
99
+ {
100
+ "name": "ra",
101
+ "datatype": "float"
102
+ },
103
+ {
104
+ "name": "dec",
105
+ "datatype": "float"
106
+ }
107
+ ]
108
+ }
109
+ ]
110
+ }
111
+ ```
112
+
113
+ We can infer that this is probably describing a schema, but it's
114
+ possible the definitions are ambiguous. IRIs help with this:
115
+
116
+ ``` sourceCode json
117
+ {
118
+ "http://lsst.org/felis/name": "MySchema",
119
+ "http://lsst.org/felis/tables": [
120
+ {
121
+ "http://lsst.org/felis/name": "Point",
122
+ "http://lsst.org/felis/columns": [
123
+ {
124
+ "http://lsst.org/felis/name": "ra",
125
+ "http://lsst.org/felis/datatype": "float"
126
+ },
127
+ {
128
+ "http://lsst.org/felis/name": "dec",
129
+ "http://lsst.org/felis/datatype": "float"
130
+ }
131
+ ]
132
+ }
133
+ ]
134
+ }
135
+ ```
136
+
137
+ This provides unambiguous definitions to the semantics of each value,
138
+ but it's extremely wordy, compared to the natural JSON form.
139
+
140
+ To help with this, JSON-LD document has a context. Every Felis
141
+ description should as well. `@context` is similar to an XML namespace.
142
+
143
+ Used to define the short-hand names that are used throughout a JSON-LD
144
+ document. These short-hand names are called terms and help developers to
145
+ express specific identifiers in a compact manner.
146
+
147
+ ``` sourceCode json
148
+ {
149
+ "@context": "http://lsst.org/felis/",
150
+ "name": "MySchema",
151
+ "tables": [
152
+ {
153
+ "name": "Point",
154
+ "columns": [
155
+ {
156
+ "name": "ra",
157
+ "datatype": "float"
158
+ },
159
+ {
160
+ "name": "dec",
161
+ "datatype": "float"
162
+ }
163
+ ]
164
+ }
165
+ ]
166
+ }
167
+ ```
168
+
169
+ This is fine, but the base vocabulary of Felis doesn't help much with
170
+ annotating columns with FITS or IVOA terms, for example. So we can add
171
+ to our context more vocabulary terms.
172
+
173
+ ``` sourceCode json
174
+ {
175
+ "@context": {
176
+ "@vocab": "http://lsst.org/felis/",
177
+ "ivoa": "http://ivoa.net/rdf/",
178
+ "fits": "http://fits.gsfc.nasa.gov/FITS/4.0/"
179
+ },
180
+ "name": "MySchema",
181
+ "tables": [
182
+ {
183
+ "name": "Point",
184
+ "columns": [
185
+ {
186
+ "name": "ra",
187
+ "datatype": "float",
188
+ "ivoa:ucd": "pos.eq.ra;meta.main",
189
+ "fits:tunit": "deg"
190
+ },
191
+ {
192
+ "name": "dec",
193
+ "datatype": "float",
194
+ "ivoa:ucd": "pos.eq.dec;meta.main",
195
+ "fits:tunit": "deg"
196
+ }
197
+ ]
198
+ }
199
+ ]
200
+ }
201
+ ```
202
+
203
+ It's also fine to [externally define a context as
204
+ well](https://json-ld.org/spec/latest/json-ld/#interpreting-json-as-json-ld).
205
+ This reduced the boilerplate in a file, and allows the JSON appear even
206
+ simpler.
207
+
208
+ ``` sourceCode json
209
+ {
210
+ "name": "MySchema",
211
+ "tables": [
212
+ {
213
+ "name": "Point",
214
+ "columns": [
215
+ {
216
+ "name": "ra",
217
+ "datatype": "float",
218
+ "ivoa:ucd": "pos.eq.ra;meta.main",
219
+ "fits:tunit": "deg"
220
+ },
221
+ {
222
+ "name": "dec",
223
+ "datatype": "float",
224
+ "ivoa:ucd": "pos.eq.dec;meta.main",
225
+ "fits:tunit": "deg"
226
+ }
227
+ ]
228
+ }
229
+ ]
230
+ }
231
+ ```
232
+
233
+ Currently, vocabularies aren't formally defined for IVOA, FITS, MySQL,
234
+ Oracle, Postgres, SQLite. For now, we won't worry about that too much.
235
+ For most descriptions of tables, we will recommend a default context of
236
+ the following:
237
+
238
+ ``` sourceCode json
239
+ {
240
+ "@context": {
241
+ "@vocab": "http://lsst.org/felis/",
242
+ "mysql": "http://mysql.com/",
243
+ "postgres": "http://posgresql.org/",
244
+ "oracle": "http://oracle.com/database/",
245
+ "sqlite": "http://sqlite.org/",
246
+ "fits": "http://fits.gsfc.nasa.gov/FITS/4.0/"
247
+ "ivoa": "http://ivoa.net/rdf/",
248
+ "votable": "http://ivoa.net/rdf/VOTable/",
249
+ "tap": "http://ivoa.net/documents/TAP/"
250
+ }
251
+ }
252
+ ```
253
+
254
+ ### @id
255
+
256
+ The main way to reference objects within a JSON-LD document is by id.
257
+ The `@id` attribute of any object MUST be unique in that document. `@id`
258
+ is the main way we use to reference objects in a Felis description, such
259
+ as the columns referenced in an index, for example.
260
+
261
+ ### As YAML
262
+
263
+ For describing schemas at rest, we recommend YAML, since we assume it
264
+ will be edited by users.
265
+
266
+ The table in YAML, with an externally defined context, would appear as
267
+ the following:
268
+
269
+ ``` sourceCode yaml
270
+ ---
271
+ name: MySchema
272
+ tables:
273
+ - name: Point
274
+ columns:
275
+ - name: ra
276
+ datatype: float
277
+ ivoa:ucd: pos.eq.ra;meta.main
278
+ fits:tunit: deg
279
+ - name: dec
280
+ datatype: float
281
+ ivoa:ucd: pos.eq.dec;meta.main
282
+ fits:tunit: deg
283
+ ```
284
+
285
+ JSON-LD keywords, those which start with `@` like `@id`, need to be
286
+ quoted in YAML.
287
+
288
+ ## Tabular Data Models
289
+
290
+ This section defines the objects which make up the model.
291
+
292
+ The annotations provide information about the columns, tables, and
293
+ schemas they are defined in. The values of an annotation may be a list,
294
+ object, or atomic values. To maximize portability, it's recommended to
295
+ use atomic values everywhere possible. A list or a structured object,
296
+ for example, may need to be serialized in target formats that only allow
297
+ key-value metadata on column and table objects. This would include
298
+ storage in a database as well.
299
+
300
+ ### Schemas
301
+
302
+ A schema is a group of tables.
303
+
304
+ A schema comprises a group of annotated tables and a set of annotations
305
+ that relate to that group of tables. The core annotations of a schema
306
+ are:
307
+
308
+ - `name` \
309
+ The name of this schema. In implementation terms, this typically
310
+ maps to:
311
+
312
+ > - A schema in a `CREATE SCHEMA` statement in Postgres.
313
+ > - A database in a `CREATE DATABASE` statement in MySQL. There is
314
+ > also a synonym for this statement under `CREATE SCHEMA`.
315
+ > - A user in a `CREATE USER` statement in Oracle
316
+ > - A SQLite file, which might be named according to `[name].db`
317
+
318
+ - `@id` \
319
+ An identifier for this group of tables. This may be used for
320
+ relating schemas together at a higher level. Typically, the name of
321
+ the schema can be used as the id.
322
+
323
+ - `description` \
324
+ A textual description of this schema
325
+
326
+ - `tables` \
327
+ The list of tables in the schema. A schema MUST have one or more tables.
328
+
329
+ - `version` \
330
+ Optional schema version description.
331
+
332
+ Schemas MAY in addition have any number of annotations which provide
333
+ information about the group of tables. Annotations on a group of tables
334
+ may include:
335
+
336
+ - DBMS-specific information for a schema, especially for creating a
337
+ schema.
338
+ - IVOA metadata about the table
339
+ - Column Groupings
340
+ - Links to other schemas which may be related
341
+ - Reference URLs
342
+ - Provenance
343
+
344
+ ### Schema versioning
345
+
346
+ Database schemas usually evolve over time and client software has to depend on
347
+ the knowledge of the schema version and possibly compatibility of different
348
+ schema versions. Felis supports specification of versions and their possible
349
+ relations but does not specify how exactly compatibility checks have to be
350
+ implemented. It is the client responsibility to interpret version numbers and
351
+ to define compatibility rules.
352
+
353
+ In simplest form the schema version can be specified as a value for the
354
+ `version` attribute and it must be a string:
355
+
356
+ version: "4.2.0"
357
+
358
+ This example uses semantic version format, but in general any string or number
359
+ can be specified here.
360
+
361
+ In the extended form version can be specified using nested attributes:
362
+
363
+ - `current` \
364
+ Specifies current version defined by the schema, must be a string.
365
+
366
+ - `compatible` \
367
+ Specifies a list of versions that current schema is fully-compatible with,
368
+ all items must be strings.
369
+
370
+ - `read_compatible` \
371
+ Specifies a list of versions that current schema is read-compatible with,
372
+ all items must be strings.
373
+
374
+ Naturally, compatibility behavior depends on the code that implements reading
375
+ and writing of the data. An example of version specification using the extended
376
+ format:
377
+
378
+ version:
379
+ current: "v42"
380
+ compatible: ["v41", "v40"]
381
+ read_compatible: ["v39", "v38"]
382
+
383
+ ### Tables
384
+
385
+ A Table within a Schema. The core annotations of a table are:
386
+
387
+ - `name` \
388
+ The name of this table. In implementation terms, this typically maps
389
+ to a table name in a `CREATE TABLE` statement in a
390
+ MySQL/Oracle/Postgres/SQLite.
391
+
392
+ - `@id` \
393
+ an identifier for this table
394
+
395
+ - `description` \
396
+ A textual of this table
397
+
398
+ - `columns` \
399
+ the list of columns in the table. A table MUST have one or more
400
+ columns and the order of the columns within the list is significant
401
+ and MUST be preserved by applications.
402
+
403
+ - `primaryKey` \
404
+ A column reference that holds either a single reference to a column
405
+ id or a list of column id references for compound primary keys.
406
+
407
+ - `constraints` \
408
+ the list of constraints for the table. A table MAY have zero or more
409
+ constraints. Usually these are Forein Key constraints.
410
+
411
+ - `indexes` \
412
+ the list of indexes in the schema. A schema MAY have zero or more
413
+ indexes.
414
+
415
+ Tables MAY in addition have any number of annotations which provide
416
+ information about the table. Annotations on a table may include:
417
+
418
+ - DBMS-specific information for a table, such as storage engine.
419
+ - IVOA metadata about the table, such as utype
420
+ - Links to other tables which may be related
421
+ - Provenance
422
+
423
+ ### Columns
424
+
425
+ Represents a column in a table. The core annotations of a column are:
426
+
427
+ - `name` \
428
+ the name of the column.
429
+
430
+ - `@id` \
431
+ an identifier for this column
432
+
433
+ - `description` \
434
+ A textual description of this column
435
+
436
+ - `datatype` \
437
+ the expected datatype for the value of the column. This is the
438
+ canonical datatype, but may often be overridden by additional
439
+ annotations for DBMS or format-specific datatypes.
440
+
441
+ - `value` \
442
+ the default value for a column. This is used in DBMS systems that
443
+ support it, and it may also be used when processing a table.
444
+
445
+ - `length` \
446
+ the length for this column. This is used in types that support it,
447
+ namely `char`, `string`, `unicode`, `text`, and `binary`.
448
+
449
+ - `nullable` \
450
+ if the column is nullable. When set to `false`, this will cause a
451
+ `NOT NULL` to be appended to SQL DDL. false. A missing value is
452
+ assumed to be equivalent to `true`. If the value is set to `false`
453
+ and the column is referenced in the `primaryKey` property of a
454
+ table, then an error should be thrown during the processing of the
455
+ metadata.
456
+
457
+ - `autoincrement` \
458
+ If the column is the primary key or part of a primary key, this may
459
+ be used to specify autoincrement behavior. We derive semantics from
460
+ [SQLAlchemy.](https://docs.sqlalchemy.org/en/rel_1_1/core/metadata.html#sqlalchemy.schema.Column.params.autoincrement)
461
+
462
+ Columns MAY in addition have any number of annotations which provide
463
+ information about the column. Annotations on a table may include:
464
+
465
+ - DBMS-specific information for a table, such as storage engine.
466
+ - IVOA metadata about the table, such as utype
467
+ - Links to other tables which may be related
468
+ - Provenance
469
+
470
+ ### Indexes
471
+
472
+ <div class="warning">
473
+
474
+ <div class="admonition-title">
475
+
476
+ Warning
477
+
478
+ </div>
479
+
480
+ This section is under development
481
+
482
+ </div>
483
+
484
+ An index that is annotated with a table. An index is typically
485
+ associated with one or more columns from a table, but it may consist of
486
+ expressions involving the columns of a table instead.
487
+
488
+ The core annotations of an index are:
489
+
490
+ - `name` \
491
+ The name of this index. This is optional.
492
+
493
+ - `@id` \
494
+ an identifier for this index
495
+
496
+ - `description` \
497
+ A textual description of this index
498
+
499
+ - `columns` \
500
+ A column reference property that holds either a single reference to
501
+ a column description object within this schema, or an list of
502
+ references. *This annotation is mutually exclusive with the
503
+ expressions annotation.*
504
+
505
+ - `expressions` \
506
+ A column reference property that holds either a single column
507
+ expression object, or a list of them. *This annotation is mutually
508
+ exclusive with the columns annotation.*
509
+
510
+ ### Constraints
511
+
512
+ <div class="warning">
513
+
514
+ <div class="admonition-title">
515
+
516
+ Warning
517
+
518
+ </div>
519
+
520
+ This section is under development
521
+
522
+ </div>
523
+
524
+ - `name` \
525
+ The name of this constraint. This is optional.
526
+
527
+ - `@id` \
528
+ an identifier for this constraint
529
+
530
+ - `@type` \
531
+ One of `ForeignKey`, `Unique`, `Check`. *Required.*
532
+
533
+ - `description` \
534
+ A description of this constraint
535
+
536
+ - `columns` \
537
+ A column reference property that holds either a single reference to
538
+ a column description object within this schema, or an list of
539
+ references.
540
+
541
+ - `referencedColumns` \
542
+ A column reference property that holds either a single reference to
543
+ a column description object within this schema, or an list of
544
+ references. Used on *ForeignKey* Constraints.
545
+
546
+ - `expression` \
547
+ A column expression object. Used on *Check* Constraints.
548
+
549
+ - `deferrable` \
550
+ If `true`, emit DEFERRABLE or NOT DEFERRABLE when issuing DDL for
551
+ this constraint.
552
+
553
+ - `initially` \
554
+ If set, emit INITIALLY when issuing DDL for this constraint.
555
+
556
+ ### References
557
+
558
+ <div class="warning">
559
+
560
+ <div class="admonition-title">
561
+
562
+ Warning
563
+
564
+ </div>
565
+
566
+ This section is under development
567
+
568
+ </div>
569
+
570
+ References are annotated objects which hold a reference to a single
571
+ object, usually a Column or a Column Grouping. While a reference to a
572
+ column might normally be just an `@id`, we create a special object so
573
+ that the reference itself may be annotated with additional information.
574
+ This is mostly useful in the case of Column Groupings.
575
+
576
+ In VOTable, this is similar to the `FIELDref` and `PARAMref` objects.
577
+ It's also similar a `GROUP` nested in a `GROUP`, which provides an
578
+ implicit reference where the nested GROUP would have an implicit
579
+ reference to the parent.
580
+
581
+ - `name` \
582
+ The name of this reference
583
+
584
+ - `@id` \
585
+ an identifier for this reference
586
+
587
+ - `description` \
588
+ A description of the reference
589
+
590
+ - `reference` \
591
+ The id of the object being referenced
592
+
593
+ ### Column Groupings
594
+
595
+ <div class="warning">
596
+
597
+ <div class="admonition-title">
598
+
599
+ Warning
600
+
601
+ </div>
602
+
603
+ This section is incomplete
604
+
605
+ </div>
606
+
607
+ Groupings are annotated objects that contain one or more references to
608
+ other objects.
609
+
610
+ - `name` \
611
+ The name of this table. In implementation terms, this typically maps
612
+ to a table name in a `CREATE TABLE` statement in a
613
+ MySQL/Oracle/Postgres/SQLite.
614
+
615
+ - `@id` \
616
+ an identifier for this grouping, so that it may be referenced.
617
+
618
+ - `description` \
619
+ A description of the grouping
620
+
621
+ - `reference` \
622
+ A reference to another column grouping, if applicable.
623
+
624
+ - `columnReferences` \
625
+ A list of column references in the table. A Column Grouping MUST
626
+ have one or more column
627
+ references.
628
+
629
+ ## Datatypes
630
+
631
+ | Type | C++ | Python | Java | JDBC | SQLAlchemy\[1\] | Notes |
632
+ | ------- | ------ | ------ | -------- | -------- | ------------------- | ----------- |
633
+ | boolean | bool | bool | boolean | BOOLEAN | BOOLEAN | |
634
+ | byte | int8 | int | byte | TINYINT | SMALLINT | [2](#note2) |
635
+ | short | int16 | int | short | SMALLINT | SMALLINT | |
636
+ | int | int32 | int | int | INTEGER | INTEGER | |
637
+ | long | int64 | int | long | BIGINT | BIGINT | |
638
+ | float | float | float | float | FLOAT | FLOAT | |
639
+ | double | double | float | double | DOUBLE | FLOAT(precision=53) | |
640
+ | char | string | str | String | CHAR | CHAR | [3](#note3) |
641
+ | string | string | str | String | VARCHAR | VARCHAR | [3](#note3) |
642
+ | unicode | string | str | String | NVARCHAR | NVARCHAR | [3](#note3) |
643
+ | text | string | str | String | CLOB | CLOB | |
644
+ | binary | string | bytes | byte\[\] | BLOB | BLOB | |
645
+
646
+ | Type | MySQL | SQLite | Oracle | Postgres | Avro | Parquet | Notes |
647
+ | ------- | -------- | -------- | ------------- | ---------------- | ------- | ----------- | ----------- |
648
+ | boolean | BIT(1) | BOOLEAN | NUMBER(1) | BOOLEAN | boolean | BOOLEAN | [5](#note5) |
649
+ | byte | TINYINT | TINYINT | NUMBER(3) | SMALLINT | int | INT\_8 | |
650
+ | short | SMALLINT | SMALLINT | NUMBER(5) | SMALLINT | int | INT\_16 | |
651
+ | int | INT | INTEGER | INTEGER | INT | int | INT\_32 | |
652
+ | long | BIGINT | BIGINT | NUMBER(38, 0) | BIGINT | long | INT\_64 | |
653
+ | float | FLOAT | FLOAT | FLOAT | FLOAT | float | FLOAT | |
654
+ | double | DOUBLE | DOUBLE | FLOAT(24) | DOUBLE PRECISION | double | DOUBLE | |
655
+ | char | CHAR | CHAR | CHAR | CHAR | string | UTF8/STRING | |
656
+ | string | VARCHAR | VARCHAR | VARCHAR2 | VARCHAR | string | UTF8/STRING | |
657
+ | unicode | NVARCHAR | NVARCHAR | NVARCHAR2 | VARCHAR | string | UTF8/STRING | |
658
+ | text | LONGTEXT | TEXT | CLOB | TEXT | string | UTF8/STRING | |
659
+ | binary | LONGBLOB | BLOB | BLOB | BYTEA | bytes | BYTE\_ARRAY | |
660
+
661
+ | Type | xsd | VOTable | Notes |
662
+ | ------- | ------------ | ---------------- | ----------- |
663
+ | boolean | boolean | boolean | |
664
+ | byte | byte | unsignedByte | [3](#note3) |
665
+ | short | short | short | |
666
+ | int | int | int | |
667
+ | long | long | long | |
668
+ | float | float | float | |
669
+ | double | double | double | |
670
+ | char | string | char\[\] | [3](#note3) |
671
+ | string | string | char\[\] | [3](#note3) |
672
+ | unicode | string | unicodeChar\[\] | [3](#note3) |
673
+ | text | string | unicodeChar\[\] | |
674
+ | binary | base64Binary | unsignedByte\[\] | [6](#note6) |
675
+
676
+ **Notes:**
677
+
678
+ - \[1\] This is the default SQLAlchemy Mapping. It's expected
679
+ implementations processing felis descriptions will use
680
+ [with\_variant](https://docs.sqlalchemy.org/en/latest/core/type_api.html#sqlalchemy.types.TypeEngine.with_variant)
681
+ to construct types based on the types outlined for specific database
682
+ engines.
683
+ - \[2\] SQLAlchemy has no "TinyInteger", so you need to override, or
684
+ the default is SMALLINT
685
+ - \[3\] The length is an additional parameter elsewhere for VOTable
686
+ types
687
+ - \[4\] This is a single byte value between 0-255, not a member of a
688
+ byte array. It's preferable to not use this type.
689
+ - \[5\] [Parquet Logical types from
690
+ Thrift](https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift)
691
+ - \[6\] There's also hexBinary, but it was not considered as the
692
+ target format is usually human-readable XML
693
+
694
+ ## DBMS Extensions
695
+
696
+ DBMS Extension Annotations may be used to override defaults or provide a
697
+ way to describe non-standard parameters for creating objects in a
698
+ database or file.
699
+
700
+ [The SQLAlchemy documentation on
701
+ dialects](https://docs.sqlalchemy.org/en/latest/dialects/mysql.html) is
702
+ a good reference for where most of these originate from, and what we
703
+ might implement.
704
+
705
+ Typically, DDL must be executed only after a schema (Postgres/MySQL),
706
+ user (Oracle), or file (SQLite) has already been created. Tools SHOULD
707
+ take into account the name of the schema defined in a felis description,
708
+ but parameters for creating the schema object are beyond the scope of a
709
+ felis description, because those parameters will likely be
710
+ instance-dependent and may contain secrets, as in the case of Oracle.
711
+
712
+ ### MySQL
713
+
714
+ This properties are defined within the context of `http://mysql.com/`.
715
+ If using the the recommended default context, this means the `engine`
716
+ property for a table would translate to `mysql:engine`, for example.
717
+
718
+ #### Table
719
+
720
+ - `engine` \
721
+ The engine for this database. Usually `INNODB` would is the default
722
+ for most instances of MySQL. `MYISAM` provides better performance.
723
+
724
+ - `charset` \
725
+ The charset for this table. `latin1` is a typical default for most
726
+ installations. `utf8mb4` is probably a more sensible default.
727
+
728
+ #### Column
729
+
730
+ - `datatype` \
731
+ The MySQL specific datatypes for a column.
732
+
733
+ ### Oracle
734
+
735
+ This properties are defined within the context of
736
+ `http://oracle.com/database/`. If using the the recommended default
737
+ context, this means the `datatype` property for a column would translate
738
+ to `oracle:datatype`, for example.
739
+
740
+ In the future, we could think about adding support for temporary tables
741
+ and specifiying Sequences for column primary keys.
742
+
743
+ #### Table
744
+
745
+ - `compress` \
746
+ If this table is to use Oracle compression, set this to `true` or
747
+ some other value
748
+
749
+ #### Index
750
+
751
+ - `bitmap` \
752
+ If an index should be a bitmap index in Oracle, set this to `true`.
753
+
754
+ ### SQLite
755
+
756
+ This properties are defined within the context of `http://sqlite.org/`.
757
+ If using the the recommended default context, this means the `datatype`
758
+ property for a column would translate to `sqlite:datatype`, for example.
759
+
760
+ ## Processing Metadata
761
+
762
+ > \*\*This section is under development
763
+
764
+ ## Creating annotated tables
765
+
766
+ > \*\*This section is under development
767
+
768
+ ## Metadata Compatibility
769
+
770
+ *This section is non-normative.*
771
+
772
+ As mentioned before, to maximize portability, it's recommended to use
773
+ atomic values everywhere possible. A list or a structured object, for
774
+ example, may need to be serialized as a string (usually JSON) for target
775
+ formats that only allow key-value metadata on column and table objects.
776
+ This would include un-mapped storage to a database table.
777
+
778
+ In the case where all annotations are pure atoms, we can represent the
779
+ annotations in virtually every format or model which allows a way to
780
+ store key-value metadata on table and columns. This includes parquet
781
+ files and afw.table objects.
782
+
783
+ We assume that atomic values of an annotation will likely be stored as
784
+ string in most formats. This means libraries processing the metadata may
785
+ need to translate a formatted number back to a float or double. Most of
786
+ this can probably be automated with a proper vocabulary for Felis.
787
+
788
+ ### Formats and Models
789
+
790
+ \*\*This section is under development
791
+
792
+ #### afw.table
793
+
794
+ A few of the metadata values for tables and columns are storable on in
795
+ the properties of a schema (table) or field.
796
+
797
+ #### YAML/JSON
798
+
799
+ This is the most natural format. Note that `@id` fields must be quoted
800
+ in a YAML file.
801
+
802
+ #### FITS
803
+
804
+ A convention and vocabulary for FITS header keywords is being developed.
805
+ In general, a FITS keyword includes a name, a value, and a comment.
806
+
807
+ #### Avro
808
+
809
+ As Avro is very similar to YAML and JSON
810
+
811
+ #### Parquet
812
+
813
+ Parquet files allow key-value metadata on column and table objects,
814
+ though all values must be strings.
815
+
816
+ #### Relational Databases
817
+
818
+ Relational database do not necessarily have facilities to directly
819
+ annotate columns and tables. However, we
820
+
821
+ #### VOTable
822
+
823
+ The annotations for columns and tables should be reused where possible.
824
+ The Column Groupings are based off of the `GROUP` element in VOTable.
825
+
826
+ #### HDF5 and PyTables
827
+
828
+ PyTables is an opinionated way of representing tabular data in HDF5.
829
+
830
+ ## Examples
831
+
832
+ ---
833
+ name: sdqa
834
+ description: The SDQA Schema
835
+ tables:
836
+ - name: sdqa_ImageStatus
837
+ "@id": "#sdqa_ImageStatus"
838
+ description: Unique set of status names and their definitions, e.g. 'passed', 'failed',
839
+ etc.
840
+ columns:
841
+ - name: sdqa_imageStatusId
842
+ "@id": "#sdqa_ImageStatus.sdqa_imageStatusId"
843
+ datatype: short
844
+ description: Primary key
845
+ mysql:datatype: SMALLINT
846
+ - name: statusName
847
+ "@id": "#sdqa_ImageStatus.statusName"
848
+ datatype: string
849
+ description: One-word, camel-case, descriptive name of a possible image status
850
+ (e.g., passedAuto, marginallyPassedManual, etc.)
851
+ length: 30
852
+ mysql:datatype: VARCHAR(30)
853
+ - name: definition
854
+ "@id": "#sdqa_ImageStatus.definition"
855
+ datatype: string
856
+ description: Detailed Definition of the image status
857
+ length: 255
858
+ mysql:datatype: VARCHAR(255)
859
+ primaryKey: "#sdqa_ImageStatus.sdqa_imageStatusId"
860
+ mysql:engine: MyISAM
861
+
862
+ - name: sdqa_Metric
863
+ "@id": "#sdqa_Metric"
864
+ description: Unique set of metric names and associated metadata (e.g., 'nDeadPix';,
865
+ 'median';, etc.). There will be approximately 30 records total in this table.
866
+ columns:
867
+ - name: sdqa_metricId
868
+ "@id": "#sdqa_Metric.sdqa_metricId"
869
+ datatype: short
870
+ description: Primary key.
871
+ mysql:datatype: SMALLINT
872
+ - name: metricName
873
+ "@id": "#sdqa_Metric.metricName"
874
+ datatype: string
875
+ description: One-word, camel-case, descriptive name of a possible metric (e.g.,
876
+ mSatPix, median, etc).
877
+ length: 30
878
+ mysql:datatype: VARCHAR(30)
879
+ - name: physicalUnits
880
+ "@id": "#sdqa_Metric.physicalUnits"
881
+ datatype: string
882
+ description: Physical units of metric.
883
+ length: 30
884
+ mysql:datatype: VARCHAR(30)
885
+ - name: dataType
886
+ "@id": "#sdqa_Metric.dataType"
887
+ datatype: char
888
+ description: Flag indicating whether data type of the metric value is integer
889
+ (0) or float (1).
890
+ length: 1
891
+ mysql:datatype: CHAR(1)
892
+ - name: definition
893
+ "@id": "#sdqa_Metric.definition"
894
+ datatype: string
895
+ length: 255
896
+ mysql:datatype: VARCHAR(255)
897
+ primaryKey: "#sdqa_Metric.sdqa_metricId"
898
+ constraints:
899
+ - name: UQ_sdqaMetric_metricName
900
+ "@id": "#UQ_sdqaMetric_metricName"
901
+ "@type": Unique
902
+ columns:
903
+ - "#sdqa_Metric.metricName"
904
+ mysql:engine: MyISAM
905
+
906
+ - name: sdqa_Rating_ForAmpVisit
907
+ "@id": "#sdqa_Rating_ForAmpVisit"
908
+ description: Various SDQA ratings for a given amplifier image. There will approximately
909
+ 30 of these records per image record.
910
+ columns:
911
+ - name: sdqa_ratingId
912
+ "@id": "#sdqa_Rating_ForAmpVisit.sdqa_ratingId"
913
+ datatype: long
914
+ description: Primary key. Auto-increment is used, we define a composite unique
915
+ key, so potential duplicates will be captured.
916
+ mysql:datatype: BIGINT
917
+ - name: sdqa_metricId
918
+ "@id": "#sdqa_Rating_ForAmpVisit.sdqa_metricId"
919
+ datatype: short
920
+ description: Pointer to sdqa_Metric.
921
+ mysql:datatype: SMALLINT
922
+ - name: sdqa_thresholdId
923
+ "@id": "#sdqa_Rating_ForAmpVisit.sdqa_thresholdId"
924
+ datatype: short
925
+ description: Pointer to sdqa_Threshold.
926
+ mysql:datatype: SMALLINT
927
+ - name: ampVisitId
928
+ "@id": "#sdqa_Rating_ForAmpVisit.ampVisitId"
929
+ datatype: long
930
+ description: Pointer to AmpVisit.
931
+ mysql:datatype: BIGINT
932
+ ivoa:ucd: meta.id;obs.image
933
+ - name: metricValue
934
+ "@id": "#sdqa_Rating_ForAmpVisit.metricValue"
935
+ datatype: double
936
+ description: Value of this SDQA metric.
937
+ mysql:datatype: DOUBLE
938
+ - name: metricSigma
939
+ "@id": "#sdqa_Rating_ForAmpVisit.metricSigma"
940
+ datatype: double
941
+ description: Uncertainty of the value of this metric.
942
+ mysql:datatype: DOUBLE
943
+ primaryKey: "#sdqa_Rating_ForAmpVisit.sdqa_ratingId"
944
+ constraints:
945
+ - name: UQ_sdqaRatingForAmpVisit_metricId_ampVisitId
946
+ "@id": "#UQ_sdqaRatingForAmpVisit_metricId_ampVisitId"
947
+ "@type": Unique
948
+ columns:
949
+ - "#sdqa_Rating_ForAmpVisit.sdqa_metricId"
950
+ - "#sdqa_Rating_ForAmpVisit.ampVisitId"
951
+ indexes:
952
+ - name: IDX_sdqaRatingForAmpVisit_metricId
953
+ "@id": "#IDX_sdqaRatingForAmpVisit_metricId"
954
+ columns:
955
+ - "#sdqa_Rating_ForAmpVisit.sdqa_metricId"
956
+ - name: IDX_sdqaRatingForAmpVisit_thresholdId
957
+ "@id": "#IDX_sdqaRatingForAmpVisit_thresholdId"
958
+ columns:
959
+ - "#sdqa_Rating_ForAmpVisit.sdqa_thresholdId"
960
+ - name: IDX_sdqaRatingForAmpVisit_ampVisitId
961
+ "@id": "#IDX_sdqaRatingForAmpVisit_ampVisitId"
962
+ columns:
963
+ - "#sdqa_Rating_ForAmpVisit.ampVisitId"
964
+ mysql:engine: MyISAM
965
+
966
+ - name: sdqa_Rating_CcdVisit
967
+ "@id": "#sdqa_Rating_CcdVisit"
968
+ description: Various SDQA ratings for a given CcdVisit.
969
+ columns:
970
+ - name: sdqa_ratingId
971
+ "@id": "#sdqa_Rating_CcdVisit.sdqa_ratingId"
972
+ datatype: long
973
+ description: Primary key. Auto-increment is used, we define a composite unique
974
+ key, so potential duplicates will be captured.
975
+ mysql:datatype: BIGINT
976
+ - name: sdqa_metricId
977
+ "@id": "#sdqa_Rating_CcdVisit.sdqa_metricId"
978
+ datatype: short
979
+ description: Pointer to sdqa_Metric.
980
+ mysql:datatype: SMALLINT
981
+ - name: sdqa_thresholdId
982
+ "@id": "#sdqa_Rating_CcdVisit.sdqa_thresholdId"
983
+ datatype: short
984
+ description: Pointer to sdqa_Threshold.
985
+ mysql:datatype: SMALLINT
986
+ - name: ccdVisitId
987
+ "@id": "#sdqa_Rating_CcdVisit.ccdVisitId"
988
+ datatype: long
989
+ description: Pointer to CcdVisit.
990
+ mysql:datatype: BIGINT
991
+ ivoa:ucd: meta.id;obs.image
992
+ - name: metricValue
993
+ "@id": "#sdqa_Rating_CcdVisit.metricValue"
994
+ datatype: double
995
+ description: Value of this SDQA metric.
996
+ mysql:datatype: DOUBLE
997
+ - name: metricSigma
998
+ "@id": "#sdqa_Rating_CcdVisit.metricSigma"
999
+ datatype: double
1000
+ description: Uncertainty of the value of this metric.
1001
+ mysql:datatype: DOUBLE
1002
+ primaryKey: "#sdqa_Rating_CcdVisit.sdqa_ratingId"
1003
+ constraints:
1004
+ - name: UQ_sdqaRatingCcdVisit_metricId_ccdVisitId
1005
+ "@id": "#UQ_sdqaRatingCcdVisit_metricId_ccdVisitId"
1006
+ "@type": Unique
1007
+ columns:
1008
+ - "#sdqa_Rating_CcdVisit.sdqa_metricId"
1009
+ - "#sdqa_Rating_CcdVisit.ccdVisitId"
1010
+ indexes:
1011
+ - name: IDX_sdqaRatingCcdVisit_metricId
1012
+ "@id": "#IDX_sdqaRatingCcdVisit_metricId"
1013
+ columns:
1014
+ - "#sdqa_Rating_CcdVisit.sdqa_metricId"
1015
+ - name: IDX_sdqaRatingCcdVisit_thresholdId
1016
+ "@id": "#IDX_sdqaRatingCcdVisit_thresholdId"
1017
+ columns:
1018
+ - "#sdqa_Rating_CcdVisit.sdqa_thresholdId"
1019
+ - name: IDX_sdqaRatingCcdVisit_ccdVisitId
1020
+ "@id": "#IDX_sdqaRatingCcdVisit_ccdVisitId"
1021
+ columns:
1022
+ - "#sdqa_Rating_CcdVisit.ccdVisitId"
1023
+ mysql:engine: MyISAM
1024
+
1025
+ - name: sdqa_Threshold
1026
+ "@id": "#sdqa_Threshold"
1027
+ description: Version-controlled metric thresholds. Total number of these records
1028
+ is approximately equal to 30 x the number of times the thresholds will be changed
1029
+ over the entire period of LSST operations (of order of 100), with most of the
1030
+ changes occuring in the first year of operations.
1031
+ columns:
1032
+ - name: sdqa_thresholdId
1033
+ "@id": "#sdqa_Threshold.sdqa_thresholdId"
1034
+ datatype: short
1035
+ description: Primary key.
1036
+ mysql:datatype: SMALLINT
1037
+ - name: sdqa_metricId
1038
+ "@id": "#sdqa_Threshold.sdqa_metricId"
1039
+ datatype: short
1040
+ description: Pointer to sdqa_Metric table.
1041
+ mysql:datatype: SMALLINT
1042
+ - name: upperThreshold
1043
+ "@id": "#sdqa_Threshold.upperThreshold"
1044
+ datatype: double
1045
+ description: Threshold for which a metric value is tested to be greater than.
1046
+ mysql:datatype: DOUBLE
1047
+ - name: lowerThreshold
1048
+ "@id": "#sdqa_Threshold.lowerThreshold"
1049
+ datatype: double
1050
+ description: Threshold for which a metric value is tested to be less than.
1051
+ mysql:datatype: DOUBLE
1052
+ - name: createdDate
1053
+ "@id": "#sdqa_Threshold.createdDate"
1054
+ datatype: timestamp
1055
+ description: Database timestamp when the record is inserted.
1056
+ value: CURRENT_TIMESTAMP
1057
+ mysql:datatype: TIMESTAMP
1058
+ primaryKey: "#sdqa_Threshold.sdqa_thresholdId"
1059
+ indexes:
1060
+ - name: IDX_sdqaThreshold_metricId
1061
+ "@id": "#IDX_sdqaThreshold_metricId"
1062
+ columns:
1063
+ - "#sdqa_Threshold.sdqa_metricId"
1064
+ mysql:engine: MyISAM