mdmodels-core 0.2.2 → 0.2.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +94 -31
- package/mdmodels-core.d.ts +113 -113
- package/mdmodels-core_bg.wasm +0 -0
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -1,29 +1,52 @@
|
|
|
1
|
-
# MD-Models
|
|
1
|
+
# MD-Models 🚀
|
|
2
2
|
|
|
3
|
-

|
|
4
|
-
](https://crates.io/crates/mdmodels)
|
|
4
|
+
[](https://www.npmjs.com/package/mdmodels-core)
|
|
5
|
+
[](https://pypi.org/project/mdmodels-core/)
|
|
6
|
+
[](https://github.com/JR-1991/sdrdm.rs/actions/workflows/test.yml)
|
|
6
7
|
|
|
7
|
-
Welcome to Markdown Models (MD-Models)
|
|
8
|
+
*Welcome to Markdown Models (MD-Models)!* 📝
|
|
8
9
|
|
|
9
|
-
|
|
10
|
+
We've created this framework to make research data management more intuitive and accessible while maintaining professional standards. Our approach uses markdown-based schema definitions to transform complex data modeling into something you'll actually enjoy working with.
|
|
10
11
|
|
|
11
|
-
|
|
12
|
+
The framework does the heavy lifting for you - automatically generating technical schemas and programming language implementations from your markdown files. This means you can focus on designing your data structures in a format that makes sense, while we handle the technical translations. ⚙️
|
|
12
13
|
|
|
13
|
-
|
|
14
|
+
## Core Philosophy 💡
|
|
14
15
|
|
|
15
|
-
|
|
16
|
-
2. Programming language implementations
|
|
17
|
-
3. Documentation for domain experts
|
|
18
|
-
4. API documentation
|
|
16
|
+
We built MD-Models to solve a common frustration in data modeling: juggling multiple versions of the same information. Here's what typically happens in traditional approaches:
|
|
19
17
|
|
|
20
|
-
|
|
18
|
+
1. Technical Schema Definitions 📊
|
|
19
|
+
- You need JSON Schema, XSD, ShEx, or SHACL
|
|
20
|
+
- Each format has its own complexity
|
|
21
|
+
- Changes need to be replicated across formats
|
|
21
22
|
|
|
22
|
-
|
|
23
|
+
2. Language-Specific Implementations 💻
|
|
24
|
+
- Different programming languages need different implementations
|
|
25
|
+
- Each requires maintenance and updates
|
|
26
|
+
- Keeping everything in sync is challenging
|
|
23
27
|
|
|
24
|
-
|
|
28
|
+
3. Documentation 📚
|
|
29
|
+
- Technical docs for developers
|
|
30
|
+
- Simplified explanations for domain experts
|
|
31
|
+
- API documentation that needs constant updates
|
|
25
32
|
|
|
26
|
-
|
|
33
|
+
Instead of dealing with all these separate pieces, MD-Models gives you one clear source of truth. Write it once, use it everywhere! ✨
|
|
34
|
+
|
|
35
|
+
Ready to see it in action? Check out our [book](https://fairchemistry.github.io/md-models/) for a deeper dive into the framework and [graph editor](https://mdmodels.vercel.app/?about) to get started.
|
|
36
|
+
|
|
37
|
+
### Schema Design 🎨
|
|
38
|
+
|
|
39
|
+
Our schema syntax makes the most of markdown's natural readability. Here's what you can do:
|
|
40
|
+
|
|
41
|
+
- Define objects with clear, descriptive headers
|
|
42
|
+
- Specify attributes with all the details you need
|
|
43
|
+
- Add rich descriptions that everyone can understand
|
|
44
|
+
- Include semantic annotations when you need them
|
|
45
|
+
- Define relationships between objects easily
|
|
46
|
+
|
|
47
|
+
We've designed this approach to work for everyone on your team - whether they're technical experts or domain specialists. You get all the precision you need for automatic code generation, while keeping things clear and approachable. 🤝
|
|
48
|
+
|
|
49
|
+
Here is an example of a markdown model definition:
|
|
27
50
|
|
|
28
51
|
```markdown
|
|
29
52
|
---
|
|
@@ -52,33 +75,71 @@ prefixes:
|
|
|
52
75
|
- Description: The street of the address
|
|
53
76
|
```
|
|
54
77
|
|
|
55
|
-
|
|
78
|
+
Lets break down the example:
|
|
79
|
+
|
|
80
|
+
We define an object `Person` with two attributes: `name` and `age`. We also define an object `Address` with one attribute: `street`. An object can be defined as a list of attributes, which can be either primitive types, other objects, or lists of other objects.
|
|
81
|
+
|
|
82
|
+
Objects are defined by using the `###` header and a list of attributes. Attributes are defined by using the `-` prefix. The type of the attribute is specified after the `:`. The description of the attribute is specified after the `-`. The term of the attribute is specified after the `-`.
|
|
83
|
+
|
|
84
|
+
Attributes can hold any key-value pair as metadata. For instance, the `age` attribute has the following metadata:
|
|
85
|
+
|
|
86
|
+
```markdown
|
|
87
|
+
- age
|
|
88
|
+
- Type: integer
|
|
89
|
+
- Description: The age of the person
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
The `age` attribute is of type `integer` and has the following description: `The age of the person`. You could also add more metadata to the attribute, such as `minValue` and `maxValue` for JSON Schema. If your application needs more metadata, you can add it to the attribute as well - There are no restrictions on the metadata.
|
|
93
|
+
|
|
94
|
+
> [!NOTE]
|
|
95
|
+
> All JSON-Schema validation keywords are supported, except for `readOnly` and `writeOnly`.
|
|
96
|
+
|
|
97
|
+
### Large Language Model Integration 🤖
|
|
98
|
+
|
|
99
|
+
Our framework also supports large language model guided extraction of information from natural language text into a structured format. Typically you would use a JSON schema as an intermediate format for this or use specialized libraries such as [Instructor](https://github.com/jxnl/instructor) or [LangChain](https://github.com/langchain-ai/langchain) to accomplish this.
|
|
56
100
|
|
|
57
|
-
|
|
101
|
+
We have wrapped all of this functionality into a single command:
|
|
58
102
|
|
|
59
103
|
```bash
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
cargo install --path .
|
|
104
|
+
export OPENAI_API_KEY="sk-..."
|
|
105
|
+
md-models extract -i text.txt -m mymodel.md -o structured.json
|
|
63
106
|
```
|
|
64
107
|
|
|
65
|
-
|
|
108
|
+
This will read the input text file and extract the information into the structured format defined in the markdown model. The output will be written to the `structured.json` file. You can even pass an existing JSON dataset and let the LLM update the dataset with the new information. By utilizing JSON patch, we can ensure that the original dataset is kept intact and only the new information is added.
|
|
66
109
|
|
|
67
|
-
|
|
110
|
+
## Installation 🛠️
|
|
111
|
+
|
|
112
|
+
MD-Models is primarily a command line tool. In order to install the command line tool, you can use the following command:
|
|
68
113
|
|
|
69
114
|
```bash
|
|
70
|
-
|
|
115
|
+
git clone https://github.com/FAIRChemistry/md-models
|
|
116
|
+
cd md-models
|
|
117
|
+
cargo install --path .
|
|
71
118
|
```
|
|
72
119
|
|
|
73
|
-
|
|
120
|
+
Checkout our releases, where you can find pre-compiled binaries for the command line tool!
|
|
121
|
+
|
|
122
|
+
## Command line usage 📝
|
|
123
|
+
|
|
124
|
+
The command line tool can be used to convert markdown files to various formats. For instance, the following command will convert a markdown file to Python code:
|
|
74
125
|
|
|
75
126
|
```bash
|
|
76
|
-
md-models
|
|
127
|
+
md-models convert -i model.md -o lib.py -t python-dataclass
|
|
77
128
|
```
|
|
78
129
|
|
|
130
|
+
This will read the input file `model.md` and write the output to `lib.py` using the Python dataclass template. Alternatively, you can also pass a URL as input to fetch the model remotely.
|
|
131
|
+
|
|
132
|
+
Here is a list of all available sub commands:
|
|
133
|
+
|
|
134
|
+
- `convert`: Convert a markdown file to a specific format
|
|
135
|
+
- `validate`: Validate and check if a markdown file conforms our specification
|
|
136
|
+
- `pipeline`: Pipeline for generating multiple files
|
|
137
|
+
- `extract`: Large Language Model Extraction guided by a markdown model
|
|
138
|
+
- `dataset`: Validate a dataset against a markdown model
|
|
139
|
+
|
|
79
140
|
## Available templates
|
|
80
141
|
|
|
81
|
-
The following templates are available:
|
|
142
|
+
The following templates are available for the `convert` command:
|
|
82
143
|
|
|
83
144
|
- `python-dataclass`: Python dataclass implementation with JSON-LD support
|
|
84
145
|
- `python-pydantic`: PyDantic implementation with JSON-LD support
|
|
@@ -99,9 +160,11 @@ The following templates are available:
|
|
|
99
160
|
- `mkdocs`: MkDocs documentation format
|
|
100
161
|
- `linkml`: LinkML schema definition
|
|
101
162
|
|
|
102
|
-
## Installation options
|
|
163
|
+
## Installation options 📦
|
|
164
|
+
|
|
165
|
+
We've made our core Rust library incredibly versatile by compiling it to both Python and WebAssembly! This means you can use our model conversion tools not just from the command line, but directly in your Python applications or web browsers.
|
|
103
166
|
|
|
104
|
-
|
|
167
|
+
We provide several packages to make integration seamless:
|
|
105
168
|
|
|
106
169
|
- **[Core Python Package](https://pypi.org/project/mdmodels-core/)**: Install via pip:
|
|
107
170
|
```bash
|
|
@@ -121,7 +184,7 @@ The main Rust crate is compiled to Python and WebAssembly, allowing the usage be
|
|
|
121
184
|
npm install mdmodels-core
|
|
122
185
|
```
|
|
123
186
|
|
|
124
|
-
## Development
|
|
187
|
+
## Development 🔧
|
|
125
188
|
|
|
126
189
|
This project uses GitHub Actions for continuous integration. The tests can be run using the following command:
|
|
127
190
|
|
|
@@ -139,4 +202,4 @@ pip install pre-commit
|
|
|
139
202
|
pre-commit install
|
|
140
203
|
```
|
|
141
204
|
|
|
142
|
-
Once the pre-commit hooks are installed, they will run on every commit. This will ensure that the code is formatted and linted correctly. And the clippy CI will not complain about warnings.
|
|
205
|
+
Once the pre-commit hooks are installed, they will run on every commit. This will ensure that the code is formatted and linted correctly. And the clippy CI will not complain about warnings.
|
package/mdmodels-core.d.ts
CHANGED
|
@@ -80,102 +80,6 @@ export enum Templates {
|
|
|
80
80
|
Linkml = 18,
|
|
81
81
|
Julia = 19,
|
|
82
82
|
}
|
|
83
|
-
/**
|
|
84
|
-
* A raw key-value representation of an attribute option.
|
|
85
|
-
*
|
|
86
|
-
* This struct provides a simple string-based representation of options,
|
|
87
|
-
* which is useful for serialization/deserialization and when working
|
|
88
|
-
* with untyped data.
|
|
89
|
-
*/
|
|
90
|
-
export interface RawOption {
|
|
91
|
-
/**
|
|
92
|
-
* The key/name of the option
|
|
93
|
-
*/
|
|
94
|
-
key: string;
|
|
95
|
-
/**
|
|
96
|
-
* The string value of the option
|
|
97
|
-
*/
|
|
98
|
-
value: string;
|
|
99
|
-
}
|
|
100
|
-
|
|
101
|
-
/**
|
|
102
|
-
* Represents an option for an attribute in a data model.
|
|
103
|
-
*
|
|
104
|
-
* This enum provides a strongly-typed representation of various attribute options
|
|
105
|
-
* that can be used to configure and constrain attributes in a data model.
|
|
106
|
-
*
|
|
107
|
-
* The options are grouped into several categories:
|
|
108
|
-
* - JSON Schema validation options (e.g., minimum/maximum values, length constraints)
|
|
109
|
-
* - SQL database options (e.g., primary key)
|
|
110
|
-
* - LinkML specific options (e.g., readonly, recommended)
|
|
111
|
-
* - Custom options via the `Other` variant
|
|
112
|
-
*
|
|
113
|
-
*/
|
|
114
|
-
export type AttrOption = { Example: string } | { MinimumValue: number } | { MaximumValue: number } | { MinItems: number } | { MaxItems: number } | { MinLength: number } | { MaxLength: number } | { Pattern: string } | { Unique: boolean } | { MultipleOf: number } | { ExclusiveMinimum: number } | { ExclusiveMaximum: number } | { PrimaryKey: boolean } | { ReadOnly: boolean } | { Recommended: boolean } | { Other: { key: string; value: string } };
|
|
115
|
-
|
|
116
|
-
/**
|
|
117
|
-
* Represents an enumeration with a name and mappings.
|
|
118
|
-
*/
|
|
119
|
-
export interface Enumeration {
|
|
120
|
-
/**
|
|
121
|
-
* Name of the enumeration.
|
|
122
|
-
*/
|
|
123
|
-
name: string;
|
|
124
|
-
/**
|
|
125
|
-
* Mappings associated with the enumeration.
|
|
126
|
-
*/
|
|
127
|
-
mappings: Map<string, string>;
|
|
128
|
-
/**
|
|
129
|
-
* Documentation string for the enumeration.
|
|
130
|
-
*/
|
|
131
|
-
docstring: string;
|
|
132
|
-
/**
|
|
133
|
-
* The line number of the enumeration
|
|
134
|
-
*/
|
|
135
|
-
position: Position | undefined;
|
|
136
|
-
}
|
|
137
|
-
|
|
138
|
-
/**
|
|
139
|
-
* Represents an object with a name, attributes, docstring, and an optional term.
|
|
140
|
-
*/
|
|
141
|
-
export interface Object {
|
|
142
|
-
/**
|
|
143
|
-
* Name of the object.
|
|
144
|
-
*/
|
|
145
|
-
name: string;
|
|
146
|
-
/**
|
|
147
|
-
* List of attributes associated with the object.
|
|
148
|
-
*/
|
|
149
|
-
attributes: Attribute[];
|
|
150
|
-
/**
|
|
151
|
-
* Documentation string for the object.
|
|
152
|
-
*/
|
|
153
|
-
docstring: string;
|
|
154
|
-
/**
|
|
155
|
-
* Optional term associated with the object.
|
|
156
|
-
*/
|
|
157
|
-
term?: string;
|
|
158
|
-
/**
|
|
159
|
-
* Parent object of the object.
|
|
160
|
-
*/
|
|
161
|
-
parent?: string;
|
|
162
|
-
/**
|
|
163
|
-
* The line number of the object
|
|
164
|
-
*/
|
|
165
|
-
position?: Position;
|
|
166
|
-
}
|
|
167
|
-
|
|
168
|
-
export interface PositionRange {
|
|
169
|
-
start: number;
|
|
170
|
-
end: number;
|
|
171
|
-
}
|
|
172
|
-
|
|
173
|
-
export interface Position {
|
|
174
|
-
line: number;
|
|
175
|
-
column: PositionRange;
|
|
176
|
-
offset: PositionRange;
|
|
177
|
-
}
|
|
178
|
-
|
|
179
83
|
export interface DataModel {
|
|
180
84
|
name?: string;
|
|
181
85
|
objects: Object[];
|
|
@@ -243,11 +147,89 @@ export interface Attribute {
|
|
|
243
147
|
import_prefix?: string;
|
|
244
148
|
}
|
|
245
149
|
|
|
150
|
+
/**
|
|
151
|
+
* Validator for checking the integrity of a data model.
|
|
152
|
+
*/
|
|
153
|
+
export interface Validator {
|
|
154
|
+
is_valid: boolean;
|
|
155
|
+
errors: ValidationError[];
|
|
156
|
+
}
|
|
157
|
+
|
|
158
|
+
/**
|
|
159
|
+
* Enum representing the type of validation error.
|
|
160
|
+
*/
|
|
161
|
+
export type ErrorType = "NameError" | "TypeError" | "DuplicateError" | "GlobalError" | "XMLError" | "ObjectError";
|
|
162
|
+
|
|
163
|
+
/**
|
|
164
|
+
* Represents a validation error in the data model.
|
|
165
|
+
*/
|
|
166
|
+
export interface ValidationError {
|
|
167
|
+
message: string;
|
|
168
|
+
object: string | undefined;
|
|
169
|
+
attribute: string | undefined;
|
|
170
|
+
location: string;
|
|
171
|
+
solution: string | undefined;
|
|
172
|
+
error_type: ErrorType;
|
|
173
|
+
positions: Position[];
|
|
174
|
+
}
|
|
175
|
+
|
|
246
176
|
/**
|
|
247
177
|
* Represents an XML type, either an attribute or an element.
|
|
248
178
|
*/
|
|
249
179
|
export type XMLType = { Attribute: { is_attr: boolean; name: string } } | { Element: { is_attr: boolean; name: string } } | { Wrapped: { is_attr: boolean; name: string; wrapped: string[] | undefined } };
|
|
250
180
|
|
|
181
|
+
/**
|
|
182
|
+
* Represents an enumeration with a name and mappings.
|
|
183
|
+
*/
|
|
184
|
+
export interface Enumeration {
|
|
185
|
+
/**
|
|
186
|
+
* Name of the enumeration.
|
|
187
|
+
*/
|
|
188
|
+
name: string;
|
|
189
|
+
/**
|
|
190
|
+
* Mappings associated with the enumeration.
|
|
191
|
+
*/
|
|
192
|
+
mappings: Map<string, string>;
|
|
193
|
+
/**
|
|
194
|
+
* Documentation string for the enumeration.
|
|
195
|
+
*/
|
|
196
|
+
docstring: string;
|
|
197
|
+
/**
|
|
198
|
+
* The line number of the enumeration
|
|
199
|
+
*/
|
|
200
|
+
position: Position | undefined;
|
|
201
|
+
}
|
|
202
|
+
|
|
203
|
+
/**
|
|
204
|
+
* Represents an object with a name, attributes, docstring, and an optional term.
|
|
205
|
+
*/
|
|
206
|
+
export interface Object {
|
|
207
|
+
/**
|
|
208
|
+
* Name of the object.
|
|
209
|
+
*/
|
|
210
|
+
name: string;
|
|
211
|
+
/**
|
|
212
|
+
* List of attributes associated with the object.
|
|
213
|
+
*/
|
|
214
|
+
attributes: Attribute[];
|
|
215
|
+
/**
|
|
216
|
+
* Documentation string for the object.
|
|
217
|
+
*/
|
|
218
|
+
docstring: string;
|
|
219
|
+
/**
|
|
220
|
+
* Optional term associated with the object.
|
|
221
|
+
*/
|
|
222
|
+
term?: string;
|
|
223
|
+
/**
|
|
224
|
+
* Parent object of the object.
|
|
225
|
+
*/
|
|
226
|
+
parent?: string;
|
|
227
|
+
/**
|
|
228
|
+
* The line number of the object
|
|
229
|
+
*/
|
|
230
|
+
position?: Position;
|
|
231
|
+
}
|
|
232
|
+
|
|
251
233
|
/**
|
|
252
234
|
* Represents different types of model imports.
|
|
253
235
|
*
|
|
@@ -294,28 +276,46 @@ export interface FrontMatter {
|
|
|
294
276
|
}
|
|
295
277
|
|
|
296
278
|
/**
|
|
297
|
-
*
|
|
279
|
+
* A raw key-value representation of an attribute option.
|
|
280
|
+
*
|
|
281
|
+
* This struct provides a simple string-based representation of options,
|
|
282
|
+
* which is useful for serialization/deserialization and when working
|
|
283
|
+
* with untyped data.
|
|
298
284
|
*/
|
|
299
|
-
export interface
|
|
300
|
-
|
|
301
|
-
|
|
285
|
+
export interface RawOption {
|
|
286
|
+
/**
|
|
287
|
+
* The key/name of the option
|
|
288
|
+
*/
|
|
289
|
+
key: string;
|
|
290
|
+
/**
|
|
291
|
+
* The string value of the option
|
|
292
|
+
*/
|
|
293
|
+
value: string;
|
|
302
294
|
}
|
|
303
295
|
|
|
304
296
|
/**
|
|
305
|
-
*
|
|
297
|
+
* Represents an option for an attribute in a data model.
|
|
298
|
+
*
|
|
299
|
+
* This enum provides a strongly-typed representation of various attribute options
|
|
300
|
+
* that can be used to configure and constrain attributes in a data model.
|
|
301
|
+
*
|
|
302
|
+
* The options are grouped into several categories:
|
|
303
|
+
* - JSON Schema validation options (e.g., minimum/maximum values, length constraints)
|
|
304
|
+
* - SQL database options (e.g., primary key)
|
|
305
|
+
* - LinkML specific options (e.g., readonly, recommended)
|
|
306
|
+
* - Custom options via the `Other` variant
|
|
307
|
+
*
|
|
306
308
|
*/
|
|
307
|
-
export type
|
|
309
|
+
export type AttrOption = { Example: string } | { MinimumValue: number } | { MaximumValue: number } | { MinItems: number } | { MaxItems: number } | { MinLength: number } | { MaxLength: number } | { Pattern: string } | { Unique: boolean } | { MultipleOf: number } | { ExclusiveMinimum: number } | { ExclusiveMaximum: number } | { PrimaryKey: boolean } | { ReadOnly: boolean } | { Recommended: boolean } | { Other: { key: string; value: string } };
|
|
308
310
|
|
|
309
|
-
|
|
310
|
-
|
|
311
|
-
|
|
312
|
-
|
|
313
|
-
|
|
314
|
-
|
|
315
|
-
|
|
316
|
-
|
|
317
|
-
|
|
318
|
-
error_type: ErrorType;
|
|
319
|
-
positions: Position[];
|
|
311
|
+
export interface PositionRange {
|
|
312
|
+
start: number;
|
|
313
|
+
end: number;
|
|
314
|
+
}
|
|
315
|
+
|
|
316
|
+
export interface Position {
|
|
317
|
+
line: number;
|
|
318
|
+
column: PositionRange;
|
|
319
|
+
offset: PositionRange;
|
|
320
320
|
}
|
|
321
321
|
|
package/mdmodels-core_bg.wasm
CHANGED
|
Binary file
|
package/package.json
CHANGED