mdmodels-core 0.2.2 → 0.2.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,29 +1,52 @@
1
- # MD-Models
1
+ # MD-Models 🚀
2
2
 
3
- ![Crates.io Version](https://img.shields.io/crates/v/mdmodels) ![NPM Version](https://img.shields.io/npm/v/mdmodels-core)
4
- ![PyPI - Version](https://img.shields.io/pypi/v/mdmodels-core)
5
- ![Build Status](https://github.com/JR-1991/sdrdm.rs/actions/workflows/test.yml/badge.svg)
3
+ [![Crates.io Version](https://img.shields.io/crates/v/mdmodels)](https://crates.io/crates/mdmodels)
4
+ [![NPM Version](https://img.shields.io/npm/v/mdmodels-core)](https://www.npmjs.com/package/mdmodels-core)
5
+ [![PyPI - Version](https://img.shields.io/pypi/v/mdmodels-core)](https://pypi.org/project/mdmodels-core/)
6
+ [![Build Status](https://github.com/JR-1991/sdrdm.rs/actions/workflows/test.yml/badge.svg)](https://github.com/JR-1991/sdrdm.rs/actions/workflows/test.yml)
6
7
 
7
- Welcome to Markdown Models (MD-Models), a powerful framework for research data management that prioritizes narrative and readability for data models.
8
+ *Welcome to Markdown Models (MD-Models)!* 📝
8
9
 
9
- With an adaptable markdown-based schema language, MD-Models automatically generates schemas and programming language representations. This markdown schema forms the foundation for object-oriented models, enabling seamless cross-format compatibility and simplifying modifications to data structures.
10
+ We've created this framework to make research data management more intuitive and accessible while maintaining professional standards. Our approach uses markdown-based schema definitions to transform complex data modeling into something you'll actually enjoy working with.
10
11
 
11
- ## Core Philosophy
12
+ The framework does the heavy lifting for you - automatically generating technical schemas and programming language implementations from your markdown files. This means you can focus on designing your data structures in a format that makes sense, while we handle the technical translations. ⚙️
12
13
 
13
- The primary motivation behind MD-Models is to reduce cognitive overhead and maintenance burden by unifying documentation and structural definition into a single source of truth. Traditional approaches often require maintaining separate artifacts:
14
+ ## Core Philosophy 💡
14
15
 
15
- 1. Technical schemas (JSON Schema, XSD, ShEx, SHACL)
16
- 2. Programming language implementations
17
- 3. Documentation for domain experts
18
- 4. API documentation
16
+ We built MD-Models to solve a common frustration in data modeling: juggling multiple versions of the same information. Here's what typically happens in traditional approaches:
19
17
 
20
- This separation frequently leads to documentation drift and increases the cognitive load on both developers and domain experts.
18
+ 1. Technical Schema Definitions 📊
19
+ - You need JSON Schema, XSD, ShEx, or SHACL
20
+ - Each format has its own complexity
21
+ - Changes need to be replicated across formats
21
22
 
22
- Check out the [documentation and graph editor](https://mdmodels.vercel.app/?about) for more information.
23
+ 2. Language-Specific Implementations 💻
24
+ - Different programming languages need different implementations
25
+ - Each requires maintenance and updates
26
+ - Keeping everything in sync is challenging
23
27
 
24
- ### Example
28
+ 3. Documentation 📚
29
+ - Technical docs for developers
30
+ - Simplified explanations for domain experts
31
+ - API documentation that needs constant updates
25
32
 
26
- The schema syntax uses Markdown to define data models in a clear and structured way. Each object is introduced with a header, followed by its attributes. Attributes are described with their type, a brief explanation, and optional metadata like terms. Nested or related objects are represented using array types or references to other objects.
33
+ Instead of dealing with all these separate pieces, MD-Models gives you one clear source of truth. Write it once, use it everywhere!
34
+
35
+ Ready to see it in action? Check out our [book](https://fairchemistry.github.io/md-models/) for a deeper dive into the framework and [graph editor](https://mdmodels.vercel.app/?about) to get started.
36
+
37
+ ### Schema Design 🎨
38
+
39
+ Our schema syntax makes the most of markdown's natural readability. Here's what you can do:
40
+
41
+ - Define objects with clear, descriptive headers
42
+ - Specify attributes with all the details you need
43
+ - Add rich descriptions that everyone can understand
44
+ - Include semantic annotations when you need them
45
+ - Define relationships between objects easily
46
+
47
+ We've designed this approach to work for everyone on your team - whether they're technical experts or domain specialists. You get all the precision you need for automatic code generation, while keeping things clear and approachable. 🤝
48
+
49
+ Here is an example of a markdown model definition:
27
50
 
28
51
  ```markdown
29
52
  ---
@@ -52,33 +75,71 @@ prefixes:
52
75
  - Description: The street of the address
53
76
  ```
54
77
 
55
- ## Installation
78
+ Lets break down the example:
79
+
80
+ We define an object `Person` with two attributes: `name` and `age`. We also define an object `Address` with one attribute: `street`. An object can be defined as a list of attributes, which can be either primitive types, other objects, or lists of other objects.
81
+
82
+ Objects are defined by using the `###` header and a list of attributes. Attributes are defined by using the `-` prefix. The type of the attribute is specified after the `:`. The description of the attribute is specified after the `-`. The term of the attribute is specified after the `-`.
83
+
84
+ Attributes can hold any key-value pair as metadata. For instance, the `age` attribute has the following metadata:
85
+
86
+ ```markdown
87
+ - age
88
+ - Type: integer
89
+ - Description: The age of the person
90
+ ```
91
+
92
+ The `age` attribute is of type `integer` and has the following description: `The age of the person`. You could also add more metadata to the attribute, such as `minValue` and `maxValue` for JSON Schema. If your application needs more metadata, you can add it to the attribute as well - There are no restrictions on the metadata.
93
+
94
+ > [!NOTE]
95
+ > All JSON-Schema validation keywords are supported, except for `readOnly` and `writeOnly`.
96
+
97
+ ### Large Language Model Integration 🤖
98
+
99
+ Our framework also supports large language model guided extraction of information from natural language text into a structured format. Typically you would use a JSON schema as an intermediate format for this or use specialized libraries such as [Instructor](https://github.com/jxnl/instructor) or [LangChain](https://github.com/langchain-ai/langchain) to accomplish this.
56
100
 
57
- In order to install the command line tool, you can use the following command:
101
+ We have wrapped all of this functionality into a single command:
58
102
 
59
103
  ```bash
60
- git clone https://github.com/JR-1991/md-models
61
- cd md-models
62
- cargo install --path .
104
+ export OPENAI_API_KEY="sk-..."
105
+ md-models extract -i text.txt -m mymodel.md -o structured.json
63
106
  ```
64
107
 
65
- ## Command line usage
108
+ This will read the input text file and extract the information into the structured format defined in the markdown model. The output will be written to the `structured.json` file. You can even pass an existing JSON dataset and let the LLM update the dataset with the new information. By utilizing JSON patch, we can ensure that the original dataset is kept intact and only the new information is added.
66
109
 
67
- The command line tool can be used to convert markdown files to various formats. The following command will convert a markdown file to Python code:
110
+ ## Installation 🛠️
111
+
112
+ MD-Models is primarily a command line tool. In order to install the command line tool, you can use the following command:
68
113
 
69
114
  ```bash
70
- md-models convert -i model.md -o lib.py -l python-dataclass
115
+ git clone https://github.com/FAIRChemistry/md-models
116
+ cd md-models
117
+ cargo install --path .
71
118
  ```
72
119
 
73
- This will read the input file `model.md` and write the output to `lib.py` using the Python dataclass template. Alternatively, you can also pass a URL as input to fetch the model remotely. For an overview of all available templates, you can use the following command:
120
+ Checkout our releases, where you can find pre-compiled binaries for the command line tool!
121
+
122
+ ## Command line usage 📝
123
+
124
+ The command line tool can be used to convert markdown files to various formats. For instance, the following command will convert a markdown file to Python code:
74
125
 
75
126
  ```bash
76
- md-models --help
127
+ md-models convert -i model.md -o lib.py -t python-dataclass
77
128
  ```
78
129
 
130
+ This will read the input file `model.md` and write the output to `lib.py` using the Python dataclass template. Alternatively, you can also pass a URL as input to fetch the model remotely.
131
+
132
+ Here is a list of all available sub commands:
133
+
134
+ - `convert`: Convert a markdown file to a specific format
135
+ - `validate`: Validate and check if a markdown file conforms our specification
136
+ - `pipeline`: Pipeline for generating multiple files
137
+ - `extract`: Large Language Model Extraction guided by a markdown model
138
+ - `dataset`: Validate a dataset against a markdown model
139
+
79
140
  ## Available templates
80
141
 
81
- The following templates are available:
142
+ The following templates are available for the `convert` command:
82
143
 
83
144
  - `python-dataclass`: Python dataclass implementation with JSON-LD support
84
145
  - `python-pydantic`: PyDantic implementation with JSON-LD support
@@ -99,9 +160,11 @@ The following templates are available:
99
160
  - `mkdocs`: MkDocs documentation format
100
161
  - `linkml`: LinkML schema definition
101
162
 
102
- ## Installation options
163
+ ## Installation options 📦
164
+
165
+ We've made our core Rust library incredibly versatile by compiling it to both Python and WebAssembly! This means you can use our model conversion tools not just from the command line, but directly in your Python applications or web browsers.
103
166
 
104
- The main Rust crate is compiled to Python and WebAssembly, allowing the usage beyond the command line tool. These are the main packages:
167
+ We provide several packages to make integration seamless:
105
168
 
106
169
  - **[Core Python Package](https://pypi.org/project/mdmodels-core/)**: Install via pip:
107
170
  ```bash
@@ -121,7 +184,7 @@ The main Rust crate is compiled to Python and WebAssembly, allowing the usage be
121
184
  npm install mdmodels-core
122
185
  ```
123
186
 
124
- ## Development
187
+ ## Development 🔧
125
188
 
126
189
  This project uses GitHub Actions for continuous integration. The tests can be run using the following command:
127
190
 
@@ -139,4 +202,4 @@ pip install pre-commit
139
202
  pre-commit install
140
203
  ```
141
204
 
142
- Once the pre-commit hooks are installed, they will run on every commit. This will ensure that the code is formatted and linted correctly. And the clippy CI will not complain about warnings.
205
+ Once the pre-commit hooks are installed, they will run on every commit. This will ensure that the code is formatted and linted correctly. And the clippy CI will not complain about warnings.
@@ -59,27 +59,141 @@ export function validate(markdown_content: string): Validator;
59
59
  * Enumeration of available templates.
60
60
  */
61
61
  export enum Templates {
62
+ /**
63
+ * XML Schema
64
+ */
62
65
  XmlSchema = 0,
66
+ /**
67
+ * Markdown
68
+ */
63
69
  Markdown = 1,
70
+ /**
71
+ * Compact Markdown
72
+ */
64
73
  CompactMarkdown = 2,
74
+ /**
75
+ * SHACL
76
+ */
65
77
  Shacl = 3,
78
+ /**
79
+ * JSON Schema
80
+ */
66
81
  JsonSchema = 4,
82
+ /**
83
+ * JSON Schema All
84
+ */
67
85
  JsonSchemaAll = 5,
86
+ /**
87
+ * SHACL
88
+ */
68
89
  Shex = 6,
90
+ /**
91
+ * Python Dataclass
92
+ */
69
93
  PythonDataclass = 7,
94
+ /**
95
+ * Python Pydantic XML
96
+ */
70
97
  PythonPydanticXML = 8,
98
+ /**
99
+ * Python Pydantic
100
+ */
71
101
  PythonPydantic = 9,
102
+ /**
103
+ * MkDocs
104
+ */
72
105
  MkDocs = 10,
106
+ /**
107
+ * Internal
108
+ */
73
109
  Internal = 11,
110
+ /**
111
+ * Typescript (io-ts)
112
+ */
74
113
  Typescript = 12,
114
+ /**
115
+ * Typescript (Zod)
116
+ */
75
117
  TypescriptZod = 13,
118
+ /**
119
+ * Rust
120
+ */
76
121
  Rust = 14,
122
+ /**
123
+ * Protobuf
124
+ */
77
125
  Protobuf = 15,
126
+ /**
127
+ * Graphql
128
+ */
78
129
  Graphql = 16,
130
+ /**
131
+ * Golang
132
+ */
79
133
  Golang = 17,
134
+ /**
135
+ * Linkml
136
+ */
80
137
  Linkml = 18,
138
+ /**
139
+ * Julia
140
+ */
81
141
  Julia = 19,
142
+ /**
143
+ * Mermaid class diagram
144
+ */
145
+ Mermaid = 20,
82
146
  }
147
+ /**
148
+ * Represents different types of model imports.
149
+ *
150
+ * Can be either a remote URL or a local file path.
151
+ */
152
+ export type ImportType = { Remote: string } | { Local: string };
153
+
154
+ /**
155
+ * Represents the front matter data of a markdown file.
156
+ */
157
+ export interface FrontMatter {
158
+ /**
159
+ * Identifier field of the model.
160
+ */
161
+ id: string | undefined;
162
+ /**
163
+ * A boolean field with a default value, renamed from `id-field`.
164
+ */
165
+ "id-field"?: boolean;
166
+ /**
167
+ * Optional hashmap of prefixes.
168
+ */
169
+ prefixes: Map<string, string> | undefined;
170
+ /**
171
+ * Optional namespace map.
172
+ */
173
+ nsmap: Map<string, string> | undefined;
174
+ /**
175
+ * A string field with a default value representing the repository URL.
176
+ */
177
+ repo?: string;
178
+ /**
179
+ * A string field with a default value representing the prefix.
180
+ */
181
+ prefix?: string;
182
+ /**
183
+ * Import remote or local models.
184
+ */
185
+ imports?: Map<string, ImportType>;
186
+ /**
187
+ * Allow empty models.
188
+ */
189
+ "allow-empty"?: boolean;
190
+ }
191
+
192
+ /**
193
+ * Represents an XML type, either an attribute or an element.
194
+ */
195
+ export type XMLType = { Attribute: { is_attr: boolean; name: string } } | { Element: { is_attr: boolean; name: string } } | { Wrapped: { is_attr: boolean; name: string; wrapped: string[] | undefined } };
196
+
83
197
  /**
84
198
  * A raw key-value representation of an attribute option.
85
199
  *
@@ -165,24 +279,6 @@ export interface Object {
165
279
  position?: Position;
166
280
  }
167
281
 
168
- export interface PositionRange {
169
- start: number;
170
- end: number;
171
- }
172
-
173
- export interface Position {
174
- line: number;
175
- column: PositionRange;
176
- offset: PositionRange;
177
- }
178
-
179
- export interface DataModel {
180
- name?: string;
181
- objects: Object[];
182
- enums: Enumeration[];
183
- config?: FrontMatter;
184
- }
185
-
186
282
  export type DataType = { Boolean: boolean } | { Integer: number } | { Float: number } | { String: string };
187
283
 
188
284
  /**
@@ -243,54 +339,15 @@ export interface Attribute {
243
339
  import_prefix?: string;
244
340
  }
245
341
 
246
- /**
247
- * Represents an XML type, either an attribute or an element.
248
- */
249
- export type XMLType = { Attribute: { is_attr: boolean; name: string } } | { Element: { is_attr: boolean; name: string } } | { Wrapped: { is_attr: boolean; name: string; wrapped: string[] | undefined } };
250
-
251
- /**
252
- * Represents different types of model imports.
253
- *
254
- * Can be either a remote URL or a local file path.
255
- */
256
- export type ImportType = { Remote: string } | { Local: string };
342
+ export interface PositionRange {
343
+ start: number;
344
+ end: number;
345
+ }
257
346
 
258
- /**
259
- * Represents the front matter data of a markdown file.
260
- */
261
- export interface FrontMatter {
262
- /**
263
- * Identifier field of the model.
264
- */
265
- id: string | undefined;
266
- /**
267
- * A boolean field with a default value, renamed from `id-field`.
268
- */
269
- "id-field"?: boolean;
270
- /**
271
- * Optional hashmap of prefixes.
272
- */
273
- prefixes: Map<string, string> | undefined;
274
- /**
275
- * Optional namespace map.
276
- */
277
- nsmap: Map<string, string> | undefined;
278
- /**
279
- * A string field with a default value representing the repository URL.
280
- */
281
- repo?: string;
282
- /**
283
- * A string field with a default value representing the prefix.
284
- */
285
- prefix?: string;
286
- /**
287
- * Import remote or local models.
288
- */
289
- imports?: Map<string, ImportType>;
290
- /**
291
- * Allow empty models.
292
- */
293
- "allow-empty"?: boolean;
347
+ export interface Position {
348
+ line: number;
349
+ column: PositionRange;
350
+ offset: PositionRange;
294
351
  }
295
352
 
296
353
  /**
@@ -319,3 +376,10 @@ export interface ValidationError {
319
376
  positions: Position[];
320
377
  }
321
378
 
379
+ export interface DataModel {
380
+ name?: string;
381
+ objects: Object[];
382
+ enums: Enumeration[];
383
+ config?: FrontMatter;
384
+ }
385
+
@@ -224,29 +224,93 @@ export function validate(markdown_content) {
224
224
 
225
225
  /**
226
226
  * Enumeration of available templates.
227
- * @enum {0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19}
227
+ * @enum {0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20}
228
228
  */
229
229
  export const Templates = Object.freeze({
230
+ /**
231
+ * XML Schema
232
+ */
230
233
  XmlSchema: 0, "0": "XmlSchema",
234
+ /**
235
+ * Markdown
236
+ */
231
237
  Markdown: 1, "1": "Markdown",
238
+ /**
239
+ * Compact Markdown
240
+ */
232
241
  CompactMarkdown: 2, "2": "CompactMarkdown",
242
+ /**
243
+ * SHACL
244
+ */
233
245
  Shacl: 3, "3": "Shacl",
246
+ /**
247
+ * JSON Schema
248
+ */
234
249
  JsonSchema: 4, "4": "JsonSchema",
250
+ /**
251
+ * JSON Schema All
252
+ */
235
253
  JsonSchemaAll: 5, "5": "JsonSchemaAll",
254
+ /**
255
+ * SHACL
256
+ */
236
257
  Shex: 6, "6": "Shex",
258
+ /**
259
+ * Python Dataclass
260
+ */
237
261
  PythonDataclass: 7, "7": "PythonDataclass",
262
+ /**
263
+ * Python Pydantic XML
264
+ */
238
265
  PythonPydanticXML: 8, "8": "PythonPydanticXML",
266
+ /**
267
+ * Python Pydantic
268
+ */
239
269
  PythonPydantic: 9, "9": "PythonPydantic",
270
+ /**
271
+ * MkDocs
272
+ */
240
273
  MkDocs: 10, "10": "MkDocs",
274
+ /**
275
+ * Internal
276
+ */
241
277
  Internal: 11, "11": "Internal",
278
+ /**
279
+ * Typescript (io-ts)
280
+ */
242
281
  Typescript: 12, "12": "Typescript",
282
+ /**
283
+ * Typescript (Zod)
284
+ */
243
285
  TypescriptZod: 13, "13": "TypescriptZod",
286
+ /**
287
+ * Rust
288
+ */
244
289
  Rust: 14, "14": "Rust",
290
+ /**
291
+ * Protobuf
292
+ */
245
293
  Protobuf: 15, "15": "Protobuf",
294
+ /**
295
+ * Graphql
296
+ */
246
297
  Graphql: 16, "16": "Graphql",
298
+ /**
299
+ * Golang
300
+ */
247
301
  Golang: 17, "17": "Golang",
302
+ /**
303
+ * Linkml
304
+ */
248
305
  Linkml: 18, "18": "Linkml",
306
+ /**
307
+ * Julia
308
+ */
249
309
  Julia: 19, "19": "Julia",
310
+ /**
311
+ * Mermaid class diagram
312
+ */
313
+ Mermaid: 20, "20": "Mermaid",
250
314
  });
251
315
 
252
316
  export function __wbg_String_8f0eb39a4a4c2f66(arg0, arg1) {
Binary file
package/package.json CHANGED
@@ -5,7 +5,7 @@
5
5
  "Jan Range <jan.range@simtech.uni-stuttgart.de>"
6
6
  ],
7
7
  "description": "A tool to generate models, code and schemas from markdown files",
8
- "version": "0.2.2",
8
+ "version": "0.2.4",
9
9
  "license": "MIT",
10
10
  "repository": {
11
11
  "type": "git",