RubyGems - structify - Versions diffs - 0.1.0 → 0.2.0 - Mend

structify 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +14 -0
data/CLAUDE.md +27 -0
data/Gemfile.lock +7 -1
data/README.md +279 -144
data/lib/structify/model.rb +290 -58
data/lib/structify/schema_serializer.rb +165 -0
data/lib/structify/version.rb +1 -1
data/lib/structify.rb +67 -4
data/structify.gemspec +1 -1
metadata +5 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: eb04b492770cd56dc4378bb73de9ff1e0ee525e89f30c942f87f20314990bbc9
-  data.tar.gz: 5061d34d693c475070680501b3bf78f0634437cefc6da8b914fcda8a56ef3470
+  metadata.gz: 56ca4a2c78aa18b382aa54c4ba88ef246bd2014f895db59fb877ecfd5cb12edf
+  data.tar.gz: 7cb8abbf4ebc23b68a7c19492d4a5fa648d89bfea4dd49a632418a3a962be1cc
 SHA512:
-  metadata.gz: 0a1ee0ace2f35d460d244902a2de7699ce346ce2629c14e4151fb3d58e00f2f81444739faddd267f5eef2f53a9ecaf4334a1334e975b549fbf66993b7577c56f
-  data.tar.gz: 11cf2d7c6bbd538c8bc7fd944d289c3c303faa4e4fa08736ba0099ab276a4131ee973c2fc86a353e75c5ec13f25bb6a4409e4ee3c0a54b28be15167dbe3e3cdd
+  metadata.gz: 16e43c43971e51405759fd3cdf1cd12759ffef115313c144938d5dea704583e04fff57dfd388682127321b71c527131ed3d6edade964db6bb55e9b1c43900744
+  data.tar.gz: 9de0c840ee85e0ac8722b518125461672006624d70d2d7317e5df9db4832a6bd454e030c2790fab8d6e408d4e694039c4a50f471bf4a45e3a523929d59c7473e

data/CHANGELOG.md ADDED Viewed

@@ -0,0 +1,14 @@
+# Changelog
+All notable changes to this project will be documented in this file.
+## [0.2.0] - 2025-03-12
+### Added
+- New `thinking` mode option to automatically add chain of thought reasoning to LLM schemas
+- When enabled, adds a `chain_of_thought` field as the first property in the generated schema
+## [0.1.0] - Initial Release
+- Initial release of Structify

data/CLAUDE.md ADDED Viewed

@@ -0,0 +1,27 @@
+# CLAUDE.md - Guidelines for Structify
+## Commands
+- Build: `bundle exec rake build`
+- Install: `bundle exec rake install`
+- Test all: `bundle exec rake spec`
+- Test single file: `bundle exec rspec spec/path/to/file_spec.rb`
+- Test specific example: `bundle exec rspec spec/path/to/file_spec.rb:LINE_NUMBER`
+- Lint: `bundle exec rubocop`
+## Code Style
+- Use `# frozen_string_literal: true` at the top of all Ruby files
+- Follow Ruby naming conventions (snake_case for methods/variables, CamelCase for classes)
+- Include YARD documentation for classes and methods
+- Group similar methods together
+- Include descriptive RSpec tests for all functionality
+- Keep methods short and focused on a single responsibility
+- Use specific error classes for error handling
+- Prefer explicit requires over auto-loading
+- Follow ActiveSupport::Concern patterns for modules
+- Keep DSL simple and intuitive for end users
+## Structure
+- Put core functionality in lib/structify/
+- Keep implementation details private when possible
+- Follow semantic versioning guidelines
+- Ensure proper test coverage for all public APIs

data/Gemfile.lock CHANGED Viewed

@@ -2,7 +2,7 @@ PATH
   remote: .
   specs:
     structify (0.1.0)
-      activesupport (>= 7.1)
+      activesupport (~> 7.1)
       attr_json (~> 2.1)
 GEM
@@ -43,6 +43,8 @@ GEM
       mutex_m
       securerandom (>= 0.3)
       tzinfo (~> 2.0)
+    addressable (2.8.7)
+      public_suffix (>= 2.0.2, < 7.0)
     ast (2.4.2)
     attr_json (2.5.0)
       activerecord (>= 6.0.0, < 8.1)
@@ -68,6 +70,8 @@ GEM
       rdoc (>= 4.0.0)
       reline (>= 0.4.2)
     json (2.9.1)
+    json-schema (4.3.1)
+      addressable (>= 2.8)
     language_server-protocol (3.17.0.4)
     logger (1.6.5)
     loofah (2.24.0)
@@ -87,6 +91,7 @@ GEM
     psych (5.2.3)
       date
       stringio
+    public_suffix (6.0.1)
     racc (1.8.1)
     rack (3.1.9)
     rack-session (2.1.0)
@@ -182,6 +187,7 @@ PLATFORMS
 DEPENDENCIES
   activerecord (~> 7.1.0)
   debug (>= 1.0.0)
+  json-schema (~> 4.1)
   rake (~> 13.0)
   rspec (~> 3.12)
   rspec-rails (~> 6.1)

data/README.md CHANGED Viewed

@@ -2,220 +2,355 @@
 [![Gem Version](https://badge.fury.io/rb/structify.svg)](https://badge.fury.io/rb/structify)
-Structify is a Ruby gem that provides a simple DSL to define extraction schemas for LLM-powered models. It integrates seamlessly with Rails models, allowing you to specify versioning, assistant prompts, and field definitions—all in a clean, declarative syntax.
+A Ruby gem for extracting structured data from content using LLMs in Rails applications
-## Features
+## What is Structify?
-- 🎯 Simple DSL for defining LLM extraction schemas
-- 🔄 Built-in versioning for schema evolution
-- 📝 Support for custom assistant prompts
-- 🏗️ JSON Schema generation for LLM validation
-- 🔌 Seamless Rails/ActiveRecord integration
-- 💾 Automatic JSON attribute handling
+Structify helps you extract structured data from unstructured content in your Rails apps:
-## Installation
+- **Define extraction schemas** directly in your ActiveRecord models
+- **Generate JSON schemas** to use with OpenAI, Anthropic, or other LLM providers
+- **Store and validate** extracted data in your models
+- **Access structured data** through typed model attributes
-Add this line to your application's Gemfile:
+## Use Cases
+- Extract metadata, topics, and sentiment from articles or blog posts
+- Pull structured information from user-generated content
+- Organize unstructured feedback or reviews into categorized data
+- Convert emails or messages into actionable, structured formats
+- Extract entities and relationships from documents
 ```ruby
-gem 'structify'
+# 1. Define extraction schema in your model
+class Article < ApplicationRecord
+  include Structify::Model
+  schema_definition do
+    field :title, :string
+    field :summary, :text
+    field :category, :string, enum: ["tech", "business", "science"]
+    field :topics, :array, items: { type: "string" }
+  end
+end
+# 2. Get schema for your LLM API
+schema = Article.json_schema
+# 3. Store LLM response in your model
+article = Article.find(123)
+article.update(llm_response)
+# 4. Access extracted data
+article.title    # => "AI Advances in 2023"
+article.summary  # => "Recent developments in artificial intelligence..."
+article.topics   # => ["machine learning", "neural networks", "computer vision"]
 ```
-And then execute:
+## Install
+```ruby
+# Add to Gemfile
+gem 'structify'
+```
+Then:
 ```bash
-$ bundle install
+bundle install
 ```
-Or install it yourself as:
+## Database Setup
-```bash
-$ gem install structify
+Add a JSON column to store extracted data:
+```ruby
+add_column :articles, :extracted_data, :jsonb  # PostgreSQL
+# or
+add_column :articles, :extracted_data, :json   # MySQL
 ```
 ## Usage
-### Basic Example
-Here's a simple example of using Structify in a Rails model:
+### Define Your Schema
 ```ruby
 class Article < ApplicationRecord
   include Structify::Model
   schema_definition do
-    title "Article Extraction"
-    description "Extract key information from articles"
     version 1
-    assistant_prompt "Extract the following fields from the article content"
-    llm_model "gpt-4"
+    title "Article Extraction"
     field :title, :string, required: true
-    field :summary, :text, description: "A brief summary of the article"
+    field :summary, :text
     field :category, :string, enum: ["tech", "business", "science"]
+    field :topics, :array, items: { type: "string" }
+    field :metadata, :object, properties: {
+      "author" => { type: "string" },
+      "published_at" => { type: "string" }
+    }
   end
 end
 ```
-### Advanced Example
+### Get Schema for LLM API
-Here's a more complex example showing all available features:
+Structify generates the JSON schema that you'll need to send to your LLM provider:
 ```ruby
-class EmailSummary < ApplicationRecord
-  include Structify::Model
-  schema_definition do
-    version 2  # Increment this when making breaking changes
-    title "Email Thread Extraction"
-    description "Extracts key information from email threads"
+# Get JSON Schema to send to OpenAI, Anthropic, etc.
+schema = Article.json_schema
+```
-    assistant_prompt <<~PROMPT
-      You are an assistant that extracts concise metadata from email threads.
-      Focus on producing a clear summary, action items, and sentiment analysis.
-      If there are multiple participants, include their roles in the conversation.
-    PROMPT
+### Integration with LLM Services
-    llm_model "gpt-4"  # Supports any LLM model
+You need to implement the actual LLM integration. Here's how you can integrate with popular services:
-    # Required fields
-    field :subject, :string,
-      required: true,
-      description: "The main topic or subject of the email thread"
+#### OpenAI Integration Example
-    field :summary, :text,
-      required: true,
-      description: "A concise summary of the entire thread"
+```ruby
+require "openai"
-    # Optional fields with enums
-    field :sentiment, :string,
-      enum: ["positive", "neutral", "negative"],
-      description: "The overall sentiment of the conversation"
+class OpenAiExtractor
+  def initialize(api_key = ENV["OPENAI_API_KEY"])
+    @client = OpenAI::Client.new(access_token: api_key)
+  end
+  def extract(content, model_class)
+    # Get schema from Structify model
+    schema = model_class.json_schema
+    # Call OpenAI with structured outputs
+    response = @client.chat(
+      parameters: {
+        model: "gpt-4o",
+        response_format: { type: "json_object", schema: schema },
+        messages: [
+          { role: "system", content: "Extract structured information from the provided content." },
+          { role: "user", content: content }
+        ]
+      }
+    )
+    # Parse and return the structured data
+    JSON.parse(response.dig("choices", 0, "message", "content"), symbolize_names: true)
+  end
+end
-    field :priority, :string,
-      enum: ["high", "medium", "low"],
-      description: "The priority level based on content and tone"
+# Usage
+extractor = OpenAiExtractor.new
+article = Article.find(123)
+extracted_data = extractor.extract(article.content, Article)
+article.update(extracted_data)
+```
-    # Complex fields
-    field :participants, :json,
-      description: "List of participants and their roles"
+#### Anthropic Integration Example
-    field :action_items, :json,
-      description: "Array of action items extracted from the thread"
+```ruby
+require "anthropic"
-    field :next_steps, :string,
-      description: "Recommended next steps based on the thread"
+class AnthropicExtractor
+  def initialize(api_key = ENV["ANTHROPIC_API_KEY"])
+    @client = Anthropic::Client.new(api_key: api_key)
+  end
+  def extract(content, model_class)
+    # Get schema from Structify model
+    schema = model_class.json_schema
+    # Call Claude with tool use
+    response = @client.messages.create(
+      model: "claude-3-opus-20240229",
+      max_tokens: 1000,
+      system: "Extract structured data based on the provided schema.",
+      messages: [{ role: "user", content: content }],
+      tools: [{
+        type: "function",
+        function: {
+          name: "extract_data",
+          description: "Extract structured data from content",
+          parameters: schema
+        }
+      }],
+      tool_choice: { type: "function", function: { name: "extract_data" } }
+    )
+    # Parse and return structured data
+    JSON.parse(response.content[0].tools[0].function.arguments, symbolize_names: true)
   end
-  # You can still use regular ActiveRecord features
-  validates :subject, presence: true
-  validates :summary, length: { minimum: 10 }
 end
 ```
-### Accessing Schema Information
-Structify provides several helper methods to access schema information:
+### Store & Access Extracted Data
 ```ruby
-# Get the JSON Schema
-EmailSummary.json_schema
-# => {
-#   name: "Email Thread Extraction",
-#   description: "Extracts key information from email threads",
-#   parameters: {
-#     type: "object",
-#     required: ["subject", "summary"],
-#     properties: {
-#       subject: { type: "string" },
-#       summary: { type: "text" },
-#       sentiment: {
-#         type: "string",
-#         enum: ["positive", "neutral", "negative"]
-#       },
-#       # ...
-#     }
-#   }
-# }
-# Get the current version
-EmailSummary.extraction_version  # => 2
-# Get the assistant prompt
-EmailSummary.extraction_assistant_prompt
-# => "You are an assistant that extracts concise metadata..."
-# Get the LLM model
-EmailSummary.extraction_llm_model  # => "gpt-4"
+# Store LLM response in your model
+article.update(response)
+# Access via model attributes
+article.title        # => "How AI is Changing Healthcare"
+article.category     # => "tech"
+article.topics       # => ["machine learning", "healthcare"]
+# All data is in the JSON column
+article.extracted_data  # => The complete JSON
 ```
-### Working with Extracted Data
+## Field Types
-Structify uses the `attr_json` gem to handle JSON attributes. All fields are stored in the `extracted_data` JSON column:
+Structify supports all standard JSON Schema types:
 ```ruby
-# Create a new record with extracted data
-summary = EmailSummary.create(
-  subject: "Project Update",
-  summary: "Team discussed Q2 goals",
-  sentiment: "positive",
-  priority: "high",
-  participants: [
-    { name: "Alice", role: "presenter" },
-    { name: "Bob", role: "reviewer" }
-  ]
-)
-# Access fields directly
-summary.subject      # => "Project Update"
-summary.sentiment    # => "positive"
-summary.participants # => [{ name: "Alice", ... }]
-# Validate enum values
-summary.sentiment = "invalid"
-summary.valid?  # => false
+field :name, :string             # String values
+field :count, :integer           # Integer values
+field :price, :number            # Numeric values (float/int)
+field :active, :boolean          # Boolean values
+field :metadata, :object         # JSON objects
+field :tags, :array              # Arrays
 ```
-## Database Setup
+## Field Options
+```ruby
+# Required fields
+field :title, :string, required: true
+# Enum values
+field :status, :string, enum: ["draft", "published", "archived"]
+# Array constraints
+field :tags, :array,
+  items: { type: "string" },
+  min_items: 1,
+  max_items: 5,
+  unique_items: true
+# Nested objects
+field :author, :object, properties: {
+  "name" => { type: "string", required: true },
+  "email" => { type: "string" }
+}
+```
+## Chain of Thought Mode
-Ensure your model has a JSON column named `extracted_data`:
+Structify supports a "thinking" mode that automatically requests chain of thought reasoning from the LLM:
 ```ruby
-class CreateEmailSummaries < ActiveRecord::Migration[7.1]
-  def change
-    create_table :email_summaries do |t|
-      t.json :extracted_data  # Required by Structify
-      t.timestamps
-    end
-  end
+schema_definition do
+  version 1
+  thinking true  # Enable chain of thought reasoning
+  field :title, :string, required: true
+  # other fields...
 end
 ```
-## Development
+Chain of thought (COT) reasoning is beneficial because it:
+- Adds more context to the extraction process
+- Helps the LLM think through problems more systematically
+- Improves accuracy for complex extractions
+- Makes the reasoning process transparent and explainable
+- Reduces hallucinations by forcing step-by-step thinking
-After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
+This is especially useful when:
+- Answers need more detailed information
+- Questions require multi-step reasoning
+- Extractions involve complex decision-making
+- You need to understand how the LLM reached its conclusions
-To install this gem onto your local machine, run `bundle exec rake install`.
+For best results, include instructions for COT in your base system prompt:
-## Contributing
+```ruby
+system_prompt = "Extract structured data from the content.
+For each field, think step by step before determining the value."
+```
-1. Fork it
-2. Create your feature branch (`git checkout -b feature/my-new-feature`)
-3. Commit your changes (`git commit -am 'Add some feature'`)
-4. Push to the branch (`git push origin feature/my-new-feature`)
-5. Create a new Pull Request
+You can generate effective chain of thought prompts using tools like the [Claude Prompt Designer](https://console.anthropic.com/dashboard).
-Bug reports and pull requests are welcome on GitHub at https://github.com/kieranklaassen/structify.
+## Schema Versioning and Field Lifecycle
-## License
+Structify provides a simple field lifecycle management system using a `versions` parameter:
-The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
+```ruby
+schema_definition do
+  version 3
+  # Fields for specific version ranges
+  field :title, :string                       # Available in all versions (default behavior)
+  field :legacy, :string, versions: 1...3     # Only in versions 1-2 (removed in v3)
+  field :summary, :text, versions: 2          # Added in version 2 onwards
+  field :content, :text, versions: 2..        # Added in version 2 onwards (endless range)
+  field :temp_field, :string, versions: 2..3  # Only in versions 2-3
+  field :special, :string, versions: [1, 3, 5] # Only in versions 1, 3, and 5
+end
+```
-## Code of Conduct
+### Version Range Syntax
-Everyone interacting in the Structify project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](CODE_OF_CONDUCT.md).
+Structify supports several ways to specify which versions a field is available in:
-```
+| Syntax | Example | Meaning |
+|--------|---------|---------|
+| No version specified | `field :title, :string` | Available in all versions (default) |
+| Single integer | `versions: 2` | Available from version 2 onwards |
+| Range (inclusive) | `versions: 1..3` | Available in versions 1, 2, and 3 |
+| Range (exclusive) | `versions: 1...3` | Available in versions 1 and 2 (not 3) |
+| Endless range | `versions: 2..` | Available from version 2 onwards |
+| Array | `versions: [1, 4, 7]` | Only available in versions 1, 4, and 7 |
+### Handling Records with Different Versions
+```ruby
+# Create a record with version 1 schema
+article_v1 = Article.create(title: "Original Article")
+# Access with version 3 schema
+article_v3 = Article.find(article_v1.id)
+# Fields from v1 are still accessible
+article_v3.title  # => "Original Article"
+# Fields not in v1 raise errors
+article_v3.summary  # => VersionRangeError: Field 'summary' is not available in version 1.
+                    #    This field is only available in versions: 2 to 999.
+# Check version compatibility
+article_v3.version_compatible_with?(3)  # => false
+article_v3.version_compatible_with?(1)  # => true
+# Upgrade record to version 3
+article_v3.summary = "Added in v3"
+article_v3.save!  # Record version is automatically updated to 3
 ```
+## Understanding Structify's Role
+Structify is designed as a **bridge** between your Rails models and LLM extraction services:
+### What Structify Does For You
+- ✅ **Define extraction schemas** directly in your ActiveRecord models
+- ✅ **Generate compatible JSON schemas** for OpenAI, Anthropic, and other LLM providers
+- ✅ **Store and validate** extracted data against your schema
+- ✅ **Provide typed access** to extracted fields through your models
+- ✅ **Handle schema versioning** and backward compatibility
+- ✅ **Support chain of thought reasoning** with the thinking mode option
+### What You Need To Implement
+- 🔧 **API integration** with your chosen LLM provider (see examples above)
+- 🔧 **Processing logic** for when and how to extract data
+- 🔧 **Authentication** and API key management
+- 🔧 **Error handling and retries** for API calls
+This separation of concerns allows you to:
+1. Use any LLM provider and model you prefer
+2. Implement extraction logic specific to your application
+3. Handle API access in a way that fits your application architecture
+4. Change LLM providers without changing your data model
+## License
+[MIT License](https://opensource.org/licenses/MIT)

data/lib/structify/model.rb CHANGED Viewed

@@ -3,10 +3,11 @@
 require "active_support/concern"
 require "active_support/core_ext/class/attribute"
 require "attr_json"
+require_relative "schema_serializer"
 module Structify
   # The Model module provides a DSL for defining LLM extraction schemas in your Rails models.
-  # It allows you to define fields, versioning, and assistant prompts for LLM-based data extraction.
+  # It allows you to define fields, versioning, and validation for LLM-based data extraction.
   #
   # @example
   #   class Article < ApplicationRecord
@@ -16,8 +17,6 @@ module Structify
   #       title "Article Extraction"
   #       description "Extract article metadata"
   #       version 1
-  #       assistant_prompt "Extract the following fields from the article"
-  #       llm_model "gpt-4"
   #
   #       field :title, :string, required: true
   #       field :summary, :text, description: "A brief summary of the article"
@@ -34,6 +33,30 @@ module Structify
       # Store all extracted data in the extracted_data JSON column
       attr_json_config(default_container_attribute: :extracted_data)
     end
+    # Instance methods
+    def version_compatible_with?(required_version)
+      record_version = self.extracted_data && self.extracted_data["version"] ?
+                       self.extracted_data["version"] : 1
+      record_version >= required_version
+    end
+    # Check if a version is within a given range/array of versions
+    # This is used in field accessors to check version compatibility
+    #
+    # @param version [Integer] The version to check
+    # @param range [Range, Array, Integer] The range, array, or single version to check against
+    # @return [Boolean] Whether the version is within the range
+    def version_in_range?(version, range)
+      case range
+      when Range
+        range.cover?(version)
+      when Array
+        range.include?(version)
+      else
+        version == range
+      end
+    end
     # Class methods added to the including class
     module ClassMethods
@@ -60,19 +83,6 @@ module Structify
         schema_builder&.version_number
       end
-      # Get the assistant prompt
-      #
-      # @return [String] The assistant prompt
-      def extraction_assistant_prompt
-        schema_builder&.assistant_prompt_str
-      end
-      # Get the LLM model name
-      #
-      # @return [String] The model name
-      def extraction_llm_model
-        schema_builder&.model_name
-      end
     end
   end
@@ -82,11 +92,9 @@ module Structify
     # @return [Array<Hash>] The field definitions
     # @return [String] The schema title
     # @return [String] The schema description
-    # @return [String] The assistant prompt
-    # @return [String] The LLM model name
     # @return [Integer] The schema version
-    attr_reader :model, :fields, :title_str, :description_str,
-                :assistant_prompt_str, :model_name, :version_number
+    # @return [Boolean] Whether thinking mode is enabled
+    attr_reader :model, :fields, :title_str, :description_str, :version_number, :thinking_enabled
     # Initialize a new SchemaBuilder
     #
@@ -94,9 +102,17 @@ module Structify
     def initialize(model)
       @model = model
       @fields = []
-      @assistant_prompt_str = nil
-      @model_name = nil
       @version_number = 1
+      @thinking_enabled = false
+    end
+    # Enable or disable thinking mode
+    # When enabled, the LLM will be asked to provide chain of thought reasoning
+    #
+    # @param enabled [Boolean] Whether to enable thinking mode
+    # @return [void]
+    def thinking(enabled)
+      @thinking_enabled = enabled
     end
     # Set the schema title
@@ -121,24 +137,15 @@ module Structify
     # @return [void]
     def version(num)
       @version_number = num
-      model.attribute :version, :integer, default: num
+      # Define version as an attr_json field so it's stored in extracted_data
+      model.attr_json :version, :integer, default: num
+      # Store mapping of fields to their introduction version
+      @fields_by_version ||= {}
+      @fields_by_version[num] ||= []
     end
-    # Set the assistant prompt
-    #
-    # @param prompt [String] The prompt text
-    # @return [void]
-    def assistant_prompt(prompt)
-      @assistant_prompt_str = prompt.strip
-    end
-    # Set the LLM model name
-    #
-    # @param name [String] The model name
-    # @return [void]
-    def llm_model(name)
-      @model_name = name
-    end
     # Define a field in the schema
     #
@@ -147,40 +154,265 @@ module Structify
     # @param required [Boolean] Whether the field is required
     # @param description [String] The field description
     # @param enum [Array] Possible values for the field
+    # @param items [Hash] For array type, defines the schema for array items
+    # @param properties [Hash] For object type, defines the properties of the object
+    # @param min_items [Integer] For array type, minimum number of items
+    # @param max_items [Integer] For array type, maximum number of items
+    # @param unique_items [Boolean] For array type, whether items must be unique
+    # @param versions [Range, Array, Integer] The versions this field is available in (default: current version onwards)
     # @return [void]
-    def field(name, type, required: false, description: nil, enum: nil)
-      fields << {
+    def field(name, type, required: false, description: nil, enum: nil,
+              items: nil, properties: nil, min_items: nil, max_items: nil,
+              unique_items: nil, versions: nil)
+      # Handle version information
+      version_range = if versions
+                        # Use the versions parameter if provided
+                        versions
+                      else
+                        # Default: field is available in all versions
+                        1..999
+                      end
+      # Check if the field is applicable for the current schema version
+      field_available = version_in_range?(@version_number, version_range)
+      # Skip defining the field in the schema if it's not applicable to the current version
+      unless field_available
+        # Still define an accessor that raises an appropriate error
+        define_version_range_accessor(name, version_range)
+        return
+      end
+      # Calculate a simple introduced_in for backward compatibility
+      effective_introduced_in = case version_range
+                               when Range
+                                 version_range.begin
+                               when Array
+                                 version_range.min
+                               else
+                                 version_range
+                               end
+      field_definition = {
         name: name,
         type: type,
         required: required,
         description: description,
-        enum: enum
+        version_range: version_range,
+        introduced_in: effective_introduced_in
       }
+      # Add enum if provided
+      field_definition[:enum] = enum if enum
+      # Array specific properties
+      if type == :array
+        field_definition[:items] = items if items
+        field_definition[:min_items] = min_items if min_items
+        field_definition[:max_items] = max_items if max_items
+        field_definition[:unique_items] = unique_items if unique_items
+      end
+      # Object specific properties
+      if type == :object
+        field_definition[:properties] = properties if properties
+      end
+      fields << field_definition
+      # Track field by its version range
+      @fields_by_version ||= {}
+      @fields_by_version[effective_introduced_in] ||= []
+      @fields_by_version[effective_introduced_in] << name
+      # Map JSON Schema types to Ruby/AttrJson types
+      attr_type = case type
+                  when :integer, :number
+                    :integer
+                  when :array
+                    :json
+                  when :object
+                    :json
+                  when :boolean
+                    :boolean
+                  else
+                    type # string, text stay the same
+                  end
+      # Define custom accessor that checks version compatibility
+      define_version_range_accessors(name, attr_type, version_range)
+    end
+    # Check if a version is within a given range/array of versions
+    #
+    # @param version [Integer] The version to check
+    # @param range [Range, Array, Integer] The range, array, or single version to check against
+    # @return [Boolean] Whether the version is within the range
+    def version_in_range?(version, range)
+      case range
+      when Range
+        # Handle endless ranges (Ruby 2.6+): 2.. means 2 and above
+        if range.end.nil?
+          version >= range.begin
+        else
+          range.cover?(version)
+        end
+      when Array
+        range.include?(version)
+      else
+        # A single integer means "this version and onwards"
+        version >= range
+      end
+    end
+    # Define accessor methods that check version compatibility using the new version ranges
+    #
+    # @param name [Symbol] The field name
+    # @param type [Symbol] The field type for attr_json
+    # @param version_range [Range, Array, Integer] The versions this field is available in
+    # @return [void]
+    def define_version_range_accessors(name, type, version_range)
+      # Define the attr_json normally first
       model.attr_json name, type
+      # Extract current version for error messages
+      schema_version = @version_number
+      # Then override the reader method to check versions
+      model.class_eval <<-RUBY, __FILE__, __LINE__ + 1
+        # Store original method
+        alias_method :_original_#{name}, :#{name}
+        # Override reader to check version compatibility
+        def #{name}
+          # Get the version from the record data
+          record_version = self.extracted_data && self.extracted_data["version"] ?
+                           self.extracted_data["version"] : 1
+          # Check if record version is compatible with field's version range
+          field_version_range = #{version_range.inspect}
+          # Handle field lifecycle based on version
+          unless version_in_range?(record_version, field_version_range)
+            # Check if this is a removed field (was valid in earlier versions but not current version)
+            if field_version_range.is_a?(Range) && field_version_range.begin <= record_version && field_version_range.end < #{schema_version}
+              raise Structify::RemovedFieldError.new(
+                "#{name}",
+                field_version_range.end
+              )
+            # Check if this is a new field (only valid in later versions)
+            elsif (field_version_range.is_a?(Range) && field_version_range.begin > record_version) ||
+                  (field_version_range.is_a?(Integer) && field_version_range > record_version)
+              raise Structify::VersionRangeError.new(
+                "#{name}",
+                record_version,
+                field_version_range
+              )
+            # Otherwise it's just not in the valid range
+            else
+              raise Structify::VersionRangeError.new(
+                "#{name}",
+                record_version,
+                field_version_range
+              )
+            end
+          end
+          # Check for deprecated fields and show warning
+          if field_version_range.is_a?(Range) &&
+             field_version_range.begin < #{schema_version} &&
+             field_version_range.end < 999 &&
+             field_version_range.cover?(record_version)
+            ActiveSupport::Deprecation.warn(
+              "Field '#{name}' is deprecated as of version #{schema_version} and will be removed in version \#{field_version_range.end}."
+            )
+          end
+          # Call original method
+          _original_#{name}
+        end
+      RUBY
+    end
+    # Define accessor for fields that are not in the current schema version
+    # These will raise an appropriate error when accessed
+    #
+    # @param name [Symbol] The field name
+    # @param version_range [Range, Array, Integer] The versions this field is available in
+    # @return [void]
+    def define_version_range_accessor(name, version_range)
+      # Capture schema version to use in the eval block
+      schema_version = @version_number
+      # Handle different version range types
+      version_range_type = case version_range
+                          when Range
+                            "range"
+                          when Array
+                            "array"
+                          else
+                            "integer"
+                          end
+      # Extract begin/end values for ranges
+      range_begin = case version_range
+                    when Range
+                      version_range.begin
+                    when Array
+                      version_range.min
+                    else
+                      version_range
+                    end
+      range_end = case version_range
+                  when Range
+                    version_range.end
+                  when Array
+                    version_range.max
+                  else
+                    version_range
+                  end
+      model.class_eval <<-RUBY, __FILE__, __LINE__ + 1
+        # Define an accessor that raises an error when accessed
+        def #{name}
+          # Based on the version_range type, create appropriate errors
+          case "#{version_range_type}"
+          when "range"
+            if #{range_begin} <= #{schema_version} && #{range_end} < #{schema_version}
+              # Removed field
+              raise Structify::RemovedFieldError.new("#{name}", #{range_end})
+            elsif #{range_begin} > #{schema_version}
+              # Field from future version
+              raise Structify::VersionRangeError.new("#{name}", #{schema_version}, #{version_range.inspect})
+            else
+              # Not in range for other reasons
+              raise Structify::VersionRangeError.new("#{name}", #{schema_version}, #{version_range.inspect})
+            end
+          when "array"
+            # For arrays, we can only check if the current version is in the array
+            raise Structify::VersionRangeError.new("#{name}", #{schema_version}, #{version_range.inspect})
+          else
+            # For integers, just report version mismatch
+            raise Structify::VersionRangeError.new("#{name}", #{schema_version}, #{version_range.inspect})
+          end
+        end
+        # Define a writer that raises an error too
+        def #{name}=(value)
+          # Use the same error logic as the reader
+          self.#{name}
+        end
+      RUBY
     end
     # Generate the JSON schema representation
     #
     # @return [Hash] The JSON schema
     def to_json_schema
-      required_fields = fields.select { |f| f[:required] }.map { |f| f[:name].to_s }
-      properties_hash = fields.each_with_object({}) do |f, hash|
-        prop = { type: f[:type].to_s }
-        prop[:description] = f[:description] if f[:description]
-        prop[:enum] = f[:enum] if f[:enum]
-        hash[f[:name].to_s] = prop
-      end
-      {
-        name: title_str,
-        description: description_str,
-        parameters: {
-          type: "object",
-          required: required_fields,
-          properties: properties_hash
-        }
-      }
+      serializer = SchemaSerializer.new(self)
+      serializer.to_json_schema
     end
   end
 end

data/lib/structify/schema_serializer.rb ADDED Viewed

@@ -0,0 +1,165 @@
+# frozen_string_literal: true
+module Structify
+  # Handles serialization of schema definitions to different formats
+  class SchemaSerializer
+    # @return [Structify::SchemaBuilder] The schema builder to serialize
+    attr_reader :schema_builder
+    # Initialize a new SchemaSerializer
+    #
+    # @param schema_builder [Structify::SchemaBuilder] The schema builder to serialize
+    def initialize(schema_builder)
+      @schema_builder = schema_builder
+    end
+    # Generate the JSON schema representation
+    #
+    # @return [Hash] The JSON schema
+    def to_json_schema
+      # Get current schema version
+      current_version = schema_builder.version_number
+      # Get fields that are applicable to the current schema version
+      fields = schema_builder.fields.select do |f|
+        # Check if the field has a version_range
+        if f[:version_range]
+          version_in_range?(current_version, f[:version_range])
+        # Legacy check for removed_in
+        elsif f[:removed_in]
+          f[:removed_in] > current_version
+        else
+          true
+        end
+      end
+      # Get required fields (excluding fields not in the current version)
+      required_fields = fields.select { |f| f[:required] }.map { |f| f[:name].to_s }
+      # Start with chain_of_thought if thinking mode is enabled
+      properties_hash = {}
+      if schema_builder.thinking_enabled
+        properties_hash["chain_of_thought"] = {
+          type: "string",
+          description: "Explain your thought process step by step before determining the final values."
+        }
+      end
+      # Add all other fields
+      fields.each_with_object(properties_hash) do |f, hash|
+        # Start with the basic type
+        prop = { type: f[:type].to_s }
+        # Add description if available
+        prop[:description] = f[:description] if f[:description]
+        # Add enum if available
+        prop[:enum] = f[:enum] if f[:enum]
+        # Handle array specific properties
+        if f[:type] == :array
+          # Add items schema
+          prop[:items] = f[:items] if f[:items]
+          # Add array constraints
+          prop[:minItems] = f[:min_items] if f[:min_items]
+          prop[:maxItems] = f[:max_items] if f[:max_items]
+          prop[:uniqueItems] = f[:unique_items] if f[:unique_items]
+        end
+        # Handle object specific properties
+        if f[:type] == :object && f[:properties]
+          prop[:properties] = {}
+          required_props = []
+          # Process each property
+          f[:properties].each do |prop_name, prop_def|
+            prop[:properties][prop_name] = prop_def.dup
+            # If a property is marked as required, add it to required list and remove from property definition
+            if prop_def[:required]
+              required_props << prop_name
+              prop[:properties][prop_name].delete(:required)
+            end
+          end
+          # Add required array if we have required properties
+          prop[:required] = required_props unless required_props.empty?
+        end
+        # Add version info to description only if requested by environment variable
+        # This allows for backward compatibility with existing tests
+        if ENV["STRUCTIFY_SHOW_VERSION_INFO"] && f[:version_range] && prop[:description]
+          version_info = format_version_range(f[:version_range])
+          prop[:description] = "#{prop[:description]} (Available in versions: #{version_info})"
+        elsif ENV["STRUCTIFY_SHOW_VERSION_INFO"] && f[:version_range]
+          prop[:description] = "Available in versions: #{format_version_range(f[:version_range])}"
+        end
+        # Legacy: Add a deprecation notice to description
+        if f[:deprecated_in] && f[:deprecated_in] <= current_version
+          deprecation_note = "Deprecated in v#{f[:deprecated_in]}. "
+          prop[:description] = if prop[:description]
+                                "#{deprecation_note}#{prop[:description]}"
+                              else
+                                deprecation_note
+                              end
+        end
+        hash[f[:name].to_s] = prop
+      end
+      {
+        name: schema_builder.title_str,
+        description: schema_builder.description_str,
+        parameters: {
+          type: "object",
+          required: required_fields,
+          properties: properties_hash
+        }
+      }
+    end
+    private
+    # Check if a version is within a given range/array of versions
+    #
+    # @param version [Integer] The version to check
+    # @param range [Range, Array, Integer] The range, array, or single version to check against
+    # @return [Boolean] Whether the version is within the range
+    def version_in_range?(version, range)
+      case range
+      when Range
+        # Handle endless ranges (Ruby 2.6+): 2.. means 2 and above
+        if range.end.nil?
+          version >= range.begin
+        else
+          range.cover?(version)
+        end
+      when Array
+        range.include?(version)
+      else
+        # A single integer means "this version and onwards"
+        version >= range
+      end
+    end
+    # Format a version range for display in error messages
+    #
+    # @param versions [Range, Array, Integer] The version range to format
+    # @return [String] A human-readable version range
+    def format_version_range(versions)
+      if versions.is_a?(Range)
+        if versions.end.nil?
+          "#{versions.begin} and above"
+        else
+          "#{versions.begin} to #{versions.end}#{versions.exclude_end? ? ' (exclusive)' : ''}"
+        end
+      elsif versions.is_a?(Array)
+        versions.join(", ")
+      else
+        "#{versions} and above"  # Single integer means this version and onwards
+      end
+    end
+  end
+end

data/lib/structify/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 module Structify
-  VERSION = "0.1.0"
+  VERSION = "0.2.0"
 end

data/lib/structify.rb CHANGED Viewed

@@ -1,11 +1,12 @@
 # frozen_string_literal: true
 require_relative "structify/version"
+require_relative "structify/schema_serializer"
 require_relative "structify/model"
 # Structify is a DSL for defining extraction schemas for LLM-powered models.
 # It provides a simple way to integrate with Rails models for LLM extraction,
-# including versioning, assistant prompts, and more.
+# allowing for schema versioning and evolution.
 #
 # @example
 #   class Article < ApplicationRecord
@@ -15,8 +16,6 @@ require_relative "structify/model"
 #       title "Article Extraction"
 #       description "Extract article metadata"
 #       version 1
-#       assistant_prompt "Extract the following fields from the article"
-#       llm_model "gpt-4"
 #
 #       field :title, :string, required: true
 #       field :summary, :text, description: "A brief summary of the article"
@@ -24,6 +23,70 @@ require_relative "structify/model"
 #     end
 #   end
 module Structify
+  # Base error class for Structify
   class Error < StandardError; end
-  # Your code goes here...
+  # Error raised when trying to access a field that doesn't exist in the record's version
+  class MissingFieldError < Error
+    attr_reader :field_name, :record_version, :schema_version
+    def initialize(field_name, record_version, schema_version)
+      @field_name = field_name
+      @record_version = record_version
+      @schema_version = schema_version
+      message = "Field '#{field_name}' does not exist in version #{record_version}. " \
+                "It was introduced in version #{schema_version}. " \
+                "To access this field, upgrade the record by setting new field values and saving."
+      super(message)
+    end
+  end
+  # Error raised when trying to access a field that has been removed in the current schema version
+  class RemovedFieldError < Error
+    attr_reader :field_name, :removed_in_version
+    def initialize(field_name, removed_in_version)
+      @field_name = field_name
+      @removed_in_version = removed_in_version
+      message = "Field '#{field_name}' has been removed in version #{removed_in_version}. " \
+                "This field is no longer available in the current schema."
+      super(message)
+    end
+  end
+  # Error raised when trying to access a field outside its specified version range
+  class VersionRangeError < Error
+    attr_reader :field_name, :record_version, :valid_versions
+    def initialize(field_name, record_version, valid_versions)
+      @field_name = field_name
+      @record_version = record_version
+      @valid_versions = valid_versions
+      message = "Field '#{field_name}' is not available in version #{record_version}. " \
+                "This field is only available in versions: #{format_versions(valid_versions)}."
+      super(message)
+    end
+    private
+    def format_versions(versions)
+      if versions.is_a?(Range)
+        if versions.end.nil?
+          "#{versions.begin} and above"
+        else
+          "#{versions.begin} to #{versions.end}#{versions.exclude_end? ? ' (exclusive)' : ''}"
+        end
+      elsif versions.is_a?(Array)
+        versions.join(", ")
+      else
+        "#{versions} and above"  # Single integer means this version and onwards
+      end
+    end
+  end
 end

data/structify.gemspec CHANGED Viewed

@@ -19,7 +19,7 @@ Gem::Specification.new do |spec|
   # Specify which files should be added to the gem when it is released.
   # The `git ls-files -z` loads the files in the RubyGem that have been added into git.
   spec.files         = Dir.chdir(File.expand_path('..', __FILE__)) do
-    `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
+    `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) || f.end_with?('.gem') }
   end
   spec.bindir        = "exe"
   spec.executables   = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: structify
 version: !ruby/object:Gem::Version
-  version: 0.1.0
+  version: 0.2.0
 platform: ruby
 authors:
 - Kieran Klaassen
 autorequire:
 bindir: exe
 cert_chain: []
-date: 2025-02-03 00:00:00.000000000 Z
+date: 2025-03-13 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: activesupport
@@ -49,6 +49,8 @@ files:
 - ".gitignore"
 - ".rspec"
 - ".travis.yml"
+- CHANGELOG.md
+- CLAUDE.md
 - CODE_OF_CONDUCT.md
 - Gemfile
 - Gemfile.lock
@@ -59,6 +61,7 @@ files:
 - bin/setup
 - lib/structify.rb
 - lib/structify/model.rb
+- lib/structify/schema_serializer.rb
 - lib/structify/version.rb
 - structify.gemspec
 homepage: https://github.com/kieranklaassen/structify