RubyGems - structify - Versions diffs - 0.1.0 → 0.3.0 - Mend

structify 0.1.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (12) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +22 -0
data/CLAUDE.md +27 -0
data/Gemfile +2 -2
data/Gemfile.lock +26 -25
data/README.md +301 -139
data/lib/structify/model.rb +304 -60
data/lib/structify/schema_serializer.rb +165 -0
data/lib/structify/version.rb +1 -1
data/lib/structify.rb +90 -4
data/structify.gemspec +2 -2
metadata +7 -4

data/README.md CHANGED Viewed

@@ -2,220 +2,382 @@
 [![Gem Version](https://badge.fury.io/rb/structify.svg)](https://badge.fury.io/rb/structify)
-Structify is a Ruby gem that provides a simple DSL to define extraction schemas for LLM-powered models. It integrates seamlessly with Rails models, allowing you to specify versioning, assistant prompts, and field definitions—all in a clean, declarative syntax.
+A Ruby gem for extracting structured data from content using LLMs in Rails applications
-## Features
+## What is Structify?
-- 🎯 Simple DSL for defining LLM extraction schemas
-- 🔄 Built-in versioning for schema evolution
-- 📝 Support for custom assistant prompts
-- 🏗️ JSON Schema generation for LLM validation
-- 🔌 Seamless Rails/ActiveRecord integration
-- 💾 Automatic JSON attribute handling
+Structify helps you extract structured data from unstructured content in your Rails apps:
-## Installation
+- **Define extraction schemas** directly in your ActiveRecord models
+- **Generate JSON schemas** to use with OpenAI, Anthropic, or other LLM providers
+- **Store and validate** extracted data in your models
+- **Access structured data** through typed model attributes
-Add this line to your application's Gemfile:
+## Use Cases
+- Extract metadata, topics, and sentiment from articles or blog posts
+- Pull structured information from user-generated content
+- Organize unstructured feedback or reviews into categorized data
+- Convert emails or messages into actionable, structured formats
+- Extract entities and relationships from documents
 ```ruby
-gem 'structify'
+# 1. Define extraction schema in your model
+class Article < ApplicationRecord
+  include Structify::Model
+  schema_definition do
+    field :title, :string
+    field :summary, :text
+    field :category, :string, enum: ["tech", "business", "science"]
+    field :topics, :array, items: { type: "string" }
+  end
+end
+# 2. Get schema for your LLM API
+schema = Article.json_schema
+# 3. Store LLM response in your model
+article = Article.find(123)
+article.update(llm_response)
+# 4. Access extracted data
+article.title    # => "AI Advances in 2023"
+article.summary  # => "Recent developments in artificial intelligence..."
+article.topics   # => ["machine learning", "neural networks", "computer vision"]
 ```
-And then execute:
+## Install
+```ruby
+# Add to Gemfile
+gem 'structify'
+```
+Then:
 ```bash
-$ bundle install
+bundle install
 ```
-Or install it yourself as:
+## Database Setup
-```bash
-$ gem install structify
+Add a JSON column to store extracted data:
+```ruby
+add_column :articles, :json_attributes, :jsonb  # PostgreSQL (default column name)
+# or
+add_column :articles, :json_attributes, :json   # MySQL (default column name)
+# Or if you configure a custom column name:
+add_column :articles, :custom_json_column, :jsonb  # PostgreSQL
 ```
-## Usage
+## Configuration
+Structify can be configured in an initializer:
+```ruby
+# config/initializers/structify.rb
+Structify.configure do |config|
+  # Configure the default JSON container attribute (default: :json_attributes)
+  config.default_container_attribute = :custom_json_column
+end
+```
-### Basic Example
+## Usage
-Here's a simple example of using Structify in a Rails model:
+### Define Your Schema
 ```ruby
 class Article < ApplicationRecord
   include Structify::Model
   schema_definition do
-    title "Article Extraction"
-    description "Extract key information from articles"
     version 1
-    assistant_prompt "Extract the following fields from the article content"
-    llm_model "gpt-4"
+    title "Article Extraction"
     field :title, :string, required: true
-    field :summary, :text, description: "A brief summary of the article"
+    field :summary, :text
     field :category, :string, enum: ["tech", "business", "science"]
+    field :topics, :array, items: { type: "string" }
+    field :metadata, :object, properties: {
+      "author" => { type: "string" },
+      "published_at" => { type: "string" }
+    }
   end
 end
 ```
-### Advanced Example
+### Get Schema for LLM API
-Here's a more complex example showing all available features:
+Structify generates the JSON schema that you'll need to send to your LLM provider:
 ```ruby
-class EmailSummary < ApplicationRecord
-  include Structify::Model
-  schema_definition do
-    version 2  # Increment this when making breaking changes
-    title "Email Thread Extraction"
-    description "Extracts key information from email threads"
+# Get JSON Schema to send to OpenAI, Anthropic, etc.
+schema = Article.json_schema
+```
-    assistant_prompt <<~PROMPT
-      You are an assistant that extracts concise metadata from email threads.
-      Focus on producing a clear summary, action items, and sentiment analysis.
-      If there are multiple participants, include their roles in the conversation.
-    PROMPT
+### Integration with LLM Services
-    llm_model "gpt-4"  # Supports any LLM model
+You need to implement the actual LLM integration. Here's how you can integrate with popular services:
-    # Required fields
-    field :subject, :string,
-      required: true,
-      description: "The main topic or subject of the email thread"
+#### OpenAI Integration Example
-    field :summary, :text,
-      required: true,
-      description: "A concise summary of the entire thread"
+```ruby
+require "openai"
-    # Optional fields with enums
-    field :sentiment, :string,
-      enum: ["positive", "neutral", "negative"],
-      description: "The overall sentiment of the conversation"
+class OpenAiExtractor
+  def initialize(api_key = ENV["OPENAI_API_KEY"])
+    @client = OpenAI::Client.new(access_token: api_key)
+  end
+  def extract(content, model_class)
+    # Get schema from Structify model
+    schema = model_class.json_schema
+    # Call OpenAI with structured outputs
+    response = @client.chat(
+      parameters: {
+        model: "gpt-4o",
+        response_format: { type: "json_object", schema: schema },
+        messages: [
+          { role: "system", content: "Extract structured information from the provided content." },
+          { role: "user", content: content }
+        ]
+      }
+    )
+    # Parse and return the structured data
+    JSON.parse(response.dig("choices", 0, "message", "content"), symbolize_names: true)
+  end
+end
-    field :priority, :string,
-      enum: ["high", "medium", "low"],
-      description: "The priority level based on content and tone"
+# Usage
+extractor = OpenAiExtractor.new
+article = Article.find(123)
+extracted_data = extractor.extract(article.content, Article)
+article.update(extracted_data)
+```
-    # Complex fields
-    field :participants, :json,
-      description: "List of participants and their roles"
+#### Anthropic Integration Example
-    field :action_items, :json,
-      description: "Array of action items extracted from the thread"
+```ruby
+require "anthropic"
-    field :next_steps, :string,
-      description: "Recommended next steps based on the thread"
+class AnthropicExtractor
+  def initialize(api_key = ENV["ANTHROPIC_API_KEY"])
+    @client = Anthropic::Client.new(api_key: api_key)
+  end
+  def extract(content, model_class)
+    # Get schema from Structify model
+    schema = model_class.json_schema
+    # Call Claude with tool use
+    response = @client.messages.create(
+      model: "claude-3-opus-20240229",
+      max_tokens: 1000,
+      system: "Extract structured data based on the provided schema.",
+      messages: [{ role: "user", content: content }],
+      tools: [{
+        type: "function",
+        function: {
+          name: "extract_data",
+          description: "Extract structured data from content",
+          parameters: schema
+        }
+      }],
+      tool_choice: { type: "function", function: { name: "extract_data" } }
+    )
+    # Parse and return structured data
+    JSON.parse(response.content[0].tools[0].function.arguments, symbolize_names: true)
   end
-  # You can still use regular ActiveRecord features
-  validates :subject, presence: true
-  validates :summary, length: { minimum: 10 }
 end
 ```
-### Accessing Schema Information
-Structify provides several helper methods to access schema information:
+### Store & Access Extracted Data
 ```ruby
-# Get the JSON Schema
-EmailSummary.json_schema
-# => {
-#   name: "Email Thread Extraction",
-#   description: "Extracts key information from email threads",
-#   parameters: {
-#     type: "object",
-#     required: ["subject", "summary"],
-#     properties: {
-#       subject: { type: "string" },
-#       summary: { type: "text" },
-#       sentiment: {
-#         type: "string",
-#         enum: ["positive", "neutral", "negative"]
-#       },
-#       # ...
-#     }
-#   }
-# }
+# Store LLM response in your model
+article.update(response)
-# Get the current version
-EmailSummary.extraction_version  # => 2
+# Access via model attributes
+article.title        # => "How AI is Changing Healthcare"
+article.category     # => "tech"
+article.topics       # => ["machine learning", "healthcare"]
-# Get the assistant prompt
-EmailSummary.extraction_assistant_prompt
-# => "You are an assistant that extracts concise metadata..."
-# Get the LLM model
-EmailSummary.extraction_llm_model  # => "gpt-4"
+# All data is in the JSON column (default column name: json_attributes)
+article.json_attributes  # => The complete JSON
 ```
-### Working with Extracted Data
+## Field Types
-Structify uses the `attr_json` gem to handle JSON attributes. All fields are stored in the `extracted_data` JSON column:
+Structify supports all standard JSON Schema types:
 ```ruby
-# Create a new record with extracted data
-summary = EmailSummary.create(
-  subject: "Project Update",
-  summary: "Team discussed Q2 goals",
-  sentiment: "positive",
-  priority: "high",
-  participants: [
-    { name: "Alice", role: "presenter" },
-    { name: "Bob", role: "reviewer" }
-  ]
-)
+field :name, :string             # String values
+field :count, :integer           # Integer values
+field :price, :number            # Numeric values (float/int)
+field :active, :boolean          # Boolean values
+field :metadata, :object         # JSON objects
+field :tags, :array              # Arrays
+```
-# Access fields directly
-summary.subject      # => "Project Update"
-summary.sentiment    # => "positive"
-summary.participants # => [{ name: "Alice", ... }]
+## Field Options
-# Validate enum values
-summary.sentiment = "invalid"
-summary.valid?  # => false
+```ruby
+# Required fields
+field :title, :string, required: true
+# Enum values
+field :status, :string, enum: ["draft", "published", "archived"]
+# Array constraints
+field :tags, :array,
+  items: { type: "string" },
+  min_items: 1,
+  max_items: 5,
+  unique_items: true
+# Nested objects
+field :author, :object, properties: {
+  "name" => { type: "string", required: true },
+  "email" => { type: "string" }
+}
 ```
-## Database Setup
+## Chain of Thought Mode
-Ensure your model has a JSON column named `extracted_data`:
+Structify supports a "thinking" mode that automatically requests chain of thought reasoning from the LLM:
 ```ruby
-class CreateEmailSummaries < ActiveRecord::Migration[7.1]
-  def change
-    create_table :email_summaries do |t|
-      t.json :extracted_data  # Required by Structify
-      t.timestamps
-    end
-  end
+schema_definition do
+  version 1
+  thinking true  # Enable chain of thought reasoning
+  field :title, :string, required: true
+  # other fields...
 end
 ```
-## Development
+Chain of thought (COT) reasoning is beneficial because it:
+- Adds more context to the extraction process
+- Helps the LLM think through problems more systematically
+- Improves accuracy for complex extractions
+- Makes the reasoning process transparent and explainable
+- Reduces hallucinations by forcing step-by-step thinking
-After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
+This is especially useful when:
+- Answers need more detailed information
+- Questions require multi-step reasoning
+- Extractions involve complex decision-making
+- You need to understand how the LLM reached its conclusions
-To install this gem onto your local machine, run `bundle exec rake install`.
+For best results, include instructions for COT in your base system prompt:
-## Contributing
+```ruby
+system_prompt = "Extract structured data from the content.
+For each field, think step by step before determining the value."
+```
-1. Fork it
-2. Create your feature branch (`git checkout -b feature/my-new-feature`)
-3. Commit your changes (`git commit -am 'Add some feature'`)
-4. Push to the branch (`git push origin feature/my-new-feature`)
-5. Create a new Pull Request
+You can generate effective chain of thought prompts using tools like the [Claude Prompt Designer](https://console.anthropic.com/dashboard).
-Bug reports and pull requests are welcome on GitHub at https://github.com/kieranklaassen/structify.
+## Schema Versioning and Field Lifecycle
-## License
+Structify provides a simple field lifecycle management system using a `versions` parameter:
+```ruby
+schema_definition do
+  version 3
+  # Fields for specific version ranges
+  field :title, :string                       # Available in all versions (default behavior)
+  field :legacy, :string, versions: 1...3     # Only in versions 1-2 (removed in v3)
+  field :summary, :text, versions: 2          # Added in version 2 onwards
+  field :content, :text, versions: 2..        # Added in version 2 onwards (endless range)
+  field :temp_field, :string, versions: 2..3  # Only in versions 2-3
+  field :special, :string, versions: [1, 3, 5] # Only in versions 1, 3, and 5
+end
+```
+### Version Range Syntax
-The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).
+Structify supports several ways to specify which versions a field is available in:
-## Code of Conduct
+| Syntax | Example | Meaning |
+|--------|---------|---------|
+| No version specified | `field :title, :string` | Available in all versions (default) |
+| Single integer | `versions: 2` | Available from version 2 onwards |
+| Range (inclusive) | `versions: 1..3` | Available in versions 1, 2, and 3 |
+| Range (exclusive) | `versions: 1...3` | Available in versions 1 and 2 (not 3) |
+| Endless range | `versions: 2..` | Available from version 2 onwards |
+| Array | `versions: [1, 4, 7]` | Only available in versions 1, 4, and 7 |
-Everyone interacting in the Structify project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the [code of conduct](CODE_OF_CONDUCT.md).
+### Handling Records with Different Versions
+```ruby
+# Create a record with version 1 schema
+article_v1 = Article.create(title: "Original Article")
+# Access with version 3 schema
+article_v3 = Article.find(article_v1.id)
+# Fields from v1 are still accessible
+article_v3.title  # => "Original Article"
+# Fields not in v1 raise errors
+article_v3.summary  # => VersionRangeError: Field 'summary' is not available in version 1.
+                    #    This field is only available in versions: 2 to 999.
+# Check version compatibility
+article_v3.version_compatible_with?(3)  # => false
+article_v3.version_compatible_with?(1)  # => true
+# Upgrade record to version 3
+article_v3.summary = "Added in v3"
+article_v3.save!  # Record version is automatically updated to 3
 ```
+### Accessing the Container Attribute
+The JSON container attribute can be accessed directly:
+```ruby
+# Using the default container attribute :json_attributes
+article.json_attributes  # => { "title" => "My Title", "version" => 1, ... }
+# If you've configured a custom container attribute
+article.custom_json_column  # => { "title" => "My Title", "version" => 1, ... }
 ```
+## Understanding Structify's Role
+Structify is designed as a **bridge** between your Rails models and LLM extraction services:
+### What Structify Does For You
+- ✅ **Define extraction schemas** directly in your ActiveRecord models
+- ✅ **Generate compatible JSON schemas** for OpenAI, Anthropic, and other LLM providers
+- ✅ **Store and validate** extracted data against your schema
+- ✅ **Provide typed access** to extracted fields through your models
+- ✅ **Handle schema versioning** and backward compatibility
+- ✅ **Support chain of thought reasoning** with the thinking mode option
+### What You Need To Implement
+- 🔧 **API integration** with your chosen LLM provider (see examples above)
+- 🔧 **Processing logic** for when and how to extract data
+- 🔧 **Authentication** and API key management
+- 🔧 **Error handling and retries** for API calls
+This separation of concerns allows you to:
+1. Use any LLM provider and model you prefer
+2. Implement extraction logic specific to your application
+3. Handle API access in a way that fits your application architecture
+4. Change LLM providers without changing your data model
+## License
+[MIT License](https://opensource.org/licenses/MIT)