RubyGems - universal_document_processor - Versions diffs - 1.0.1 → 1.0.2 - Mend

universal_document_processor 1.0.1 → 1.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (11) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +20 -0
data/README.md +53 -1
data/lib/universal_document_processor/document.rb +40 -4
data/lib/universal_document_processor/processors/excel_processor.rb +719 -132
data/lib/universal_document_processor/processors/word_processor.rb +82 -4
data/lib/universal_document_processor/utils/file_detector.rb +1 -0
data/lib/universal_document_processor/version.rb +1 -1
metadata +15 -3
data/AI_USAGE_GUIDE.md +0 -404
data/GEM_RELEASE_GUIDE.md +0 -288

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: c33ef9db6830ddb62e98966d0bf3dd85106833e260303651df297bb3d46b9529
-  data.tar.gz: 280fa75bf3d842fc1af11dd95ca49e0526ed22512b5732aed0a7c25d4a57a7d9
+  metadata.gz: '06539a78d5cc253518f84b242dd7a8dcb71ce575614d7e8853bb9de70b031f75'
+  data.tar.gz: c58e957dfe6940c0cb16fb4c50f9ce0ee69aa7cf9a7587b2496a80da78bbdda8
 SHA512:
-  metadata.gz: 01d6c5129dc7ec1a7911a77ba69b8dbafd32bd9397e0374175a4cf09cf24f8c06225a0bc927b1e3bba8250e3d1e93d40bbbc1dbf3aa1a6ff0fbc92cb7f5f24a4
-  data.tar.gz: b445b8d773e2865e6e2c5593c123cc7d0da580d4d16822e4b9e45c851568f077f4c949d921f2f602c4ed3ff941ccf8a1d002eb14401c554b0f8355dc293bff62
+  metadata.gz: 81940e620b3dcff668493ae459e1287161ccbc56b8677acab3477913685d4f68580f0d4062ae0e777e83a000aa58a42619f3a96d0f19b2146e38fa2c3418587a
+  data.tar.gz: 61b98794dc95f0489a806d1dbb31b172f75570ac2fa140d0ff334347f58afdbfab370ffd72b003b948b7a4ec80656e6f29b196ca833e0b219b41316ebb729843

data/CHANGELOG.md CHANGED Viewed

@@ -7,6 +7,26 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
+## [1.2.0] - 2024-01-15
+### Added
+- **TSV (Tab-Separated Values) File Support**: Complete built-in TSV processing capabilities
+  - Native TSV parsing using Ruby CSV library with tab delimiter
+  - Text extraction with proper formatting
+  - Comprehensive metadata detection (format, delimiter, encoding)
+  - Table structure analysis and header detection
+  - Statistical analysis and data validation
+  - Format conversions: TSV ↔ CSV, TSV → JSON
+  - Cross-format compatibility with existing CSV and Excel features
+  - New `to_tsv()` method for converting other formats to TSV
+  - Enhanced file detector with TSV MIME type mapping
+  - Full integration with existing Document class API
+### Enhanced
+- **ExcelProcessor**: Extended to handle TSV files alongside CSV and Excel formats
+- **File Detection**: Added TSV MIME type support (`text/tab-separated-values`)
+- **Document Class**: Added `to_tsv()` method and TSV format support
+- **Supported Formats**: Updated to include TSV in format list
 ## [1.0.1] - 2025-06-23
 ### Fixed

data/README.md CHANGED Viewed

@@ -16,7 +16,7 @@ A comprehensive Ruby gem that provides unified document processing capabilities
 ### **Supported File Formats**
 - **📄 Documents**: PDF, DOC, DOCX, RTF
-- **📊 Spreadsheets**: XLS, XLSX, CSV
+- **📊 Spreadsheets**: XLS, XLSX, CSV, TSV
 - **📺 Presentations**: PPT, PPTX
 - **🖼️ Images**: JPG, PNG, GIF, BMP, TIFF
 - **📁 Archives**: ZIP, RAR, 7Z
@@ -236,6 +236,58 @@ tables.each_with_index do |table, index|
 end
 ```
+### Processing TSV (Tab-Separated Values) Files
+```ruby
+# Process TSV files with built-in support
+result = UniversalDocumentProcessor.process('data.tsv')
+# TSV-specific metadata
+metadata = result[:metadata]
+puts "Format: #{metadata[:format]}"        # => "tsv"
+puts "Delimiter: #{metadata[:delimiter]}"  # => "tab"
+puts "Rows: #{metadata[:total_rows]}"
+puts "Columns: #{metadata[:total_columns]}"
+puts "Has headers: #{metadata[:has_headers]}"
+# Extract structured data
+tables = result[:tables]
+table = tables.first
+puts "Headers: #{table[:headers].join(', ')}"
+puts "Sample row: #{table[:data][1].join(' | ')}"
+# Format conversions
+document = UniversalDocumentProcessor::Document.new('data.tsv')
+# Convert TSV to CSV
+csv_output = document.to_csv
+puts "CSV conversion: #{csv_output.length} characters"
+# Convert TSV to JSON
+json_output = document.to_json
+puts "JSON conversion: #{json_output.length} characters"
+# Convert CSV to TSV
+csv_document = UniversalDocumentProcessor::Document.new('data.csv')
+tsv_output = csv_document.to_tsv
+puts "TSV conversion: #{tsv_output.length} characters"
+# Statistical analysis
+stats = document.extract_statistics
+sheet_stats = stats['Sheet1']
+puts "Total cells: #{sheet_stats[:total_cells]}"
+puts "Numeric cells: #{sheet_stats[:numeric_cells]}"
+puts "Text cells: #{sheet_stats[:text_cells]}"
+puts "Average value: #{sheet_stats[:average_value]}"
+# Data validation
+validation = document.validate_data
+sheet_validation = validation['Sheet1']
+puts "Data quality score: #{sheet_validation[:data_quality_score]}%"
+puts "Empty rows: #{sheet_validation[:empty_rows]}"
+puts "Duplicate rows: #{sheet_validation[:duplicate_rows]}"
+```
 ### Processing Word Documents
 ```ruby

data/lib/universal_document_processor/document.rb CHANGED Viewed

@@ -48,6 +48,42 @@ module UniversalDocumentProcessor
       []
     end
+    def extract_statistics
+      processor.respond_to?(:extract_statistics) ? processor.extract_statistics : {}
+    rescue => e
+      {}
+    end
+    def validate_data
+      processor.respond_to?(:validate_data) ? processor.validate_data : {}
+    rescue => e
+      {}
+    end
+    def extract_formulas
+      processor.respond_to?(:extract_formulas) ? processor.extract_formulas : []
+    rescue => e
+      []
+    end
+    def to_json
+      processor.respond_to?(:to_json) ? processor.to_json : process.to_json
+    rescue => e
+      process.to_json
+    end
+    def to_csv(sheet_name = nil)
+      processor.respond_to?(:to_csv) ? processor.to_csv(sheet_name) : ""
+    rescue => e
+      ""
+    end
+    def to_tsv(sheet_name = nil)
+      processor.respond_to?(:to_tsv) ? processor.to_tsv(sheet_name) : ""
+    rescue => e
+      ""
+    end
     def convert_to(target_format)
       case target_format.to_sym
       when :pdf
@@ -64,7 +100,7 @@ module UniversalDocumentProcessor
     end
     def supported_formats
-      %w[pdf docx doc xlsx xls pptx ppt txt rtf html xml csv jpg jpeg png gif bmp tiff zip rar 7z]
+      %w[pdf docx doc xlsx xls pptx ppt txt rtf html xml csv tsv jpg jpeg png gif bmp tiff zip rar 7z]
     end
     def supported?
@@ -139,11 +175,11 @@ module UniversalDocumentProcessor
       case @content_type
       when /pdf/
         Processors::PdfProcessor.new(@file_path, @options)
-      when /word/, /document/
+      when /wordprocessingml/, /msword/
         Processors::WordProcessor.new(@file_path, @options)
-      when /excel/, /spreadsheet/
+      when /spreadsheetml/, /ms-excel/, /csv/, /tab-separated/
         Processors::ExcelProcessor.new(@file_path, @options)
-      when /powerpoint/, /presentation/
+      when /presentationml/, /ms-powerpoint/
         Processors::PowerpointProcessor.new(@file_path, @options)
       when /image/
         Processors::ImageProcessor.new(@file_path, @options)