fractor 0.1.4 → 0.1.7
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/.rubocop-https---raw-githubusercontent-com-riboseinc-oss-guides-main-ci-rubocop-yml +552 -0
- data/.rubocop.yml +14 -8
- data/.rubocop_todo.yml +284 -43
- data/README.adoc +111 -950
- data/docs/.lycheeignore +16 -0
- data/docs/Gemfile +24 -0
- data/docs/README.md +157 -0
- data/docs/_config.yml +151 -0
- data/docs/_features/error-handling.adoc +1192 -0
- data/docs/_features/index.adoc +80 -0
- data/docs/_features/monitoring.adoc +589 -0
- data/docs/_features/signal-handling.adoc +202 -0
- data/docs/_features/workflows.adoc +1235 -0
- data/docs/_guides/continuous-mode.adoc +736 -0
- data/docs/_guides/cookbook.adoc +1133 -0
- data/docs/_guides/index.adoc +55 -0
- data/docs/_guides/pipeline-mode.adoc +730 -0
- data/docs/_guides/troubleshooting.adoc +358 -0
- data/docs/_pages/architecture.adoc +1390 -0
- data/docs/_pages/core-concepts.adoc +1392 -0
- data/docs/_pages/design-principles.adoc +862 -0
- data/docs/_pages/getting-started.adoc +290 -0
- data/docs/_pages/installation.adoc +143 -0
- data/docs/_reference/api.adoc +1080 -0
- data/docs/_reference/error-reporting.adoc +670 -0
- data/docs/_reference/examples.adoc +181 -0
- data/docs/_reference/index.adoc +96 -0
- data/docs/_reference/troubleshooting.adoc +862 -0
- data/docs/_tutorials/complex-workflows.adoc +1022 -0
- data/docs/_tutorials/data-processing-pipeline.adoc +740 -0
- data/docs/_tutorials/first-application.adoc +384 -0
- data/docs/_tutorials/index.adoc +48 -0
- data/docs/_tutorials/long-running-services.adoc +931 -0
- data/docs/assets/images/favicon-16.png +0 -0
- data/docs/assets/images/favicon-32.png +0 -0
- data/docs/assets/images/favicon-48.png +0 -0
- data/docs/assets/images/favicon.ico +0 -0
- data/docs/assets/images/favicon.png +0 -0
- data/docs/assets/images/favicon.svg +45 -0
- data/docs/assets/images/fractor-icon.svg +49 -0
- data/docs/assets/images/fractor-logo.svg +61 -0
- data/docs/index.adoc +131 -0
- data/docs/lychee.toml +39 -0
- data/examples/api_aggregator/README.adoc +627 -0
- data/examples/api_aggregator/api_aggregator.rb +376 -0
- data/examples/auto_detection/README.adoc +407 -29
- data/examples/auto_detection/auto_detection.rb +9 -9
- data/examples/continuous_chat_common/message_protocol.rb +53 -0
- data/examples/continuous_chat_fractor/README.adoc +217 -0
- data/examples/continuous_chat_fractor/chat_client.rb +303 -0
- data/examples/continuous_chat_fractor/chat_common.rb +83 -0
- data/examples/continuous_chat_fractor/chat_server.rb +167 -0
- data/examples/continuous_chat_fractor/simulate.rb +345 -0
- data/examples/continuous_chat_server/README.adoc +135 -0
- data/examples/continuous_chat_server/chat_client.rb +303 -0
- data/examples/continuous_chat_server/chat_server.rb +359 -0
- data/examples/continuous_chat_server/simulate.rb +343 -0
- data/examples/error_reporting.rb +207 -0
- data/examples/file_processor/README.adoc +170 -0
- data/examples/file_processor/file_processor.rb +615 -0
- data/examples/file_processor/sample_files/invalid.csv +1 -0
- data/examples/file_processor/sample_files/orders.xml +24 -0
- data/examples/file_processor/sample_files/products.json +23 -0
- data/examples/file_processor/sample_files/users.csv +6 -0
- data/examples/hierarchical_hasher/README.adoc +629 -41
- data/examples/hierarchical_hasher/hierarchical_hasher.rb +12 -8
- data/examples/image_processor/README.adoc +610 -0
- data/examples/image_processor/image_processor.rb +349 -0
- data/examples/image_processor/processed_images/sample_10_processed.jpg.json +12 -0
- data/examples/image_processor/processed_images/sample_1_processed.jpg.json +12 -0
- data/examples/image_processor/processed_images/sample_2_processed.jpg.json +12 -0
- data/examples/image_processor/processed_images/sample_3_processed.jpg.json +12 -0
- data/examples/image_processor/processed_images/sample_4_processed.jpg.json +12 -0
- data/examples/image_processor/processed_images/sample_5_processed.jpg.json +12 -0
- data/examples/image_processor/processed_images/sample_6_processed.jpg.json +12 -0
- data/examples/image_processor/processed_images/sample_7_processed.jpg.json +12 -0
- data/examples/image_processor/processed_images/sample_8_processed.jpg.json +12 -0
- data/examples/image_processor/processed_images/sample_9_processed.jpg.json +12 -0
- data/examples/image_processor/test_images/sample_1.png +1 -0
- data/examples/image_processor/test_images/sample_10.png +1 -0
- data/examples/image_processor/test_images/sample_2.png +1 -0
- data/examples/image_processor/test_images/sample_3.png +1 -0
- data/examples/image_processor/test_images/sample_4.png +1 -0
- data/examples/image_processor/test_images/sample_5.png +1 -0
- data/examples/image_processor/test_images/sample_6.png +1 -0
- data/examples/image_processor/test_images/sample_7.png +1 -0
- data/examples/image_processor/test_images/sample_8.png +1 -0
- data/examples/image_processor/test_images/sample_9.png +1 -0
- data/examples/log_analyzer/README.adoc +662 -0
- data/examples/log_analyzer/log_analyzer.rb +579 -0
- data/examples/log_analyzer/sample_logs/apache.log +20 -0
- data/examples/log_analyzer/sample_logs/json.log +15 -0
- data/examples/log_analyzer/sample_logs/nginx.log +15 -0
- data/examples/log_analyzer/sample_logs/rails.log +29 -0
- data/examples/multi_work_type/README.adoc +576 -26
- data/examples/multi_work_type/multi_work_type.rb +30 -29
- data/examples/performance_monitoring.rb +120 -0
- data/examples/pipeline_processing/README.adoc +740 -26
- data/examples/pipeline_processing/pipeline_processing.rb +16 -16
- data/examples/priority_work_example.rb +155 -0
- data/examples/producer_subscriber/README.adoc +889 -46
- data/examples/producer_subscriber/producer_subscriber.rb +20 -16
- data/examples/scatter_gather/README.adoc +829 -27
- data/examples/scatter_gather/scatter_gather.rb +29 -28
- data/examples/simple/README.adoc +347 -0
- data/examples/simple/sample.rb +5 -5
- data/examples/specialized_workers/README.adoc +622 -26
- data/examples/specialized_workers/specialized_workers.rb +88 -45
- data/examples/stream_processor/README.adoc +206 -0
- data/examples/stream_processor/stream_processor.rb +284 -0
- data/examples/web_scraper/README.adoc +625 -0
- data/examples/web_scraper/web_scraper.rb +285 -0
- data/examples/workflow/README.adoc +406 -0
- data/examples/workflow/circuit_breaker/README.adoc +360 -0
- data/examples/workflow/circuit_breaker/circuit_breaker_workflow.rb +225 -0
- data/examples/workflow/conditional/README.adoc +483 -0
- data/examples/workflow/conditional/conditional_workflow.rb +215 -0
- data/examples/workflow/dead_letter_queue/README.adoc +374 -0
- data/examples/workflow/dead_letter_queue/dead_letter_queue_workflow.rb +217 -0
- data/examples/workflow/fan_out/README.adoc +381 -0
- data/examples/workflow/fan_out/fan_out_workflow.rb +202 -0
- data/examples/workflow/retry/README.adoc +248 -0
- data/examples/workflow/retry/retry_workflow.rb +195 -0
- data/examples/workflow/simple_linear/README.adoc +267 -0
- data/examples/workflow/simple_linear/simple_linear_workflow.rb +175 -0
- data/examples/workflow/simplified/README.adoc +329 -0
- data/examples/workflow/simplified/simplified_workflow.rb +222 -0
- data/exe/fractor +10 -0
- data/lib/fractor/cli.rb +288 -0
- data/lib/fractor/configuration.rb +307 -0
- data/lib/fractor/continuous_server.rb +183 -0
- data/lib/fractor/error_formatter.rb +72 -0
- data/lib/fractor/error_report_generator.rb +152 -0
- data/lib/fractor/error_reporter.rb +244 -0
- data/lib/fractor/error_statistics.rb +147 -0
- data/lib/fractor/execution_tracer.rb +162 -0
- data/lib/fractor/logger.rb +230 -0
- data/lib/fractor/main_loop_handler.rb +406 -0
- data/lib/fractor/main_loop_handler3.rb +135 -0
- data/lib/fractor/main_loop_handler4.rb +299 -0
- data/lib/fractor/performance_metrics_collector.rb +181 -0
- data/lib/fractor/performance_monitor.rb +215 -0
- data/lib/fractor/performance_report_generator.rb +202 -0
- data/lib/fractor/priority_work.rb +93 -0
- data/lib/fractor/priority_work_queue.rb +189 -0
- data/lib/fractor/result_aggregator.rb +33 -1
- data/lib/fractor/shutdown_handler.rb +168 -0
- data/lib/fractor/signal_handler.rb +80 -0
- data/lib/fractor/supervisor.rb +430 -144
- data/lib/fractor/supervisor_logger.rb +88 -0
- data/lib/fractor/version.rb +1 -1
- data/lib/fractor/work.rb +12 -0
- data/lib/fractor/work_distribution_manager.rb +151 -0
- data/lib/fractor/work_queue.rb +88 -0
- data/lib/fractor/work_result.rb +181 -9
- data/lib/fractor/worker.rb +75 -1
- data/lib/fractor/workflow/builder.rb +210 -0
- data/lib/fractor/workflow/chain_builder.rb +169 -0
- data/lib/fractor/workflow/circuit_breaker.rb +183 -0
- data/lib/fractor/workflow/circuit_breaker_orchestrator.rb +208 -0
- data/lib/fractor/workflow/circuit_breaker_registry.rb +112 -0
- data/lib/fractor/workflow/dead_letter_queue.rb +334 -0
- data/lib/fractor/workflow/execution_hooks.rb +39 -0
- data/lib/fractor/workflow/execution_strategy.rb +225 -0
- data/lib/fractor/workflow/execution_trace.rb +134 -0
- data/lib/fractor/workflow/helpers.rb +191 -0
- data/lib/fractor/workflow/job.rb +290 -0
- data/lib/fractor/workflow/job_dependency_validator.rb +120 -0
- data/lib/fractor/workflow/logger.rb +110 -0
- data/lib/fractor/workflow/pre_execution_context.rb +193 -0
- data/lib/fractor/workflow/retry_config.rb +156 -0
- data/lib/fractor/workflow/retry_orchestrator.rb +184 -0
- data/lib/fractor/workflow/retry_strategy.rb +93 -0
- data/lib/fractor/workflow/structured_logger.rb +30 -0
- data/lib/fractor/workflow/type_compatibility_validator.rb +222 -0
- data/lib/fractor/workflow/visualizer.rb +211 -0
- data/lib/fractor/workflow/workflow_context.rb +132 -0
- data/lib/fractor/workflow/workflow_executor.rb +669 -0
- data/lib/fractor/workflow/workflow_result.rb +55 -0
- data/lib/fractor/workflow/workflow_validator.rb +295 -0
- data/lib/fractor/workflow.rb +333 -0
- data/lib/fractor/wrapped_ractor.rb +66 -91
- data/lib/fractor/wrapped_ractor3.rb +161 -0
- data/lib/fractor/wrapped_ractor4.rb +242 -0
- data/lib/fractor.rb +93 -3
- metadata +192 -6
- data/tests/sample.rb.bak +0 -309
- data/tests/sample_working.rb.bak +0 -209
|
@@ -0,0 +1,285 @@
|
|
|
1
|
+
#!/usr/bin/env ruby
|
|
2
|
+
# frozen_string_literal: true
|
|
3
|
+
|
|
4
|
+
require_relative "../../lib/fractor"
|
|
5
|
+
require "net/http"
|
|
6
|
+
require "uri"
|
|
7
|
+
require "json"
|
|
8
|
+
require "fileutils"
|
|
9
|
+
|
|
10
|
+
module WebScraper
|
|
11
|
+
# Represents a URL to be scraped
|
|
12
|
+
class ScrapeWork < Fractor::Work
|
|
13
|
+
def initialize(url, attempt: 1)
|
|
14
|
+
super({ url: url, attempt: attempt })
|
|
15
|
+
end
|
|
16
|
+
|
|
17
|
+
def url
|
|
18
|
+
input[:url]
|
|
19
|
+
end
|
|
20
|
+
|
|
21
|
+
def attempt
|
|
22
|
+
input[:attempt]
|
|
23
|
+
end
|
|
24
|
+
|
|
25
|
+
def to_s
|
|
26
|
+
"ScrapeWork(url: #{url}, attempt: #{attempt})"
|
|
27
|
+
end
|
|
28
|
+
end
|
|
29
|
+
|
|
30
|
+
# Worker that scrapes URLs with rate limiting and retry logic
|
|
31
|
+
class WebScraperWorker < Fractor::Worker
|
|
32
|
+
MAX_RETRIES = 3
|
|
33
|
+
RETRY_DELAYS = [1, 2, 4].freeze # Exponential backoff in seconds
|
|
34
|
+
RATE_LIMIT_DELAY = 0.5 # 500ms between requests
|
|
35
|
+
|
|
36
|
+
def initialize
|
|
37
|
+
super()
|
|
38
|
+
@output_dir = "scraped_data"
|
|
39
|
+
@last_request_time = {}
|
|
40
|
+
@request_count = 0
|
|
41
|
+
@worker_id = "#{object_id}"
|
|
42
|
+
FileUtils.mkdir_p(@output_dir)
|
|
43
|
+
end
|
|
44
|
+
|
|
45
|
+
def worker_id
|
|
46
|
+
@worker_id
|
|
47
|
+
end
|
|
48
|
+
|
|
49
|
+
def process(work)
|
|
50
|
+
return nil unless work.is_a?(ScrapeWork)
|
|
51
|
+
|
|
52
|
+
url = work.url
|
|
53
|
+
attempt = work.attempt
|
|
54
|
+
|
|
55
|
+
begin
|
|
56
|
+
# Rate limiting: ensure minimum delay between requests
|
|
57
|
+
enforce_rate_limit(url)
|
|
58
|
+
|
|
59
|
+
# Fetch the URL
|
|
60
|
+
puts "[Worker #{worker_id}] Scraping #{url} (attempt #{attempt}/#{MAX_RETRIES})"
|
|
61
|
+
response = fetch_url(url)
|
|
62
|
+
|
|
63
|
+
# Parse and save the data
|
|
64
|
+
data = parse_response(response, url)
|
|
65
|
+
save_data(url, data)
|
|
66
|
+
|
|
67
|
+
@request_count += 1
|
|
68
|
+
puts "[Worker #{worker_id}] ✓ Successfully scraped #{url}"
|
|
69
|
+
|
|
70
|
+
Fractor::WorkResult.new(
|
|
71
|
+
result: { url: url, status: "success", size: data[:content].length },
|
|
72
|
+
work: work
|
|
73
|
+
)
|
|
74
|
+
rescue StandardError => e
|
|
75
|
+
handle_error(work, e)
|
|
76
|
+
end
|
|
77
|
+
end
|
|
78
|
+
|
|
79
|
+
private
|
|
80
|
+
|
|
81
|
+
def enforce_rate_limit(url)
|
|
82
|
+
domain = extract_domain(url)
|
|
83
|
+
last_time = @last_request_time[domain]
|
|
84
|
+
|
|
85
|
+
if last_time
|
|
86
|
+
elapsed = Time.now - last_time
|
|
87
|
+
if elapsed < RATE_LIMIT_DELAY
|
|
88
|
+
sleep_time = RATE_LIMIT_DELAY - elapsed
|
|
89
|
+
puts "[Worker #{worker_id}] Rate limiting: sleeping #{sleep_time.round(2)}s for #{domain}"
|
|
90
|
+
sleep(sleep_time)
|
|
91
|
+
end
|
|
92
|
+
end
|
|
93
|
+
|
|
94
|
+
@last_request_time[domain] = Time.now
|
|
95
|
+
end
|
|
96
|
+
|
|
97
|
+
def fetch_url(url)
|
|
98
|
+
uri = URI.parse(url)
|
|
99
|
+
http = Net::HTTP.new(uri.host, uri.port)
|
|
100
|
+
http.use_ssl = (uri.scheme == "https")
|
|
101
|
+
http.open_timeout = 10
|
|
102
|
+
http.read_timeout = 10
|
|
103
|
+
|
|
104
|
+
request = Net::HTTP::Get.new(uri.request_uri)
|
|
105
|
+
request["User-Agent"] = "Fractor Web Scraper Example/1.0"
|
|
106
|
+
|
|
107
|
+
response = http.request(request)
|
|
108
|
+
|
|
109
|
+
unless response.is_a?(Net::HTTPSuccess)
|
|
110
|
+
raise "HTTP Error: #{response.code} #{response.message}"
|
|
111
|
+
end
|
|
112
|
+
|
|
113
|
+
response
|
|
114
|
+
end
|
|
115
|
+
|
|
116
|
+
def parse_response(response, url)
|
|
117
|
+
content = response.body
|
|
118
|
+
content_type = response["content-type"] || "unknown"
|
|
119
|
+
|
|
120
|
+
{
|
|
121
|
+
url: url,
|
|
122
|
+
content: content,
|
|
123
|
+
content_type: content_type,
|
|
124
|
+
size: content.length,
|
|
125
|
+
timestamp: Time.now.iso8601,
|
|
126
|
+
headers: response.to_hash
|
|
127
|
+
}
|
|
128
|
+
end
|
|
129
|
+
|
|
130
|
+
def save_data(url, data)
|
|
131
|
+
filename = generate_filename(url)
|
|
132
|
+
filepath = File.join(@output_dir, filename)
|
|
133
|
+
|
|
134
|
+
File.write("#{filepath}.json", JSON.pretty_generate(data))
|
|
135
|
+
File.write("#{filepath}.html", data[:content])
|
|
136
|
+
|
|
137
|
+
puts "[Worker #{worker_id}] Saved to #{filepath}"
|
|
138
|
+
end
|
|
139
|
+
|
|
140
|
+
def generate_filename(url)
|
|
141
|
+
uri = URI.parse(url)
|
|
142
|
+
sanitized = "#{uri.host}#{uri.path}".gsub(/[^a-zA-Z0-9_-]/, "_")
|
|
143
|
+
timestamp = Time.now.strftime("%Y%m%d_%H%M%S")
|
|
144
|
+
"#{sanitized}_#{timestamp}"
|
|
145
|
+
end
|
|
146
|
+
|
|
147
|
+
def extract_domain(url)
|
|
148
|
+
URI.parse(url).host
|
|
149
|
+
rescue StandardError
|
|
150
|
+
"unknown"
|
|
151
|
+
end
|
|
152
|
+
|
|
153
|
+
def handle_error(work, error)
|
|
154
|
+
puts "[Worker #{worker_id}] ✗ Error scraping #{work.url}: #{error.message}"
|
|
155
|
+
|
|
156
|
+
# Return error with context about retry potential
|
|
157
|
+
Fractor::WorkResult.new(
|
|
158
|
+
error: error,
|
|
159
|
+
work: work,
|
|
160
|
+
error_context: {
|
|
161
|
+
url: work.url,
|
|
162
|
+
attempt: work.attempt,
|
|
163
|
+
max_retries: MAX_RETRIES,
|
|
164
|
+
retriable: work.attempt < MAX_RETRIES
|
|
165
|
+
}
|
|
166
|
+
)
|
|
167
|
+
end
|
|
168
|
+
end
|
|
169
|
+
|
|
170
|
+
# Progress tracker for monitoring scraping progress
|
|
171
|
+
class ProgressTracker
|
|
172
|
+
def initialize(total_urls)
|
|
173
|
+
@total_urls = total_urls
|
|
174
|
+
@completed = 0
|
|
175
|
+
@successful = 0
|
|
176
|
+
@failed = 0
|
|
177
|
+
@start_time = Time.now
|
|
178
|
+
@mutex = Mutex.new
|
|
179
|
+
end
|
|
180
|
+
|
|
181
|
+
def update(result)
|
|
182
|
+
@mutex.synchronize do
|
|
183
|
+
@completed += 1
|
|
184
|
+
if result.success?
|
|
185
|
+
@successful += 1
|
|
186
|
+
else
|
|
187
|
+
@failed += 1
|
|
188
|
+
end
|
|
189
|
+
|
|
190
|
+
print_progress
|
|
191
|
+
end
|
|
192
|
+
end
|
|
193
|
+
|
|
194
|
+
def print_progress
|
|
195
|
+
percentage = (@completed.to_f / @total_urls * 100).round(1)
|
|
196
|
+
elapsed = Time.now - @start_time
|
|
197
|
+
rate = @completed / elapsed
|
|
198
|
+
|
|
199
|
+
puts "\n" + "=" * 60
|
|
200
|
+
puts "Progress: #{@completed}/#{@total_urls} (#{percentage}%)"
|
|
201
|
+
puts "Successful: #{@successful} | Failed: #{@failed}"
|
|
202
|
+
puts "Elapsed: #{elapsed.round(1)}s | Rate: #{rate.round(2)} URLs/s"
|
|
203
|
+
puts "=" * 60 + "\n"
|
|
204
|
+
end
|
|
205
|
+
|
|
206
|
+
def summary
|
|
207
|
+
elapsed = Time.now - @start_time
|
|
208
|
+
|
|
209
|
+
puts "\n" + "=" * 60
|
|
210
|
+
puts "SCRAPING COMPLETE"
|
|
211
|
+
puts "=" * 60
|
|
212
|
+
puts "Total URLs: #{@total_urls}"
|
|
213
|
+
puts "Successful: #{@successful}"
|
|
214
|
+
puts "Failed: #{@failed}"
|
|
215
|
+
puts "Total time: #{elapsed.round(2)}s"
|
|
216
|
+
puts "Average rate: #{(@total_urls / elapsed).round(2)} URLs/s"
|
|
217
|
+
puts "=" * 60 + "\n"
|
|
218
|
+
end
|
|
219
|
+
end
|
|
220
|
+
|
|
221
|
+
# Main execution
|
|
222
|
+
if __FILE__ == $PROGRAM_NAME
|
|
223
|
+
# Example URLs to scrape (using httpbin.org for testing)
|
|
224
|
+
urls = [
|
|
225
|
+
"https://httpbin.org/html",
|
|
226
|
+
"https://httpbin.org/json",
|
|
227
|
+
"https://httpbin.org/xml",
|
|
228
|
+
"https://httpbin.org/robots.txt",
|
|
229
|
+
"https://httpbin.org/deny", # Will return 403 to test error handling
|
|
230
|
+
"https://httpbin.org/status/500", # Will return 500 to test retries
|
|
231
|
+
"https://httpbin.org/delay/2", # Slow response
|
|
232
|
+
"https://httpbin.org/user-agent",
|
|
233
|
+
"https://httpbin.org/headers",
|
|
234
|
+
"https://httpbin.org/ip"
|
|
235
|
+
]
|
|
236
|
+
|
|
237
|
+
puts "Starting Web Scraper Example"
|
|
238
|
+
puts "URLs to scrape: #{urls.length}"
|
|
239
|
+
puts "Workers: 3"
|
|
240
|
+
puts "Rate limit: 500ms between requests per domain"
|
|
241
|
+
puts "Max retries: 3 with exponential backoff"
|
|
242
|
+
puts "\n"
|
|
243
|
+
|
|
244
|
+
# Create output directory
|
|
245
|
+
output_dir = "scraped_data"
|
|
246
|
+
FileUtils.rm_rf(output_dir) if File.exist?(output_dir)
|
|
247
|
+
|
|
248
|
+
# Create progress tracker
|
|
249
|
+
tracker = WebScraper::ProgressTracker.new(urls.length)
|
|
250
|
+
|
|
251
|
+
# Create supervisor with 3 workers
|
|
252
|
+
supervisor = Fractor::Supervisor.new(
|
|
253
|
+
worker_pools: [
|
|
254
|
+
{ worker_class: WebScraper::WebScraperWorker, num_workers: 3 }
|
|
255
|
+
]
|
|
256
|
+
)
|
|
257
|
+
|
|
258
|
+
# Submit all URLs
|
|
259
|
+
work_items = urls.map { |url| WebScraper::ScrapeWork.new(url) }
|
|
260
|
+
supervisor.add_work_items(work_items)
|
|
261
|
+
|
|
262
|
+
# Start the supervisor
|
|
263
|
+
supervisor.run
|
|
264
|
+
|
|
265
|
+
# Collect results and update tracker
|
|
266
|
+
results = supervisor.results
|
|
267
|
+
(results.results + results.errors).each do |result|
|
|
268
|
+
tracker.update(result)
|
|
269
|
+
end
|
|
270
|
+
|
|
271
|
+
# Print summary
|
|
272
|
+
tracker.summary
|
|
273
|
+
|
|
274
|
+
# Print details of failures
|
|
275
|
+
failures = results.errors
|
|
276
|
+
if failures.any?
|
|
277
|
+
puts "\nFailed URLs:"
|
|
278
|
+
failures.each do |result|
|
|
279
|
+
puts " - #{result.error_context[:url]}: #{result.error.message}"
|
|
280
|
+
end
|
|
281
|
+
end
|
|
282
|
+
|
|
283
|
+
puts "\nData saved to: #{output_dir}/"
|
|
284
|
+
end
|
|
285
|
+
end
|
|
@@ -0,0 +1,406 @@
|
|
|
1
|
+
= Workflow Examples
|
|
2
|
+
|
|
3
|
+
This directory contains examples demonstrating Fractor's workflow system, which provides a GitHub Actions-style declarative DSL for orchestrating complex parallel processing pipelines.
|
|
4
|
+
|
|
5
|
+
== Overview
|
|
6
|
+
|
|
7
|
+
The Fractor workflow system allows you to:
|
|
8
|
+
|
|
9
|
+
* Define workflows with declarative DSL
|
|
10
|
+
* Specify job dependencies and execution order
|
|
11
|
+
* Type-safe data flow between jobs
|
|
12
|
+
* Automatic parallelization of independent jobs
|
|
13
|
+
* Fan-out and fan-in patterns
|
|
14
|
+
* Conditional job execution
|
|
15
|
+
* Both pipeline and continuous modes
|
|
16
|
+
|
|
17
|
+
== Quick Start
|
|
18
|
+
|
|
19
|
+
[source,ruby]
|
|
20
|
+
----
|
|
21
|
+
# Define data models
|
|
22
|
+
class InputData
|
|
23
|
+
attr_accessor :text
|
|
24
|
+
def initialize(text:) @text = text end
|
|
25
|
+
end
|
|
26
|
+
|
|
27
|
+
class OutputData
|
|
28
|
+
attr_accessor :result
|
|
29
|
+
def initialize(result:) @result = result end
|
|
30
|
+
end
|
|
31
|
+
|
|
32
|
+
# Define workers with input/output types
|
|
33
|
+
class ProcessWorker < Fractor::Worker
|
|
34
|
+
input_type InputData
|
|
35
|
+
output_type OutputData
|
|
36
|
+
|
|
37
|
+
def process(work)
|
|
38
|
+
output = OutputData.new(result: work.input.text.upcase)
|
|
39
|
+
Fractor::WorkResult.new(result: output, work: work)
|
|
40
|
+
end
|
|
41
|
+
end
|
|
42
|
+
|
|
43
|
+
# Define workflow
|
|
44
|
+
class MyWorkflow < Fractor::Workflow
|
|
45
|
+
workflow "my-workflow" do
|
|
46
|
+
input_type InputData
|
|
47
|
+
output_type OutputData
|
|
48
|
+
|
|
49
|
+
start_with "process"
|
|
50
|
+
end_with "process"
|
|
51
|
+
|
|
52
|
+
job "process" do
|
|
53
|
+
runs_with ProcessWorker
|
|
54
|
+
inputs_from_workflow
|
|
55
|
+
outputs_to_workflow
|
|
56
|
+
terminates_workflow
|
|
57
|
+
end
|
|
58
|
+
end
|
|
59
|
+
end
|
|
60
|
+
|
|
61
|
+
# Execute
|
|
62
|
+
input = InputData.new(text: "hello")
|
|
63
|
+
workflow = MyWorkflow.new
|
|
64
|
+
result = workflow.execute(input: input)
|
|
65
|
+
puts result.output.result # => "HELLO"
|
|
66
|
+
----
|
|
67
|
+
|
|
68
|
+
== Examples
|
|
69
|
+
|
|
70
|
+
Fractor provides three comprehensive workflow examples demonstrating different patterns:
|
|
71
|
+
|
|
72
|
+
=== link:simple_linear/README.adoc[Simple Linear Workflow]
|
|
73
|
+
|
|
74
|
+
Demonstrates sequential job processing with data transformation at each stage.
|
|
75
|
+
|
|
76
|
+
*Location:* `examples/workflow/simple_linear/`
|
|
77
|
+
|
|
78
|
+
*Key Concepts:*
|
|
79
|
+
|
|
80
|
+
* Sequential dependencies using `needs`
|
|
81
|
+
* Type-safe data flow with `input_type` and `output_type`
|
|
82
|
+
* Workflow entry and exit points (`start_with` / `end_with`)
|
|
83
|
+
* Job output mapping with `inputs_from_job`
|
|
84
|
+
|
|
85
|
+
*Run:*
|
|
86
|
+
[source,shell]
|
|
87
|
+
----
|
|
88
|
+
ruby examples/workflow/simple_linear/simple_linear_workflow.rb
|
|
89
|
+
----
|
|
90
|
+
|
|
91
|
+
=== link:fan_out/README.adoc[Fan-Out Workflow]
|
|
92
|
+
|
|
93
|
+
Demonstrates parallel processing patterns with fan-out and fan-in aggregation.
|
|
94
|
+
|
|
95
|
+
*Location:* `examples/workflow/fan_out/`
|
|
96
|
+
|
|
97
|
+
*Key Concepts:*
|
|
98
|
+
|
|
99
|
+
* Fan-out pattern: one job feeding multiple parallel jobs
|
|
100
|
+
* Fan-in pattern: multiple jobs aggregating into one job
|
|
101
|
+
* Multiple input aggregation with `inputs_from_multiple`
|
|
102
|
+
* Input mapping syntax for aggregating job outputs
|
|
103
|
+
|
|
104
|
+
*Run:*
|
|
105
|
+
[source,shell]
|
|
106
|
+
----
|
|
107
|
+
ruby examples/workflow/fan_out/fan_out_workflow.rb
|
|
108
|
+
----
|
|
109
|
+
|
|
110
|
+
=== link:conditional/README.adoc[Conditional Workflow]
|
|
111
|
+
|
|
112
|
+
Demonstrates runtime conditional execution based on data validation.
|
|
113
|
+
|
|
114
|
+
*Location:* `examples/workflow/conditional/`
|
|
115
|
+
|
|
116
|
+
*Key Concepts:*
|
|
117
|
+
|
|
118
|
+
* Conditional job execution using `if_condition`
|
|
119
|
+
* Multiple termination points with `terminates_workflow`
|
|
120
|
+
* Runtime decision making with context access
|
|
121
|
+
* Branching logic based on validation results
|
|
122
|
+
|
|
123
|
+
*Run:*
|
|
124
|
+
[source,shell]
|
|
125
|
+
----
|
|
126
|
+
ruby examples/workflow/conditional/conditional_workflow.rb
|
|
127
|
+
----
|
|
128
|
+
|
|
129
|
+
== Core Concepts
|
|
130
|
+
|
|
131
|
+
=== Jobs
|
|
132
|
+
|
|
133
|
+
Jobs are the basic units of work in a workflow. Each job:
|
|
134
|
+
|
|
135
|
+
* Runs a specific Worker class
|
|
136
|
+
* Declares its dependencies
|
|
137
|
+
* Maps inputs from previous jobs or workflow input
|
|
138
|
+
* Produces typed output
|
|
139
|
+
|
|
140
|
+
[source,ruby]
|
|
141
|
+
----
|
|
142
|
+
job "my-job" do
|
|
143
|
+
needs "previous-job" # Dependencies
|
|
144
|
+
runs_with MyWorker # Worker class
|
|
145
|
+
parallel_workers 4 # Parallel execution
|
|
146
|
+
inputs_from_job "previous-job" # Input mapping
|
|
147
|
+
outputs_to_workflow # Output to workflow
|
|
148
|
+
end
|
|
149
|
+
----
|
|
150
|
+
|
|
151
|
+
=== Data Flow
|
|
152
|
+
|
|
153
|
+
Data flows through the workflow via typed models:
|
|
154
|
+
|
|
155
|
+
[source,ruby]
|
|
156
|
+
----
|
|
157
|
+
# Job A output
|
|
158
|
+
class AOutput
|
|
159
|
+
attr_accessor :data
|
|
160
|
+
end
|
|
161
|
+
|
|
162
|
+
# Job B input (can use A's output)
|
|
163
|
+
class BInput
|
|
164
|
+
attr_accessor :data
|
|
165
|
+
end
|
|
166
|
+
|
|
167
|
+
job "job-a" do
|
|
168
|
+
runs_with WorkerA # output_type AOutput
|
|
169
|
+
end
|
|
170
|
+
|
|
171
|
+
job "job-b" do
|
|
172
|
+
needs "job-a"
|
|
173
|
+
runs_with WorkerB # input_type BInput
|
|
174
|
+
|
|
175
|
+
# Map AOutput.data → BInput.data
|
|
176
|
+
inputs_from_job "job-a", select: {
|
|
177
|
+
data: :data
|
|
178
|
+
}
|
|
179
|
+
end
|
|
180
|
+
----
|
|
181
|
+
|
|
182
|
+
=== Fan-Out Pattern
|
|
183
|
+
|
|
184
|
+
One job's output feeds multiple parallel jobs:
|
|
185
|
+
|
|
186
|
+
[source,ruby]
|
|
187
|
+
----
|
|
188
|
+
job "extract" do
|
|
189
|
+
runs_with ExtractWorker
|
|
190
|
+
end
|
|
191
|
+
|
|
192
|
+
# These three run in parallel, all using extract's output
|
|
193
|
+
job "validate" do
|
|
194
|
+
needs "extract"
|
|
195
|
+
runs_with ValidateWorker
|
|
196
|
+
inputs_from_job "extract"
|
|
197
|
+
end
|
|
198
|
+
|
|
199
|
+
job "analyze" do
|
|
200
|
+
needs "extract"
|
|
201
|
+
runs_with AnalyzeWorker
|
|
202
|
+
inputs_from_job "extract"
|
|
203
|
+
end
|
|
204
|
+
|
|
205
|
+
job "stats" do
|
|
206
|
+
needs "extract"
|
|
207
|
+
runs_with StatsWorker
|
|
208
|
+
inputs_from_job "extract"
|
|
209
|
+
end
|
|
210
|
+
----
|
|
211
|
+
|
|
212
|
+
=== Fan-In Pattern
|
|
213
|
+
|
|
214
|
+
Multiple jobs feed one aggregator job:
|
|
215
|
+
|
|
216
|
+
[source,ruby]
|
|
217
|
+
----
|
|
218
|
+
job "aggregate" do
|
|
219
|
+
needs ["validate", "analyze", "stats"]
|
|
220
|
+
runs_with AggregateWorker
|
|
221
|
+
|
|
222
|
+
# Combine outputs from multiple jobs
|
|
223
|
+
inputs_from_multiple(
|
|
224
|
+
"validate" => { validated: :data },
|
|
225
|
+
"analyze" => { analysis: :results },
|
|
226
|
+
"stats" => { statistics: :summary }
|
|
227
|
+
)
|
|
228
|
+
end
|
|
229
|
+
----
|
|
230
|
+
|
|
231
|
+
=== Conditional Execution
|
|
232
|
+
|
|
233
|
+
Jobs can execute conditionally:
|
|
234
|
+
|
|
235
|
+
[source,ruby]
|
|
236
|
+
----
|
|
237
|
+
job "optional-job" do
|
|
238
|
+
needs "check"
|
|
239
|
+
runs_with OptionalWorker
|
|
240
|
+
|
|
241
|
+
# Only run if condition is met
|
|
242
|
+
if_condition ->(ctx) {
|
|
243
|
+
ctx.job_output("check").should_process
|
|
244
|
+
}
|
|
245
|
+
end
|
|
246
|
+
----
|
|
247
|
+
|
|
248
|
+
== Best Practices
|
|
249
|
+
|
|
250
|
+
=== Define Clear Data Models
|
|
251
|
+
|
|
252
|
+
Use separate classes for each job's input and output:
|
|
253
|
+
|
|
254
|
+
[source,ruby]
|
|
255
|
+
----
|
|
256
|
+
# Good: Clear, type-safe models
|
|
257
|
+
class ExtractInput
|
|
258
|
+
attr_accessor :source_url, :batch_size
|
|
259
|
+
end
|
|
260
|
+
|
|
261
|
+
class ExtractOutput
|
|
262
|
+
attr_accessor :raw_data, :record_count, :metadata
|
|
263
|
+
end
|
|
264
|
+
|
|
265
|
+
# Better: Use Lutaml::Model for validation
|
|
266
|
+
class ExtractInput < Lutaml::Model::Serializable
|
|
267
|
+
attribute :source_url, :string
|
|
268
|
+
attribute :batch_size, :integer
|
|
269
|
+
|
|
270
|
+
validates :source_url, presence: true
|
|
271
|
+
validates :batch_size, numericality: { greater_than: 0 }
|
|
272
|
+
end
|
|
273
|
+
----
|
|
274
|
+
|
|
275
|
+
=== Keep Jobs Focused
|
|
276
|
+
|
|
277
|
+
Each job should have a single responsibility:
|
|
278
|
+
|
|
279
|
+
[source,ruby]
|
|
280
|
+
----
|
|
281
|
+
# Good: Focused jobs
|
|
282
|
+
job "extract" do
|
|
283
|
+
runs_with ExtractWorker # Only extracts
|
|
284
|
+
end
|
|
285
|
+
|
|
286
|
+
job "validate" do
|
|
287
|
+
needs "extract"
|
|
288
|
+
runs_with ValidateWorker # Only validates
|
|
289
|
+
end
|
|
290
|
+
|
|
291
|
+
# Avoid: Jobs that do too much
|
|
292
|
+
job "extract-and-validate" do # Too many responsibilities
|
|
293
|
+
runs_with ExtractAndValidateWorker
|
|
294
|
+
end
|
|
295
|
+
----
|
|
296
|
+
|
|
297
|
+
=== Use Descriptive Names
|
|
298
|
+
|
|
299
|
+
Job names should clearly indicate their purpose:
|
|
300
|
+
|
|
301
|
+
[source,ruby]
|
|
302
|
+
----
|
|
303
|
+
# Good
|
|
304
|
+
job "extract-from-api"
|
|
305
|
+
job "validate-schema"
|
|
306
|
+
job "transform-data"
|
|
307
|
+
job "load-to-database"
|
|
308
|
+
|
|
309
|
+
# Avoid
|
|
310
|
+
job "job1"
|
|
311
|
+
job "process"
|
|
312
|
+
job "do-stuff"
|
|
313
|
+
----
|
|
314
|
+
|
|
315
|
+
=== Leverage Parallelization
|
|
316
|
+
|
|
317
|
+
Specify worker counts for CPU-intensive jobs:
|
|
318
|
+
|
|
319
|
+
[source,ruby]
|
|
320
|
+
----
|
|
321
|
+
job "heavy-computation" do
|
|
322
|
+
runs_with ComputeWorker
|
|
323
|
+
parallel_workers 8 # Use 8 parallel workers
|
|
324
|
+
end
|
|
325
|
+
----
|
|
326
|
+
|
|
327
|
+
=== Handle Errors Gracefully
|
|
328
|
+
|
|
329
|
+
Workers should return error results rather than raising exceptions:
|
|
330
|
+
|
|
331
|
+
[source,ruby]
|
|
332
|
+
----
|
|
333
|
+
class MyWorker < Fractor::Worker
|
|
334
|
+
def process(work)
|
|
335
|
+
if work.input.invalid?
|
|
336
|
+
return Fractor::WorkResult.new(
|
|
337
|
+
error: "Invalid input",
|
|
338
|
+
work: work
|
|
339
|
+
)
|
|
340
|
+
end
|
|
341
|
+
|
|
342
|
+
# Normal processing...
|
|
343
|
+
Fractor::WorkResult.new(result: output, work: work)
|
|
344
|
+
rescue => e
|
|
345
|
+
Fractor::WorkResult.new(
|
|
346
|
+
error: "Unexpected error: #{e.message}",
|
|
347
|
+
work: work
|
|
348
|
+
)
|
|
349
|
+
end
|
|
350
|
+
end
|
|
351
|
+
----
|
|
352
|
+
|
|
353
|
+
== Architecture
|
|
354
|
+
|
|
355
|
+
The workflow system builds on Fractor's existing components:
|
|
356
|
+
|
|
357
|
+
[source]
|
|
358
|
+
----
|
|
359
|
+
Workflow (DSL)
|
|
360
|
+
↓
|
|
361
|
+
Jobs (Dependencies)
|
|
362
|
+
↓
|
|
363
|
+
Workers (Processing)
|
|
364
|
+
↓
|
|
365
|
+
Supervisor (Execution)
|
|
366
|
+
↓
|
|
367
|
+
Ractors (Parallelism)
|
|
368
|
+
----
|
|
369
|
+
|
|
370
|
+
Key components:
|
|
371
|
+
|
|
372
|
+
* `Fractor::Workflow` - Workflow definition and DSL
|
|
373
|
+
* `Fractor::Workflow::Job` - Job configuration
|
|
374
|
+
* `Fractor::Workflow::WorkflowExecutor` - Orchestration
|
|
375
|
+
* `Fractor::Workflow::WorkflowContext` - Data flow management
|
|
376
|
+
* `Fractor::Workflow::WorkflowValidator` - Structure validation
|
|
377
|
+
|
|
378
|
+
== Future Features
|
|
379
|
+
|
|
380
|
+
Planned enhancements:
|
|
381
|
+
|
|
382
|
+
* Continuous mode support
|
|
383
|
+
* Pipeline stages grouping
|
|
384
|
+
* Matrix strategies
|
|
385
|
+
* Workflow visualization
|
|
386
|
+
* State persistence
|
|
387
|
+
* Resume from failure
|
|
388
|
+
* Workflow composition
|
|
389
|
+
|
|
390
|
+
== Contributing
|
|
391
|
+
|
|
392
|
+
When adding new workflow examples:
|
|
393
|
+
|
|
394
|
+
1. Keep examples simple and focused on one feature
|
|
395
|
+
2. Include clear comments explaining each part
|
|
396
|
+
3. Provide example output
|
|
397
|
+
4. Document any prerequisites
|
|
398
|
+
5. Update this README
|
|
399
|
+
|
|
400
|
+
== Support
|
|
401
|
+
|
|
402
|
+
For questions or issues with workflows:
|
|
403
|
+
|
|
404
|
+
* Check existing examples
|
|
405
|
+
* Review the main Fractor documentation
|
|
406
|
+
* Report issues via GitHub
|