RubyGems - scout-essentials - Versions diffs - 1.7.1 → 1.8.0 - Mend

scout-essentials 1.7.1 → 1.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (54) hide show

checksums.yaml +4 -4
data/.vimproject +200 -47
data/README.md +136 -0
data/Rakefile +1 -0
data/VERSION +1 -1
data/doc/Annotation.md +352 -0
data/doc/CMD.md +203 -0
data/doc/ConcurrentStream.md +163 -0
data/doc/IndiferentHash.md +240 -0
data/doc/Log.md +235 -0
data/doc/NamedArray.md +174 -0
data/doc/Open.md +331 -0
data/doc/Path.md +217 -0
data/doc/Persist.md +214 -0
data/doc/Resource.md +229 -0
data/doc/SimpleOPT.md +236 -0
data/doc/TmpFile.md +154 -0
data/lib/scout/annotation/annotated_object.rb +8 -0
data/lib/scout/annotation/annotation_module.rb +1 -0
data/lib/scout/cmd.rb +19 -12
data/lib/scout/concurrent_stream.rb +3 -1
data/lib/scout/config.rb +2 -2
data/lib/scout/indiferent_hash/options.rb +2 -2
data/lib/scout/indiferent_hash.rb +16 -0
data/lib/scout/log/color.rb +5 -3
data/lib/scout/log/fingerprint.rb +8 -8
data/lib/scout/log/progress/report.rb +6 -6
data/lib/scout/log.rb +7 -7
data/lib/scout/misc/digest.rb +11 -13
data/lib/scout/misc/format.rb +2 -2
data/lib/scout/misc/system.rb +5 -0
data/lib/scout/open/final.rb +16 -1
data/lib/scout/open/remote.rb +0 -1
data/lib/scout/open/stream.rb +30 -5
data/lib/scout/open/util.rb +32 -0
data/lib/scout/path/digest.rb +12 -2
data/lib/scout/path/find.rb +19 -6
data/lib/scout/path/util.rb +37 -1
data/lib/scout/persist/open.rb +2 -0
data/lib/scout/persist.rb +7 -1
data/lib/scout/resource/path.rb +2 -2
data/lib/scout/resource/util.rb +18 -4
data/lib/scout/resource.rb +15 -1
data/lib/scout/simple_opt/parse.rb +2 -0
data/lib/scout/tmpfile.rb +1 -1
data/scout-essentials.gemspec +19 -6
data/test/scout/misc/test_hook.rb +2 -2
data/test/scout/open/test_stream.rb +43 -15
data/test/scout/path/test_find.rb +1 -1
data/test/scout/path/test_util.rb +11 -0
data/test/scout/test_path.rb +4 -4
data/test/scout/test_persist.rb +10 -1
metadata +31 -5
data/README.rdoc +0 -18

data/doc/NamedArray.md ADDED Viewed

@@ -0,0 +1,174 @@
+# NamedArray
+NamedArray is a small utility mixin built on top of the Annotation system that gives arrays named fields and name-based accessors. It lets you treat an Array like a record/tuple where elements can be accessed by name (symbol or string), supports fuzzy name matching, conversion to a hash (indifferent to string/symbol keys), and provides helpers for zipping/combining lists of named values.
+NamedArray extends Annotation and declares two annotation attributes:
+- fields — an ordered list of names for each position in the array
+- key — an optional primary key field name
+Since NamedArray extends Annotation, you can apply it to an array via NamedArray.setup(array, fields, key: ...), or by extending an instance.
+Examples:
+```ruby
+a = NamedArray.setup([1,2], [:a, :b])
+a[:a]    # => 1
+a["b"]   # => 2
+a.a      # => 1   # method_missing lookup
+a.to_hash  # => IndiferentHash { a: 1, b: 2 }
+```
+## Core instance API
+- fields, key
+  - Provided by Annotation (accessors). fields is an Array of field names associated with array positions.
+- all_fields
+  - Returns [key, fields].compact.flatten — useful if you want the key included with other fields.
+- [](name_or_index)
+  - Accepts a field name (symbol or string) or numeric index. Name is resolved to a numeric position via identify_name; if unresolved returns nil.
+  - Example: a[:a], a["a"], a[0]
+- []=(name_or_index, value)
+  - Sets element by name (resolved to a position) or index; returns nil if name not found.
+- positions(fields)
+  - Resolve one or many fields to their positions (delegates to NamedArray.identify_name).
+- values_at(*positions)
+  - Accepts named fields or positions; it will translate names to indices before calling Array#values_at.
+- concat(other)
+  - If other is a Hash: appends values of the hash in iteration order and adds the hash keys to this array's fields list.
+    Example:
+      a.concat({c: 3, d: 4})  # adds 3 and 4 to array and [:c, :d] to fields
+  - If other is another NamedArray: standard concat and fields from other are appended.
+  - Otherwise behaves like Array#concat.
+- to_hash
+  - Returns a hash mapping fields => value for each field position. The returned hash is extended with IndiferentHash (so both string and symbol lookups work).
+  - Example: a.to_hash[:a] => 1
+- prety_print
+  - Convenience pretty-print wrapper: uses Misc.format_definition_list(self.to_hash, sep: "\n").
+- method_missing(name, *args)
+  - If name resolves to a field (via identify_name) returns self[name]; otherwise calls super. This gives quick accessors like a.foo
+## Name resolution and matching
+NamedArray provides flexible name resolution via:
+- NamedArray.identify_name(names, selected, strict: false)
+  - names: array of field names (usually the NamedArray#fields)
+  - selected: value to resolve — may be nil, Range, Integer, Symbol, or String
+  - Returns:
+    - Integer index (position) for a single field selection
+    - Range (unchanged) if a Range is passed
+    - 0 for nil (treat nil as first field)
+    - :key for Symbol :key (special sentinel)
+    - nil if unresolved
+Resolution rules:
+- nil => 0
+- Range => returned as-is
+- Integer => returned as-is
+- Symbol:
+  - if :key => returns :key
+  - otherwise finds first field whose to_s equals the symbol name
+- String:
+  - exact string match first
+  - if string is numeric (^\d+$) it is treated as an index
+  - unless strict: fuzzy match using NamedArray.field_match
+    - field_match returns true if:
+      - exact equality
+      - one contains the other inside parentheses
+      - one starts with the other followed by a space
+  - returns the index found or nil if none
+Instance helper identify_name(selected) delegates to the class method using this NamedArray's fields.
+Note: identify_name accepts arrays for selected (returns an array of resolved positions), so values_at and other helpers can pass multiple names.
+## Class-level helpers for lists
+- NamedArray.field_match(field, name)
+  - Helper used by identify_name for fuzzy matching of two strings (parentheses and prefix matching).
+- NamedArray._zip_fields(array, max = nil)
+  - Internal helper to zip together an array of lists, expanding single-element lists to match `max`.
+- NamedArray.zip_fields(array)
+  - Zips a list-of-lists into per-position combined lists. Optimized to slice large inputs when array length is huge.
+  Example:
+  ```ruby
+  NamedArray.zip_fields([ %w(a b), %w(1 1) ]) # => [["a","1"], ["b","1"]]
+  ```
+- NamedArray.add_zipped(source, new)
+  - Given two zipped-lists (source and new), concatenates each corresponding sub-array from `new` into `source` (skips nil entries).
+  - Useful to merge results incrementally.
+## Concatenation with Hash
+Calling concat with a Hash behaves like:
+```ruby
+a = NamedArray.setup([1,2], [:a, :b])
+a.concat({c: 3, d: 4})
+# resulting array becomes [1,2,3,4] and fields => [:a, :b, :c, :d]
+```
+This is handy when building named rows incrementally from keyed data.
+## Integration with Annotation
+Because NamedArray extends Annotation:
+- You can call NamedArray.setup(array, fields) to set the `@fields` annotation on the array and extend it with NamedArray behavior.
+- NamedArray.setup delegates to the Annotation::AnnotationModule.setup implementation for assigning @fields/@key values to the array.
+Example:
+```ruby
+a = NamedArray.setup([1,2], [:a, :b])
+a.fields  # => [:a, :b]
+```
+## Examples (from tests)
+Identify names:
+```ruby
+names = ["ValueA", "ValueB (Entity type)", "15"]
+NamedArray.identify_name(names, "ValueA")    # => 0
+NamedArray.identify_name(names, :key)        # => :key
+NamedArray.identify_name(names, nil)         # => 0
+NamedArray.identify_name(names, "ValueB")    # => 1  (fuzzy match)
+NamedArray.identify_name(names, 1)           # => 1
+```
+Basic named array usage:
+```ruby
+a = NamedArray.setup([1,2], [:a, :b])
+a[:a]      # => 1
+a[:c]      # => nil
+a.a        # => 1   (method_missing provides a getter)
+a.to_hash  # => IndiferentHash { a: 1, b: 2 }
+```
+Zipping and adding zipped:
+```ruby
+NamedArray.zip_fields([ %w(a b), %w(1 1) ]) # => [["a","1"], ["b","1"]]
+a = [%w(a b), %w(1 1)]
+NamedArray.add_zipped(a, [%w(c), %w(1)])
+NamedArray.add_zipped(a, [%w(d), %w(1)])
+# a => [%w(a b c d), %w(1 1 1 1)]
+```
+## Notes & caveats
+- Name matching is intentionally forgiving (parentheses and space-prefix checks). Use `identify_name(..., strict: true)` to force exact matches only.
+- The `fields` annotation must correspond to element positions in the array. If fields and array lengths differ, name resolution may return nil or indices outside current array bounds.
+- to_hash returns an IndiferentHash (so consumers can use either string or symbol keys).
+- method_missing exposes field getters only; it does not create setters (use []= to assign by name).
+NamedArray is small but convenient when treating Arrays as records/rows with named columns and needing flexible lookup and composition tools.

data/doc/Open.md ADDED Viewed

@@ -0,0 +1,331 @@
+# Open
+The Open module provides unified, high-level file/stream/remote I/O and filesystem utilities. It wraps plain File I/O, streaming helpers, remote fetching (wget/ssh), atomic/sensible writes, pipe/fifo helpers, gzip/bgzip/zip helpers, file-system operations (mkdir, mv, ln, cp, rm, etc.), and a lock wrapper (Lockfile). Use Open when you need robust file access, streaming, temporary/atomic writes, remote access and process-safe locking.
+Sections:
+- Opening / reading / writing files
+- Streams, pipes and tees
+- Sensible / atomic writes
+- Remote fetching, caching and downloads
+- File / filesystem helpers
+- Locking
+- Sync (rsync)
+- Utilities (gzip/bgzip/grep/sort/collapse)
+- Examples
+- Notes and edge cases
+---
+## Opening / reading / writing files
+Open unifies access and transparently handles compressed and remote files.
+- Open.open(file, options = {}) { |io| ... } or returns an IO-like stream
+  - Accepts File paths, Path objects, IO, StringIO.
+  - Options (via IndiferentHash):
+    - :mode (default 'r') — file open mode (e.g., 'r', 'w', 'rb', etc.)
+    - :grep, :invert_grep, :fixed_grep — pipe the stream through grep
+    - :noz — if true, do not auto-decompress zip/gzip/bgz
+    - :gzip / :bgzip / :zip — force decompression
+  - For compressed files: Open detects .gz, .bgz, .zip and pipes through gzip/bgzip/unzip unless mode includes "w" (noz true).
+  - The returned stream is extended with NamedStream (has .filename and digest_str helper).
+- Open.file_open(file, grep = false, mode = 'r', invert_grep = false, fixed_grep = true, options = {})
+  - Returns the basic stream (no auto-decompress). Uses get_stream to handle remote/ssh/wget or open file.
+- Open.get_stream(file, mode = 'r', options = {})
+  - Low-level stream getter:
+    - If file is already a stream, returns it.
+    - If file responds to .stream, returns file.stream.
+    - If Path, resolves .find.
+    - If remote URL, delegates to Open.ssh or Open.wget.
+    - Finally falls back to File.open(File.expand_path(file), mode).
+- Open.read(file, options = {}) { |line| ... } or returns String
+  - If block given: yields lines from file (fixes UTF-8 by default; suppressed via :nofix).
+  - If no block: returns full file contents (UTF-8 fixed by default).
+  - Supports :grep and :invert_grep (see tests).
+- Open.write(file, content = nil, options = {})
+  - Atomic/robust writing wrapper:
+    - options default includes :mode => 'w'
+    - If mode includes 'w' ensures parent directory exists.
+    - If a block is given, yields a File object opened in mode and ensures close; on exception removes target file.
+    - If content is nil => writes empty file.
+    - If content is String => writes content.
+    - If content is an IO/StringIO => streams content into file with locking.
+    - On success calls Open.notify_write(file) and returns nil.
+  - Use for simple writes; for safer concurrency-sensitive writes prefer Open.sensible_write (below).
+---
+## Streams, pipes and tees
+Open contains multiple helpers to produce and manage streams:
+- Open.consume_stream(io, in_thread = false, into = nil, into_close = true)
+  - Consumes `io` reading blocks of size BLOCK_SIZE and writes into `into` (file or IO) or discards.
+  - If in_thread true, spawns a consumer thread and returns it (thread named and stored).
+  - Handles exceptions: closes and deletes partial files on failure; forwards abort to stream if supported.
+- Open.pipe
+  - Creates an IO.pipe pair [reader, writer] and records writer for management. Returns [sout, sin] (sout reader, sin writer).
+  - Caller typically uses sout as stream to read and sin to write.
+- Open.open_pipe { |sin| ... } -> returns sout
+  - Creates a pipe and runs the block with `sin` (writer) in a new thread (or fork if requested). The returned `sout` is a ConcurrentStream configured to join/handle threads/pids.
+  - Example: use for producing a stream on-the-fly for other consumers.
+- Open.open_pipe(do_fork = false, close = true) yields sin in child/forked thread and returns sout for parent.
+- Open.tee_stream_thread(stream) / Open.tee_stream_thread_multiple(stream, num)
+  - Duplicate input stream into multiple output pipes; returns out pipes.
+  - Uses a splitter thread that reads from the source and writes to each pipe; sets up abort callbacks and cleanup.
+  - Useful to fan-out a single stream to multiple consumers without re-reading source.
+- Open.tee_stream(stream) — convenience returns two outputs.
+- Open.read_stream / Open.read_stream(stream, size)
+  - Blocking reads helper to ensure reading exactly `size` bytes; raises ClosedStream if stream EOF.
+- Open.with_fifo(path = nil, clean = true) { |path| ... }
+  - Create FIFO in temp path and yield it; removes it after block.
+---
+## Sensible / atomic writes
+Use `Open.sensible_write(path, content, options = {})` for safe writes that avoid overwriting existing targets and use temporary files + atomic rename.
+- Behavior:
+  - If path exists and :force not true, will consume source and skip update.
+  - Writes to a temporary file in `Open.sensible_write_dir` then moves (Open.mv) into place.
+  - Supports lock options via `:lock` key (uses Open.lock). Accepts hash of lock settings or Lockfile instance.
+  - Ensures cleanup of temp files on exception; preserves existing target if write fails.
+  - On successful move, calls Open.notify_write(path).
+- Open.sensible_write_lock_dir / Open.sensible_write_dir are configurable directories (Paths) used for temporary files and lock state.
+- Open.sensible_write uses Open.lock to protect move operations.
+- For basic atomic writes, Open.write does attempt file lock during write (f.flock File::LOCK_EX) but sensible_write also uses safer tmp->mv semantics and optionally locking for concurrent processes.
+---
+## Remote fetching, caching and downloads
+Open supports remote URLs and SSH-style access:
+- Open.remote?(file) -> Boolean if URL-like (http|https|ftp|ssh)
+- Open.ssh?(file) -> ssh:// scheme detection
+- Open.ssh(file, options = {})
+  - Parses ssh://server:path and streams via `ssh server cat 'path'` (if server != 'localhost').
+  - For localhost returns Open.open(file) (local path handling).
+- Open.wget(url, options = {})
+  - Download via `wget` (through CMD.cmd), returns an IO-like stream (ConcurrentStream).
+  - Options:
+    - :pipe => true (default), :autojoin => true
+    - supports `--post-data=`, cookies, quiet mode, :force, :nocache
+    - caching: unless :nocache true, saves to remote_cache_dir under a digest filename (Open.add_cache) and returns Open.open on cache file.
+    - :nice / :nice_key for throttling repeated requests with a wait
+    - Errors raise OpenURLError on failure
+  - Example: Open.wget('http://example.com', quiet: true, nocache: true).read
+- Open.cache_file(url, options), Open.in_cache(url, options), Open.add_cache(url, data, options), Open.open_cache(url)
+  - Support caching remote requests to `Open.remote_cache_dir`.
+- Open.download(url, file) — wrapper to run wget into local file with logging.
+- Open.digest_url(url, options) — compute cache key based on url and post data/file.
+- Open.scp(source_file, target_file, target:, source:) — convenience wrapper for scp and remote mkdir.
+---
+## File / filesystem helpers
+Common filesystem operations with Path support:
+- Open.mkdir(path) — ensure directory exists (mkdir_p). Accepts Path.
+- Open.mkfiledir(target) — ensure parent dir exists for file target.
+- Open.mv(source, target) — move with tmp intermediate to reduce risk (move to .tmp_mv.* then rename).
+- Open.rm(file) — remove file if exists or broken symlink.
+- Open.rm_rf(file) — recursive remove
+- Open.touch(file) — create or update mtime (ensures parent dir).
+- Open.cp(source, target) — copy (uses cp_r, removes existing target).
+- Open.directory?(file)
+- Open.exists?(file) / Open.exist? alias — existence check (Path supported).
+- Open.ctime(file), Open.mtime(file) — time helpers; mtime has logic to follow symlinks and handle special Step info file cases.
+- Open.size(file)
+- Open.ln_s(source, target) — create symbolic link (ensures parent dir and remove existing).
+- Open.ln(source, target) — create hard link (removing target if present).
+- Open.ln_h(source, target) — attempt hard link via `ln -L`, fallback to copy on failure.
+- Open.link(source, target) — tries ln then ln_s as fallback.
+- Open.link_dir(source, target) — cp with hard-links (cp_lr).
+- Open.same_file(file1, file2) — File.identical?
+- Open.writable?(path) — checks writability handling symlinks and non-existing files.
+- Open.realpath(file) — returns canonical realpath (resolves symlinks).
+- Open.list(file) — returns file contents split on newline (convenience).
+---
+## Locking
+Open wraps Lockfile to provide safe locking primitives and a simpler interface.
+- Open.lock(file, unlock = true, options = {}) { |lockfile| ... }
+  - Acquire a lock (Lockfile) for a given path.
+  - `file` may be:
+    - a Lockfile instance (used directly),
+    - Path/String (lockfile path defaulting to `file + '.lock'`),
+    - nil with options[:lock] being a Lockfile instance or false.
+  - `unlock` default true; set false to keep lock after block (or raise KeepLocked inside block to keep lock and return payload).
+  - Options passed to Lockfile constructor (min_sleep, max_sleep, sleep_inc, max_age, refresh, timeout, etc.).
+  - Handles exceptions and unlocks safely in ensure.
+  - Example (from tests):
+    ```ruby
+    Open.lock lockfile_path, min_sleep: 0.01, max_sleep: 0.05 do
+      # critical section
+    end
+    ```
+- Lockfile class is included in `open/lock/lockfile.rb` — classic NFS-safe lockfile implementation (supports refreshing, stealing detection, sweeps, retries, timeouts, etc.). Use its options via Open.lock(..., options).
+---
+## Sync (rsync)
+- Open.rsync(source, target, options = {})
+  - Wrapper to build and execute an `rsync` command with common options.
+  - Options processed via IndiferentHash:
+    - :excludes, :files (list of files to transfer), :hard_link (use --link-dest), :test (dry-run), :print (return command), :delete, :source, :target (server strings), :other (extra args)
+  - Handles directory trailing slashes, remote server prefixes, ensures target dirs exist (remote mkdir via ssh when needed).
+  - Uses TMP files for --files-from when passing a list.
+  - Example:
+    ```ruby
+    Open.rsync(source_dir, target_dir, excludes: 'tmp_dir', delete: true)
+    ```
+- Open.sync is alias for rsync.
+---
+## Utilities
+- Compression helpers:
+  - Open.gzip?(file) / Open.bgzip?(file) / Open.zip?(file) — simple extension checks.
+  - Open.gunzip(stream), Open.gzip(stream), Open.bgzip(stream) — spawn subprocesses (zcat/gzip/bgzip) returning a piped IO.
+  - Open.gzip_pipe(file) — returns shell-friendly expression for gzip handling.
+- Open.grep(stream, grep, invert = false, fixed = nil, options = {})
+  - Uses system grep (GREP_CMD) to filter stream. Accepts Array of patterns (written to temporary file and used with -f) or single pattern.
+- Open.sort_stream(stream, header_hash: "#", cmd_args: nil, memory: false)
+  - Sort stream while preserving header lines (lines starting with header_hash).
+  - For memory=false runs external sort (env LC_ALL=C sort).
+  - Splits into substreams to avoid loading entire stream into memory for large inputs.
+- Open.collapse_stream(s, line: nil, sep: "\t", header: nil, compact: false, &block)
+  - Collapses consecutive lines with same key (first field) merging rest columns with `|` separators or processed by provided block.
+  - Useful for aggregating grouped data in streaming fashion.
+- Open.consume_stream described above.
+- Open.notify_write(file)
+  - If `<file>.notify` exists, reads its contents and sends notification (email or system notify) and removes .notify file.
+- Open.broken_link?(path) — true if symlink target missing
+- Open.exist_or_link?(file) — exists or symlink
+- Open.list(file) — read as lines
+- Lockfile utility: Lockfile.create(path) creates lock and opens file (used internally).
+---
+## Examples (from tests)
+Reading and line-wise processing:
+```ruby
+sum = 0
+Open.read(file) { |line| sum += line.to_i }
+```
+Open compressed file:
+```ruby
+Open.read("file.txt.gz")  # decompresses and returns content
+```
+Sensible write:
+```ruby
+Open.sensible_write(target_path, File.open(source))  # safe atomic write from stream
+```
+Pipe and open_pipe:
+```ruby
+sout = Open.open_pipe do |sin|
+  10.times { |i| sin.puts "line #{i}" }
+end
+# sout is a readable stream; consume:
+Open.consume_stream(sout, false, target_file)
+```
+Tee stream to two consumers:
+```ruby
+sout = Open.open_pipe do |sin|
+  2000.times { |i| sin.puts "line #{i}" }
+end
+s1, s2 = Open.tee_stream_thread(sout)
+t1 = Open.consume_stream(s1, true, tmp.file1)
+t2 = Open.consume_stream(s2, true, tmp.file2)
+t1.join; t2.join
+```
+Locking (concurrency safe):
+```ruby
+Open.lock(lockfile_path, min_sleep: 0.01, max_sleep: 0.05) do
+  # critical section
+end
+```
+Rsync:
+```ruby
+Open.rsync(source, target)
+Open.sync(source, target) # alias
+```
+Sorting a stream while preserving headers:
+```ruby
+sorted = Open.sort_stream(string_io)
+puts sorted.read
+```
+Collapse grouped rows:
+```ruby
+stream = Open.collapse_stream(s, sep: " ") do |parts|
+  parts.map(&:upcase) # or aggregate
+end
+```
+Remote fetch:
+```ruby
+io = Open.wget('http://example.com', quiet: true)
+puts io.read
+```
+---
+## Notes & edge cases
+- Many functions accept Path objects and will call `.find` or `.produce_and_find` where appropriate.
+- Remote functions rely on external commands (wget, ssh). Errors from those commands are wrapped/propagated (OpenURLError, ConcurrentStreamProcessFailed, etc.).
+- Open.sensible_write and Open.write try to avoid inconsistent partial files; sensible_write uses tmp-file + mv and optional Lockfile to avoid races.
+- Stream utilities use a ConcurrentStream abstraction (not documented here) to manage thread/pid/join semantics.
+- Tee/splitter threads forward aborts and exceptions to downstream consumers; callers must handle cleanup and join threads.
+- Open.lock relies on the included Lockfile implementation which supports NFS-safe locking, lock refreshing, stealing detection and sweeping stale locks.
+- gzip/bgzip/unzip operations spawn external processes and return piped IOs — ensure you consume/join and close these streams to avoid zombies.
+- Open.grep handles Array of patterns by writing them to a tmp file and using `-f` grep; fixed matching uses -F and -w by default.
+---
+This document covers the main public behaviors of the Open module: unified file/stream opening, robust writing, streaming utilities, remote fetching and caching, filesystem helpers, locking and synchronization, and convenience utilities for sorting, collapsing and grepping streams. Use Open for safe, composable I/O operations in scripts and concurrent code.