RubyGems - traject - Versions diffs - 2.1.0-java → 2.2.0-java - Mend

traject 2.1.0-java → 2.2.0-java

Files changed (18) hide show

checksums.yaml +4 -4
data/.gitignore +2 -0
data/.travis.yml +8 -20
data/CHANGES.md +14 -0
data/README.md +35 -56
data/doc/extending.md +20 -27
data/doc/indexing_rules.md +46 -57
data/doc/settings.md +17 -48
data/lib/traject/debug_writer.rb +31 -5
data/lib/traject/indexer.rb +6 -4
data/lib/traject/marc_extractor.rb +37 -157
data/lib/traject/marc_extractor_spec.rb +229 -0
data/lib/traject/version.rb +1 -1
data/test/debug_writer_test.rb +41 -0
data/test/marc_extractor_test.rb +24 -24
data/test/test_support/demo_config.rb +1 -1
data/traject.gemspec +5 -5
metadata +74 -73

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 1e497f1fbf0507bf5c427dc49caaf177eaf44cbe
-  data.tar.gz: e9822f0ab83b9645172aa56b4219709f302cad1b
+  metadata.gz: 31b0c8daf3b5365e6f76172e9af9d8ec7fd842fe
+  data.tar.gz: 9286f5626eb34bd4df3e89aa1197fc3b1810e601
 SHA512:
-  metadata.gz: 69e90b97d27248d62d17e7b3e6cd7088ff1f15e20f7205cd2e633965294efaa68ec991c86a181c0ad0b71ea3c601a165ce159acbac3eb9010682b8bc0d6d8e16
-  data.tar.gz: 6bab099581a57947ef8d14191bff72602c9c428fc836597b6e33a3baeec2f14ec5cc316d6d5f555e5c4c62d862966ff703992fedeed9307e6a45f6ef4d6f86c1
+  metadata.gz: 0495e94238704ab066c86c40e1ef65c86d4229178795031a59dfaf42ceb5efa01e00943b0447291a666d504b42f3cca62d56735240a56004541fa24b8efc138a
+  data.tar.gz: e002525a16a48c0897548f526df1c63bccdf5d23d888cd799afb2518b52a73b3832ba0f5e8a8748f8649ef45b43c6efe85b0f48fa883d9540480f15dd6d19423

data/.gitignore CHANGED Viewed

@@ -1,3 +1,5 @@
+.idea
+bench
 *.gem
 *.rbc
 .bundle

data/.travis.yml CHANGED Viewed

@@ -1,27 +1,15 @@
 language: ruby
+cache: bundler
+sudo: false
 rvm:
   - jruby-19mode
-  - jruby-head
+  - jruby-9.0.4.0
   - 1.9
-  - 2.1
   - 2.2
+  - 2.3.0
   - rbx-2
+before_install:
+  - gem update --system
+  - gem install bundler
 jdk:
-  - openjdk7
-  - openjdk6
-matrix:
-  exclude:
-    - rvm: 1.9
-      jdk: openjdk7
-    - rvm: 2.1
-      jdk: openjdk7
-    - rvm: rbx-2
-      jdk: openjdk7
-    - rvm: jruby-head
-      jdk: openjdk6
-    - rvm: 2.2
-      jdk: openjdk6
-  allow_failures:
-    - rvm: jruby-head
-bundler_args: --without debug
+  - oraclejdk8

data/CHANGES.md CHANGED Viewed

@@ -1,5 +1,19 @@
 # Changes
+## 2.2.0
+  * Change DebugWriter to be more forgiving (and informative) about missing record-id fields
+  * Automatically require DebugWriter for easier use on the command line
+  * Refactor MarcExtractor to be easier to read
+  * Fix .travis file to actually work, and target more recent rubies.
+## 2.1.0
+  * update some docs (typos)
+  * Make the indexer's `writer` r/w so it can be set at runtime (#110)
+  * Allow `extract_marc` to be callable from anywhere (#111)
+  * Add doc instructions/examples for programmatic Indexer use
+  * _Much_ better error reporting; easier to find which record went wrong
 ## 2.0.2
 * Guard against assumption of MARC data when indexing using SolrJsonWriter ([#94](https://github.com/traject-project/traject/issues/94))

data/README.md CHANGED Viewed

@@ -1,12 +1,10 @@
 # Traject
-An easy to use, high-performance, flexible and extensible MARC to Solr indexer.
+An easy to use, high-performance, flexible and extensible MARC to Solr indexer.
-You might use traject to index MARC data for a Solr-based discovery product like [Blacklight](https://github.com/projectblacklight/blacklight) or [VUFind](http://vufind.org/).
+You might use [traject](https://github.com/traject/traject) to index MARC data for a Solr-based discovery product like [Blacklight](https://github.com/projectblacklight/blacklight) or [VUFind](http://vufind.org/).
-Traject can also be generalized to a set of tools for getting structured data from a source, and transforming it to a hash-like object to send to a destination. In addition to sending data
-to solr, Traject can produce json or yaml files, tab-delimited files, CSV files, and output suitable
-for debugging by a human.
+Traject can also be generalized to a set of tools for getting structured data from a source, and transforming it to a hash-like object to send to a destination. In addition to sending data to Solr, Traject can produce json or yaml files, tab-delimited files, CSV files, and output suitable for debugging by a human.
 **Traject is stable, mature software, that is already being used in production by its authors.**
@@ -23,7 +21,7 @@ Initially by Jonathan Rochkind (Johns Hopkins Libraries) and Bill Dueber (Univer
 * Fast. Traject by default indexes using multiple threads, on multiple cpu cores, when the underlying
 ruby implementation (i.e., JRuby) allows it, and can use a separate thread for communication with
 solr even under MRI.
-* Composed of decoupled components, for flexibility and extensibility.
+* Composed of decoupled components, for flexibility and extensibility.
 * Designed to support local code and configuration that's maintainable and testable, and can be shared between projects as ruby gems.
 * Easy to split configuration between multiple files, for simple "pick-and-choose" command line options
 that can combine to deal with any of your local needs.
@@ -33,42 +31,36 @@ that can combine to deal with any of your local needs.
 Traject runs under jruby (1.7.x or higher), MRI ruby (1.9.3 or higher), or probably any other ruby platform.
-**Traject runs much faster on JRuby** where it can use multi-core parallelism, and the Java
-Marc4J marc reader. If performance is a concern, you should run traject on JRuby.
+**Traject runs much faster on JRuby** where it can use multi-core parallelism, and the Java Marc4J marc reader. If performance is a concern, you should run traject on JRuby.
 Some options for installing a ruby other than your system-provided one are [chruby](https://github.com/postmodern/chruby) and [ruby-install](https://github.com/postmodern/ruby-install#readme).
 Once you have ruby, just `$ gem install traject`.
-( **Note**: We might in the future provide an all-in-one .jar distribution, which does not require you to install jruby  on your system, for those who want the multi-threading of jruby without having to actually install it. Let us know if interested.).
+(**Note**: We might in the future provide an all-in-one .jar distribution, which will not require you to install jruby on your system, for those who want the multi-threading of jruby without having to actually install it. Let us know if interested.)
 ## Configuration files
-traject is configured using configuration files. To get a sense of what they look like, you can
-take a look at our sample basic configuration file,
-[demo_config.rb](./test/test_support/demo_config.rb). You could run traject with that configuration file
-as: `traject -c path/to/demo_config.rb marc_file.marc`.
+traject is configured using configuration files. To get a sense of what they look like, you can take a look at our sample basic configuration file,
+[demo_config.rb](./test/test_support/demo_config.rb). You could run traject with that configuration file as: `traject -c path/to/demo_config.rb marc_file.marc`.
 Configuration files are actually just ruby -- so by convention they end in `.rb`.
-We hope you can write basic useful configuration files without much ruby experience, since
-traject gives you some easy functions to use for common directives. But the full power
-of ruby is available to you if needed.
+We hope you can write basic useful configuration files without much ruby experience, since traject gives you some easy functions to use for common directives. But the full power of ruby is available to you if needed.
 **rubyist tip**: Technically, config files are executed with `instance_eval` in a Traject::Indexer instance, so the special commands you see are just methods on Traject::Indexer (or mixed into it). But you can
 call ordinary ruby `require` in config files, etc., too, to load
 external functionality. See more at Extending Logic below.
 You can keep your settings and indexing rules in one config file,
-or split them accross multiple config files however you like. (Connection details vs indexing? Common things vs environmental specific things?)
+or split them across multiple config files however you like. (Connection details vs indexing? Common things vs environmental specific things?)
 There are two main categories of directives in your configuration files: _Settings_, and _Indexing Rules_.
 ## Settings
-Settings are a flat list of key/value pairs, where the keys are always strings and the values usually are. They look like this
-in a config file:
+Settings are a flat list of key/value pairs, where the keys are always strings and the values usually are too. They look like this in a config file:
 ~~~ruby
 # configuration_file.rb
@@ -98,20 +90,17 @@ end
 `provide` will only set the key if it was previously unset, so first
 setting wins, and command-line comes first of all and overrides everything.
-You can also use `store` if you want to force-set, last set wins.
+You can also use `store` if you want to force-set: last set wins.
 See, docs page on [Settings](./doc/settings.md) for list
 of all standardized settings.
-## Indexing rules: Let's start with 'to_field' and 'extract_marc'
+## Indexing rules: 'to_field' and 'extract_marc'
-There are a few methods that can be used to create indexing rules, but the
-one you'll most common is called `to_field`, and establishes a rule
-to extract content to a particular named output field.
+There are a few methods that can be used to create indexing rules.  We will touch on the two most commonly used methods here.  More information is available in [Indexing Rules: Macros and Custom Logic](./doc/indexing_rules.md)
-A `to_field` extraction rule can use built-in 'macros', or, as we'll see later,
-entirely custom logic.
+`to_field` establishes a rule to extract content to a particular named output field.  A `to_field` extraction rule can use built-in 'macros', or, as we'll see later, entirely custom logic.
 The built-in macro you'll use the most is `extract_marc`, to extract
 data out of a MARC record according to a tag/subfield specification.
@@ -140,24 +129,18 @@ data out of a MARC record according to a tag/subfield specification.
     to_field "language_code", extract_marc("008[35-37]")
 ~~~
-`extract_marc` by default includes all 'alternate script' linked fields correspoinding
-to matched specifications, but you can turn that off, or extract *only* corresponding
-880s.
+`extract_marc` by default includes all 'alternate script' linked fields correspoinding to matched specifications, but you can turn that off, or extract *only* corresponding 880s.
 ~~~ruby
     to_field "title", extract_marc("245abc", :alternate_script => false)
     to_field "title_vernacular", extract_marc("245abc", :alternate_script => :only)
 ~~~
-By default, specifications with multiple subfields (like "240abc") will produce one single string of output per field (for each '240'), with the concatenation of each matched subfield. Specifications with single subfields (like "020a") will split subfields and produce an output string for each matching subfield.
+By default, specifications with multiple subfields (e.g. "240abc") will produce one single string of output per field (for each '240' field in the record), with the concatenation of each matched subfield. Specifications with single subfields (like "020a") will split subfields and produce an output string for each matching subfield (i.e. two output strings for a single '020' with two subfield 'a').
-For the syntax and complete possibilities of the specification
-string argument to extract_marc, see docs at the [MarcExtractor class](./lib/traject/marc_extractor.rb) ([rdoc](http://rdoc.info/gems/traject/Traject/MarcExtractor)).
+For the syntax and complete possibilities of the specification string argument to extract_marc, see docs at the [MarcExtractor class](./lib/traject/marc_extractor.rb) ([rdoc](http://rdoc.info/gems/traject/Traject/MarcExtractor)).
-`extract_marc` also supports `translation maps` similar
-to SolrMarc's. There are some translation maps provided by traject,
-and you can also define your own, in yaml or ruby. Translation maps are especially useful
-for mapping form MARC codes to user-displayable strings:
+`extract_marc` also supports `translation maps` similar to SolrMarc's. There are some translation maps provided by traject, and you can also define your own, in yaml or ruby. Translation maps are especially useful for mapping form MARC codes to user-displayable strings:
 ~~~ruby
     # "translation_map" will be passed to Traject::TranslationMap.new
@@ -165,18 +148,19 @@ for mapping form MARC codes to user-displayable strings:
     to_field "language", extract_marc("008[35-37]:041a:041d", :translation_map => "marc_language_code")
 ~~~
-To see all options for `extract_marc`, see the [method documentation](http://rdoc.info/gems/traject/Traject/Macros/Marc21:extract_marc)
+To see all options for `extract_marc`, see the [extract_marc](http://rdoc.info/gems/traject/Traject/Macros/Marc21:extract_marc) method documentation.
-## other built-in utility macros
+## Other built-in utility macros
-Other built-in methods that can be used with `to_field` include a hard-coded
-literal string:
+Other built-in methods that can be used with `to_field` include:
+a hard-coded literal string:
 ~~~ruby
     to_field "source", literal("LIB_CATALOG")
 ~~~
-The current record serialized back out as MARC, in binary, XML, or json:
+the current record serialized back out as MARC, in binary, XML, or json:
 ~~~ruby
     # or :format => "json" for marc-in-json
@@ -186,7 +170,7 @@ The current record serialized back out as MARC, in binary, XML, or json:
     to_field "marc_record_raw", serialized_marc(:format => "binary", :binary_escape => false, :allow_oversized => true)
 ~~~
-Text of all fields in a range:
+text of all fields in a range:
 ~~~ruby
     to_field "text", extract_all_marc_values(:from => "100", :to => "899")
@@ -194,11 +178,9 @@ Text of all fields in a range:
 All of these methods are defined at [Traject::Macros::Marc21](./lib/traject/macros/marc21.rb) ([rdoc](http://rdoc.info/gems/traject/Traject/Macros/Marc21))
-## more complex canned MARC semantic logic
+## More complex canned MARC semantic logic
-Some more complex (and opinionated/subjective) algorithms for deriving semantics
-from Marc are also packaged with Traject, but not available by default. To make
-them available to your indexing, you just need to use ruby `require` and `extend`.
+Some more complex (and opinionated/subjective) algorithms for deriving semantics from Marc are also packaged with Traject, but not available by default. To make them available to your indexing, you just need to use ruby `require` and `extend`.
 A number of methods are in [Traject::Macros::Marc21Semantics](./lib/traject/macros/marc21_semantics.rb) ([rdoc](http://rdoc.info/gems/traject/Traject/Macros/Marc21Semantics))
@@ -283,10 +265,10 @@ additional documentation page on [Indexing Rules: Macros and Custom Logic](./doc
 In addition to `to_field`, an `each_record` method is available, which,
 like `to_field`, is executed for every record, but without being tied
-to a specific field.
+to a specific output field.
-`each_record` can be used for logging or notifiying; computing intermediate
-results; or writing to more than one field at once.
+`each_record` can be used for logging or notifiying, computing intermediate
+results, or writing to more than one field at once.
 ~~~ruby
   each_record do |record|
@@ -294,12 +276,9 @@ results; or writing to more than one field at once.
   end
 ~~~
-For more on `each_record`, see documentation page on [Indexing Rules: Macros and Custom Logic](./doc/indexing_rules.md).
+For more on `each_record`, see [Indexing Rules: Macros and Custom Logic](./doc/indexing_rules.md).
-There is also an `after_processing` method that can be used to register
-logic that will be called after the entire has been processed. You can use it for whatever custom
-ruby code you might want for your app (send an email? Clean up a log file? Trigger
-a Solr replication?)
+There is also an `after_processing` method that can be used to register logic that will be called after the entire has been processed. You can use it for whatever custom ruby code you might want for your app (send an email? Clean up a log file? Trigger a Solr replication?)
 ~~~ruby
 after_processing do
@@ -310,8 +289,7 @@ end
 ## Readers and Writers
-Traject uses modular 'Writer' classes to take the output hashes from transformation, and
-send them somewhere or do something useful with them.
+Traject uses modular 'Writer' classes to take the output hashes from transformation and send them somewhere or do something useful with them.
 By default traject uses the [Traject::SolrJsonWriter](lib/traject/solr_json_writer.rb) ([rdoc](http://rdoc.info/gems/traject/Traject/SolrJsonWriter)) to send to Solr for indexing.
 Several other writers are also built-in:
@@ -419,6 +397,7 @@ Own Code](./doc/extending.md)
   * [traject-solrj_writer](https://github.com/traject/traject-solrj_writer): a jruby-only writer that uses the solrj .jar to talk directly to solr. Your only option for speaking to a solr version < 3.2, which is when the json handler was added to solr.
   * [traject_marc4j_reader](https://github.com/traject/traject-marc4j_reader): Packaged with traject automatically on jruby. A JRuby-only reader for
   reading marc records using the Marc4J library, fastest MARC reading on JRuby.
+  * [traject_sequel_writer](https://github.com/traject/traject_sequel_writer) A writer for sending to an rdbms via [Sequel](https://github.com/jeremyevans/sequel)
 # Development

data/doc/extending.md CHANGED Viewed

@@ -13,13 +13,12 @@ of a couple traject features meant to make it easier.
 ## Expert Summary
-* Traject `-I` argument command line can be used to list directories to
+* Load Path options:
+  * Traject `-I` argument command line can be used to list directories to
   add to the load path, similar to the `ruby -I` argument. You
   can then 'require' local project files from the load path.
-  * Or modify the ruby `$LOAD_PATH` manually at the top of a traject config file you are loading.
-  * translation map files found in a
-    "./translation_maps" subdir on the load path will be found
-    for Traject translation maps.
+  * Modify the ruby `$LOAD_PATH` manually at the top of a traject config file you are loading.
+  * NOTE: translation map files in a "./translation_maps" subdir on the load path will be available for to traject.
 * You can use Bundler with traject simply by creating a Gemfile with `bundler init`,
   and then running command line with `bundle exec traject` or
   even `BUNDLE_GEMFILE=path/to/Gemfile bundle exec traject`
@@ -114,11 +113,11 @@ a skeleton of your gem
 This will also make available rake commands to install your gem locally
 (`rake install`), or release it to the rubygems server (`rake release`).
-There are two main methods to use a gem in your traject project,
-with straight rubygems, or with bundler.
+There are two main methods to use a gem in your traject project: with straight rubygems, or with bundler.
-Without bundler is simpler. Simply `gem install some_gem` from the
-command line, and now you can `require` that gem in your traject
+### without bundler (straight rubygems):
+Without bundler may be simpler, at least at first. Simply `gem install some_gem` from the command line, and now you can `require` that gem in your traject
 config file, and use what it provides:
 ~~~ruby
@@ -129,25 +128,20 @@ require 'some_gem'
 SomeGem.whatever!
 ~~~
-A gem can provide traject translation map definitions
-in a `lib/translation_maps` sub-directory, and traject will be able to find those
-translation maps when the gem is loaded. (Because gems'
-`./lib` directories are by default added to the ruby load path.)
+A gem can provide traject translation map definitions in a `lib/translation_maps` sub-directory, and traject will be able to find those translation maps when the gem is loaded (because gems' `./lib` directories are by default added to the ruby load path).
-### Or, with bundler:
+### with bundler:
-However, if you then move your traject project to another system,
+If you move your traject project to another system,
 where you haven't yet installed the `some_gem`, then running
-traject with this config file will, of course, fail. Or if you
+traject with the above config file will, of course, fail. Or if you
 move your traject project to another system with a slightly
 different version of `some_gem`, your traject indexing could
 behave differently in confusing ways. As the number of gems
-you are using increases, managing this gets increasingly
+you are using increases, managing the gems and gem versions gets increasingly
 confusing.
-[bundler](http://bundler.io/) was invented to make this kind of dependency management
-more straightforward and reliable. We recommend you consider using
-bundler, especially for traject installations where traject will
+[bundler](http://bundler.io/) was invented to make this kind of dependency management in ruby more straightforward and reliable. We recommend you consider using bundler, especially for traject installations where traject will
 be run via automated batch jobs on production servers.
 Bundler's behavior is based on a `Gemfile` that lists your
@@ -156,15 +150,14 @@ by running `bundler init`, probably in the directory
 right next to your traject config files.
 Then specify what gems your traject project will use,
-possibly with version restrictions, in the [Gemfile](http://bundler.io/v1.3/gemfile.html) --
-**do** include `gem 'traject'` in the Gemfile.
+possibly with version restrictions, in the [Gemfile](http://bundler.io/v1.3/gemfile.html)
+Be sure to include `gem 'traject'` in the Gemfile.
 Run `bundle install` from the directory with the Gemfile, on any system
-at any time, to make sure specified gems are installed.
+at any time, to make sure specified gems are installed.  (The bundler gem must be already installed on the system.)
-**Run traject** with `bundle exec` to have bundler set up the environment
-from your Gemfile. You can `cd` into the directory containing the Gemfile,
-so bundler can find it:
+**Run traject** with `bundle exec` to have bundler set up the traject environment from your Gemfile. You can `cd` into the directory containing the Gemfile, so bundler can find it:
     $ cd /some/where
     $ bundle exec traject -c some_traject_config.rb ...
@@ -178,7 +171,7 @@ Bundler will make sure the specified versions of all gems are used by
 traject, and also make sure no gems except those specified in the gemfile
 are available to the program, for a reliable reproducible environment.
-You should still `require` the gem in your traject config file,
+You still need to `require` the gem in your traject config file;
 then just refer to what it provides in your config code as usual.
 You should check both the `Gemfile` and the `Gemfile.lock`

data/doc/indexing_rules.md CHANGED Viewed

@@ -16,21 +16,17 @@ That `do` is just ruby `block` syntax, whereby we can pass a block of ruby code
 The block is then stored by the Traject::Indexer, and called for each record indexed, with three arguments provided.
-#### record argument
+### record argument
 The record that gets passed to your block is a MARC::Record object (or, theoretically, any object that gets returned by a traject Reader). Your logic will usually examine the record to calculate the desired output.
 ### accumulator argument
-The accumulator argument is an array. At the end of your custom code, the accumulator
-array should hold the output you want to send off, to the field specified in the `to_field`.
+The accumulator argument is an Array. At the end of your custom code, the accumulator Array should hold the output you want send off to the field specified in `to_field`.
-The accumulator is a reference to a ruby array, and you need to **modify** that array,
-manipulating it in place with Array methods that mutate the array, like `concat`, `<<`,
-`map!` or even `replace`.
+The accumulator is a reference to a ruby Array, and you need to **modify** that Array, manipulating it in place with Array methods that mutate the array, like `concat`, `<<`, `map!` or even `replace`.
-You can't simply assign the accumulator variable to a different array, that won't work,
-you need to modify the array in-place.
+You can't simply assign the accumulator variable to a different Array; you need to modify the Array *in place*.
     # Won't work, assigning variable
     to_field('foo') do |rec, acc|
@@ -50,21 +46,16 @@ you need to modify the array in-place.
     to_field('foo') do |rec, acc|
       acc << 'bill'
       acc << 'dueber'
-      acc = acc.map!{|str| str.upcase} #notice using "map!" not just "map"
+      acc.map!{|str| str.upcase} # NOTE: "map!" not "map"
     end
 ### context argument
-The third optional context argument
-The third optional argument is a
-[Traject::Indexer::Context](./lib/traject/indexer/context.rb)  ([rdoc](http://rdoc.info/github/traject/traject/Traject/Indexer/Context))
-object. Most of the time you don't need it, but you can use it for
-some sophisticated functionality, for example using these Context methods:
+The third optional argument is a [Traject::Indexer::Context](./lib/traject/indexer/context.rb)  ([rdoc](http://rdoc.info/github/traject/traject/Traject/Indexer/Context)) object. Most of the time you don't need it, but you can use it for some sophisticated functionality.  These are some useful methods available:
 * `context.clipboard` A hash into which you can stuff values that you want to pass from one indexing step to another. For example, if you go through a bunch of work to query a database and get a result you'll need more than once, stick the results somewhere in the clipboard. This clipboard is record-specific, and won't persist between records.
-* `context.position` The position of the record in the input file (e.g., was it the first record, seoncd, etc.). Useful for error reporting
-* `context.output_hash` A hash mapping the field names (generally defined in `to_field` calls) to an array of values to be sent to the writer associated with that field. This allows you to modify what goes to the writer without going through a `to_field` call -- you can just set `context.output_hash['myfield'] = ['my', 'values']` and you're set. See below for more examples
+* `context.position` The position of the record in the input file (e.g., was it the first record, second, etc.). Useful for error reporting.
+* `context.output_hash` A hash mapping the field names (generally defined in `to_field` calls) to an array of values to be sent to the writer associated with that field. This allows you to modify what goes to the writer without going through a `to_field` call -- you can just set `context.output_hash['myfield'] = ['my', 'values']` and you're set. See below for more examples.
 * `context.skip!(msg)` An assertion that this record should be ignored. No more indexing steps will be called, no results will be sent to the writer, and a `debug`-level log message will be written stating that the record was skipped.
@@ -102,28 +93,26 @@ end
 ```
 Certain built-in traject calls have been optimized to be high performance
-so it's safe to do them inside 'inner loop' blocks though.
-That includes `Traject::TranslationMap.new` and `Traject::MarcExtractor.cached("xxx")`
-(note #cached rather than #new there)
+so it's safe to do them inside 'inner loop' blocks. That includes `Traject::TranslationMap.new` and `Traject::MarcExtractor.cached("xxx")`
+(NOTE: #cached rather than #new there)
 ## From block to lambda
 In the ruby language, in addition to creating a code block as an argument
-to a method with `do |args| ... end` or `{|arg| ...  }, we can also create
+to a method with `do |args| ... end` or `{|arg| ...  }`, we can also create
 a code block to hold in a variable, with the `lambda` keyword:
     always_output_foo = lambda do |record, accumulator|
       accumulator << "FOO"
     end
-traject `to_field` is written so, as a convenience, it can take a lambda expression
-stored in a variable as an alternative to a block:
+In traject, `to_field` is written so that, as a convenience, it can take a lambda expression stored in a variable as an alternative to a block:
     to_field("always_has_foo"), always_output_foo
 Why is this a convenience? Well, ordinarily it's not something we
-need, but in fact it's what allows traject 'macros' as re-useable
+need, but in fact it's what allows traject 'macros' to be re-useable
 code templates.
@@ -131,10 +120,9 @@ code templates.
 A Traject macro is a way to automatically create indexing rules via re-usable "templates".
-Traject macros are simply methods that return ruby lambda/proc objects, possibly creating
-them based on parameters passed in.
+Traject macros are methods that return ruby lambda/proc objects, possibly creating them based on parameters passed in.
-Here is in fact how the `literal` function is implemented:
+For example, here is the implementation of the  `literal` method/macro:
 ~~~ruby
 def literal(value)
@@ -144,12 +132,12 @@ def literal(value)
      accumulator << value
   end
 end
-to_field("something"), literal("something")
+to_field("fieldname"), literal("my_fav_literal")
 ~~~
-It's really as simple as that, that's all a Traject macro is. A function that takes parameters, and based on those parameters returns a lambda; the lambda is then passed to the `to_field` indexing method, or similar methods.
+So a Traject macro is a method that may have parameters and, based on those parameters, returns a lambda; the lambda is then passed to the `to_field` indexing method, or similar methods.
-How do you make these methods available to the indexer?
+How do you make these methods available to the traject indexer?
 Define it in a module:
@@ -173,15 +161,15 @@ in one of your config files:
 require `literal_macro.rb`
 extend LiteralMacro
-to_field ...
+to_field("fieldname"), literal("my_fav_literal")
 ~~~
 That's it.  You can use the traject command line `-I` option to set the ruby load path, so your file will be findable via `require`.  Or you can distribute it in a gem, and use straight rubygems and the `gem` command in your configuration file, or Bundler with traject command-line `-g` option.
-## Using a lambda _and_ and block
+## Using a lambda _and_ a block
 Traject macros (such as `extract_marc`) create and return a lambda. If
-you include a lambda _and_ a block on a `to_field` call, the latter
+you include a lambda _and_ a block on a `to_field` call, the block
 gets the accumulator as it was filled in by the former.
 ```ruby
@@ -196,38 +184,42 @@ to_field('foo'), mylam do |rec, acc, context|
   acc << 'two'
 end #=> context.output_hash['foo'] == ['one', 'two']
 # You might also want to do something like this
-to_field('foo'), my_macro_that_doesn't_dedup_ do |rec, acc|
+to_field('foo'), macro_returning_dup_values do |rec, acc|
   acc.uniq!
 end
 ```
 ## Maniuplating `context.output_hash` directly
-If you ask for the context argument, a [Traject::Indexer::Context](./lib/traject/indexer/context.rb) ([rdoc](http://rdoc.info/gems/traject/Traject/Indexer/Context)), you have access to context.output_hash, with is
-the hash of transformed output that will be sent to Solr (or any other Writer)
+If you ask for the context argument, a [Traject::Indexer::Context](./lib/traject/indexer/context.rb) ([rdoc](http://rdoc.info/gems/traject/Traject/Indexer/Context)), you have access to `context.output_hash`, which is
+the hash of already transformed output that will be sent to Solr (or any other Writer).
+You can examine `context.output_hash` to see any already transformed output and use it as the source for new output.
+You can *write* to `context.output_hash` directly, which can be useful for computations that affect more than one output field at once.
-You can look in there to see any already transformed output and use it as the source
-for new output. You can actually *write* to there manually, which can be useful
-to write routines that effect more than one output field at once.
+**Note**: Make sure you always assign an _Array_ to each `context.output_hash` value, e.g., `context.output_hash['foo']`, not a single value!
-**Note**: Make sure you always assign an _array_ to, e.g., `context.output_hash['foo']`, not a single value!
+```ruby
+# Wrong - do NOT assign a value of anything other than an Array
+context.output_hash['fieldname'] = 'fuzzy_wuzzies'
+# Correct
+context.output_hash['fieldname'] = ['fuzzy_wuzzies']
+```
 ## each_record
-All the previous discussion was in terms of `to_field` -- `each_record` is a similar
-routine, to define logic that is executed for each record, but isn't fixed to write
-to a single output field.
+`each_record` is similar to `to_field` in that it defines logic executed for each record.  It differs from `to_field` because the output of `each_record` is not associated with a specific output field.
-So `each_record` blocks have no `accumulator` argument, instead they either take a single
-`record` argument; or both a `record` and a `context`.
+Thus, `each_record` blocks have no `accumulator` argument: instead they either take a single `record` argument; or both a `record` and a `context`.
-`each_record` can be used for logging or notifiying; computing intermediate
-results; or writing to more than one field at once.
+`each_record` is useful for logging or notifiying, computing intermediate
+results, or writing to more than one field at once.
 ~~~ruby
 each_record do |record, context|
@@ -239,20 +231,17 @@ each_record do |record, context|
 end
 each_record do |record, context|
-  (one, two) = calculate_two_things_from(record)
+  (val1, val2) = calculate_two_things_from(record)
   context.output_hash["first_field"] ||= []
-  context.output_hash["first_field"] << one
+  context.output_hash["first_field"] << val1
   context.output_hash["second_field"] ||= []
-  context.output_hash["second_field"] << one
+  context.output_hash["second_field"] << val2
 end
 ~~~
-traject doesn't come with any macros written for use with
-`each_record`, but they could be created if useful --
-just methods that return lambda's taking the right
-args for `each_record`.
+traject doesn't come with any macros written for use with `each_record`, but they could be created:  such macros would be methods that return a lambda given the appropriate args from `each_record`.
 ## More tips and gotchas about indexing steps
@@ -262,4 +251,4 @@ args for `each_record`.
 * **Once you call `context.skip!(msg)` no more index steps will be run for that record**. So if you have any cleanup code, you'll need to make sure to call it yourself.
-* **By default, `trajcet` indexing runs multi-threaded**. In the current implementation, the indexing steps for one record are *not* split across threads, but different records can be processed simultaneously by more than one thread. That means you need to make sure your code is thread-safe (or always set `processing_thread_pool` to 0).
+* **By default, `traject` indexing runs multi-threaded**. In the current implementation, the indexing steps for one record are *not* split across threads, but different records can be processed simultaneously by more than one thread. That means you need to make sure your code is thread-safe (or always set `processing_thread_pool` to 0).