traject 0.17.0 → 1.0.0.beta.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: ab462aadfb1252846b617cf1adb288eeb519b353
4
- data.tar.gz: 7eac38dd8ac32e1dbfd417686ff04f95c108f011
3
+ metadata.gz: 19fae7c4d428bc40fd5c568f13daa7583fcebb2a
4
+ data.tar.gz: 9c71c57d86f6b9585451e958596a757cc75c432d
5
5
  SHA512:
6
- metadata.gz: 331350a2a93083b10710943e71bdf31b30bb3c6aeed9dde97f05fd232eaa34681a7ac0bcdf0d7aae9e37fd6ea7b9d3e4da1c840f036ed9abc6d57a01aea02e12
7
- data.tar.gz: 381e2c56dc2b92e0b91330bf20275e47462b86c1301ef723f31297cae1702b1f2fb77e6b8a016cf9213879f4c593ce001b68e854649613f12ba9a96238dc9da2
6
+ metadata.gz: 6730cd527d1ea285a5d60ebddeb95950dae2fdb5c548b905627eddf7a4a985e1891a039f409577903f181c60613861a3798fede6d5ecc8a942140cbb7a2d63d4
7
+ data.tar.gz: ee66b43f7bed57fcb21d82e7cd82c6e49c973bf7780b7a9f9c95fb5afc28744ffdb03e63d9d9b28bf567123571fa760ea9fed30fa62d20cf0ebfd0ef95d5880a
data/README.md CHANGED
@@ -6,10 +6,10 @@ Might be used to index MARC data for a Solr-based discovery product like [Blackl
6
6
  Traject might also be generalized to a set of tools for getting structured data from a source, and sending it to a destination.
7
7
 
8
8
 
9
- **Traject is nearing 1.0, it is robust, feature-rich and ready for trial use**
9
+ **Traject is nearing 1.0, it is robust, feature-rich and being used in production by authors -- feedback invited**
10
10
 
11
11
  [![Gem Version](https://badge.fury.io/rb/traject.png)](http://badge.fury.io/rb/traject)
12
- [![Build Status](https://travis-ci.org/jrochkind/traject.png)](https://travis-ci.org/jrochkind/traject)
12
+ [![Build Status](https://travis-ci.org/traject-project/traject.png)](https://travis-ci.org/traject-project/traject)
13
13
 
14
14
 
15
15
  ## Background/Goals
@@ -147,8 +147,10 @@ data out of a MARC record according to a tag/subfield specification.
147
147
  to matched specifications, but you can turn that off, or extract *only* corresponding
148
148
  880s.
149
149
 
150
+ ~~~ruby
150
151
  to_field "title", extract_marc("245abc", :alternate_script => false)
151
152
  to_field "title_vernacular", extract_marc("245abc", :alternate_script => :only)
153
+ ~~~
152
154
 
153
155
  By default, specifications with multiple subfields (like "240abc") will produce one single string of output for each matching field. Specifications with single subfields (like "020a") will split subfields and produce an output string for each matching subfield.
154
156
 
@@ -173,20 +175,25 @@ To see all options for `extract_marc`, see the [method documentation](http://rdo
173
175
  Other built-in methods that can be used with `to_field` include a hard-coded
174
176
  literal string:
175
177
 
178
+ ~~~ruby
176
179
  to_field "source", literal("LIB_CATALOG")
180
+ ~~~
177
181
 
178
182
  The current record serialized back out as MARC, in binary, XML, or json:
179
183
 
184
+ ~~~ruby
180
185
  # or :format => "json" for marc-in-json
181
186
  # or :format => "binary", by default Base64-encoded for Solr
182
187
  # 'binary' field, or, for more like what SolrMarc did, without
183
188
  # escaping:
184
189
  to_field "marc_record_raw", serialized_marc(:format => "binary", :binary_escape => false, :allow_oversized => true)
190
+ ~~~
185
191
 
186
192
  Text of all fields in a range:
187
193
 
194
+ ~~~ruby
188
195
  to_field "text", extract_all_marc_values(:from => 100, :to => 899)
189
-
196
+ ~~~
190
197
 
191
198
  All of these methods are defined at [Traject::Macros::Marc21](./lib/traject/macros/marc21.rb) ([rdoc](http://rdoc.info/gems/traject/Traject/Macros/Marc21))
192
199
 
@@ -198,6 +205,7 @@ them available to your indexing, you just need to use ruby `require` and `extend
198
205
 
199
206
  A number of methods are in [Traject::Macros::Marc21Semantics](./lib/traject/macros/marc21_semantics.rb) ([rdoc](http://rdoc.info/gems/traject/Traject/Macros/Marc21Semantics))
200
207
 
208
+ ~~~ruby
201
209
  require 'traject/macros/marc21_semantics'
202
210
  extend Traject::Macros::Marc21Semantics
203
211
 
@@ -205,15 +213,17 @@ A number of methods are in [Traject::Macros::Marc21Semantics](./lib/traject/macr
205
213
  to_field 'broad_subject', marc_lcc_to_broad_category
206
214
  to_field "geographic_facet", marc_geo_facet
207
215
  # And several more
216
+ ~~~
208
217
 
209
218
  And, there's a routine for classifying MARC to an internal
210
219
  format/genre/type vocabulary:
211
220
 
221
+ ~~~ruby
212
222
  require 'traject/macros/marc_format_classifier'
213
223
  extend Traject::Macros::MarcFormats
214
224
 
215
225
  to_field 'format_facet', marc_formats
216
-
226
+ ~~~
217
227
 
218
228
  ## Custom logic
219
229
 
@@ -221,6 +231,7 @@ The built-in routines are there for your convenience, but if you need
221
231
  something local or custom, you can write ruby logic directly
222
232
  in a configuration file, using a ruby block, which looks like this:
223
233
 
234
+ ~~~ruby
224
235
  to_field "id" do |record, accumulator|
225
236
  # take the record's 001, prefix it with "bib_",
226
237
  # and then add it to the 'accumulator' argument,
@@ -229,6 +240,7 @@ in a configuration file, using a ruby block, which looks like this:
229
240
  value = "bib_#{value}"
230
241
  accumulator << value
231
242
  end
243
+ ~~~
232
244
 
233
245
  `do |record, accumulator|` is the definition of a ruby block taking
234
246
  two arguments. The first one passed in will be a MARC record. The
@@ -239,21 +251,25 @@ Here's a more realistic example that shows how you'd get the
239
251
  record type byte 06 out of a MARC leader, then translate it
240
252
  to a human-readable string with a TranslationMap
241
253
 
254
+ ~~~ruby
242
255
  to_field "marc_type" do |record, accumulator|
243
256
  leader06 = record.leader.byteslice(6)
244
257
  # this translation map doesn't actually exist, but could
245
258
  accumulator << TranslationMap.new("marc_leader")[ leader06 ]
246
259
  end
260
+ ~~~
247
261
 
248
262
  You can also add a block onto the end of a built-in 'macro', to
249
263
  further customize the output. The `accumulator` passed to your block
250
264
  will already have values in it from the first step, and you can
251
265
  use ruby methods like `map!` to modify it:
252
266
 
267
+ ~~~ruby
253
268
  to_field "big_title", extract_marc("245abcdefg") do |record, accumulator|
254
269
  # put it all in all uppercase, I don't know why.
255
270
  accumulator.map! {|v| v.upcase}
256
271
  end
272
+ ~~~
257
273
 
258
274
  There are many more things you can do with custom logic blocks like this too,
259
275
  including additional features we haven't discussed yet.
@@ -357,7 +373,6 @@ Also see `-I load_path` option and suggestions for Bundler use under Extending W
357
373
  See also [Hints for batch and cronjob use](./doc/batch_execution.md) of traject.
358
374
 
359
375
 
360
-
361
376
  ## Extending With Your Own Code
362
377
 
363
378
  Traject config files are full live ruby files, where you can do anything,
@@ -393,7 +408,11 @@ Own Code](./doc/extending.md)
393
408
 
394
409
  * [Other traject commands](./doc/other_commands.md) including `marcout`, and `commit`
395
410
  * [Hints for batch and cronjob use](./doc/batch_execution.md) of traject.
396
-
411
+ * Plugin extensions: Gems that add functionality to traject
412
+ * [traject_alephsequential_reader](https://github.com/traject-project/traject_alephsequential_reader/): read MARC files serialized in the AlephSequential format, as output by Ex Libris's Alpeh ILS.
413
+ * [traject_horizon](https://github.com/jrochkind/traject_horizon): Export MARC records directly from a Horizon ILS rdbms, as serialized MARC or to index into Solr.
414
+ * [traject_umich_format](https://github.com/billdueber/traject_umich_format/): opinionated code and associated macros to extract format (book, audio file, etc.) and types (bibliography, conference report, etc.) from a MARC record. Code mirrors that used by the University of Michigan, and is an alternate approach to that taken by the `marc_formats` macro in `Traject::Macros::MarcFormatClassifier`.
415
+
397
416
 
398
417
  # Development
399
418
 
@@ -415,6 +434,8 @@ and/or extra files in ./docs -- as appropriate for what needs to be docs.
415
434
  **Inline api docs** Note that our [`.yardopts` file](./.yardopts) used by rdoc.info to generate
416
435
  online api docs has a `--markup markdown` specified -- inline class/method docs are in markdown, not rdoc.
417
436
 
437
+ Bundler rake tasks included for gem releases: `rake release`
438
+
418
439
  ## TODO
419
440
 
420
441
 
@@ -58,7 +58,7 @@ you need to modify the array in-place.
58
58
  The third optional context argument
59
59
 
60
60
  The third optional argument is a
61
- [Traject::Indexer::Context](./lib/traject/indexer/context.rb) ([rdoc](http://rdoc.info/github/jrochkind/traject/Traject/Indexer/Context))
61
+ [Traject::Indexer::Context](./lib/traject/indexer/context.rb) ([rdoc](http://rdoc.info/github/traject-project/traject/Traject/Indexer/Context))
62
62
  object. Most of the time you don't need it, but you can use it for
63
63
  some sophisticated functionality, for example using these Context methods:
64
64
 
@@ -36,8 +36,7 @@ module Traject
36
36
  # and includes a tag and a a byte slice specification.
37
37
  #
38
38
  # "008[35-37]:007[5]""
39
- # => bytes 35-37 inclusive of any field 008, and byte 5 of any field 007 (TODO: Should we support
40
- # "LDR" as a pseudo-tag to take byte slices of leader?)
39
+ # => bytes 35-37 inclusive of any field 008, and byte 5 of any field 007
41
40
  #
42
41
  # * subfields and indicators can only be provided for marc data/variable fields
43
42
  # * byte slice can only be provided for marc control fields (generally tags less than 010)
data/lib/traject/util.rb CHANGED
@@ -56,8 +56,6 @@ module Traject
56
56
  Object.const_set("HttpSolrServer", org.apache.solr.client.solrj.impl.HttpSolrServer) unless defined? ::HttpSolrServer
57
57
  Object.const_set("SolrInputDocument", org.apache.solr.common.SolrInputDocument) unless defined? ::SolrInputDocument
58
58
  rescue NameError => e
59
- # /Users/jrochkind/code/solrj-gem/lib"
60
-
61
59
  included_jar_dir = File.expand_path("../../vendor/solrj/lib", File.dirname(__FILE__))
62
60
 
63
61
  jardir = settings["solrj.jar_dir"] || included_jar_dir
@@ -1,3 +1,3 @@
1
1
  module Traject
2
- VERSION = "0.17.0"
2
+ VERSION = "1.0.0.beta.1"
3
3
  end
@@ -157,6 +157,18 @@ describe "Traject::Macros::Marc21Semantics" do
157
157
  @record = MARC::Reader.new(support_file_path "date_type_r_missing_date2.marc").to_a.first
158
158
  assert_equal 1957, Marc21Semantics.publication_date(@record)
159
159
  end
160
+
161
+ it "works correctly with date type 'q'" do
162
+ val = @record['008'].value
163
+ val[6] = 'q'
164
+ val[7..10] = '191u'
165
+ val[11..14] = '192u'
166
+ @record['008'].value = val
167
+
168
+ # Date should be date1 + date2 / 2 = (1910 + 1929) / 2 = 1919
169
+ estimate_tolerance = 30
170
+ assert_equal 1919, Marc21Semantics.publication_date(@record, estimate_tolerance)
171
+ end
160
172
  end
161
173
 
162
174
  describe "marc_lcc_to_broad_category" do
data/traject.gemspec CHANGED
@@ -9,7 +9,7 @@ Gem::Specification.new do |spec|
9
9
  spec.authors = ["Jonathan Rochkind", "Bill Dueber"]
10
10
  spec.email = ["none@nowhere.org"]
11
11
  spec.summary = %q{Index MARC to Solr; or generally process source records to hash-like structures}
12
- spec.homepage = "http://github.com/jrochkind/traject"
12
+ spec.homepage = "http://github.com/traject-project/traject"
13
13
  spec.license = "MIT"
14
14
 
15
15
  spec.files = `git ls-files`.split($/)
@@ -21,7 +21,7 @@ Gem::Specification.new do |spec|
21
21
 
22
22
 
23
23
  spec.add_dependency "marc", ">= 0.7.1"
24
- spec.add_dependency "marc-marc4j", ">=0.1.1"
24
+ spec.add_dependency "marc-marc4j", ">=0.1.1" # use and convert marc4j
25
25
  spec.add_dependency "hashie", ">= 2.0.5", "< 2.1" # used for Indexer#settings
26
26
  spec.add_dependency "slop", ">= 3.4.5", "< 4.0" # command line parsing
27
27
  spec.add_dependency "yell" # logging
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: traject
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.17.0
4
+ version: 1.0.0.beta.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Jonathan Rochkind
@@ -9,7 +9,7 @@ authors:
9
9
  autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2013-10-10 00:00:00.000000000 Z
12
+ date: 2013-10-14 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: marc
@@ -265,7 +265,7 @@ files:
265
265
  - vendor/solrj/lib/solr-solrj-4.3.1.jar
266
266
  - vendor/solrj/lib/wstx-asl-3.2.7.jar
267
267
  - vendor/solrj/lib/zookeeper-3.4.5.jar
268
- homepage: http://github.com/jrochkind/traject
268
+ homepage: http://github.com/traject-project/traject
269
269
  licenses:
270
270
  - MIT
271
271
  metadata: {}
@@ -280,9 +280,9 @@ required_ruby_version: !ruby/object:Gem::Requirement
280
280
  version: '0'
281
281
  required_rubygems_version: !ruby/object:Gem::Requirement
282
282
  requirements:
283
- - - '>='
283
+ - - '>'
284
284
  - !ruby/object:Gem::Version
285
- version: '0'
285
+ version: 1.3.1
286
286
  requirements: []
287
287
  rubyforge_project:
288
288
  rubygems_version: 2.1.5