treat 0.2.0 → 0.2.5

Sign up to get free protection for your applications and to get access to all the features.
Files changed (45) hide show
  1. data/TODO +3 -0
  2. data/lib/economist/hungarys_troubles.txt +46 -0
  3. data/lib/economist/indias_slowdown.txt +15 -0
  4. data/lib/economist/merkozy_rides_again.txt +24 -0
  5. data/lib/economist/prada_is_not_walmart.txt +9 -0
  6. data/lib/ferret/_11.cfs +0 -0
  7. data/lib/ferret/_14.cfs +0 -0
  8. data/lib/ferret/_p.cfs +0 -0
  9. data/lib/ferret/_s.cfs +0 -0
  10. data/lib/ferret/_v.cfs +0 -0
  11. data/lib/ferret/_y.cfs +0 -0
  12. data/lib/ferret/segments +0 -0
  13. data/lib/ferret/segments_15 +0 -0
  14. data/lib/treat/buildable.rb +10 -4
  15. data/lib/treat/categories.rb +2 -1
  16. data/lib/treat/delegatable.rb +2 -1
  17. data/lib/treat/doable.rb +3 -2
  18. data/lib/treat/entities/collection.rb +2 -9
  19. data/lib/treat/entities/entity.rb +13 -14
  20. data/lib/treat/entities.rb +5 -4
  21. data/lib/treat/extractors/coreferences/stanford.rb +1 -0
  22. data/lib/treat/extractors/topic_words/lda.rb +2 -15
  23. data/lib/treat/formatters/readers/autoselect.rb +0 -1
  24. data/lib/treat/formatters/unserializers/yaml.rb +2 -0
  25. data/lib/treat/formatters.rb +0 -7
  26. data/lib/treat/group.rb +4 -2
  27. data/lib/treat/languages/english.rb +1 -1
  28. data/lib/treat/lexicalizers/tag/brill.rb +17 -15
  29. data/lib/treat/lexicalizers/tag/lingua.rb +11 -6
  30. data/lib/treat/lexicalizers/tag/stanford.rb +28 -36
  31. data/lib/treat/lexicalizers.rb +1 -2
  32. data/lib/treat/processors/parsers/enju.rb +7 -5
  33. data/lib/treat/processors/parsers/stanford.rb +3 -1
  34. data/lib/treat/retrievers/indexers/ferret.rb +28 -0
  35. data/lib/treat/retrievers/searchers/ferret.rb +53 -0
  36. data/lib/treat/retrievers.rb +17 -0
  37. data/lib/treat/visitable.rb +1 -1
  38. data/lib/treat.rb +17 -16
  39. metadata +50 -30
  40. data/lib/economist/hose_and_dry.doc +0 -0
  41. data/lib/economist/hungarys_troubles.abw +0 -70
  42. data/lib/economist/republican_nomination.pdf +0 -0
  43. data/lib/economist/saving_the_euro.odt +0 -0
  44. data/lib/economist/zero_sum.html +0 -91
  45. data/lib/treat/lexicalizers/tag/tagger.rb +0 -29
data/TODO CHANGED
@@ -1,6 +1,8 @@
1
1
  ## Todo
2
2
 
3
3
  - Chronic, ruby date
4
+ - YAML in folder
5
+ - Tag remove sentence
4
6
  - Stanford/Enju phrase in phrase
5
7
  - Date, time, roles structs
6
8
  - test wiki
@@ -19,6 +21,7 @@
19
21
  - Enju as a server
20
22
  - More information with categories
21
23
  - Gist languages convert
24
+ -
22
25
 
23
26
  ## Personal
24
27
 
@@ -0,0 +1,46 @@
1
+ Hungary's troubles
2
+ Not just a rap on the knuckles
3
+
4
+ THE pressure is piling up on the beleaguered Hungarian government. Today the European Commission threatened it with legal action over several new "cardinal" laws that would require a two-thirds majority in parliament to overturn.
5
+
6
+ The commission is still considering the laws, but today it highlighted concerns over three issues:
7
+
8
+ - The independence of the central bank. Late last year the Hungarian parliament passed a law which expands the monetary council and takes the power to nominate deputies away from the governor and hands it to the prime minister. A separate law opens the door to a merger between the bank and the financial regulator.
9
+
10
+ - The judiciary. More than 200 judges over the age of 62 have been forced into retirement and hundreds more face the sack. The new National Judicial Authority is headed by Tünde Handó, a friend of the family of Viktor Orban, the prime minister.
11
+
12
+ - The independence of the national data authority.
13
+
14
+ That wasn't all the commission had to say today. Hungary also received a ticking-off from Olli Rehn (pictured), the economic-affairs commissioner, for not doing enough to tackle its budget deficit. It may now lose access to EU funds.
15
+
16
+ Slammed in Brussels, the Hungarian government is also under pressure at home. Earlier this week Gordon Bajnai, who served as Socialist prime minister from 2009-10, fired off a broadside that sent shockwaves through the political and media establishments.
17
+
18
+ After a year and a half of government by the right-wing Fidesz party, wrote Mr Bajnai in a lengthy article on the website of the Patriotism and Progress Public Policy Foundation, democracy has been destroyed in Hungary. The country, he warned, is scarred by division and is drifting towards bankruptcy and away from Europe.
19
+
20
+ Mr Bajnai called for a radical change of government and a complete political re-orientation. “A new government must have a programme readily at hand that can be applied without delay: a programme that promotes the republic, reconciliation, and recovery.”
21
+
22
+ Fidesz is rattled by Mr Bajnai, who since leaving office has been teaching at Columbia University in New York. Understandably so. He headed a technocratic administration which stabilised the economy. Unlike his Socialist predecessor, Ferenc Gyurcsany, he was neither part of the old Communist elite nor connected to it by marriage, and so cannot be smeared as a "Komcsi". He is modern in outlook and well regarded internationally.
23
+
24
+ Moreover, say those how know him, Mr Bajnai has little patience for the narcissistic exceptionalism that shapes Fidesz’s worldview. Exhibit A: the plaintive cry of Janos Martonyi, the foreign minister, who lamented recently: “The world will never understand our pains and spiritual wounds.” Such self-pity is unlikely to endear the Hungarian government to Brussels or Washington DC (to where it has sent an envoy this week to negotiate with the IMF).
25
+
26
+ Fidesz won a two-thirds majority in 2010. But its support is evaporating, and analysts say there is a gap in the political market for a centrist pro-business party committed to democratic norms. Mr Bajnai, who has not ruled out a return to politics, would be an obvious candidate to lead it.
27
+
28
+ Meanwhile, as Hungarians watch the value of their assets vaporise, in large part thanks to the government’s increasingly erratic policies, Mr Orban smirks his way through press conferences. Here he is dodging questions from a reporter from HVG, an economics weekly, about his responsibility for the crisis and trying to shift the blame to his old enemy Andras Simor, president of the central bank. The interview ran as follows:
29
+
30
+ hvg.hu: Do you feel responsible for the falling/weakening forint?
31
+
32
+ Mr Orban: You mean the president of the central bank? He did not comment on it.
33
+
34
+ hvg.hu: No, you, Mr prime minister!
35
+
36
+ Mr Orban: The personal responsibility of the president of the central bank was not discussed over the meeting.
37
+
38
+ hvg.hu: You, your personal…!
39
+
40
+ Mr Orban: That neither.
41
+
42
+ Surrounded by yes-men and grinning flunkies, Mr Orban seems increasingly out of touch. His future will likely be decided not in the gilded corridors of the Hungarian parliament, but in Brussels and Washington DC.
43
+
44
+ What happens next? If his hand is forced Mr Orban can probably endure policy reversals on the independence of the central bank and the data ombudsman. Sorry, he would say to his loyal followers: national crisis, what can you do.
45
+
46
+ The dismantling of the judiciary would be another matter. If outsiders keep up the pressure and the judicial changes are judged to be in breach of the EU treaty, Mr Orban would be in a tricky spot. It’s hard to see how he could declare the 200-plus judges his government has forced into retirement ready for office after all, and still sit in his own.
@@ -0,0 +1,15 @@
1
+ India’s slowdown
2
+ The case for the defence
3
+ Why officials think investors are too bearish about India’s economy
4
+
5
+ THE SMOG is so bad in Delhi right now that it seeps indoors. In one government building the far end of the corridor seems hazy. But the view of the mandarin working there is clear: India’s economic miracle is not over, regardless of the chatter among investors and howls about government paralysis from industrialists. He pokes fun at the latter. A year ago they were swanning around Davos proclaiming India could grow in its sleep, he says. Now, with growth dipping to 6.9% last quarter, from a peak of 10% (see chart), they are pleading for government action.
6
+
7
+ Bears in Mumbai, India’s financial capital, worry that GDP growth might slip below 6% as confidence and investment slip. That partly reflects global woes, and partly too the gumming up of the bureaucracy due to a wave of graft allegations. But it is also because no big reforms have taken place for years; and such is the dire state of India’s politics that it is hard to imagine any being imminent. Things reached a nadir at the end of last year when the ruling coalition announced it would allow foreign supermarkets into the country, only to do a U-turn in the face of protests from the opposition and its own coalition partners. Shortly afterwards it failed to carry a key anti-corruption bill through parliament.
8
+
9
+ The government is not blind to these concerns—in his new-year address the prime minister, Manmohan Singh, conceded that “it would be wrong to conclude that India is now unshakeably set on a process of rapid growth.” But officials in Delhi are more optimistic than the financial markets, for three reasons. First, they argue that growth is bottoming out. Inflation is showing signs of falling, which should allow the central bank to reverse its long series of interest-rate hikes. The recent drop in the rupee is a healthy adjustment, not cause for panic, they say. Meanwhile the euro zone is vaguely getting its act together and there are hints of a recovery in America. Growth, it is thought, will be about 7% for the fiscal year ending in March, respectable enough, and will pick up from there.
10
+
11
+ Second, the long-term drivers of India’s boom are intact. There is “not much reason to change your mind,” says the mandarin. The rise in the savings rate, which allows more investment, will continue, partly thanks to a demographic bulge of people reaching working age. Even if there is a drop in capital expenditure, it should remain above 30% of GDP—a “handsome level”, says another official, that will boost the country’s potential. The government’s 12th five-year plan, which is due out soon, was originally expected to forecast growth of 9% between 2012 and 2017. That might fall to 8.5%, officials say, but no further.
12
+
13
+ The final strut of the argument is that the politics are not as bad as they seem. Pessimists, the mandarin says, reckon “we’re just going to fiddle around and miss our opportunities.” But after important elections in February in Uttar Pradesh, the most populous state, the politicians may stop posturing and even co-operate to pass less contentious reforms such as a new national value-added tax. That would cut red tape and the fiscal deficit. And even if parliament stays gridlocked, there are lots of nuts-and-bolts reforms that do not require legislation. The government will try harder to tackle the bottlenecks that choke the power industry, for instance, and the paperwork that is snarling up big projects.
14
+
15
+ The nub of the official argument is “calm down—and trust us to do just enough.” The trouble is the government has been saying this for a year, and business folk and investors seem to have lost heart. Firms have cut investment and the stockmarket was one of the world’s worst performers in dollar terms last year. Perhaps they are being too jumpy, but India does not have the luxury of dismissing what firms and investors think. The fiscal deficit, including the states and off-balance-sheet items, is running at 9-10% of GDP for the fourth year in a row. The current-account deficit is drifting towards 4% of GDP, officials admit, well above the country’s traditional comfort zone. India needs to command the confidence of domestic and foreign investors. Unless the reform process starts moving there is a risk that the financing of these deficits will become an acute problem—and that India’s economic miracle recedes further into the Delhi haze.
@@ -0,0 +1,24 @@
1
+ Saving the euro, part 473
2
+ Merkozy rides again
3
+
4
+ ANGELA MERKEL and Nicolas Sarkozy kicked off the 2012 season of the euro soap opera with a summit meeting in Berlin today. Neither said anything startling; certainly nothing that would betoken a swift and happy conclusion to the long-running saga.
5
+
6
+ The German chancellor and the French president muted their differences over such issues as how quickly to introduce a tax on financial transactions and what the role of the European Central Bank (ECB) should be in supporting shaky members of the euro zone. “Our analysis is the same,” said Mr Sarkozy at the post-summit press conference.
7
+
8
+ This did not calm markets’ nerves. The euro dropped to its lowest level against the dollar since September 2010 ($1.266) before the summit and recovered marginally as the two leaders met. Currency traders’ biggest worry is Greece’s failure to meet its fiscal targets, which means it may not get the fresh money it needs to avoid defaulting on its debt.
9
+
10
+ At the opposite end of the confidence spectrum, investors are so eager to finance Germany that they accepted a negative interest rate on an auction of six-month paper, in effect paying Germany’s government for the privilege of lending to it. Germans will see this as vindication of their prudent policies, but it also serves to underline the dangerous economic divergences within the euro zone.
11
+
12
+ The main significance of the Merkozy summit is that it seemed to signal a shift in emphasis. True, the austerity agenda—promoted by the Germans and grudgingly accepted by the French—is still there. Indeed, Mr Sarkozy boasted that France’s fiscal deficit was smaller than expected in 2011. Europe is making swift progress towards a “fiscal pact” to limit deficits, proclaimed Mrs Merkel, including German-style “debt brakes”. A new treaty should be signed by March.
13
+
14
+ But fiscal self-denial will now be supplemented by what Mrs Merkel called a “second leg”, meaning economic growth and job creation. This is partly meant to help Mr Sarkozy, who faces a tough re-election fight this spring.
15
+
16
+ All euro-zone countries, including Germany, are “prepared to do their homework” in this area, the chancellor promised, but it is not clear that much new is on offer. A big German stimulus package to boost growth in neighbouring countries is not in prospect (that would nobble the fiscal leg).
17
+
18
+ Mrs Merkel spoke of spreading best practice in labour-market regulation across the euro zone (which is German practice, Mr Sarkozy admits) and spending existing European funds more quickly and effectively. Both ideas make sense; neither will prevent further financial turmoil, or a European recession. In the latest sign of fragility, German industrial production dropped 1% in November.
19
+
20
+ The leaders tried to seem anything but complacent. Mr Sarkozy called the situation “very tense” and Mrs Merkel said they had “understood the needs of the hour.” The intention is to keep Greece from dropping out of the euro zone, but whatever happens Greece is an exceptional case, the leaders said (perhaps fearing that a Greek default or even an exit from the euro could not be avoided). As always, the chancellor dampened expectations of a quick “one-dimensional” solution to the crisis. The problem would be solved, she said, “step by step.”
21
+
22
+ The next steps involve Italy, an indebted giant that poses a far greater threat to the euro than Greece. Mrs Merkel will meet Italy’s unelected prime minister, Mario Monti, in Berlin on Wednesday; she and Mr Sarkozy will hold a three-way summit with him in Rome on January 20th. European heads of government are to gather, probably on January 30th, to put the finishing touches to the fiscal pact.
23
+
24
+ Also on the agenda, no doubt, will be a proposed financial-transactions tax. Britain is threatening a veto; Mr Sarkozy has said France will go it alone at first, if need be. Mrs Merkel wants the tax but her junior coalition partner, the Free Democrats, do not unless the British get on board. As the crisis sharpens, disagreements are likely to re-emerge over the role of the ECB and how to strengthen the euro zone's bail-out funds. The soap opera has a long way to run.
@@ -0,0 +1,9 @@
1
+ Prada is not Walmart
2
+
3
+ INDIA, if you believe the government, will be a land in which Starbucks and Prada thrive but where foreign firms will be prohibited from selling onions. It does not seem like much of a cause for celebration, but the announcement on January 11th that foreign “single brand” retailers could own 100% of their operations in India was meant to show the reform process was on track. It followed a debacle late last year when the government first announced that not only would single brand retailing be opened up, but foreign supermarkets would be allowed to operate in India too—and then was quickly forced into a U-turn on the latter promise after facing a rebellion within its own ranks and from the coalition parties it relies on in parliament.
4
+
5
+ By emphasising that at least the single brand bit of retail reform is still on track, the government hopes to show the world that India is still open for business. But this is a meek change indeed. Single brand retailers, such as fashion chains, were already allowed to own 51% of their operations. And the political stink of last month is likely to scare those who are not already present because swathes of the political class have been shown to be populist and hostile for foreign firms. Individual states may still choose to override the central government’s rules. Lastly, the reform comes with a large catch: 30% of what is sold must be supplied from cottage industries in India. If you are selling a uniform product worldwide—a sofa or handbag made in China—that is a major hassle.
6
+
7
+ The hope must be that India is on a journey to the right place, stumbling along the way. Perhaps the supplier rule will eventually be dropped, the argument goes. Maybe reluctant states will learn the error of their ways and open up too, after seeing the success of single brand retailers in other states. And maybe, after seeing an influx of investment from single brand retailers, the political climate will change and it will be easier to pass a reform that lets in supermarkets in too.
8
+
9
+ Interviewed in Delhi earlier in January a government mandarin insisted that the supermarket reform was not dead. Yet all of this seems half hearted. India is a hard enough place as it is for foreign firms to make profits. Adding in a fickle polity just makes things worse. And it is a rather sorry day for progress when a rule tweak to allow Starbucks or Prada to own not 51%, but 100%, of their shops is presented as a meaningful economic reform.
Binary file
Binary file
data/lib/ferret/_p.cfs ADDED
Binary file
data/lib/ferret/_s.cfs ADDED
Binary file
data/lib/ferret/_v.cfs ADDED
Binary file
data/lib/ferret/_y.cfs ADDED
Binary file
Binary file
Binary file
@@ -81,7 +81,7 @@ module Treat
81
81
  end
82
82
  Treat::Entities::Number.new(numeric.to_s)
83
83
  end
84
- def from_folder(folder)
84
+ def from_folder(folder, exclude = ['cfs'])
85
85
  unless FileTest.directory?(folder)
86
86
  raise Treat::Exception,
87
87
  "Path '#{folder}' does not point to a folder."
@@ -95,11 +95,15 @@ module Treat
95
95
  "Cannot create something else than a " +
96
96
  "collection from folder '#{folder}'."
97
97
  end
98
- c = Treat::Entities::Collection.new
98
+ c = Treat::Entities::Collection.new(folder)
99
99
  folder += '/' unless folder[-1] == '/'
100
100
  Dir[folder + '*'].each do |f|
101
- next if FileTest.directory?(f)
102
- c << Treat::Entities::Document.from_file(f)
101
+ if FileTest.directory?(f)
102
+ c2 = Treat::Entities::Collection.from_folder(f)
103
+ c << c2
104
+ else
105
+ c << Treat::Entities::Document.from_file(f)
106
+ end
103
107
  end
104
108
  c
105
109
  end
@@ -127,6 +131,8 @@ module Treat
127
131
  else
128
132
  from_raw_file(file)
129
133
  end
134
+ elsif ext == 'cfs'
135
+ return
130
136
  else
131
137
  from_raw_file(file)
132
138
  end
@@ -36,7 +36,8 @@ module Treat
36
36
  require 'treat/formatters'
37
37
  require 'treat/processors'
38
38
  require 'treat/lexicalizers'
39
- require 'treat/extractors'
40
39
  require 'treat/inflectors'
40
+ require 'treat/extractors'
41
+ require 'treat/retrievers'
41
42
  end
42
43
  end
@@ -68,6 +68,7 @@ module Treat
68
68
  end
69
69
  if group.type == :annotator
70
70
  f = postprocessor.nil? ? m : postprocessor
71
+
71
72
  entity.features[f] = result unless result == nil
72
73
  end
73
74
  result
@@ -101,7 +102,7 @@ module Treat
101
102
  self.find_worker_for_language(entity.language, group) :
102
103
  group.default
103
104
  if worker == :none
104
- raise NAT::Exception,
105
+ raise Treat::Exception,
105
106
  "There is intentionally no default worker for #{group}."
106
107
  end
107
108
  worker
data/lib/treat/doable.rb CHANGED
@@ -18,6 +18,7 @@ module Treat
18
18
  end
19
19
  end
20
20
  end
21
+ DEBUG = true
21
22
  def do_task(task, worker, options)
22
23
  group = Categories.lookup(task)
23
24
  unless group
@@ -25,8 +26,8 @@ module Treat
25
26
  end
26
27
  entity_types = group.targets
27
28
  f = nil
28
- entity_types.each do |t|
29
- f = true if Treat::Entities.match_types[type][t]
29
+ entity_types.each do |t|
30
+ f = true if Treat::Entities.match_types[t][type]
30
31
  end
31
32
  if f || entity_types.include?(:entity)
32
33
  send(task, worker, options)
@@ -4,18 +4,11 @@ module Treat
4
4
  class Collection < Entity
5
5
  # Initialize the collection with a folder
6
6
  # containing the texts of the collection.
7
- def initialize(folder = nil, id = nil)
7
+ def initialize(folder = nil)
8
8
  super('', id)
9
9
  @type = :collection
10
- if folder
11
- set :folder, folder
12
- Dir.glob("#{folder}/*").each do |f|
13
- next if FileTest.directory?(f)
14
- self << Document.new(f)
15
- end
16
- end
10
+ set :folder, folder
17
11
  end
18
- def type; :collection; end
19
12
  end
20
13
  end
21
14
  end
@@ -33,6 +33,7 @@ module Treat
33
33
  id ||= object_id
34
34
  super(value, id)
35
35
  @type = :entity
36
+ # @match_types = Treat::Entities.match_types
36
37
  end
37
38
  # Catch missing methods to support method-like
38
39
  # access to features (e.g. entity.categoryinstead of
@@ -114,7 +115,7 @@ module Treat
114
115
  e.send($2.intern) == args[0]
115
116
  end
116
117
  a
117
- elsif method =~ /^#{@@entities_regexp}s_with_([a-z]*)$/
118
+ elsif method =~ /^#{@@entities_regexp}_with_([a-z]*)$/
118
119
  a = []
119
120
  each_entity($1.intern) do |e|
120
121
  a << e if e.has?($2.intern) &&
@@ -187,21 +188,9 @@ module Treat
187
188
  #
188
189
  # This function NEEDS to be ported to C (see source).
189
190
  def each_entity(*types)
190
- =begin
191
- # Replace with:
192
- inline do |builder|
193
-
194
- builder.c_raw <<-EOS, :arity => -1
195
- VALUE each_entity_c(int argc, VALUE *types, VALUE self)
196
- {
197
-
198
- }
199
- EOS
200
- end
201
- =end
202
191
  types = [:entity] if types.size == 0
203
192
  f = false
204
- types.each { |t2| f = true if Treat::Entities.match_types[type][t2] }
193
+ types.each { |t2| f = true if Treat::Entities.match_types[t2][type] }
205
194
  yield self if f
206
195
  unless @children.size == 0
207
196
  @children.each do |child|
@@ -209,6 +198,16 @@ module Treat
209
198
  end
210
199
  end
211
200
  end
201
+
202
+ # Replace with:
203
+ #inline do |builder|
204
+ #
205
+ # builder.c_raw <<-EOS, :arity => -1
206
+
207
+
208
+
209
+ #EOS
210
+ #end
212
211
  # Returns the first ancestor of this entity that has the given type.
213
212
  def ancestor_with_types(*types)
214
213
  ancestor = @parent
@@ -46,11 +46,12 @@ module Treat
46
46
  list = (Treat::Entities.list + [:entity])
47
47
  @@match_types = {}
48
48
  list.each do |type1|
49
- @@match_types[type1] = {type1 => true}
50
49
  list.each do |type2|
51
- if Treat::Entities.const_get(cc(type1)) <
52
- Treat::Entities.const_get(cc(type2))
53
- @@match_types[type1][type2] = true
50
+ @@match_types[type2] ||= {}
51
+ if (type1 == type2) ||
52
+ (Treat::Entities.const_get(cc(type1)) <
53
+ Treat::Entities.const_get(cc(type2)))
54
+ @@match_types[type2][type1] = true
54
55
  end
55
56
  end
56
57
  end
@@ -5,6 +5,7 @@ module Treat
5
5
  require 'stanford-core-nlp'
6
6
  @@pipeline = nil
7
7
  def self.coreferences(entity, options = {})
8
+ val = entity.to_s
8
9
  if entity.has_children?
9
10
  warn "The Stanford Coreference Resolver currently requires " +
10
11
  "an unsegmented, untokenized block of text to work with. " +
@@ -54,21 +54,8 @@ module Treat
54
54
  lda.load_vocabulary(options[:vocabulary])
55
55
  end
56
56
 
57
- # Get the topic words and annotate the section.
58
- topic_words = lda.top_words(options[:words_per_topic])
59
-
60
- collection.each_word do |word|
61
- topic_words.each do |i, words|
62
- if words.include?(word)
63
- word.set :is_topic_word?, true
64
- word.set :topic_id, i
65
- else
66
- word.set :is_topic_word?, false
67
- end
68
- end
69
- end
70
-
71
- topic_words
57
+ # Get the topic words.
58
+ lda.top_words(options[:words_per_topic])
72
59
  end
73
60
  end
74
61
  end
@@ -22,7 +22,6 @@ module Treat
22
22
  begin
23
23
  r = Treat::Formatters::Readers.const_get(cc(reader))
24
24
  rescue NameError
25
- puts e.message
26
25
  raise Treat::Exception,
27
26
  "Cannot find a reader for format: '#{ext}'."
28
27
  end
@@ -6,6 +6,8 @@ module Treat
6
6
  class YAML
7
7
  # Require the Psych YAML parser.
8
8
  require 'psych'
9
+ # Require date to revive DateTime.
10
+ require 'date'
9
11
  # Unserialize a YAML file.
10
12
  #
11
13
  # Options: none.
@@ -31,13 +31,6 @@ module Treat
31
31
  self.targets = [:entity]
32
32
  self.default = :tree
33
33
  end
34
- # Cleaners strip a text from its mark up.
35
- module Cleaners
36
- extend Group
37
- self.type = :transformer
38
- self.targets = [:document]
39
- self.default = :html
40
- end
41
34
  extend Treat::Category
42
35
  end
43
36
  end
data/lib/treat/group.rb CHANGED
@@ -24,7 +24,7 @@ module Treat
24
24
  return @method if @method
25
25
  m = ucc(cl(self))
26
26
  if m[-3..-1] == 'ers'
27
- if ['k', 't', 'm', 'd', 'g', 'n'].include? m[-4]
27
+ if ['k', 't', 'm', 'd', 'g', 'n', 'x', 'h'].include? m[-4]
28
28
  n = m[0..-4]
29
29
  n = n[0..-2] if n[-1] == n[-2]
30
30
  else
@@ -81,7 +81,9 @@ module Treat
81
81
  end
82
82
  # Get constants in this module, excluding those
83
83
  # defined by parent modules.
84
- def const_get(const); super(const, false); end
84
+ def const_get(const)
85
+ super(const, false)
86
+ end
85
87
  # Lazy load the classes in the group.
86
88
  def const_missing(const)
87
89
  bits = self.ancestors[0].to_s.split('::')
@@ -20,7 +20,7 @@ module Treat
20
20
  :chunkers => [:txt],
21
21
  :parsers => [:stanford, :enju],
22
22
  :segmenters => [:tactful, :punkt, :stanford],
23
- :tokenizers => [:macintyre, :multilingual, :perl, :punkt, :stanford, :tactful]
23
+ :tokenizers => [:macintyre, :multilingual, :perl, :punkt, :tactful, :stanford]
24
24
  }
25
25
 
26
26
  Lexicalizers = {
@@ -21,7 +21,7 @@ module Treat
21
21
  # http://rbtagger.rubyforge.org/
22
22
  # Original Perl module site:
23
23
  # http://search.cpan.org/~kwilliams/Lingua-BrillTagger-0.02/lib/Lingua/BrillTagger.pm
24
- class Brill < Tagger
24
+ class Brill
25
25
  patch = false
26
26
  # Require the 'rbtagger' gem.
27
27
  require 'rbtagger'
@@ -50,9 +50,8 @@ module Treat
50
50
  end
51
51
  # Hold the tagger.
52
52
  @@tagger = nil
53
- # Hold the user-set options
54
- @@options = {}
55
53
  # Tag words using a native Brill tagger.
54
+ # Performs own tokenization.
56
55
  #
57
56
  # Options:
58
57
  #
@@ -60,24 +59,27 @@ module Treat
60
59
  # :lexical_rules => String (Lexical rule file to use)
61
60
  # :contextual_rules => String (Contextual rules file to use)
62
61
  def self.tag(entity, options = {})
63
- r = super(entity, options)
64
- return r if r && r != :isolated_word
65
- # Reinitialize the tagger if the options have changed.
66
- @@tagger = nil if options != @@options
62
+ if entity.has_children?
63
+ warn "The Brill tagger performs its own tokenization. " +
64
+ "Removing all children of #{entity.type} with value #{entity.short_value}."
65
+ entity.remove_all!
66
+ end
67
67
  # Create the tagger if necessary
68
68
  @@tagger ||= ::Brill::Tagger.new(options[:lexicon],
69
69
  options[:lexical_rules], options[:contextual_rules])
70
- words = (r == :isolated_word) ? [entity] : entity.tokens
71
- res = @@tagger.tag(words.join(' '))[1..-1]
70
+ res = @@tagger.tag(entity.to_s)
72
71
  res ||= []
72
+ isolated_word = entity.is_a?(Treat::Entities::Token)
73
73
  res.each do |info|
74
- words.each do |word|
75
- if word.value == info[0]
76
- word.set :tag_set, :penn
77
- word.set :tag, info[1]
78
- return info[1] if r == :isolated_word
79
- end
74
+ next if info[1] == ')'
75
+ token = Treat::Entities::Token.from_string(info[0])
76
+ token.set :tag_set, :penn
77
+ token.set :tag, info[1]
78
+ if isolated_word
79
+ entity.set :tag_set, :penn
80
+ return info[1]
80
81
  end
82
+ entity << token
81
83
  end
82
84
  entity.set :tag_set, :penn
83
85
  return 'P' if entity.is_a?(Treat::Entities::Phrase)
@@ -15,7 +15,7 @@ module Treat
15
15
  # Project website: http://engtagger.rubyforge.org/
16
16
  # Original Perl module site:
17
17
  # http://cpansearch.perl.org/src/ACOBURN/Lingua-EN-Tagger-0.15/
18
- class Lingua < Tagger
18
+ class Lingua
19
19
  # Require the 'engtagger' gem.
20
20
  silence_warnings { require 'engtagger' }
21
21
  # Hold the tagger.
@@ -38,9 +38,11 @@ module Treat
38
38
  # particularly words used polysemously.
39
39
  # - (String) :unknown_word_tag => Tag for unknown words.
40
40
  def self.tag(entity, options = {})
41
+ if !entity.has_children?
42
+ warn "The Lingua tagger requires prior tokenization."
43
+ warn "Tokenizing the entity #{entity.short_value}."
44
+ end
41
45
  options = DefaultOptions.merge(options)
42
- r = super(entity, options)
43
- return r if r && r != :isolated_word
44
46
  # Reinitialize the tagger if the options have changed.
45
47
  if options != @@options
46
48
  @@options = DefaultOptions.merge(options)
@@ -48,15 +50,18 @@ module Treat
48
50
  end
49
51
  @@tagger ||= ::EngTagger.new(@@options)
50
52
  left_tag = @@tagger.conf[:current_tag] = 'pp'
51
- tokens = (r == :isolated_word) ? [entity] : entity.tokens
52
- tokens.each do |token|
53
+ isolated_word = entity.is_a?(Treat::Entities::Token)
54
+ entity.tokens.each do |token|
53
55
  w = @@tagger.clean_word(token.to_s)
54
56
  t = @@tagger.assign_tag(left_tag, w)
55
57
  t = options[:unknown_word_tag] if t.nil? || t == ''
56
58
  @@tagger.conf[:current_tag] = left_tag = t
57
59
  token.set :tag, t.upcase
58
60
  token.set :tag_set, :penn
59
- return t.upcase if r == :isolated_word
61
+ if isolated_word
62
+ entity.set :tag_set, :penn
63
+ return t.upcase
64
+ end
60
65
  end
61
66
  entity.set :tag_set, :penn
62
67
  return 'P' if entity.is_a?(Treat::Entities::Phrase)
@@ -1,7 +1,7 @@
1
1
  module Treat
2
2
  module Lexicalizers
3
3
  module Tag
4
- class Stanford < Tagger
4
+ class Stanford
5
5
  require 'stanford-core-nlp'
6
6
  # Hold one tagger per language.
7
7
  @@taggers = {}
@@ -21,54 +21,46 @@ module Treat
21
21
  def self.tag(entity, options = {})
22
22
  # Handle options and set models.
23
23
  options = DefaultOptions.merge(options)
24
- r = super(entity, options)
25
- return r if r && r != :isolated_word
24
+ if entity.has_children?
25
+ warn "The Stanford tagger performs its own tokenization." +
26
+ "Removing all children of #{entity.type} with value #{entity.short_value}."
27
+ entity.remove_all!
28
+ end
26
29
  # Arrange options.
27
30
  lang = entity.language
28
- @@tag_set = LanguageToTagSet[lang]
29
- unless @@tag_set
30
- warn "The tag set for the tagger you are requiring is not supported."
31
- end
32
-
33
- if options[:tagger_model]
34
- ::StanfordCoreNLP.set_model(
35
- 'pos.model', options[:tagger_model]
36
- )
37
- end
38
- if options[:silence]
39
- options[:log_to_file] = '/dev/null'
40
- end
41
- if options[:log_to_file]
42
- ::StanfordCoreNLP.log_file =
43
- options[:log_to_file]
44
- end
31
+ tag_set = LanguageToTagSet[lang]
32
+ warn "The tag set for the Stanford tagger you are requiring is not supported." unless tag_set
33
+ ::StanfordCoreNLP.set_model('pos.model', options[:tagger_model]) if options[:tagger_model]
34
+ options[:log_to_file] = '/dev/null' if options[:silence]
35
+ ::StanfordCoreNLP.log_file = options[:log_to_file] if options[:log_to_file]
45
36
 
46
37
  # Load the tagger.
47
38
  StanfordCoreNLP.use(lang)
48
39
  @@taggers[lang] ||= ::StanfordCoreNLP.load(:tokenize, :ssplit, :pos)
40
+
49
41
  # Tag the text.
50
42
  text = ::StanfordCoreNLP::Text.new(entity.to_s)
43
+ isolated_word = entity.is_a?(Treat::Entities::Token)
51
44
  @@taggers[lang].annotate(text)
52
- # Realign the tags.
53
- entity.each_token do |t1|
54
- text.get(:sentences).each do |sentence|
55
- sentence.get(:tokens).each do |t2|
56
- if t2.value == t1.value
57
- tag = t2.get(:part_of_speech).to_s
58
- tag_s, tag_opt = *tag.split('-')
59
- tag_s ||= ''
60
- t1.set :tag, tag_s
61
- t1.set :tag_opt, tag_opt
62
- t1.set :tag_set, @@tag_set if @@tag_set
63
- return tag_s if r == :isolated_word
64
- break
65
- end
66
- end
45
+
46
+ text.get(:tokens).each do |token|
47
+ val = token.get(:value).to_s
48
+ tok = Treat::Entities::Token.from_string(val)
49
+ tag = token.get(:part_of_speech).to_s
50
+ tag_s, tag_opt = *tag.split('-')
51
+ tag_s ||= ''
52
+ tok.set :tag, tag_s
53
+ tok.set :tag_opt, tag_opt
54
+ tok.set :tag_set, tag_set if tag_set
55
+ if isolated_word
56
+ entity.set :tag_set, :penn
57
+ return tag_s
67
58
  end
59
+ entity << tok
68
60
  end
69
61
 
70
62
  # Handle tags for sentences and phrases.
71
- entity.set :tag_set, @@tag_set if @@tag_set
63
+ entity.set :tag_set, tag_set if tag_set
72
64
  return 'P' if entity.is_a?(Treat::Entities::Phrase)
73
65
  return 'S' if entity.is_a?(Treat::Entities::Sentence)
74
66
  end
@@ -6,9 +6,8 @@ module Treat
6
6
  # Taggers return the part of speech tag of a word.
7
7
  module Tag
8
8
  extend Group
9
- require 'treat/lexicalizers/tag/tagger'
10
9
  self.type = :annotator
11
- self.targets = [:word]
10
+ self.targets = [:sentence, :phrase, :token]
12
11
  end
13
12
 
14
13
  # Return the general category of a word.