opener-opinion-detector-basic 1.0.1 → 1.0.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: a69285829be3b605bef98b6e597078390744b5fa
4
- data.tar.gz: 3997c28d2a29a10ab31a04c5b5c6c06852672c44
3
+ metadata.gz: 0fba3c1e1c36909714262f98b74b56d063e001ce
4
+ data.tar.gz: d9373a0f72ae696444c98c5a5fb164a64123618e
5
5
  SHA512:
6
- metadata.gz: 107ac8a51fb4f3664f5d994731338c4fafdcb437beb4631274b58c8a26d5f37e9af23e0298db2f044fffb96ff1c0a459aa68bc55d6efdaea9e2da4b60679ee11
7
- data.tar.gz: 0d35ab0aa4cc33e63be29f1457b5c65a2d137b740751c2657e2ac90a7ac425888e2ecbde3937c692c04406c52af265d86b266fba381e189ecba54da0ce86669d
6
+ metadata.gz: 59f04729557973091725070c69e4313fbc1466c3dafed71e8ef8ad92b987c264ea4beeb85ebfbc7128adfb6651dc5459189a6f49f78be537eb531f630dada5db
7
+ data.tar.gz: 26bcd2d87e98bdd6f8483aaffb42e39e04e07cd8d97f1fd822cc169c9ab81879148a305e804bcae8f1e7deb0b6fff02fc7af2fc88aba6fdb75903702caa947fe
data/README.md CHANGED
@@ -1,5 +1,5 @@
1
1
  Opinion Detector Basic
2
- ======================
2
+ ---------------------
3
3
 
4
4
  This module implements a opinion detector for English (also works for Dutch and
5
5
  German). The language is determined by the "xml:lang" attribut in the input KAF
@@ -10,13 +10,25 @@ be loaded. This module detects three elements of the opinions:
10
10
  * Target: about what is the previous expression
11
11
  * Holder: who is stating that expression
12
12
 
13
- Requirements
14
- -----------
15
- * VUKafParserPy: parser in python for KAF files
16
- * lxml: library for processing xml in python
13
+ ### Confused by some terminology?
14
+
15
+ This software is part of a larger collection of natural language processing
16
+ tools known as "the OpeNER project". You can find more information about the
17
+ project at [the OpeNER portal](http://opener-project.github.io). There you can
18
+ also find references to terms like KAF (an XML standard to represent linguistic
19
+ annotations in texts), component, cores, scenario's and pipelines.
20
+
21
+ Quick Use Example
22
+ -----------------
23
+
24
+ Installing the opinion-detector-basic can be done by executing:
25
+
26
+ gem install opener-opinion-detector-basic
17
27
 
18
- Usage
19
- ----
28
+ Please bare in mind that all components in OpeNER take KAF as an input and
29
+ output KAF by default.
30
+
31
+ ### Command line interface
20
32
 
21
33
  The input KAF file has to be annotated with at least the term layer, with
22
34
  polarity information. Correct input files for this module are the output KAF
@@ -28,3 +40,102 @@ To tag an input KAF file example.kaf with opinions you can run:
28
40
 
29
41
  The output will the input KAF file extended with the opinion layer.
30
42
 
43
+ Excerpt of example output.
44
+
45
+ ```
46
+ <opinions>
47
+ <opinion oid="o1">
48
+ <opinion_target>
49
+ <!--hotel-->
50
+ <span>
51
+ <target id="t_6"/>
52
+ </span>
53
+ </opinion_target>
54
+ <opinion_expression polarity="positive" strength="2">
55
+ <!--heel mooi-->
56
+ <span>
57
+ <target id="t_4"/>
58
+ <target id="t_5"/>
59
+ </span>
60
+ </opinion_expression>
61
+ </opinion>
62
+ </opinions>
63
+ ```
64
+
65
+ ### Webservices
66
+
67
+ You can launch a webservice by executing:
68
+
69
+ opinion-detector-basic-server
70
+
71
+ This will launch a mini webserver with the webservice. It defaults to port 9292,
72
+ so you can access it at <http://localhost:9292>.
73
+
74
+ To launch it on a different port provide the `-p [port-number]` option like
75
+ this:
76
+
77
+ opinion-detector-basic-server -p 1234
78
+
79
+ It then launches at <http://localhost:1234>
80
+
81
+ Documentation on the Webservice is provided by surfing to the urls provided
82
+ above. For more information on how to launch a webservice run the command with
83
+ the ```-h``` option.
84
+
85
+
86
+ ### Daemon
87
+
88
+ Last but not least the opinion detector basic comes shipped with a daemon that
89
+ can read jobs (and write) jobs to and from Amazon SQS queues. For more
90
+ information type:
91
+
92
+ opinion-detector-basic-daemon -h
93
+
94
+
95
+ Description of dependencies
96
+ ---------------------------
97
+
98
+ This component runs best if you run it in an environment suited for OpeNER
99
+ components. You can find an installation guide and helper tools in the [OpeNER installer](https://github.com/opener-project/opener-installer) and an
100
+ [installation guide on the Opener Website](http://opener-project.github.io/getting-started/how-to/local-installation.html)
101
+
102
+ At least you need the following system setup:
103
+
104
+ ### Depenencies for normal use:
105
+
106
+ * Ruby 1.9.3 or newer
107
+ * Python 2.6
108
+ * lxml: library for processing xml in python
109
+
110
+ Domain Adaption
111
+ ---------------
112
+
113
+ TODO
114
+
115
+ Language Extension
116
+ ------------------
117
+
118
+ TODO
119
+
120
+
121
+ Where to go from here
122
+ ---------------------
123
+
124
+ * [Check the project websitere](http://opener-project.github.io)
125
+ * [Checkout the webservice](http://opener.olery.com/opinion-detector-basic)
126
+
127
+ Report problem/Get help
128
+ -----------------------
129
+
130
+ If you encounter problems, please email <support@opener-project.eu> or leave an
131
+ issue in the [issue tracker](https://github.com/opener-project/opinion-detector-basic/issues).
132
+
133
+
134
+ Contributing
135
+ ------------
136
+
137
+ 1. Fork it <http://github.com/opener-project/opinion-detector-basic/fork>
138
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
139
+ 3. Commit your changes (`git commit -am 'Add some feature'`)
140
+ 4. Push to the branch (`git push origin my-new-feature`)
141
+ 5. Create new Pull Request
@@ -0,0 +1,10 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ require 'opener/daemons'
4
+
5
+ exec_path = File.expand_path('../../exec/opinion-detector-basic.rb', __FILE__)
6
+
7
+ Opener::Daemons::Controller.new(
8
+ :name => 'opinion-detector-basic',
9
+ :exec_path => exec_path
10
+ )
@@ -8,7 +8,7 @@ this_folder = os.path.dirname(os.path.realpath(__file__))
8
8
 
9
9
  # This updates the load path to ensure that the local site-packages directory
10
10
  # can be used to load packages (e.g. a locally installed copy of lxml).
11
- sys.path.append(os.path.join(this_folder, 'site-packages/pre_build'))
11
+ sys.path.append(os.path.join(this_folder, 'site-packages/pre_install'))
12
12
 
13
13
  from VUKafParserPy import KafParser
14
14
  from collections import defaultdict
@@ -26,7 +26,7 @@ def mix_lists(l1,l2):
26
26
  for x in range(min_l):
27
27
  newl.append(l1[x])
28
28
  newl.append(l2[x])
29
-
29
+
30
30
  if len(l1)>len(l2):
31
31
  newl.extend(l1[min_l:])
32
32
  elif len(l2)>len(l1):
@@ -43,13 +43,13 @@ class OpinionExpression:
43
43
  self.candidates_r=[]
44
44
  self.candidates_l=[]
45
45
  self.holder = []
46
-
46
+
47
47
  def __repr__(self):
48
48
  r='Ids:'+'#'.join(self.ids)+' Sent:'+self.sentence+' Value:'+str(self.value)+ ' Target:'+'#'.join(self.target_ids)+'\n'
49
49
  r+='Right cand: '+str(self.candidates_r)+'\n'
50
50
  r+='Left cand: '+str(self.candidates_l)+'\n'
51
- return r
52
-
51
+ return r
52
+
53
53
  class MyToken:
54
54
  def __init__(self,id,lemma,pos,polarity,sent_mod,sent):
55
55
  self.id = id
@@ -61,39 +61,39 @@ class MyToken:
61
61
  self.use_it = True
62
62
  self.list_ids = [id]
63
63
  self.value = 0
64
-
65
-
64
+
65
+
66
66
  if polarity == 'positive':
67
67
  self.value = 1
68
68
  elif polarity == 'negative':
69
69
  self.value = -1
70
-
70
+
71
71
  if sent_mod == 'intensifier':
72
72
  self.value = 2
73
73
  elif sent_mod == 'shifter':
74
74
  self.value = -1
75
75
 
76
-
76
+
77
77
  def isNegator(self):
78
78
  return self.sent_mod == 'shifter'
79
-
80
79
 
81
-
80
+
81
+
82
82
  def isIntensifier(self):
83
83
  return self.sent_mod == 'intensifier'
84
-
85
-
84
+
85
+
86
86
  def is_opinion_expression(self):
87
87
  return self.use_it and self.polarity is not None
88
-
89
-
88
+
89
+
90
90
  def __repr__(self):
91
91
  if self.use_it:
92
92
  return self.id+' lemma:'+self.lemma.encode('utf-8')+'.'+self.pos.encode('utf-8')+' pol:'+str(self.polarity)+' sentmod:'+str(self.sent_mod)+' sent:'+self.sentence+' use:'+str(self.use_it)+' list:'+'#'.join(self.list_ids)+' val:'+str(self.value)
93
93
  else:
94
94
  return '\t'+self.id+' lemma:'+self.lemma.encode('utf-8')+'.'+self.pos.encode('utf-8')+' pol:'+str(self.polarity)+' sentmod:'+str(self.sent_mod)+' sent:'+self.sentence+' use:'+str(self.use_it)+' list:'+'#'.join(self.list_ids)+' val:'+str(self.value)
95
-
96
-
95
+
96
+
97
97
 
98
98
  def obtain_opinion_expressions(tokens,lang='nl'):
99
99
  logging.debug(' Obtaining opinion expressions')
@@ -118,7 +118,7 @@ def obtain_opinion_expressions(tokens,lang='nl'):
118
118
  logging.debug(' Accucumating '+'-'.join(my_tokens[t+1].list_ids))
119
119
  t+=1
120
120
  ###########################################
121
-
121
+
122
122
  ##Apply intensifiers/negators over the next elements
123
123
  if apply_modifiers:
124
124
  logging.debug(' Applying modifiers')
@@ -133,7 +133,7 @@ def obtain_opinion_expressions(tokens,lang='nl'):
133
133
  logging.debug(' Applied modifier over '+'-'.join(my_tokens[t+1].list_ids))
134
134
  t += 1
135
135
  ###########################################
136
-
136
+
137
137
  if apply_conjunctions:
138
138
  if lang=='nl':
139
139
  concat = [',','en']
@@ -148,8 +148,8 @@ def obtain_opinion_expressions(tokens,lang='nl'):
148
148
  elif lang == 'fr':
149
149
  concat=[',','et']
150
150
  logging.debug(' Applying conjunctions:'+str(concat))
151
-
152
-
151
+
152
+
153
153
  t = 0
154
154
  while t < len(my_tokens):
155
155
  if my_tokens[t].use_it and my_tokens[t].value!=0: ## Find the first one
@@ -160,12 +160,12 @@ def obtain_opinion_expressions(tokens,lang='nl'):
160
160
  value_aux = my_tokens[t].value
161
161
  my_tokens[t].use_it = False
162
162
  #print 'Modified',my_tokens[t]
163
-
163
+
164
164
  x = t+1
165
165
  while True:
166
166
  if x>=len(my_tokens):
167
167
  break
168
-
168
+
169
169
  if my_tokens[x].lemma in concat:
170
170
  ## list_aux += my_tokens[x].list_ids Dont use it as part of the OE
171
171
  my_tokens[x].use_it = False
@@ -174,7 +174,7 @@ def obtain_opinion_expressions(tokens,lang='nl'):
174
174
  #print '\Also ',my_tokens[x]
175
175
  logging.debug(' Found token '+str(my_tokens[x]))
176
176
  list_aux += my_tokens[x].list_ids
177
-
177
+
178
178
  used.append(x)
179
179
  my_tokens[x].use_it = False
180
180
  value_aux += my_tokens[x].value
@@ -183,7 +183,7 @@ def obtain_opinion_expressions(tokens,lang='nl'):
183
183
  break
184
184
  #print 'OUT OF THE WHILE'
185
185
  ##The last one in the list used is the one accumulating all
186
-
186
+
187
187
  last_pos = used[-1]
188
188
  my_tokens[last_pos].value = value_aux
189
189
  my_tokens[last_pos].list_ids = list_aux
@@ -193,8 +193,8 @@ def obtain_opinion_expressions(tokens,lang='nl'):
193
193
  #print
194
194
  #print
195
195
  t += 1
196
-
197
-
196
+
197
+
198
198
  ## Create OpinionExpression
199
199
  my_opinion_exps = []
200
200
  logging.debug(' Generating output')
@@ -205,7 +205,7 @@ def obtain_opinion_expressions(tokens,lang='nl'):
205
205
  return my_opinion_exps
206
206
 
207
207
 
208
- '''
208
+ '''
209
209
  def get_distance(id1, id2):
210
210
  pos1 = int(id1[id1.find('_')+1:])
211
211
  pos2 = int(id2[id2.find('_')+1:])
@@ -214,7 +214,7 @@ def get_distance(id1, id2):
214
214
  else:
215
215
  return pos2-pos1
216
216
  '''
217
-
217
+
218
218
 
219
219
  def obtain_holders(ops_exps,sentences,lang):
220
220
  if lang=='nl':
@@ -229,9 +229,9 @@ def obtain_holders(ops_exps,sentences,lang):
229
229
  holders = ['ich','du','wir','ihr','sie','er']
230
230
  elif lang == 'fr':
231
231
  holders = ['je','tu','lui','elle','nous','vous','ils','elles']
232
-
232
+
233
233
  logging.debug('Obtaining holders with list: '+str(holders))
234
-
234
+
235
235
  for oe in ops_exps:
236
236
  sent = oe.sentence
237
237
  list_terms = sentences[str(sent)]
@@ -254,24 +254,24 @@ def obtain_targets_improved(ops_exps,sentences):
254
254
  logging.debug(' Obtaining targets improved')
255
255
  #print>>sys.stderr,'#'*40
256
256
  #print>>sys.stderr,'#'*40
257
-
257
+
258
258
  #print>>sys.stderr,'Beginning with obtain targets'
259
259
  ##sentences --> dict [str(numsent)] ==> list of (lemma, term)id
260
-
260
+
261
261
  all_ids_in_oe = []
262
262
  for oe in ops_exps:
263
263
  all_ids_in_oe.extend(oe.ids)
264
264
  #print>>sys.stderr,'All list of ids in oe',all_ids_in_oe
265
-
265
+
266
266
  for oe in ops_exps:
267
267
  #print>>sys.stderr,'\tOE:',oe
268
268
  logging.debug(' OpExp: '+str(oe))
269
-
269
+
270
270
  ids_in_oe = oe.ids
271
271
  sent = oe.sentence
272
272
  list_terms = sentences[str(sent)]
273
273
  #print>>sys.stderr,'\t\tTerms in sent:',list_terms
274
-
274
+
275
275
  ###########################################
276
276
  #First rule: noun to the right within maxdistance tokens
277
277
  max_distance_right = 3
@@ -279,7 +279,7 @@ def obtain_targets_improved(ops_exps,sentences):
279
279
  for idx, (lemma,pos,term_id) in enumerate(list_terms):
280
280
  if term_id in ids_in_oe:
281
281
  biggest_index = idx
282
-
282
+
283
283
  #print>>sys.stderr,'\t\tBI',biggest_index
284
284
  if biggest_index+1 >= len(list_terms): ## is the last element and we shall skip it
285
285
  #print>>sys.stderr,'\t\tNot possible to apply 1st rule'
@@ -294,7 +294,7 @@ def obtain_targets_improved(ops_exps,sentences):
294
294
  #print>>sys.stderr,'\t\tCandidates for right rule no filter',oe.__candidates_right
295
295
 
296
296
  ######################################################################################
297
-
297
+
298
298
 
299
299
  ###########################################
300
300
  max_distance_left = 3
@@ -315,14 +315,14 @@ def obtain_targets_improved(ops_exps,sentences):
315
315
  oe.candidates_l = filter_candidates(candidates,all_ids_in_oe)
316
316
  logging.debug(' Candidates filtered left: '+str(oe.candidates_l))
317
317
 
318
- ######################################################################################
319
-
318
+ ######################################################################################
319
+
320
320
  #print>>sys.stderr,'#'*40
321
321
  #print>>sys.stderr,'#'*40
322
-
322
+
323
323
  ## filling or.target_ids
324
324
  assigned_as_targets = []
325
-
325
+
326
326
  # First we assing to all the first in the right, if any, and not assigned
327
327
  logging.debug(' Applying first to the right rule')
328
328
  for oe in ops_exps:
@@ -334,7 +334,7 @@ def obtain_targets_improved(ops_exps,sentences):
334
334
  ###assigned_as_targets.append(id) #Uncomment to avoid selection of the same target moe than once
335
335
  logging.debug(' OpExp '+str(oe)+' selected '+id)
336
336
  #print>>sys.stderr,'Asignamos',id
337
-
337
+
338
338
  logging.debug(' Applying most close rule')
339
339
  for oe in ops_exps:
340
340
  if len(oe.target_ids) == 0: # otherwise it's solved
@@ -346,7 +346,7 @@ def obtain_targets_improved(ops_exps,sentences):
346
346
  logging.debug(' OpExp '+str(oe)+' selected '+id)
347
347
  break
348
348
 
349
- ######## MAIN ROUTINE ############
349
+ ######## MAIN ROUTINE ############
350
350
 
351
351
  ## Check if we are reading from a pipeline
352
352
  if sys.stdin.isatty():
@@ -384,8 +384,8 @@ except Exception as e:
384
384
  print>>sys.stderr,'Stream input must be a valid KAF file'
385
385
  print>>sys.stderr,'Error: ',str(e)
386
386
  sys.exit(-1)
387
-
388
-
387
+
388
+
389
389
  lang = my_kaf_tree.getLanguage()
390
390
  ## Creating data structure
391
391
  sentences = defaultdict(list)
@@ -410,7 +410,7 @@ for term in my_kaf_tree.getTerms():
410
410
  sent_mod = sentiment.getSentimentModifier()
411
411
  sentence = my_kaf_tree.getToken(list_span[0]).get('sent') ## The sentence of the first token element in span
412
412
  my_tokens.append(MyToken(term_id,lemma,kaf_pos,polarity,sent_mod,sentence))
413
-
413
+
414
414
  sentences[str(sentence)].append((lemma,kaf_pos,term_id))
415
415
  #############################
416
416
 
@@ -437,10 +437,10 @@ logging.debug('Generating KAF output')
437
437
 
438
438
  if remove_opinions:
439
439
  my_kaf_tree.remove_opinion_layer()
440
-
440
+
441
441
  for oe in my_ops_exps:
442
442
  op_ele = etree.Element('opinion')
443
-
443
+
444
444
  ## Holder
445
445
  if len(oe.holder)!=0:
446
446
  oe.holder.sort()
@@ -452,48 +452,48 @@ for oe in my_ops_exps:
452
452
  op_hol.append(span_op_hol)
453
453
  for id in oe.holder:
454
454
  span_op_hol.append(etree.Element('target',attrib={'id':id}))
455
-
455
+
456
456
  ## Target
457
457
  op_tar = etree.Element('opinion_target')
458
458
  op_ele.append(op_tar)
459
459
 
460
-
460
+
461
461
  if len(oe.target_ids)!=0: ## if there are no targets, there is no opinion eleemnt
462
462
  oe.target_ids.sort()
463
463
  c = ' '.join(lemma_for_tid[tid] for tid in oe.target_ids)
464
- op_tar.append(etree.Comment(c))
464
+ op_tar.append(etree.Comment(c))
465
465
  span_op_tar = etree.Element('span')
466
466
  op_tar.append(span_op_tar)
467
467
  for id in oe.target_ids:
468
468
  span_op_tar.append(etree.Element('target',attrib={'id':id}))
469
-
469
+
470
470
  #Expression
471
471
  if oe.value > 0: pol = 'positive'
472
472
  elif oe.value < 0: pol = 'negative'
473
473
  else: pol = 'neutral'
474
-
474
+
475
475
  op_exp = etree.Element('opinion_expression')
476
476
  op_exp.set('polarity',pol)
477
477
  if opinion_strength:
478
478
  op_exp.set('strength',str(oe.value))
479
-
479
+
480
480
  op_ele.append(op_exp)
481
481
  oe.ids.sort()
482
- c = ' '.join(lemma_for_tid[tid] for tid in oe.ids)
483
- op_exp.append(etree.Comment(c))
482
+ c = ' '.join(lemma_for_tid[tid] for tid in oe.ids)
483
+ op_exp.append(etree.Comment(c))
484
484
  span_exp = etree.Element('span')
485
485
  op_exp.append(span_exp)
486
486
  for id in oe.ids:
487
487
  span_exp.append(etree.Element('target',attrib={'id':id}))
488
-
488
+
489
489
  ##Append the op_ele to the opinions layer
490
490
  my_kaf_tree.addElementToLayer('opinions', op_ele)
491
-
492
-
493
- my_kaf_tree.addLinguisticProcessor('Basic opinion detector with Pos','1.0','opinions', my_time_stamp)
491
+
492
+
493
+ my_kaf_tree.addLinguisticProcessor('Basic opinion detector with Pos','1.0','opinions', my_time_stamp)
494
494
  my_kaf_tree.saveToFile(sys.stdout)
495
495
  logging.debug('Process finished')
496
496
 
497
497
 
498
-
498
+
499
499