hdo-storting-importer 0.0.1 → 0.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/Gemfile CHANGED
@@ -1,8 +1,3 @@
1
1
  source :rubygems
2
2
 
3
- gem "builder"
4
- gem "nokogiri"
5
- gem "socksify"
6
- gem "aruba"
7
- gem "rake"
8
- gem "rest-client"
3
+ gemspec
data/README.md CHANGED
@@ -1,14 +1,13 @@
1
1
  What
2
2
  ====
3
3
 
4
- Convert and import XML from data.stortinget.no
4
+ Convert XML from data.stortinget.no to HDO XML
5
5
 
6
6
  Usage
7
7
  =====
8
8
 
9
- $ bin/hdo-converter --app-root /src/hdo-site all
9
+ $ hdo-converter --help
10
+ $ hdo-converter categories folketingparser/rawdata/data.stortinget.no/eksport/emner/index.html
11
+ $ hdo-converter promises all-promises.csv | script/import promises
10
12
 
11
- Caveats
12
- =======
13
13
 
14
- Right now we're executing script/import from hdo-site to perform the import. That means running this with bundler may fail if the gems don't match. Don't do that.
@@ -0,0 +1,189 @@
1
+ Feature: Import data
2
+ Scenario: Import districts
3
+ Given a file named "fylker.xml" with:
4
+ """
5
+ <?xml version="1.0" encoding="utf-8"?>
6
+ <fylker_oversikt xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://data.stortinget.no">
7
+ <versjon>1.0</versjon>
8
+ <fylker_liste>
9
+ <fylke>
10
+ <versjon>1.0</versjon>
11
+ <id>AA</id>
12
+ <navn>Aust-Agder</navn>
13
+ </fylke>
14
+ <fylke>
15
+ <versjon>1.0</versjon>
16
+ <id>VA</id>
17
+ <navn>Vest-Agder</navn>
18
+ </fylke>
19
+ <fylke>
20
+ <versjon>1.0</versjon>
21
+ <id>Ak</id>
22
+ <navn>Akershus</navn>
23
+ </fylke>
24
+ <fylke>
25
+ <versjon>1.0</versjon>
26
+ <id>Bu</id>
27
+ <navn>Buskerud</navn>
28
+ </fylke>
29
+ <fylke>
30
+ <versjon>1.0</versjon>
31
+ <id>Fi</id>
32
+ <navn>Finnmark</navn>
33
+ </fylke>
34
+ <fylke>
35
+ <versjon>1.0</versjon>
36
+ <id>He</id>
37
+ <navn>Hedmark</navn>
38
+ </fylke>
39
+ <fylke>
40
+ <versjon>1.0</versjon>
41
+ <id>Ho</id>
42
+ <navn>Hordaland</navn>
43
+ </fylke>
44
+ <fylke>
45
+ <versjon>1.0</versjon>
46
+ <id>MR</id>
47
+ <navn>Møre og Romsdal</navn>
48
+ </fylke>
49
+ <fylke>
50
+ <versjon>1.0</versjon>
51
+ <id>No</id>
52
+ <navn>Nordland</navn>
53
+ </fylke>
54
+ <fylke>
55
+ <versjon>1.0</versjon>
56
+ <id>Op</id>
57
+ <navn>Oppland</navn>
58
+ </fylke>
59
+ <fylke>
60
+ <versjon>1.0</versjon>
61
+ <id>Os</id>
62
+ <navn>Oslo</navn>
63
+ </fylke>
64
+ <fylke>
65
+ <versjon>1.0</versjon>
66
+ <id>Ro</id>
67
+ <navn>Rogaland</navn>
68
+ </fylke>
69
+ <fylke>
70
+ <versjon>1.0</versjon>
71
+ <id>SF</id>
72
+ <navn>Sogn og Fjordane</navn>
73
+ </fylke>
74
+ <fylke>
75
+ <versjon>1.0</versjon>
76
+ <id>Te</id>
77
+ <navn>Telemark</navn>
78
+ </fylke>
79
+ <fylke>
80
+ <versjon>1.0</versjon>
81
+ <id>Tr</id>
82
+ <navn>Troms</navn>
83
+ </fylke>
84
+ <fylke>
85
+ <versjon>1.0</versjon>
86
+ <id>NT</id>
87
+ <navn>Nord-Trøndelag</navn>
88
+ </fylke>
89
+ <fylke>
90
+ <versjon>1.0</versjon>
91
+ <id>ST</id>
92
+ <navn>Sør-Trøndelag</navn>
93
+ </fylke>
94
+ <fylke>
95
+ <versjon>1.0</versjon>
96
+ <id>Ve</id>
97
+ <navn>Vestfold</navn>
98
+ </fylke>
99
+ <fylke>
100
+ <versjon>1.0</versjon>
101
+ <id>Øs</id>
102
+ <navn>Østfold</navn>
103
+ </fylke>
104
+ </fylker_liste>
105
+ </fylker_oversikt>
106
+ """
107
+ When I run `hdo-converter districts fylker.xml`
108
+ Then the stdout should contain:
109
+ """
110
+ <?xml version="1.0" encoding="UTF-8"?>
111
+ <districts>
112
+ <district>
113
+ <externalId>AA</externalId>
114
+ <name>Aust-Agder</name>
115
+ </district>
116
+ <district>
117
+ <externalId>VA</externalId>
118
+ <name>Vest-Agder</name>
119
+ </district>
120
+ <district>
121
+ <externalId>Ak</externalId>
122
+ <name>Akershus</name>
123
+ </district>
124
+ <district>
125
+ <externalId>Bu</externalId>
126
+ <name>Buskerud</name>
127
+ </district>
128
+ <district>
129
+ <externalId>Fi</externalId>
130
+ <name>Finnmark</name>
131
+ </district>
132
+ <district>
133
+ <externalId>He</externalId>
134
+ <name>Hedmark</name>
135
+ </district>
136
+ <district>
137
+ <externalId>Ho</externalId>
138
+ <name>Hordaland</name>
139
+ </district>
140
+ <district>
141
+ <externalId>MR</externalId>
142
+ <name>Møre og Romsdal</name>
143
+ </district>
144
+ <district>
145
+ <externalId>No</externalId>
146
+ <name>Nordland</name>
147
+ </district>
148
+ <district>
149
+ <externalId>Op</externalId>
150
+ <name>Oppland</name>
151
+ </district>
152
+ <district>
153
+ <externalId>Os</externalId>
154
+ <name>Oslo</name>
155
+ </district>
156
+ <district>
157
+ <externalId>Ro</externalId>
158
+ <name>Rogaland</name>
159
+ </district>
160
+ <district>
161
+ <externalId>SF</externalId>
162
+ <name>Sogn og Fjordane</name>
163
+ </district>
164
+ <district>
165
+ <externalId>Te</externalId>
166
+ <name>Telemark</name>
167
+ </district>
168
+ <district>
169
+ <externalId>Tr</externalId>
170
+ <name>Troms</name>
171
+ </district>
172
+ <district>
173
+ <externalId>NT</externalId>
174
+ <name>Nord-Trøndelag</name>
175
+ </district>
176
+ <district>
177
+ <externalId>ST</externalId>
178
+ <name>Sør-Trøndelag</name>
179
+ </district>
180
+ <district>
181
+ <externalId>Ve</externalId>
182
+ <name>Vestfold</name>
183
+ </district>
184
+ <district>
185
+ <externalId>Øs</externalId>
186
+ <name>Østfold</name>
187
+ </district>
188
+ </districts>
189
+ """
@@ -14,4 +14,10 @@ Gem::Specification.new do |gem|
14
14
  gem.name = "hdo-storting-importer"
15
15
  gem.require_paths = ["lib"]
16
16
  gem.version = Hdo::StortingImporter::VERSION
17
+
18
+ gem.add_runtime_dependency "builder"
19
+ gem.add_runtime_dependency "nokogiri"
20
+ gem.add_runtime_dependency "aruba"
21
+ gem.add_runtime_dependency "rake"
22
+ gem.add_runtime_dependency "rest-client"
17
23
  end
@@ -6,10 +6,45 @@ module Hdo
6
6
  attr_reader :external_id, :name
7
7
  attr_accessor :children
8
8
 
9
+ def self.type_name
10
+ 'category'
11
+ end
12
+
13
+ def self.description
14
+ 'a parliamentary category, used to categorize issues and promises'
15
+ end
16
+
17
+ def self.fields
18
+ [
19
+ EXTERNAL_ID_FIELD,
20
+ Field.new(:name, true, :string, 'The name of the category.'),
21
+ Field.new(:subcategories, false, 'list<category>', 'A list of subcategories.'),
22
+ ]
23
+ end
24
+
25
+ def self.xml_example(builder = Util.builder)
26
+ cat = new("5", "Employment")
27
+ cat.children << new("6", "Wages")
28
+
29
+ cat.to_hdo_xml(builder)
30
+ end
31
+
32
+ #
33
+ # Deserialize from a Storting XML document (<emne_liste><emne>...</emne></emne_liste>)
34
+ #
35
+ # @return [Array<Category>]
36
+ #
37
+
9
38
  def self.from_storting_doc(doc)
10
39
  doc.css("emne_liste > emne").map { |xt| from_storting_node(xt) }
11
40
  end
12
41
 
42
+ #
43
+ # Deserialize form a Storting XML node (<emne>...</emne>)
44
+ #
45
+ # @return [Category]
46
+ #
47
+
13
48
  def self.from_storting_node(node)
14
49
  cat = Category.new node.css("id").first.text, node.css("navn").first.text
15
50
 
@@ -21,6 +56,24 @@ module Hdo
21
56
  cat
22
57
  end
23
58
 
59
+ #
60
+ # Deserialize from a HDO XML document (<categories><category>...</category></categories>)
61
+ #
62
+ # @param [Nokogiri::XML::Element]
63
+ # @return [Array<Category>]
64
+ #
65
+
66
+
67
+ def self.from_hdo_doc(doc)
68
+ doc.css("categories > category").map { |node| from_hdo_node(node) }
69
+ end
70
+
71
+ #
72
+ # Deserialize from a HDO XML node
73
+ #
74
+ # @return [Category]
75
+ #
76
+
24
77
  def self.from_hdo_node(node)
25
78
  external_id = node.css("externalId").first.text
26
79
  name = node.css("name").first.text
@@ -37,6 +90,10 @@ module Hdo
37
90
  @children = []
38
91
  end
39
92
 
93
+ #
94
+ # Serialize as HDO XML
95
+ #
96
+
40
97
  def to_hdo_xml(builder = Util.builder)
41
98
  builder.category do |cat|
42
99
  cat.externalId external_id
@@ -3,139 +3,118 @@ module Hdo
3
3
  class CLI
4
4
  def initialize(args)
5
5
  @log = Logger.new(STDERR)
6
- @cmd, @options = parse(args)
6
+ @type, @files, @options = parse(args)
7
7
  end
8
8
 
9
9
  def execute
10
- import @cmd.to_sym
10
+ case @type
11
+ when :dld_issues
12
+ puts read_dld_issues
13
+ when :dld_votes
14
+ puts read_dld_votes
15
+ when :promises
16
+ puts read_promises
17
+ else
18
+ puts read_type(@type, class_for_type(@type))
19
+ end
20
+ rescue Errno::EPIPE
21
+ # ignored
11
22
  end
12
23
 
13
24
  private
14
25
 
26
+ TYPE_TO_CLASS = {
27
+ :categories => Category,
28
+ :issues => Issue,
29
+ :districts => District,
30
+ :committees => Committee,
31
+ :parties => Party,
32
+ :representatives => Representative,
33
+ :votes => Vote
34
+ }
35
+
36
+ def class_for_type(type)
37
+ TYPE_TO_CLASS[type] or raise ArgumentError, "unknown type: #{type.inspect}"
38
+ end
39
+
40
+ def read_type(plural, klass)
41
+ objs = @files.map do |e|
42
+ doc = Nokogiri::XML.parse(File.read(e))
43
+ doc.remove_namespaces!
44
+
45
+ klass.from_storting_doc(doc)
46
+ end.flatten
47
+
48
+ str = Util.builder do |xml|
49
+ xml.instruct!
50
+ xml.__send__(plural) do |builder|
51
+ objs.each { |e| e.to_hdo_xml(builder) }
52
+ end
53
+ end
54
+
55
+ str
56
+ end
57
+
15
58
  def parse(args)
16
- options = {:source => (has_submodule? ? 'disk' : 'api')}
59
+ options = {}
17
60
 
18
61
  parser = OptionParser.new do |opt|
19
- opt.banner = "Usage: #{$0} <import-type> [options]"
20
-
62
+ types = TYPE_TO_CLASS.keys + [:dld_issues, :dld_votes, :promises]
63
+ opt.banner = "Usage: #{$0} <#{types.join '|'}> <file(s)>"
21
64
  opt.on("--help", "You're looking at it.") { puts opt; exit; }
22
- opt.on("--only-print", "Don't run import, only print generated XML.") { options[:only_print] = true }
23
- opt.on("--except WHAT", 'Ignore this comma separated list of entities from import.') { |s| options[:ignore] = s.split(",").map(&:strip).map(&:to_sym) }
24
- opt.on("--app-root APP_ROOT", 'Path to clone of git://github.com/holderdeord/hdo-site.git') { |path| options[:app_root] = path }
25
- opt.on("--app-url APP_URL", 'URL to hdo-site') { |url| options[:app_url] = url }
26
- opt.on("--source SOURCE ", 'Where to take data from [disk|api]') { |source| options[:source] = source }
27
- opt.on("--socks PROXY", 'host:port for SOCKS proxy') do |proxy|
28
- require 'socksify'
29
- host, port = proxy.split ":"
30
- TCPSocket.socks_server = host
31
- TCPSocket.socks_port = Integer(port)
32
- end
33
65
  end
34
66
 
35
67
  parser.parse!(args)
36
68
 
37
- cmd = args.shift
69
+ type, files = args
38
70
 
39
- unless cmd
71
+ if type.nil?
40
72
  puts(parser)
41
73
  exit 1
42
74
  end
43
75
 
44
- [cmd, options]
76
+ [type.to_sym, Array(files), options]
45
77
  end
46
78
 
47
- def data_source
48
- @data_source ||= (
49
- ds = if @options[:source] == "disk"
50
- DiskDataSource.new(File.join(StortingImporter.root, 'folketingparser/rawdata/data.stortinget.no'))
51
- elsif @options[:source] == "api"
52
- ApiDataSource.new("http://data.stortinget.no/")
53
- else
54
- raise ArgumentError, "invalid source: #{@options[:source].inspect}"
55
- end
56
-
57
- ParsingDataSource.new(ds)
58
- )
59
- end
79
+ def read_dld_issues
80
+ doc = Nokogiri::XML.parse(File.read(File.join(StortingImporter.root, 'data/dld-issues.xml')))
81
+ issues = Issue.from_hdo_doc doc
60
82
 
61
- def importer
62
- @importer ||= (
63
- if @options[:app_root]
64
- ScriptImporter.new(@options[:app_root])
65
- else
66
- raise ArgumentError, "app-root not given, can't import"
83
+ Util.builder do |xml|
84
+ xml.instruct!
85
+ xml.issues do |issues_builder|
86
+ issues.each { |i| i.to_hdo_xml(issues_builder) }
67
87
  end
68
- )
69
- end
70
-
71
- def import(what)
72
- case what
73
- when :dld
74
- import_dld
75
- when :promises
76
- import_promises
77
- when :all
78
- import_all
79
- else
80
- import_docs converter.xml_for(what.to_sym)
81
88
  end
82
89
  end
83
90
 
84
- def import_all
85
- ignore = Array(@options[:ignore])
86
-
87
- import_docs converter.xml_for(:parties) unless ignore.include?(:parties)
88
- import_docs converter.xml_for(:committees) unless ignore.include?(:committees)
89
- import_docs converter.xml_for(:districts) unless ignore.include?(:districts)
90
- import_docs converter.xml_for(:representatives) unless ignore.include?(:representatives)
91
- import_docs converter.xml_for(:categories) unless ignore.include?(:categories)
92
- import_docs converter.xml_for(:issues) unless ignore.include?(:issues)
93
- import_docs converter.xml_for(:votes) unless ignore.include?(:votes)
91
+ def read_dld_votes
92
+ doc = Nokogiri::XML.parse(File.read(File.join(StortingImporter.root, 'folketingparser/data/votering-2011-04-04-dld-hdo.xml')))
93
+ votes = Vote.from_hdo_doc doc
94
94
 
95
- import_dld unless ignore.include?(:dld)
96
- import_promises unless ignore.include?(:promises)
97
- end
98
-
99
- def converter
100
- @converter ||= Converter.new(data_source)
101
- end
102
-
103
- def import_dld
104
- if has_submodule?
105
- print_or_import File.read(File.join(StortingImporter.root, 'data/dld-issues.xml'))
106
- print_or_import File.read(File.join(StortingImporter.root, 'folketingparser/data/votering-2011-04-04-dld-hdo.xml'))
107
- else
108
- $stderr.puts "folketingparser not found, skipping DLD votes and issues (run `git submodule update --init` if you need this)"
109
- end
110
- end
111
-
112
- def import_promises
113
- csvs = Dir[File.join(StortingImporter.root, 'data/promises-*.csv')].sort_by { |e| File.basename(e) }
114
- csvs.each do |path|
115
- print_or_import PromiseConverter.new(path).xml
95
+ Util.builder do |xml|
96
+ xml.instruct!
97
+ xml.votes do |votes_builder|
98
+ votes.each { |v| v.to_hdo_xml(votes_builder) }
99
+ end
116
100
  end
117
101
  end
118
102
 
119
- def import_docs(docs)
120
- docs = [docs] unless docs.kind_of?(Array)
121
-
122
- docs.each do |doc|
123
- print_or_import doc.to_s
103
+ def read_promises
104
+ csvs = @files.any? ? @files : Dir[File.join(StortingImporter.root, 'data/promises-*.csv')].sort_by { |e| File.basename(e) }
105
+ content = ''
106
+ csvs.each do |csv|
107
+ content << File.read(File.expand_path(csv), encoding: "ISO-8859-1").encode("UTF-8")
124
108
  end
125
- end
126
109
 
127
- def print_or_import(xml)
128
- if @options[:only_print]
129
- puts xml
130
- else
131
- importer.import xml
110
+ Util.builder do |xml|
111
+ xml.instruct!
112
+ xml.promises do |promises|
113
+ Promise.from_csv(content).each { |e| e.to_hdo_xml(promises) }
114
+ end
132
115
  end
133
116
  end
134
117
 
135
- def has_submodule?
136
- File.exist?(File.join(StortingImporter.root, 'folketingparser/data'))
137
- end
138
-
139
118
  end
140
119
  end
141
120
  end
@@ -5,6 +5,22 @@ module Hdo
5
5
 
6
6
  attr_reader :external_id, :name
7
7
 
8
+ def self.type_name
9
+ 'committee'
10
+ end
11
+
12
+ def self.description
13
+ 'a parliamentary committe'
14
+ end
15
+
16
+ def self.xml_example(builder = Util.builder)
17
+ new("ARBSOS", "Arbeids- og sosialkomiteen").to_hdo_xml(builder)
18
+ end
19
+
20
+ def self.fields
21
+ [EXTERNAL_ID_FIELD, Field.new(:name, true, :string, 'The name of the committee.')]
22
+ end
23
+
8
24
  def self.from_storting_doc(doc)
9
25
  doc.css("komiteer_liste komite").map do |node|
10
26
  from_storting_node(node)
@@ -15,6 +31,10 @@ module Hdo
15
31
  new node.css("id").first.text, node.css("navn").first.text
16
32
  end
17
33
 
34
+ def self.from_hdo_doc(doc)
35
+ doc.css("committees > committee").map { |e| from_hdo_node e }
36
+ end
37
+
18
38
  def self.from_hdo_node(node)
19
39
  new node.css("externalId").first.text, node.css("name").first.text
20
40
  end
@@ -5,6 +5,22 @@ module Hdo
5
5
 
6
6
  attr_reader :external_id, :name
7
7
 
8
+ def self.type_name
9
+ 'district'
10
+ end
11
+
12
+ def self.description
13
+ 'an electoral district'
14
+ end
15
+
16
+ def self.xml_example(builder = Util.builder)
17
+ new("Db", "Duckburg").to_hdo_xml(builder)
18
+ end
19
+
20
+ def self.fields
21
+ [EXTERNAL_ID_FIELD, Field.new(:name, true, :string, 'The name of the electoral district.')]
22
+ end
23
+
8
24
  def self.from_storting_doc(doc)
9
25
  doc.css("fylker_liste fylke").map do |node|
10
26
  from_storting_node(node)
@@ -15,6 +31,10 @@ module Hdo
15
31
  new node.css("id").first.text, node.css("navn").first.text
16
32
  end
17
33
 
34
+ def self.from_hdo_doc(doc)
35
+ doc.css("districts > district").map { |e| from_hdo_node(e) }
36
+ end
37
+
18
38
  def self.from_hdo_node(node)
19
39
  new node.css("externalId").first.text, node.css("name").first.text
20
40
  end
@@ -1,3 +1,5 @@
1
+ # encoding: UTF-8
2
+
1
3
  module Hdo
2
4
  module StortingImporter
3
5
  class Issue
@@ -6,6 +8,48 @@ module Hdo
6
8
  attr_reader :external_id, :summary, :description, :type, :status, :last_update,
7
9
  :reference, :document_group, :committee, :categories
8
10
 
11
+ def self.type_name
12
+ 'issue'
13
+ end
14
+
15
+ def self.description
16
+ 'a parliament issue'
17
+ end
18
+
19
+ def self.example
20
+ new(
21
+ "53520",
22
+ "Inngåelse av avtale om opprettelse av sekretariatet for Den nordlige dimensjons partnerskap for helse og livskvalitet (NDPHS)",
23
+ "Samtykke til inngåelse av avtale av 25. november 2011 om opprettelse av sekretariatet for Den nordlige dimensjons partnerskap for helse og livskvalitet (NDPHS)",
24
+ "alminneligsak",
25
+ "mottatt",
26
+ "2012-04-20T00:00:00",
27
+ "Prop. 90 S (2011-2012)",
28
+ "proposisjon",
29
+ "Transport- og kommunikasjonskomiteen",
30
+ ['UTENRIKSSAKER', 'TRAKTATER', 'NORDISK SAMARBEID']
31
+ )
32
+ end
33
+
34
+ def self.xml_example(builder = Util.builder)
35
+ example.to_hdo_xml(builder)
36
+ end
37
+
38
+ def self.fields
39
+ [
40
+ EXTERNAL_ID_FIELD,
41
+ Field.new(:summary, true, :string, 'A (preferably one-line) summary of the issue.'),
42
+ Field.new(:description, true, :string, 'A longer description of the issue.'),
43
+ Field.new(:type, true, :string, 'The type of issue.'),
44
+ Field.new(:status, true, :string, 'The status of the issue.'),
45
+ Field.new(:lastUpdate, true, :string, 'The time the issue was last updated in the parliament.'),
46
+ Field.new(:reference, true, :string, 'A reference.'),
47
+ Field.new(:documentGroup, true, :string, 'What document group this issue belongs to.'),
48
+ Field.new(:committee, false, :string, "What committee this issue belongs to. Should match the 'name' field in the committee type."),
49
+ Field.new(:categories, false, 'list', "List of categories (matching the 'name' field of the category type).")
50
+ ]
51
+ end
52
+
9
53
  def self.from_storting_doc(doc)
10
54
  doc.css("saker_liste sak").map do |node|
11
55
  from_storting_node(node)
@@ -33,6 +77,13 @@ module Hdo
33
77
  end
34
78
 
35
79
  new(external_id, summary, description, type, status, last_update, reference, document_group, committee, categories)
80
+ rescue
81
+ puts lnode
82
+ raise
83
+ end
84
+
85
+ def self.from_hdo_doc(doc)
86
+ doc.css("issues > issue").map { |e| from_hdo_node(e) }
36
87
  end
37
88
 
38
89
  def self.from_hdo_node(node)