hdo-storting-importer 0.0.1 → 0.0.2

Sign up to get free protection for your applications and to get access to all the features.
data/Gemfile CHANGED
@@ -1,8 +1,3 @@
1
1
  source :rubygems
2
2
 
3
- gem "builder"
4
- gem "nokogiri"
5
- gem "socksify"
6
- gem "aruba"
7
- gem "rake"
8
- gem "rest-client"
3
+ gemspec
data/README.md CHANGED
@@ -1,14 +1,13 @@
1
1
  What
2
2
  ====
3
3
 
4
- Convert and import XML from data.stortinget.no
4
+ Convert XML from data.stortinget.no to HDO XML
5
5
 
6
6
  Usage
7
7
  =====
8
8
 
9
- $ bin/hdo-converter --app-root /src/hdo-site all
9
+ $ hdo-converter --help
10
+ $ hdo-converter categories folketingparser/rawdata/data.stortinget.no/eksport/emner/index.html
11
+ $ hdo-converter promises all-promises.csv | script/import promises
10
12
 
11
- Caveats
12
- =======
13
13
 
14
- Right now we're executing script/import from hdo-site to perform the import. That means running this with bundler may fail if the gems don't match. Don't do that.
@@ -0,0 +1,189 @@
1
+ Feature: Import data
2
+ Scenario: Import districts
3
+ Given a file named "fylker.xml" with:
4
+ """
5
+ <?xml version="1.0" encoding="utf-8"?>
6
+ <fylker_oversikt xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://data.stortinget.no">
7
+ <versjon>1.0</versjon>
8
+ <fylker_liste>
9
+ <fylke>
10
+ <versjon>1.0</versjon>
11
+ <id>AA</id>
12
+ <navn>Aust-Agder</navn>
13
+ </fylke>
14
+ <fylke>
15
+ <versjon>1.0</versjon>
16
+ <id>VA</id>
17
+ <navn>Vest-Agder</navn>
18
+ </fylke>
19
+ <fylke>
20
+ <versjon>1.0</versjon>
21
+ <id>Ak</id>
22
+ <navn>Akershus</navn>
23
+ </fylke>
24
+ <fylke>
25
+ <versjon>1.0</versjon>
26
+ <id>Bu</id>
27
+ <navn>Buskerud</navn>
28
+ </fylke>
29
+ <fylke>
30
+ <versjon>1.0</versjon>
31
+ <id>Fi</id>
32
+ <navn>Finnmark</navn>
33
+ </fylke>
34
+ <fylke>
35
+ <versjon>1.0</versjon>
36
+ <id>He</id>
37
+ <navn>Hedmark</navn>
38
+ </fylke>
39
+ <fylke>
40
+ <versjon>1.0</versjon>
41
+ <id>Ho</id>
42
+ <navn>Hordaland</navn>
43
+ </fylke>
44
+ <fylke>
45
+ <versjon>1.0</versjon>
46
+ <id>MR</id>
47
+ <navn>Møre og Romsdal</navn>
48
+ </fylke>
49
+ <fylke>
50
+ <versjon>1.0</versjon>
51
+ <id>No</id>
52
+ <navn>Nordland</navn>
53
+ </fylke>
54
+ <fylke>
55
+ <versjon>1.0</versjon>
56
+ <id>Op</id>
57
+ <navn>Oppland</navn>
58
+ </fylke>
59
+ <fylke>
60
+ <versjon>1.0</versjon>
61
+ <id>Os</id>
62
+ <navn>Oslo</navn>
63
+ </fylke>
64
+ <fylke>
65
+ <versjon>1.0</versjon>
66
+ <id>Ro</id>
67
+ <navn>Rogaland</navn>
68
+ </fylke>
69
+ <fylke>
70
+ <versjon>1.0</versjon>
71
+ <id>SF</id>
72
+ <navn>Sogn og Fjordane</navn>
73
+ </fylke>
74
+ <fylke>
75
+ <versjon>1.0</versjon>
76
+ <id>Te</id>
77
+ <navn>Telemark</navn>
78
+ </fylke>
79
+ <fylke>
80
+ <versjon>1.0</versjon>
81
+ <id>Tr</id>
82
+ <navn>Troms</navn>
83
+ </fylke>
84
+ <fylke>
85
+ <versjon>1.0</versjon>
86
+ <id>NT</id>
87
+ <navn>Nord-Trøndelag</navn>
88
+ </fylke>
89
+ <fylke>
90
+ <versjon>1.0</versjon>
91
+ <id>ST</id>
92
+ <navn>Sør-Trøndelag</navn>
93
+ </fylke>
94
+ <fylke>
95
+ <versjon>1.0</versjon>
96
+ <id>Ve</id>
97
+ <navn>Vestfold</navn>
98
+ </fylke>
99
+ <fylke>
100
+ <versjon>1.0</versjon>
101
+ <id>Øs</id>
102
+ <navn>Østfold</navn>
103
+ </fylke>
104
+ </fylker_liste>
105
+ </fylker_oversikt>
106
+ """
107
+ When I run `hdo-converter districts fylker.xml`
108
+ Then the stdout should contain:
109
+ """
110
+ <?xml version="1.0" encoding="UTF-8"?>
111
+ <districts>
112
+ <district>
113
+ <externalId>AA</externalId>
114
+ <name>Aust-Agder</name>
115
+ </district>
116
+ <district>
117
+ <externalId>VA</externalId>
118
+ <name>Vest-Agder</name>
119
+ </district>
120
+ <district>
121
+ <externalId>Ak</externalId>
122
+ <name>Akershus</name>
123
+ </district>
124
+ <district>
125
+ <externalId>Bu</externalId>
126
+ <name>Buskerud</name>
127
+ </district>
128
+ <district>
129
+ <externalId>Fi</externalId>
130
+ <name>Finnmark</name>
131
+ </district>
132
+ <district>
133
+ <externalId>He</externalId>
134
+ <name>Hedmark</name>
135
+ </district>
136
+ <district>
137
+ <externalId>Ho</externalId>
138
+ <name>Hordaland</name>
139
+ </district>
140
+ <district>
141
+ <externalId>MR</externalId>
142
+ <name>Møre og Romsdal</name>
143
+ </district>
144
+ <district>
145
+ <externalId>No</externalId>
146
+ <name>Nordland</name>
147
+ </district>
148
+ <district>
149
+ <externalId>Op</externalId>
150
+ <name>Oppland</name>
151
+ </district>
152
+ <district>
153
+ <externalId>Os</externalId>
154
+ <name>Oslo</name>
155
+ </district>
156
+ <district>
157
+ <externalId>Ro</externalId>
158
+ <name>Rogaland</name>
159
+ </district>
160
+ <district>
161
+ <externalId>SF</externalId>
162
+ <name>Sogn og Fjordane</name>
163
+ </district>
164
+ <district>
165
+ <externalId>Te</externalId>
166
+ <name>Telemark</name>
167
+ </district>
168
+ <district>
169
+ <externalId>Tr</externalId>
170
+ <name>Troms</name>
171
+ </district>
172
+ <district>
173
+ <externalId>NT</externalId>
174
+ <name>Nord-Trøndelag</name>
175
+ </district>
176
+ <district>
177
+ <externalId>ST</externalId>
178
+ <name>Sør-Trøndelag</name>
179
+ </district>
180
+ <district>
181
+ <externalId>Ve</externalId>
182
+ <name>Vestfold</name>
183
+ </district>
184
+ <district>
185
+ <externalId>Øs</externalId>
186
+ <name>Østfold</name>
187
+ </district>
188
+ </districts>
189
+ """
@@ -14,4 +14,10 @@ Gem::Specification.new do |gem|
14
14
  gem.name = "hdo-storting-importer"
15
15
  gem.require_paths = ["lib"]
16
16
  gem.version = Hdo::StortingImporter::VERSION
17
+
18
+ gem.add_runtime_dependency "builder"
19
+ gem.add_runtime_dependency "nokogiri"
20
+ gem.add_runtime_dependency "aruba"
21
+ gem.add_runtime_dependency "rake"
22
+ gem.add_runtime_dependency "rest-client"
17
23
  end
@@ -6,10 +6,45 @@ module Hdo
6
6
  attr_reader :external_id, :name
7
7
  attr_accessor :children
8
8
 
9
+ def self.type_name
10
+ 'category'
11
+ end
12
+
13
+ def self.description
14
+ 'a parliamentary category, used to categorize issues and promises'
15
+ end
16
+
17
+ def self.fields
18
+ [
19
+ EXTERNAL_ID_FIELD,
20
+ Field.new(:name, true, :string, 'The name of the category.'),
21
+ Field.new(:subcategories, false, 'list<category>', 'A list of subcategories.'),
22
+ ]
23
+ end
24
+
25
+ def self.xml_example(builder = Util.builder)
26
+ cat = new("5", "Employment")
27
+ cat.children << new("6", "Wages")
28
+
29
+ cat.to_hdo_xml(builder)
30
+ end
31
+
32
+ #
33
+ # Deserialize from a Storting XML document (<emne_liste><emne>...</emne></emne_liste>)
34
+ #
35
+ # @return [Array<Category>]
36
+ #
37
+
9
38
  def self.from_storting_doc(doc)
10
39
  doc.css("emne_liste > emne").map { |xt| from_storting_node(xt) }
11
40
  end
12
41
 
42
+ #
43
+ # Deserialize form a Storting XML node (<emne>...</emne>)
44
+ #
45
+ # @return [Category]
46
+ #
47
+
13
48
  def self.from_storting_node(node)
14
49
  cat = Category.new node.css("id").first.text, node.css("navn").first.text
15
50
 
@@ -21,6 +56,24 @@ module Hdo
21
56
  cat
22
57
  end
23
58
 
59
+ #
60
+ # Deserialize from a HDO XML document (<categories><category>...</category></categories>)
61
+ #
62
+ # @param [Nokogiri::XML::Element]
63
+ # @return [Array<Category>]
64
+ #
65
+
66
+
67
+ def self.from_hdo_doc(doc)
68
+ doc.css("categories > category").map { |node| from_hdo_node(node) }
69
+ end
70
+
71
+ #
72
+ # Deserialize from a HDO XML node
73
+ #
74
+ # @return [Category]
75
+ #
76
+
24
77
  def self.from_hdo_node(node)
25
78
  external_id = node.css("externalId").first.text
26
79
  name = node.css("name").first.text
@@ -37,6 +90,10 @@ module Hdo
37
90
  @children = []
38
91
  end
39
92
 
93
+ #
94
+ # Serialize as HDO XML
95
+ #
96
+
40
97
  def to_hdo_xml(builder = Util.builder)
41
98
  builder.category do |cat|
42
99
  cat.externalId external_id
@@ -3,139 +3,118 @@ module Hdo
3
3
  class CLI
4
4
  def initialize(args)
5
5
  @log = Logger.new(STDERR)
6
- @cmd, @options = parse(args)
6
+ @type, @files, @options = parse(args)
7
7
  end
8
8
 
9
9
  def execute
10
- import @cmd.to_sym
10
+ case @type
11
+ when :dld_issues
12
+ puts read_dld_issues
13
+ when :dld_votes
14
+ puts read_dld_votes
15
+ when :promises
16
+ puts read_promises
17
+ else
18
+ puts read_type(@type, class_for_type(@type))
19
+ end
20
+ rescue Errno::EPIPE
21
+ # ignored
11
22
  end
12
23
 
13
24
  private
14
25
 
26
+ TYPE_TO_CLASS = {
27
+ :categories => Category,
28
+ :issues => Issue,
29
+ :districts => District,
30
+ :committees => Committee,
31
+ :parties => Party,
32
+ :representatives => Representative,
33
+ :votes => Vote
34
+ }
35
+
36
+ def class_for_type(type)
37
+ TYPE_TO_CLASS[type] or raise ArgumentError, "unknown type: #{type.inspect}"
38
+ end
39
+
40
+ def read_type(plural, klass)
41
+ objs = @files.map do |e|
42
+ doc = Nokogiri::XML.parse(File.read(e))
43
+ doc.remove_namespaces!
44
+
45
+ klass.from_storting_doc(doc)
46
+ end.flatten
47
+
48
+ str = Util.builder do |xml|
49
+ xml.instruct!
50
+ xml.__send__(plural) do |builder|
51
+ objs.each { |e| e.to_hdo_xml(builder) }
52
+ end
53
+ end
54
+
55
+ str
56
+ end
57
+
15
58
  def parse(args)
16
- options = {:source => (has_submodule? ? 'disk' : 'api')}
59
+ options = {}
17
60
 
18
61
  parser = OptionParser.new do |opt|
19
- opt.banner = "Usage: #{$0} <import-type> [options]"
20
-
62
+ types = TYPE_TO_CLASS.keys + [:dld_issues, :dld_votes, :promises]
63
+ opt.banner = "Usage: #{$0} <#{types.join '|'}> <file(s)>"
21
64
  opt.on("--help", "You're looking at it.") { puts opt; exit; }
22
- opt.on("--only-print", "Don't run import, only print generated XML.") { options[:only_print] = true }
23
- opt.on("--except WHAT", 'Ignore this comma separated list of entities from import.') { |s| options[:ignore] = s.split(",").map(&:strip).map(&:to_sym) }
24
- opt.on("--app-root APP_ROOT", 'Path to clone of git://github.com/holderdeord/hdo-site.git') { |path| options[:app_root] = path }
25
- opt.on("--app-url APP_URL", 'URL to hdo-site') { |url| options[:app_url] = url }
26
- opt.on("--source SOURCE ", 'Where to take data from [disk|api]') { |source| options[:source] = source }
27
- opt.on("--socks PROXY", 'host:port for SOCKS proxy') do |proxy|
28
- require 'socksify'
29
- host, port = proxy.split ":"
30
- TCPSocket.socks_server = host
31
- TCPSocket.socks_port = Integer(port)
32
- end
33
65
  end
34
66
 
35
67
  parser.parse!(args)
36
68
 
37
- cmd = args.shift
69
+ type, files = args
38
70
 
39
- unless cmd
71
+ if type.nil?
40
72
  puts(parser)
41
73
  exit 1
42
74
  end
43
75
 
44
- [cmd, options]
76
+ [type.to_sym, Array(files), options]
45
77
  end
46
78
 
47
- def data_source
48
- @data_source ||= (
49
- ds = if @options[:source] == "disk"
50
- DiskDataSource.new(File.join(StortingImporter.root, 'folketingparser/rawdata/data.stortinget.no'))
51
- elsif @options[:source] == "api"
52
- ApiDataSource.new("http://data.stortinget.no/")
53
- else
54
- raise ArgumentError, "invalid source: #{@options[:source].inspect}"
55
- end
56
-
57
- ParsingDataSource.new(ds)
58
- )
59
- end
79
+ def read_dld_issues
80
+ doc = Nokogiri::XML.parse(File.read(File.join(StortingImporter.root, 'data/dld-issues.xml')))
81
+ issues = Issue.from_hdo_doc doc
60
82
 
61
- def importer
62
- @importer ||= (
63
- if @options[:app_root]
64
- ScriptImporter.new(@options[:app_root])
65
- else
66
- raise ArgumentError, "app-root not given, can't import"
83
+ Util.builder do |xml|
84
+ xml.instruct!
85
+ xml.issues do |issues_builder|
86
+ issues.each { |i| i.to_hdo_xml(issues_builder) }
67
87
  end
68
- )
69
- end
70
-
71
- def import(what)
72
- case what
73
- when :dld
74
- import_dld
75
- when :promises
76
- import_promises
77
- when :all
78
- import_all
79
- else
80
- import_docs converter.xml_for(what.to_sym)
81
88
  end
82
89
  end
83
90
 
84
- def import_all
85
- ignore = Array(@options[:ignore])
86
-
87
- import_docs converter.xml_for(:parties) unless ignore.include?(:parties)
88
- import_docs converter.xml_for(:committees) unless ignore.include?(:committees)
89
- import_docs converter.xml_for(:districts) unless ignore.include?(:districts)
90
- import_docs converter.xml_for(:representatives) unless ignore.include?(:representatives)
91
- import_docs converter.xml_for(:categories) unless ignore.include?(:categories)
92
- import_docs converter.xml_for(:issues) unless ignore.include?(:issues)
93
- import_docs converter.xml_for(:votes) unless ignore.include?(:votes)
91
+ def read_dld_votes
92
+ doc = Nokogiri::XML.parse(File.read(File.join(StortingImporter.root, 'folketingparser/data/votering-2011-04-04-dld-hdo.xml')))
93
+ votes = Vote.from_hdo_doc doc
94
94
 
95
- import_dld unless ignore.include?(:dld)
96
- import_promises unless ignore.include?(:promises)
97
- end
98
-
99
- def converter
100
- @converter ||= Converter.new(data_source)
101
- end
102
-
103
- def import_dld
104
- if has_submodule?
105
- print_or_import File.read(File.join(StortingImporter.root, 'data/dld-issues.xml'))
106
- print_or_import File.read(File.join(StortingImporter.root, 'folketingparser/data/votering-2011-04-04-dld-hdo.xml'))
107
- else
108
- $stderr.puts "folketingparser not found, skipping DLD votes and issues (run `git submodule update --init` if you need this)"
109
- end
110
- end
111
-
112
- def import_promises
113
- csvs = Dir[File.join(StortingImporter.root, 'data/promises-*.csv')].sort_by { |e| File.basename(e) }
114
- csvs.each do |path|
115
- print_or_import PromiseConverter.new(path).xml
95
+ Util.builder do |xml|
96
+ xml.instruct!
97
+ xml.votes do |votes_builder|
98
+ votes.each { |v| v.to_hdo_xml(votes_builder) }
99
+ end
116
100
  end
117
101
  end
118
102
 
119
- def import_docs(docs)
120
- docs = [docs] unless docs.kind_of?(Array)
121
-
122
- docs.each do |doc|
123
- print_or_import doc.to_s
103
+ def read_promises
104
+ csvs = @files.any? ? @files : Dir[File.join(StortingImporter.root, 'data/promises-*.csv')].sort_by { |e| File.basename(e) }
105
+ content = ''
106
+ csvs.each do |csv|
107
+ content << File.read(File.expand_path(csv), encoding: "ISO-8859-1").encode("UTF-8")
124
108
  end
125
- end
126
109
 
127
- def print_or_import(xml)
128
- if @options[:only_print]
129
- puts xml
130
- else
131
- importer.import xml
110
+ Util.builder do |xml|
111
+ xml.instruct!
112
+ xml.promises do |promises|
113
+ Promise.from_csv(content).each { |e| e.to_hdo_xml(promises) }
114
+ end
132
115
  end
133
116
  end
134
117
 
135
- def has_submodule?
136
- File.exist?(File.join(StortingImporter.root, 'folketingparser/data'))
137
- end
138
-
139
118
  end
140
119
  end
141
120
  end
@@ -5,6 +5,22 @@ module Hdo
5
5
 
6
6
  attr_reader :external_id, :name
7
7
 
8
+ def self.type_name
9
+ 'committee'
10
+ end
11
+
12
+ def self.description
13
+ 'a parliamentary committe'
14
+ end
15
+
16
+ def self.xml_example(builder = Util.builder)
17
+ new("ARBSOS", "Arbeids- og sosialkomiteen").to_hdo_xml(builder)
18
+ end
19
+
20
+ def self.fields
21
+ [EXTERNAL_ID_FIELD, Field.new(:name, true, :string, 'The name of the committee.')]
22
+ end
23
+
8
24
  def self.from_storting_doc(doc)
9
25
  doc.css("komiteer_liste komite").map do |node|
10
26
  from_storting_node(node)
@@ -15,6 +31,10 @@ module Hdo
15
31
  new node.css("id").first.text, node.css("navn").first.text
16
32
  end
17
33
 
34
+ def self.from_hdo_doc(doc)
35
+ doc.css("committees > committee").map { |e| from_hdo_node e }
36
+ end
37
+
18
38
  def self.from_hdo_node(node)
19
39
  new node.css("externalId").first.text, node.css("name").first.text
20
40
  end
@@ -5,6 +5,22 @@ module Hdo
5
5
 
6
6
  attr_reader :external_id, :name
7
7
 
8
+ def self.type_name
9
+ 'district'
10
+ end
11
+
12
+ def self.description
13
+ 'an electoral district'
14
+ end
15
+
16
+ def self.xml_example(builder = Util.builder)
17
+ new("Db", "Duckburg").to_hdo_xml(builder)
18
+ end
19
+
20
+ def self.fields
21
+ [EXTERNAL_ID_FIELD, Field.new(:name, true, :string, 'The name of the electoral district.')]
22
+ end
23
+
8
24
  def self.from_storting_doc(doc)
9
25
  doc.css("fylker_liste fylke").map do |node|
10
26
  from_storting_node(node)
@@ -15,6 +31,10 @@ module Hdo
15
31
  new node.css("id").first.text, node.css("navn").first.text
16
32
  end
17
33
 
34
+ def self.from_hdo_doc(doc)
35
+ doc.css("districts > district").map { |e| from_hdo_node(e) }
36
+ end
37
+
18
38
  def self.from_hdo_node(node)
19
39
  new node.css("externalId").first.text, node.css("name").first.text
20
40
  end
@@ -1,3 +1,5 @@
1
+ # encoding: UTF-8
2
+
1
3
  module Hdo
2
4
  module StortingImporter
3
5
  class Issue
@@ -6,6 +8,48 @@ module Hdo
6
8
  attr_reader :external_id, :summary, :description, :type, :status, :last_update,
7
9
  :reference, :document_group, :committee, :categories
8
10
 
11
+ def self.type_name
12
+ 'issue'
13
+ end
14
+
15
+ def self.description
16
+ 'a parliament issue'
17
+ end
18
+
19
+ def self.example
20
+ new(
21
+ "53520",
22
+ "Inngåelse av avtale om opprettelse av sekretariatet for Den nordlige dimensjons partnerskap for helse og livskvalitet (NDPHS)",
23
+ "Samtykke til inngåelse av avtale av 25. november 2011 om opprettelse av sekretariatet for Den nordlige dimensjons partnerskap for helse og livskvalitet (NDPHS)",
24
+ "alminneligsak",
25
+ "mottatt",
26
+ "2012-04-20T00:00:00",
27
+ "Prop. 90 S (2011-2012)",
28
+ "proposisjon",
29
+ "Transport- og kommunikasjonskomiteen",
30
+ ['UTENRIKSSAKER', 'TRAKTATER', 'NORDISK SAMARBEID']
31
+ )
32
+ end
33
+
34
+ def self.xml_example(builder = Util.builder)
35
+ example.to_hdo_xml(builder)
36
+ end
37
+
38
+ def self.fields
39
+ [
40
+ EXTERNAL_ID_FIELD,
41
+ Field.new(:summary, true, :string, 'A (preferably one-line) summary of the issue.'),
42
+ Field.new(:description, true, :string, 'A longer description of the issue.'),
43
+ Field.new(:type, true, :string, 'The type of issue.'),
44
+ Field.new(:status, true, :string, 'The status of the issue.'),
45
+ Field.new(:lastUpdate, true, :string, 'The time the issue was last updated in the parliament.'),
46
+ Field.new(:reference, true, :string, 'A reference.'),
47
+ Field.new(:documentGroup, true, :string, 'What document group this issue belongs to.'),
48
+ Field.new(:committee, false, :string, "What committee this issue belongs to. Should match the 'name' field in the committee type."),
49
+ Field.new(:categories, false, 'list', "List of categories (matching the 'name' field of the category type).")
50
+ ]
51
+ end
52
+
9
53
  def self.from_storting_doc(doc)
10
54
  doc.css("saker_liste sak").map do |node|
11
55
  from_storting_node(node)
@@ -33,6 +77,13 @@ module Hdo
33
77
  end
34
78
 
35
79
  new(external_id, summary, description, type, status, last_update, reference, document_group, committee, categories)
80
+ rescue
81
+ puts lnode
82
+ raise
83
+ end
84
+
85
+ def self.from_hdo_doc(doc)
86
+ doc.css("issues > issue").map { |e| from_hdo_node(e) }
36
87
  end
37
88
 
38
89
  def self.from_hdo_node(node)