statement 1.9.9 → 2.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: b7585e5c311f08f415b5dd857db4a3868757e525
4
- data.tar.gz: 335ef89e72907c1867c3edbeb0913f027879cc39
3
+ metadata.gz: 11dcba16755ef54dff1c0c48db50aca841485abd
4
+ data.tar.gz: dda7e3b05004b1d7bf59411c902ed2a670436914
5
5
  SHA512:
6
- metadata.gz: b9c134ef362bfebf125f1f52e53ffe2c89dcd5b42eb6c58fb790489c9cc2f2d0b5460aacf2827ed1c4752b60172458c0dd848d01a0d10823dd8a83a57bf82484
7
- data.tar.gz: 390ae95ce5433578b9a761373f32ae77e98b5ba9e8a75f1677ae2e7d144c192eb2a68ad4b29328e3b61aa98aa716f343902cf987fb34482010379c328f3e843c
6
+ metadata.gz: 20f0513c7aa7a3d70b9e3c3f8d8eb6c30e66b2b88a3dec326d0516e031d3450b201e7a2db8f98b90d29a1d2fb2a384ed096d9ea9f9940cfa41abf9ef79a84590
7
+ data.tar.gz: 358a8cfa517caf462ebe20119a81a898ac46b9806b1614e2ed6f34ceabc89bcd5dd40ecf43dc38cd989f8fd17449af2a32e63c2d636cd54f92830919bf569533
data/README.md CHANGED
@@ -1,10 +1,10 @@
1
1
  # Statement
2
2
 
3
- Statement parses RSS feeds and HTML pages containing press releases and other official statements from members of Congress, and produces hashes with information about those pages. It has been tested under Ruby 1.9.2, 1.9.3 and 2.0.0.
3
+ Statement parses RSS feeds and HTML pages containing press releases and other official statements from members of Congress, and produces hashes with information about those pages. It has been tested under Ruby 1.9.3 and 2.x.
4
4
 
5
5
  ## Coverage
6
6
 
7
- Statement currently parses press releases for members of the House and Senate. For members with RSS feeds, you can pass the feed URL into Statement. For members without RSS feeds, HTML scrapers are provided, as are methods for speciality groups, such as House Republicans. Suggestions are welcomed.
7
+ Statement currently parses press releases for members of the House and Senate. For members with RSS feeds, you can pass the feed URL into Statement. For members without RSS feeds (or with broken ones), HTML scrapers are provided, as are methods for special groups, such as House Republicans. Suggestions are welcomed.
8
8
 
9
9
  ## Installation
10
10
 
@@ -28,7 +28,7 @@ $ gem install statement
28
28
 
29
29
  ## Usage
30
30
 
31
- Statement provides access to press releases, Facebook status updates and tweets from members of Congress. Most congressional offices have RSS feeds but some require HTML scraping.
31
+ Statement provides access to press releases, Facebook status updates and tweets from members of Congress. Most congressional offices have RSS feeds but some require HTML scraping.
32
32
 
33
33
  To configure Statement to pull from the Twitter and Facebook APIs, you can pass in configuration values via a hash or a `config.yml` file:
34
34
 
@@ -48,7 +48,7 @@ To parse an RSS feed, simply pass the URL to Statement's Feed class:
48
48
  ```ruby
49
49
  require 'rubygems'
50
50
  require 'statement'
51
-
51
+
52
52
  results = Statement::Feed.from_rss('http://blumenauer.house.gov/index.php?option=com_bca-rss-syndicator&feed_id=1')
53
53
  puts results.first
54
54
  {:source=>"http://blumenauer.house.gov/index.php?option=com_bca-rss-syndicator&feed_id=1", :url=>"http://blumenauer.house.gov/index.php?option=com_content&amp;view=article&amp;id=2203:blumenauer-qwe-need-a-national-system-that-speaks-to-the-transportation-challenges-of-todayq&amp;catid=66:2013-press-releases", :title=>"Blumenauer: &quot;We need a national system that speaks to the transportation challenges of ...", :date=>#<Date: 2013-04-24 ((2456407j,0s,0n),+0s,2299161j)>, :domain=>"blumenauer.house.gov"}
@@ -121,6 +121,8 @@ $ rake test
121
121
 
122
122
  ## Contributing
123
123
 
124
+ Statement would not be nearly the library it is without our contributors, and we sincerely thank them for their generosity and interest in making congressional press release data more available.
125
+
124
126
  1. Fork it
125
127
  2. Create your feature branch (`git checkout -b my-new-feature`)
126
128
  3. Commit your changes (`git commit -am 'Add some feature'`)
@@ -131,6 +133,8 @@ If you write a new scraper, please use Nokogiri for parsing - see some of the ex
131
133
 
132
134
  ## Authors
133
135
 
134
- * Derek Willis
135
- * Jacob Harris
136
-
136
+ * [Derek Willis](https://github.com/dwillis)
137
+ * [Jacob Harris](https://github.com/harrisj)
138
+ * [Mick O'Brien](https://github.com/mickaobrien)
139
+ * [Tyler Pearson](https://github.com/tylerpearson)
140
+ * [Sam Sweeney](https://github.com/shubik22)
@@ -30,9 +30,9 @@ module Statement
30
30
 
31
31
  def self.member_methods
32
32
  [:crenshaw, :capuano, :cold_fusion, :conaway, :chabot, :freshman_senators, :klobuchar, :billnelson, :crapo, :boxer,
33
- :vitter, :inhofe, :palazzo, :roe, :document_query, :swalwell, :fischer, :clark, :edwards, :culberson_chabot_grisham, :barton,
34
- :sherman_mccaul, :welch, :sessions, :gabbard, :ellison, :costa, :farr, :mcclintock, :mcnerney, :olson, :schumer, :lamborn, :walden,
35
- :bennie_thompson, :speier, :poe, :grassley]
33
+ :vitter, :inhofe, :document_query, :swalwell, :fischer, :clark, :edwards, :culberson_chabot_grisham, :barton,
34
+ :welch, :sessions, :gabbard, :costa, :farr, :mcclintock, :olson, :schumer, :lamborn, :walden,
35
+ :bennie_thompson, :speier, :poe, :grassley, :bennet, :shaheen, :keating, :drupal, :jenkins]
36
36
  end
37
37
 
38
38
  def self.committee_methods
@@ -41,21 +41,21 @@ module Statement
41
41
 
42
42
  def self.member_scrapers
43
43
  year = Date.today.year
44
- results = [crenshaw, capuano, cold_fusion(year, nil), conaway, chabot, klobuchar(year), palazzo(page=1), roe(page=1), billnelson(year=year),
45
- document_query(page=1), document_query(page=2), swalwell(page=1), crapo, boxer(start=1), grassley(page=0),
46
- vitter(year=year), inhofe(year=year), fischer, clark(year=year), edwards, culberson_chabot_grisham(page=1), barton, sherman_mccaul, welch,
47
- sessions(year=year), gabbard, ellison(page=0), costa, farr, olson, mcnerney, schumer, lamborn(limit=10), walden, bennie_thompson, speier,
48
- poe(year=year, month=0)].flatten
44
+ results = [crenshaw, capuano, cold_fusion(year, nil), conaway, chabot, klobuchar(year), billnelson(page=0),
45
+ document_query(page=1), document_query(page=2), swalwell(page=1), crapo, boxer, grassley(page=0),
46
+ vitter(year=year), inhofe(year=year), fischer, clark(year=year), edwards, culberson_chabot_grisham(page=1), barton, welch,
47
+ sessions(year=year), gabbard, costa, farr, olson, schumer, lamborn(limit=10), walden, bennie_thompson, speier,
48
+ poe(year=year, month=0), bennet(page=1), shaheen(page=1), perlmutter, keating, drupal, jenkins].flatten
49
49
  results = results.compact
50
50
  Utils.remove_generic_urls!(results)
51
51
  end
52
52
 
53
53
  def self.backfill_from_scrapers
54
54
  results = [cold_fusion(2012, 0), cold_fusion(2011, 0), cold_fusion(2010, 0), billnelson(year=2012), document_query(page=3),
55
- document_query(page=4), boxer(start=11), boxer(start=21), grassley(page=1), grassley(page=2), grassley(page=3),
56
- boxer(start=31), boxer(start=41), vitter(year=2012), vitter(year=2011), swalwell(page=2), swalwell(page=3), clark(year=2013), culberson_chabot_grisham(page=2),
57
- sherman_mccaul(page=1), sessions(year=2013), pryor(page=1), ellison(page=1), ellison(page=2), ellison(page=3), farr(year=2013), farr(year=2012), farr(year=2011),
58
- mcnerney(page=2), mcnerney(page=3), mcnerney(page=4), mcnerney(page=5), mcnerney(page=6), olson(year=2013), schumer(page=2), schumer(page=3), poe(year=2015, month=2),
55
+ document_query(page=4), grassley(page=1), grassley(page=2), grassley(page=3),
56
+ vitter(year=2012), vitter(year=2011), swalwell(page=2), swalwell(page=3), clark(year=2013), culberson_chabot_grisham(page=2),
57
+ sessions(year=2013), pryor(page=1), farr(year=2013), farr(year=2012), farr(year=2011),
58
+ olson(year=2013), schumer(page=2), schumer(page=3), poe(year=2015, month=2),
59
59
  poe(year=2015, month=1)].flatten
60
60
  Utils.remove_generic_urls!(results)
61
61
  end
@@ -391,14 +391,14 @@ module Statement
391
391
  results
392
392
  end
393
393
 
394
- def self.billnelson(year=2013)
394
+ def self.billnelson(page=0)
395
395
  results = []
396
- base_url = "http://www.billnelson.senate.gov/news/"
397
- year_url = base_url + "media.cfm?year=#{year}"
398
- doc = open_html(year_url)
396
+ url = "http://www.billnelson.senate.gov/newsroom/press-releases?page=#{page}"
397
+ doc = open_html(url)
399
398
  return if doc.nil?
400
- doc.xpath('//li').each do |row|
401
- results << { :source => year_url, :url => base_url + row.children[0]['href'], :title => row.children[0].text.strip, :date => Date.parse(row.children.last.text), :domain => "billnelson.senate.gov" }
399
+ dates = doc.xpath("//div[@class='date-box']").map{|d| Date.parse(d.children.map{|x| x.text.strip}.join(" "))}
400
+ (doc/:h3).each_with_index do |row, index|
401
+ results << { :source => url, :url => "http://www.billnelson.senate.gov" + row.children.first['href'], :title => row.children.first.text.strip, :date => dates[index], :domain => "billnelson.senate.gov" }
402
402
  end
403
403
  results
404
404
  end
@@ -451,14 +451,15 @@ module Statement
451
451
  results
452
452
  end
453
453
 
454
- def self.boxer(start=1)
454
+ def self.boxer
455
455
  results = []
456
- url = "http://www.boxer.senate.gov/en/press/releases.cfm?start=#{start}"
456
+ url = "http://www.boxer.senate.gov/press/release"
457
457
  domain = 'www.boxer.senate.gov'
458
458
  doc = open_html(url)
459
459
  return if doc.nil?
460
- doc.xpath("//div[@class='left']")[1..-1].each do |row|
461
- results << { :source => url, :url => domain + row.next.next.children[1].children[0]['href'], :title => row.next.next.children[1].children[0].text, :date => Date.parse(row.text.strip), :domain => domain}
460
+ doc.css("tr")[1..-1].each do |row|
461
+ next if row.children[1].text == "Sat, January 1st 0000 "
462
+ results << { :source => url, :url => "http://"+domain + row.children[3].children[1]['href'], :title => row.children[3].children[1].text.strip, :date => Date.parse(row.children[1].text), :domain => domain}
462
463
  end
463
464
  results
464
465
  end
@@ -505,30 +506,6 @@ module Statement
505
506
  results
506
507
  end
507
508
 
508
- def self.palazzo(page=1)
509
- results = []
510
- domain = "palazzo.house.gov"
511
- url = "http://palazzo.house.gov/news/documentquery.aspx?DocumentTypeID=2519&Page=#{page}"
512
- doc = open_html(url)
513
- return if doc.nil?
514
- doc.xpath("//div[@class='middlecopy']//li").each do |row|
515
- results << { :source => url, :url => "http://palazzo.house.gov/news/" + row.children[1]['href'], :title => row.children[1].text.strip, :date => Date.parse(row.children[3].text.strip), :domain => domain }
516
- end
517
- results
518
- end
519
-
520
- def self.roe(page=1)
521
- results = []
522
- domain = 'roe.house.gov'
523
- url = "http://roe.house.gov/news/documentquery.aspx?DocumentTypeID=1532&Page=#{page}"
524
- doc = open_html(url)
525
- return if doc.nil?
526
- doc.xpath("//div[@class='middlecopy']//li").each do |row|
527
- results << { :source => url, :url => "http://roe.house.gov/news/" + row.children[1]['href'], :title => row.children[1].text.strip, :date => Date.parse(row.children[3].text.strip), :domain => domain }
528
- end
529
- results
530
- end
531
-
532
509
  def self.clark(year=Date.today.year)
533
510
  results = []
534
511
  domain = 'katherineclark.house.gov'
@@ -596,22 +573,6 @@ module Statement
596
573
  results
597
574
  end
598
575
 
599
- def self.sherman_mccaul(page=0)
600
- results = []
601
- domains = ['sherman.house.gov', 'mccaul.house.gov']
602
- domains.each do |domain|
603
- url = "http://#{domain}/media-center/press-releases?page=#{page}"
604
- doc = open_html(url)
605
- return if doc.nil?
606
- dates = doc.xpath('//span[@class="field-content"]').map {|s| s.text if s.text.strip.include?("201")}.compact!
607
- (doc/:h3).first(10).each_with_index do |row, i|
608
- date = Date.parse(dates[i])
609
- results << {:source => url, :url => "http://"+domain+row.children.first['href'], :title => row.children.first.text.strip, :date => date, :domain => domain}
610
- end
611
- end
612
- results.flatten
613
- end
614
-
615
576
  def self.welch
616
577
  results = []
617
578
  domain = 'welch.house.gov'
@@ -636,19 +597,6 @@ module Statement
636
597
  results
637
598
  end
638
599
 
639
- def self.ellison(page=0)
640
- results = []
641
- domain = 'ellison.house.gov'
642
- url = "http://ellison.house.gov/media-center/press-releases?page=#{page}"
643
- doc = open_html(url)
644
- return if doc.nil?
645
- doc.xpath("//div[@class='views-field views-field-created datebar']").each do |row|
646
- next if row.nil?
647
- results << { :source => url, :url => "http://ellison.house.gov" + row.next.next.children[1].children[0]['href'], :title => row.next.next.children[1].children[0].text.strip, :date => Date.parse(row.text.strip), :domain => domain}
648
- end
649
- results
650
- end
651
-
652
600
  def self.costa
653
601
  results = []
654
602
  domain = 'costa.house.gov'
@@ -701,21 +649,9 @@ module Statement
701
649
  results
702
650
  end
703
651
 
704
- def self.mcnerney(page=1)
705
- results = []
706
- domain = 'mcnerney.house.gov'
707
- url = "http://mcnerney.house.gov/media-center/press-releases"
708
- doc = open_html(url)
709
- return if doc.nil?
710
- doc.xpath("//div[@class='views-field views-field-title']").each do |row|
711
- results << {:source => url, :url => 'http://mcnerney.house.gov' + row.children[1].children[0]['href'], :title => row.children[1].children[0].text.strip, :date => Date.parse(row.next.next.text.strip), :domain => domain }
712
- end
713
- results
714
- end
715
-
716
652
  def self.document_query(page=1)
717
653
  results = []
718
- domains = [{"thornberry.house.gov" => 1776}, {"wenstrup.house.gov" => 2491}, {"clawson.house.gov" => 2641}]
654
+ domains = [{"thornberry.house.gov" => 1776}, {"wenstrup.house.gov" => 2491}, {"clawson.house.gov" => 2641}, {"palazzo.house.gov" => 2519}, {"roe.house.gov" => 1532}, {"perry.house.gov" => 2608}, {"rodneydavis.house.gov" => 2427}, {"kevinbrady.house.gov" => 2657}]
719
655
  domains.each do |domain|
720
656
  doc = open_html("http://"+domain.keys.first+"/news/documentquery.aspx?DocumentTypeID=#{domain.values.first}&Page=#{page}")
721
657
  return if doc.nil?
@@ -739,6 +675,31 @@ module Statement
739
675
  results
740
676
  end
741
677
 
678
+ def self.bennet(page=1)
679
+ results = []
680
+ domain = 'www.bennet.senate.gov'
681
+ url = "http://www.bennet.senate.gov/?p=releases&pg=#{page}"
682
+ doc = open_html(url)
683
+ return if doc.nil?
684
+ (doc/:h2).each do |row|
685
+ results << {:source => url, :url => 'http://www.bennet.senate.gov' + row.children.first['href'], :title => row.text.strip, :date => Date.parse(row.previous.previous.text), :domain => domain }
686
+ end
687
+ results
688
+ end
689
+
690
+ def self.shaheen(page=1)
691
+ results = []
692
+ domain = 'www.shaheen.senate.gov'
693
+ url = "http://www.shaheen.senate.gov/news/press/index.cfm?PageNum_rs=#{page}"
694
+ doc = open_html(url)
695
+ return if doc.nil?
696
+ (doc/:ul)[3].children.each do |row|
697
+ next if row.text.strip == ''
698
+ results << {:source => url, :url => row.children[2].children[0]['href'], :title => row.children[2].text.strip, :date => Date.parse(row.children.first.text), :domain => domain }
699
+ end
700
+ results
701
+ end
702
+
742
703
  def self.lamborn(limit=nil)
743
704
  results = []
744
705
  domain = 'lamborn.house.gov'
@@ -756,6 +717,18 @@ module Statement
756
717
  results
757
718
  end
758
719
 
720
+ def self.jenkins
721
+ results = []
722
+ domain = 'lynnjenkins.house.gov/'
723
+ url = "http://lynnjenkins.house.gov/index.cfm?sectionid=186"
724
+ doc = open_html(url)
725
+ return if doc.nil?
726
+ doc.xpath("//ul[@class='sectionitems']//li").each do |row|
727
+ results << {:source => url, :url => 'http://lynnjenkins.house.gov' + row.children[3].children[1]['href'], :title => row.children[3].text.strip, :date => Date.parse(row.children[5].text), :domain => domain }
728
+ end
729
+ results
730
+ end
731
+
759
732
  def self.walden
760
733
  results = []
761
734
  domain = 'walden.house.gov'
@@ -812,5 +785,84 @@ module Statement
812
785
 
813
786
  end
814
787
 
788
+ def self.perlmutter
789
+ results = []
790
+ domain = "perlmutter.house.gov"
791
+ url = "http://#{domain}/index.php/media-center/press-releases-86821"
792
+ doc = open_html(url)
793
+ return if doc.nil?
794
+
795
+ doc.css("#adminForm tr")[0..-1].each do |row|
796
+ results << { :source => url, :url => "http://" + domain + row.children[1].children[1]['href'], :title => row.children[1].children[1].text.strip, :date => Date.parse(row.children[3].text), :domain => domain}
797
+ end
798
+ results
799
+ end
800
+
801
+ def self.keating
802
+ results = []
803
+ domain = "keating.house.gov"
804
+ source_url = "http://#{domain}/index.php?option=com_content&view=category&id=14&Itemid=13"
805
+ doc = open_html(source_url)
806
+ return if doc.nil?
807
+
808
+ doc.css("#adminForm tr")[0..-1].each do |row|
809
+ url = 'http://' + domain + row.children[1].children[1]['href']
810
+ title = row.children[1].children[1].text.strip
811
+ results << { :source => source_url, :url => url, :title => title, :date => Date.parse(row.children[3].text), :domain => domain}
812
+ end
813
+ results
814
+ end
815
+
816
+ def self.drupal(urls=[], page=0)
817
+ if urls.empty?
818
+ urls = [
819
+ "http://sherman.house.gov/media-center/press-releases",
820
+ "http://mccaul.house.gov/media-center/press-releases",
821
+ "https://ellison.house.gov/media-center/press-releases",
822
+ "http://mcnerney.house.gov/media-center/press-releases",
823
+ "http://sanford.house.gov/media-center/press-releases",
824
+ "http://butterfield.house.gov/media-center/press-releases",
825
+ "http://walz.house.gov/media-center/press-releases",
826
+ "https://pingree.house.gov/media-center/press-releases",
827
+ "http://sarbanes.house.gov/media-center/press-releases",
828
+ "http://wilson.house.gov/media-center/press-releases",
829
+ "https://bilirakis.house.gov/press-releases",
830
+ "http://quigley.house.gov/media-center/press-releases"
831
+ ]
832
+ end
833
+
834
+ results = []
835
+
836
+ urls.each do |url|
837
+ source_url = "#{url}?page=#{page}"
838
+
839
+ domain = URI.parse(source_url).host
840
+ doc = open_html(source_url)
841
+ return if doc.nil?
842
+
843
+ doc.css("#region-content .views-row").each do |row|
844
+ title_anchor = row.css("h3 a")
845
+ title = title_anchor.text
846
+ release_url = "http://#{domain + title_anchor.attr('href')}"
847
+ raw_date = row.css(".views-field-created").text
848
+ results << { :source => source_url,
849
+ :url => release_url,
850
+ :title => title,
851
+ :date => begin Date.parse(raw_date) rescue nil end,
852
+ :domain => domain }
853
+ end
854
+
855
+ # mike quigley's release page doesn't have dates, so we fetch those individually
856
+ if url == "http://quigley.house.gov/media-center/press-releases"
857
+ results.select{|r| r[:source] == source_url}.each do |result|
858
+ doc = open_html(result[:url])
859
+ result[:date] = Date.parse(doc.css(".pane-content").children[0].text.strip)
860
+ end
861
+ end
862
+ end
863
+ results
864
+ end
865
+
815
866
  end
867
+
816
868
  end
@@ -1,3 +1,3 @@
1
1
  module Statement
2
- VERSION = "1.9.9"
2
+ VERSION = "2.0"
3
3
  end
data/scraper_guide.md ADDED
@@ -0,0 +1,49 @@
1
+ ## Contributing Scrapers
2
+
3
+ Some members of Congress either don't have RSS feeds of their press releases, or the ones they have are broken. That's where scraping comes in. Unfortunately, members also tend to change the layouts of their sites more often than you might think, so it's not always a matter of writing a single scraper and forgetting about it.
4
+
5
+ That doesn't mean that writing member-specific scrapers is particularly difficult. Many lawmakers have similar sites, so you can either build off an existing scraper or even add to an existing one. Here's the basic process:
6
+
7
+ ### Setup
8
+
9
+ 1. Ruby: if you don't have it, install Ruby (version 2.x) and run `gem install bundler` from the command line.
10
+ 2. Fork the [repository](https://github.com/TheUpshot/statement) and clone it to a directory on your computer.
11
+ 3. cd into that directory and run `bundle install` to install the gems used by Statement.
12
+ 4. Enter the Ruby console by typing `irb` and then require the libraries we'll need:
13
+
14
+ ```ruby
15
+ require 'uri'
16
+ require 'open-uri'
17
+ require 'american_date'
18
+ require 'nokogiri'
19
+ ```
20
+ Then pick a lawmaker that needs a scraper written from [our issues page](https://github.com/TheUpshot/statement/issues).
21
+
22
+ ### Scraper Design
23
+
24
+ Most lawmakers have press release sections of their sites that display the date, title and link of a press release. Take Barbara Boxer, the California Democratic senator. Her [press release page](http://www.boxer.senate.gov/press/release/) is somewhat typical in that it features a table of releases, 10 to a page. The goal is to scrape that page, and optionally others if the site is paginated (most congressional press release sites are), and to build an Array of Ruby hashes that contain each release's url, date and title, along with two other piece of information: the source page of press release urls and the domain of the site (which helps to identify the lawmaker).
25
+
26
+ To do this, we use Nokogiri, a Ruby HTML and XML parser, rather than regular expressions. One of Nokogiri's strengths is that it can parse HTML documents based on CSS classes, XPath or via HTML entity search. Statement has a helper method, `open_html`, that loads the press release url into Nokigiri's parser. Senator Boxer's scraper might look like this:
27
+
28
+ ```ruby
29
+ def self.boxer
30
+ results = []
31
+ url = "http://www.boxer.senate.gov/press/release"
32
+ domain = 'www.boxer.senate.gov'
33
+ doc = open_html(url)
34
+ return if doc.nil?
35
+ doc.css("tr")[1..-1].each do |row|
36
+ results << { :source => url, :url => "http://"+domain + row.children[3].children[1]['href'], :title => row.children[3].children[1].text.strip, :date => Date.parse(row.children[1].text), :domain => domain}
37
+ end
38
+ results
39
+ end
40
+ ```
41
+ For the first row that would produce the following hash:
42
+
43
+ ```ruby
44
+ => {:source=>"http://www.boxer.senate.gov/press/release", :url=>"http://www.boxer.senate.gov/press/release/boxer-feinstein-colleagues-introduces-bill-in-support-of-positive-train-control/", :title=>"Boxer, Feinstein, Colleagues Introduces Bill in Support of Positive Train Control", :date=><Date: 2015-04-17 ((2457130j,0s,0n),+0s,2299161j)>, :domain=>"www.boxer.senate.gov"}
45
+ ```
46
+
47
+ For people new to Nokogiri, perhaps the hardest part is navigating its nodes - a `tr` node will have children `td` nodes, for example. The best advice we can provide is to spend time in the console trying to navigate up and down an HTML document's nodes. Calling the `text` method on any Nokogiri object will print its contents.
48
+
49
+ The best advice is to work off an existing [member scraper](https://github.com/TheUpshot/statement/blob/master/lib/statement/scraper.rb). You don't need to write anything except the scraper method; we'll take care of the rest once you submit your pull request.