wriggler 1.0.0 → 1.1.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: b1d083c4ddf26508f6ca9b68f5574bf1491eba46
4
- data.tar.gz: 2f347c815b35996afbec369bc0f39eb56cb2c739
3
+ metadata.gz: a0bec94fbdd0a26ab8850e55099a46cee2a7a7a3
4
+ data.tar.gz: a3213bec797d9e3b9f6de806289f2f6e3008a36e
5
5
  SHA512:
6
- metadata.gz: b3fab4d15b405776477b04317e34f757971ab4f6c14cb1041ba2040e513bbcb8a0cc837427372843e77365e322d82ae72f44aa54fbd0c724fa5e224c276b48b9
7
- data.tar.gz: 8d8a1aace53b38470d54f0dade3708e0645211aaaa25d6e09f78175bccb93c9cf24af2e44735e3311b929c98ff0a2cf34a9194b26342b063579921a4c896bc99
6
+ metadata.gz: 4dafba93323d6f876b9b80dcc103c908df0d8b87ca0bd40d3c7209d2dfb81d852d039f546976fe2cc2e23168a7ab87fc2c2511a6c179b83dbc1f7b39ad8f6589
7
+ data.tar.gz: 03f53045bc46845145ddc0642e3d76f3df98ed33d9a0d5ca8c4b76337ccef52d78c2e3ab69058517005ddb5d2a7ca8f9a02129f617be1f7dc759d93799e4f93d
data/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # Wriggler
2
2
 
3
- Wriggler was created to serve and the crawler for a search engine, moving its way through HTML and/or XML files and grabbing data based on pre determined tags then making/storing the data in a specifically created CSV file. Wriggler acts similarly t0 a spider, but was designed to be used with any number of local files, not as an actual web scraper.
3
+ Wriggler was created to serve and the crawler for a search engine, moving its way through HTML and/or XML files and grabbing data based on pre determined tags then exporting it in a manipulatable format. Wriggler acts similarly to a spider, but was designed to be used with any number of local files, not as an actual web scraper.
4
4
 
5
5
  ## Installation
6
6
 
@@ -23,14 +23,28 @@ Or install it yourself as:
23
23
  You only need to run one command to use Wriggler, run:
24
24
 
25
25
  ```ruby
26
- Wriggler.crawl([array, of, HTML/XML, tags], directory)
26
+ Wriggler.crawl(["array", "of", "HTML/XML", "tags"], directory)
27
27
  ```
28
28
 
29
- Note: The directory in this should be the top level directory that your HTML/XML files are in. Wriggler will account for any nested directories within this directory that also contain HTML/XML files and at the end of it running will save a new file named "tag_content.csv" to this directory
29
+ Note: The directory in this should be the top level directory that your HTML/XML files are in. Wriggler will account for any nested directories within this directory that also contain HTML/XML files. At the end you will have a data structure that resembles this:
30
+
31
+ ```ruby
32
+ ===============
33
+ Files Found: 2
34
+ ===============
35
+ content = {
36
+ tag1: ["Content", "Found", "in", "the", "First", "Opened", "File"], ["Content", "Found", "in", "the", "Second", "Opened", "File"]
37
+ tag2: [], []
38
+ tag3: ["Content", "Found", "in", "the", "First", "Opened", "File"], []
39
+ tag4: [], ["Content", "Found", "in", "the", "Second", "Opened", "File"]
40
+ }
41
+ ```
42
+
43
+ Where tag2 has no content found between both files, tag3 only found content in the first of the two files, tag4 only found content in the second of two files, and tag1 found content in both.
30
44
 
31
45
  ## Contributing
32
46
 
33
- Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/wriggler. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](contributor-covenant.org) code of conduct.
47
+ Bug reports and pull requests are welcome on GitHub at https://github.com/elliottayoung/wriggler. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant](contributor-covenant.org) code of conduct.
34
48
 
35
49
  On top of that, please contribute. I built this for a very specific reason, but I would very much like to see it become something bigger, so if you can assist with that please do!
36
50
 
data/bin/console CHANGED
@@ -2,6 +2,7 @@
2
2
 
3
3
  require "bundler/setup"
4
4
  require "wriggler"
5
+ require "awesome_print"
5
6
 
6
7
  # You can add fixtures and/or initialization code here to make experimenting
7
8
  # with your gem easier. You can also use a different console, if you like.
@@ -0,0 +1 @@
1
+ <test>If this appears it works</test>
@@ -0,0 +1,5 @@
1
+ character,test,name,sitcom
2
+ "[""Al Bundy"", ""Bud Bundy"", ""Marcy Darcy"", ""Larry Appleton"", ""Balki Bartokomous"", ""John 'Hannibal' Smith"", ""Templeton 'Face' Peck"", ""'B.A.' Baracus"", ""'Howling Mad' Murdock""]","[""Al Bundy"", ""Bud Bundy"", ""Marcy Darcy"", ""Larry Appleton"", ""Balki Bartokomous"", ""John 'Hannibal' Smith"", ""Templeton 'Face' Peck"", ""'B.A.' Baracus"", ""'Howling Mad' Murdock""]","[""Al Bundy"", ""Bud Bundy"", ""Marcy Darcy"", ""Larry Appleton"", ""Balki Bartokomous"", ""John 'Hannibal' Smith"", ""Templeton 'Face' Peck"", ""'B.A.' Baracus"", ""'Howling Mad' Murdock""]"
3
+ "[""If this appears it works""]","[""This is different""]"
4
+ "[""Married with Children"", ""Perfect Strangers"", ""The A-Team""]","[""Married with Children"", ""Perfect Strangers"", ""The A-Team""]","[""Married with Children"", ""Perfect Strangers"", ""The A-Team""]"
5
+ "[""This is different\n Married with Children\n \n Al Bundy\n Bud Bundy\n Marcy Darcy\n \n "", ""Perfect Strangers\n \n Larry Appleton\n Balki Bartokomous\n \n ""]","[""Married with Children\n \n Al Bundy\n Bud Bundy\n Marcy Darcy\n \n "", ""Perfect Strangers\n \n Larry Appleton\n Balki Bartokomous\n \n ""]","[""Married with Children\n \n Al Bundy\n Bud Bundy\n Marcy Darcy\n \n "", ""Perfect Strangers\n \n Larry Appleton\n Balki Bartokomous\n \n ""]"
data/dirtest/test1.xml ADDED
@@ -0,0 +1,31 @@
1
+ <root>
2
+ <sitcoms>
3
+ <sitcom>
4
+ <test>This is different</test>
5
+ <name>Married with Children</name>
6
+ <characters>
7
+ <character>Al Bundy</character>
8
+ <character>Bud Bundy</character>
9
+ <character>Marcy Darcy</character>
10
+ </characters>
11
+ </sitcom>
12
+ <sitcom>
13
+ <name>Perfect Strangers</name>
14
+ <characters>
15
+ <character>Larry Appleton</character>
16
+ <character>Balki Bartokomous</character>
17
+ </characters>
18
+ </sitcom>
19
+ </sitcoms>
20
+ <dramas>
21
+ <drama>
22
+ <name>The A-Team</name>
23
+ <characters>
24
+ <character>John "Hannibal" Smith</character>
25
+ <character>Templeton "Face" Peck</character>
26
+ <character>"B.A." Baracus</character>
27
+ <character>"Howling Mad" Murdock</character>
28
+ </characters>
29
+ </drama>
30
+ </dramas>
31
+ </root>
data/dirtest/test2.xml ADDED
@@ -0,0 +1,30 @@
1
+ <root>
2
+ <sitcoms>
3
+ <sitcom>
4
+ <name>Married with Children</name>
5
+ <characters>
6
+ <character>Al Bundy</character>
7
+ <character>Bud Bundy</character>
8
+ <character>Marcy Darcy</character>
9
+ </characters>
10
+ </sitcom>
11
+ <sitcom>
12
+ <name>Perfect Strangers</name>
13
+ <characters>
14
+ <character>Larry Appleton</character>
15
+ <character>Balki Bartokomous</character>
16
+ </characters>
17
+ </sitcom>
18
+ </sitcoms>
19
+ <dramas>
20
+ <drama>
21
+ <name>The A-Team</name>
22
+ <characters>
23
+ <character>John "Hannibal" Smith</character>
24
+ <character>Templeton "Face" Peck</character>
25
+ <character>"B.A." Baracus</character>
26
+ <character>"Howling Mad" Murdock</character>
27
+ </characters>
28
+ </drama>
29
+ </dramas>
30
+ </root>
data/dirtest/test3.xml ADDED
@@ -0,0 +1,30 @@
1
+ <root>
2
+ <sitcoms>
3
+ <sitcom>
4
+ <name>Married with Children</name>
5
+ <characters>
6
+ <character>Al Bundy</character>
7
+ <character>Bud Bundy</character>
8
+ <character>Marcy Darcy</character>
9
+ </characters>
10
+ </sitcom>
11
+ <sitcom>
12
+ <name>Perfect Strangers</name>
13
+ <characters>
14
+ <character>Larry Appleton</character>
15
+ <character>Balki Bartokomous</character>
16
+ </characters>
17
+ </sitcom>
18
+ </sitcoms>
19
+ <dramas>
20
+ <drama>
21
+ <name>The A-Team</name>
22
+ <characters>
23
+ <character>John "Hannibal" Smith</character>
24
+ <character>Templeton "Face" Peck</character>
25
+ <character>"B.A." Baracus</character>
26
+ <character>"Howling Mad" Murdock</character>
27
+ </characters>
28
+ </drama>
29
+ </dramas>
30
+ </root>
@@ -0,0 +1,7 @@
1
+ <div id = "buttons">
2
+ <button id="bye">Bye</button>
3
+ <button id="hello">Hello</button>
4
+ </div>
5
+ <div>
6
+ <p>1: <span id="greeting">Greeting</span></p>
7
+ </div>
@@ -1,3 +1,3 @@
1
1
  module Wriggler
2
- VERSION = "1.0.0"
2
+ VERSION = "1.1.0"
3
3
  end
data/lib/wriggler.rb CHANGED
@@ -10,7 +10,7 @@ module Wriggler
10
10
  @directory = directory #Current top-level directory
11
11
 
12
12
  navigate_directory
13
- Writer.write(@content)
13
+ @content
14
14
  end
15
15
 
16
16
  private
@@ -27,6 +27,9 @@ module Wriggler
27
27
  Find.find(@directory) do |file|
28
28
  file_array << file if file.match(/\.xml\Z/) || file.match(/\.html\Z/)
29
29
  end
30
+ puts "==============="
31
+ puts "Files Found: #{file_array.length}"
32
+ puts "==============="
30
33
  file_array
31
34
  end
32
35
 
@@ -71,14 +74,14 @@ module Wriggler
71
74
  end
72
75
 
73
76
  def self.crawl_file(doc)
74
- #Crawl the Nokogiri Object for the file
75
- @content.each_key do |key|
77
+ #Crawl the Nokogiri Object for the file
78
+ @content.each_key do |key|
76
79
  arr = []
77
- if !doc.xpath("//#{key}").empty? #Returns an empty array if tag is not present
78
- doc.xpath("//#{key}").map{ |tag| arr << sanitize(tag.text) }
79
- end
80
- fill_content(arr, key)
81
- end
80
+ if !doc.xpath("//#{key}").empty? #Returns an empty array if tag is not present
81
+ doc.xpath("//#{key}").map{ |tag| arr << sanitize(tag.text) }
82
+ end
83
+ @content.fetch(key) << arr
84
+ end
82
85
  end
83
86
 
84
87
  def self.sanitize(text)
@@ -86,24 +89,4 @@ module Wriggler
86
89
  text.gsub(/"/, "'").lstrip.chomp
87
90
  end
88
91
 
89
- def self.fill_content(arr, key)
90
- #Doesn't shovel if there is no content found for the specific tag
91
- !arr.empty? ? (@content.fetch(key) << arr) : nil
92
- end
93
- end
94
-
95
- require 'CSV'
96
-
97
- module Writer
98
- def self.write(content)
99
- #Write to a CSV file now
100
- column_names = content.keys
101
- s = CSV.generate do |csv|
102
- csv << column_names
103
- content.keys.each do |key|
104
- csv << content.fetch(key)
105
- end
106
- end
107
- File.write('tag_content.csv', s)
108
- end
109
92
  end
data/wriggler.gemspec CHANGED
@@ -10,7 +10,7 @@ Gem::Specification.new do |spec|
10
10
  spec.email = ["elliott.a.young@gmail.com"]
11
11
 
12
12
  spec.summary = "A Gem designed to crawl through a local directory of HTML/XML files and pull out content based on pre-specified tag"
13
- spec.description = "A Gem designed to crawl through a local directory of HTML/XML files and pull out content based on pre-specified tag, which will then be saved to a CSV file and exported. Originally designed to feed an indexer."
13
+ spec.description = "A Gem designed to crawl through a local directory of HTML/XML files and pull out content based on pre-specified tag, which will be exported as a manipulatable object"
14
14
  spec.homepage = "https://github.com/ElliottAYoung/wriggler"
15
15
  spec.license = "MIT"
16
16
 
@@ -31,4 +31,5 @@ Gem::Specification.new do |spec|
31
31
  spec.add_development_dependency "rake", "~> 10.0"
32
32
  spec.add_development_dependency "rspec"
33
33
  spec.add_development_dependency "nokogiri"
34
+ spec.add_development_dependency "awesome_print"
34
35
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: wriggler
3
3
  version: !ruby/object:Gem::Version
4
- version: 1.0.0
4
+ version: 1.1.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Elliott Young
@@ -66,9 +66,23 @@ dependencies:
66
66
  - - '>='
67
67
  - !ruby/object:Gem::Version
68
68
  version: '0'
69
+ - !ruby/object:Gem::Dependency
70
+ name: awesome_print
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - '>='
74
+ - !ruby/object:Gem::Version
75
+ version: '0'
76
+ type: :development
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - '>='
81
+ - !ruby/object:Gem::Version
82
+ version: '0'
69
83
  description: A Gem designed to crawl through a local directory of HTML/XML files and
70
- pull out content based on pre-specified tag, which will then be saved to a CSV file
71
- and exported. Originally designed to feed an indexer.
84
+ pull out content based on pre-specified tag, which will be exported as a manipulatable
85
+ object
72
86
  email:
73
87
  - elliott.a.young@gmail.com
74
88
  executables: []
@@ -85,6 +99,12 @@ files:
85
99
  - Rakefile
86
100
  - bin/console
87
101
  - bin/setup
102
+ - dirtest/nested_fldr/test5.xml
103
+ - dirtest/tag_content.csv
104
+ - dirtest/test1.xml
105
+ - dirtest/test2.xml
106
+ - dirtest/test3.xml
107
+ - dirtest/test4.html
88
108
  - lib/wriggler.rb
89
109
  - lib/wriggler/version.rb
90
110
  - wriggler.gemspec
@@ -115,4 +135,3 @@ specification_version: 4
115
135
  summary: A Gem designed to crawl through a local directory of HTML/XML files and pull
116
136
  out content based on pre-specified tag
117
137
  test_files: []
118
- has_rdoc: