bookmark_machine 0.0.1

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 3f5f66298a831b464d86c98800b1ab5e71ad8e39
4
+ data.tar.gz: a0c95f2b85fc13d8554d37ad781012dbddac5ec8
5
+ SHA512:
6
+ metadata.gz: 27ab83813bde72786e43cb1fcf93445397ba2599dfb8fdb6f9f34d3520368612e8ed93e2bb0e0fc687044c5440e32763a645a9e2a8e6430d731f9b6256b31ece
7
+ data.tar.gz: f0b36f19b94810551431812fa40ad7cdd068f9be32bf8f820df6c8dab161c06ff858d2ddbf4b0ec9473fa41645ddb2db995e47b4bc5602afa47fea71934e00cd
@@ -0,0 +1,54 @@
1
+ BookmarkMachine
2
+ ===============
3
+
4
+ Parses Netscape bookmark files. These are for better or worse the standard bookmark file format used by every browser. This also happens to be roughly the file format that Delicious uses for liberating your bookmarks from their once adequate system.
5
+
6
+ Usage
7
+ -----
8
+
9
+ Pass `BookmarkMachine::NetscapeParser` a string, and ask it for bookmarks:
10
+
11
+ ```
12
+ html = IO.read("bookmarks.html")
13
+ parser = BookmarkMachine::NetscapeParser.new(html)
14
+
15
+ bookmarks = parser.bookmarks
16
+ ```
17
+
18
+ To format a collection of bookmarks, create `BookmarkMachine::NetscapeFormatter` with a collection of bookmarks:
19
+
20
+ ```
21
+ bookmarks = [
22
+ Bookmark.new("http://example.com", name: "Example")
23
+ ]
24
+
25
+ formatter = BookmarkMachine::NetscapeFormatter.new(bookmarks)
26
+ html = formatter.to_s
27
+ ```
28
+
29
+ BookmarkMachine represents bookmarks as an object containing:
30
+
31
+ * `url` - Bookmark's url
32
+ * `name` - Page's name, defaults to an empty string
33
+ * `created_at` - `Time` the bookmark was created
34
+ * `updated_at` - `Time` the bookmark was last updated
35
+ * `icon` - Either a data uri or url to an icon
36
+ * `folders` - An `Array` of the bookmark's parent folder names
37
+ * `tags` - An `Array` of tags (less common)
38
+ * `description` - An extended description of the bookmark
39
+
40
+ Any fields that aren't present will be `nil` unless otherwise noted.
41
+
42
+ Warning
43
+ -------
44
+
45
+ Honestly there are alternatives like [markio](https://github.com/spajus/markio) that are probably better. About half of this project was an exercise in doing some TDD. That said, it does support outputting nested folders if that's important to you.
46
+
47
+ TODO
48
+ ----
49
+
50
+ It might be fun (in the questionable sense) to support other linked formats including Atom feeds and so forth.
51
+
52
+ Contact
53
+ -------
54
+ Adam Sanderson, netghost@gmail.com | http://monkeyandcrow.com
@@ -0,0 +1,139 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ # This is a quick an dirty script for anonymizing HTML while keeping the
4
+ # same general shape.
5
+ #
6
+ # Doing this requires a handful of dirty tricks, and the output is not
7
+ # quite the same, but Nokogiri will roundtrip it back to the same structure
8
+ # anyways.
9
+ #
10
+ # It's actually pretty amazing that browsers all implement this wickedly
11
+ # broken format.
12
+
13
+ require 'nokogiri'
14
+ require 'base64'
15
+
16
+ # A useful sampling of words; the original ipsum lorum by Lewis Carroll
17
+ #
18
+ WORDS = <<-JABBERWOCKY.split(/[\s\W]+/m).reject{|w| w =~ /\s+/ }.map(&:downcase)
19
+ Twas brillig, and the slithy toves
20
+ Did gyre and gimble in the wabe:
21
+ All mimsy were the borogoves,
22
+ And the mome raths outgrabe.
23
+
24
+ Beware the Jabberwock, my son!
25
+ The jaws that bite, the claws that catch!
26
+ Beware the Jubjub bird, and shun
27
+ The frumious Bandersnatch!
28
+ He took his vorpal sword in hand:
29
+ Long time the manxome foe he sought --
30
+ So rested he by the Tumtum tree,
31
+ And stood awhile in thought.
32
+ And, as in uffish thought he stood,
33
+ The Jabberwock, with eyes of flame,
34
+ Came whiffling through the tulgey wood,
35
+ And burbled as it came!
36
+ One, two! One, two! And through and through
37
+ The vorpal blade went snicker-snack!
38
+ He left it dead, and with its head
39
+ He went galumphing back.
40
+ And, has thou slain the Jabberwock?
41
+ Come to my arms, my beamish boy!
42
+ O frabjous day! Callooh! Callay!'
43
+ He chortled in his joy.
44
+
45
+ Twas brillig, and the slithy toves
46
+ Did gyre and gimble in the wabe;
47
+ All mimsy were the borogoves,
48
+ And the mome raths outgrabe.
49
+
50
+ --Lewis Carroll
51
+
52
+ JABBERWOCKY
53
+
54
+ # Substitue attributes for the given element type in the document.
55
+ def sub_attr(tag, attr)
56
+ @doc.css("#{tag}[#{attr}]").each do |el|
57
+ el[attr] = yield el[attr]
58
+ end
59
+ end
60
+
61
+ # Substitue text for the given element type in the document.
62
+ def sub_text(tag)
63
+ @doc.css(tag).each do |el|
64
+ el.content = yield el.content
65
+ end
66
+ end
67
+
68
+ # Generate a fake url.
69
+ def fake_url
70
+ protocol = rand(10) > 8 ? "http" : "https"
71
+ domain = (rand(3)+2).times.map{ rand(2 ** 12).to_s(32) }.join(".")
72
+ path = words(rand(5)).join("/")
73
+
74
+ "#{protocol}://#{domain}/#{path}"
75
+ end
76
+
77
+ # Generates a URL safe base64 encoded String.
78
+ def fake_base_64
79
+ Base64.urlsafe_encode64(words(20).join)
80
+ end
81
+
82
+ # Samples from the excellent lexicon of the Jabberwocky, source
83
+ # of all good slithy toves.
84
+ def words(length)
85
+ WORDS.sample(length)
86
+ end
87
+
88
+ # Replaces a collection of words (or tags), with a new collection
89
+ # containing the same number of beamish words.
90
+ def replace_words(text, delimiter=" ")
91
+ count = text.split(delimiter).count
92
+ words(count).join(delimiter)
93
+ end
94
+
95
+ # Converts most elements (except for `p` tags), and their attributes
96
+ # to uppercase. Why not `p` tags? I know not, but that's the convention.
97
+ def upcase_elements(node)
98
+ node.name = node.name.upcase unless node.name == "p"
99
+
100
+ # Make sure all the ATTRIBUTES ARE SHOUTING at you!
101
+ node.each do |key, value|
102
+ node.delete(key)
103
+ node[key.upcase] = value
104
+ end
105
+ node.elements.each{|el| upcase_elements(el) }
106
+ end
107
+
108
+ # Parses the input document either from the first argument or STDIN:
109
+ @doc = Nokogiri::HTML(ARGV.length > 0 ? IO.read(ARGV[0]) : STDIN.read)
110
+
111
+ # Substitutes meaningful information with borogoves and Jubjub birds.
112
+ sub_attr("a", "href") { fake_url }
113
+ sub_attr("a", "icon_uri") { fake_url }
114
+ sub_attr("a", "icon") { "data:image/png;base64,#{ fake_base_64 }" }
115
+ sub_attr("a", "tags") {|s| replace_words(s, ",") }
116
+ sub_text("a") {|s| replace_words(s) }
117
+ sub_text("h3") {|s| replace_words(s) }
118
+
119
+ # Deformats this back into the mess it started with:
120
+
121
+ # 1. Lower case tag names? What is this, 1998? UPCASE!
122
+ upcase_elements(@doc.root)
123
+
124
+ # 2. Strip out closing tags for DT and DD elements.
125
+ # 3. Remove the root HTML, HEAD, and BODY elements.
126
+ html = @doc.to_html(indent: 2)
127
+ .gsub(%r(\s*</p>\s*),"") # Do not close P tags
128
+ .gsub(%r(\s*</DT>\s*),"") # Do not close DT tags
129
+ .gsub(%r(\s*</DD>\s*),"") # Do not close DD tags
130
+ .gsub(%r(\s*</?HTML>\s*),"") # Strip HTML tags
131
+ .gsub(%r(\s*</?HEAD>\s*),"") # Strip HEAD tags
132
+ .gsub(%r(\s*</?BODY>\s*),"") # Strip BODY tags
133
+
134
+ # Yes, I know you shouldn't parse HTML with regexps, but this isn't exactly
135
+ # a "10 Best Practices You Should Be Using Today!" kind of script.
136
+
137
+ # All done!
138
+ # Enjoy the mome raths.
139
+ puts html
@@ -0,0 +1,8 @@
1
+ require "bookmark_machine/version"
2
+ require "bookmark_machine/bookmark"
3
+ require "bookmark_machine/netscape_parser"
4
+ require "bookmark_machine/netscape_formatter"
5
+
6
+ module BookmarkMachine
7
+
8
+ end
@@ -0,0 +1,29 @@
1
+ module BookmarkMachine
2
+ class Bookmark
3
+ attr_accessor :url, :name, :created_at, :updated_at, :icon, :folders, :tags, :description
4
+
5
+ def initialize(url, attrs=nil)
6
+ self.url = url
7
+
8
+ if attrs
9
+ attrs.each{|key,value| self.send("#{key}=", value)}
10
+ end
11
+
12
+ self.name ||= ""
13
+ self.folders ||= []
14
+ end
15
+
16
+ # Bookmarks are considered equal if all attributes are equal.
17
+ # Which is probably what you would have excpected.
18
+ def == other
19
+ url == other.url &&
20
+ name == other.name &&
21
+ created_at == other.created_at &&
22
+ updated_at == other.updated_at &&
23
+ icon == other.icon &&
24
+ folders == other.folders &&
25
+ tags == other.tags &&
26
+ description == other.description
27
+ end
28
+ end
29
+ end
@@ -0,0 +1,184 @@
1
+ require 'nokogiri'
2
+
3
+ module BookmarkMachine
4
+ # Formatter for the Netscape Bookmark File format.
5
+ # Amusingly, the best documentation for the format comes from Microsoft.
6
+ #
7
+ # https://msdn.microsoft.com/en-us/library/aa753582(v=vs.85).aspx
8
+ #
9
+ # We live in interesting times.
10
+ class NetscapeFormatter
11
+ attr_reader :bookmarks
12
+
13
+ def initialize(bookmarks)
14
+ @bookmarks = bookmarks
15
+ end
16
+
17
+ # Returns an Array of Bookmark objects.
18
+ def to_html
19
+ writer = Writer.new(StringIO.new)
20
+
21
+ bookmarks.each{|b| writer << b }
22
+ writer.done
23
+
24
+ writer.io.string
25
+ end
26
+
27
+ alias_method :to_s, :to_html
28
+
29
+ # :nodoc:
30
+ # This is a simple writer for outputting bookmark appropriate HTML.
31
+ # Since the expected HTML doesn't have a root, uses a custom doctype,
32
+ # and doesn't close most tags, it's easier to just write the output
33
+ # manually rather than try to get Nokogiri to format it poorly for us.
34
+ #
35
+ # Plus this is just kind of fun in a bizarre un-fun kind of way.
36
+ class Writer
37
+ HEADER = <<-HTML.gsub(/^ /, "")
38
+ <!DOCTYPE NETSCAPE-Bookmark-file-1>
39
+ <!-- This is an automatically generated file.
40
+ It will be read and overwritten.
41
+ DO NOT EDIT! -->
42
+ <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">
43
+ <TITLE>Bookmarks</TITLE>
44
+ <H1>Bookmarks</H1>
45
+ HTML
46
+
47
+ attr_reader :io
48
+ attr_reader :folders
49
+
50
+ def initialize(io)
51
+ @io = io
52
+ @folders = []
53
+ io.set_encoding(Encoding::UTF_8)
54
+
55
+ start
56
+ end
57
+
58
+ def done
59
+ close_all_folders
60
+ end
61
+
62
+ def << bookmark
63
+ adjust_folders(bookmark.folders)
64
+ write_bookmark(bookmark)
65
+ end
66
+
67
+ private
68
+
69
+ def start
70
+ io.puts HEADER
71
+ open_collection
72
+ end
73
+
74
+ def adjust_folders(new_folders)
75
+ # Find where the new and old folders differ:
76
+ diverge_index = folders.zip(new_folders).find_index{|a,b| a != b} || 0
77
+
78
+ # Close old folders after that point:
79
+ folders[diverge_index .. -1].reverse_each do |name|
80
+ close_folder(name)
81
+ end
82
+
83
+ # Open new folders after that point:
84
+ new_folders[diverge_index .. -1].each do |name|
85
+ open_folder(name)
86
+ end
87
+ end
88
+
89
+ def close_all_folders
90
+ folders.reverse_each{|name| close_folder(name)}
91
+ # Close root folder
92
+ close_collection
93
+ end
94
+
95
+ def write_bookmark(bookmark)
96
+ open_tag(:DT)
97
+
98
+ open_tag(:A) do
99
+ write_bookmark_attrs(bookmark)
100
+ end
101
+
102
+ write_text(bookmark.name)
103
+ close_tag(:A)
104
+
105
+ if bookmark.description
106
+ open_tag(:DD)
107
+ write_text(bookmark.description)
108
+ end
109
+ end
110
+
111
+ def write_bookmark_attrs(bookmark)
112
+ write_attr(:HREF, bookmark.url) if bookmark.url
113
+ write_attr(:ADD_DATE, bookmark.created_at.to_i) if bookmark.created_at
114
+ write_attr(:LAST_MODIFIED, bookmark.updated_at.to_i) if bookmark.updated_at
115
+ write_attr(:TAGS, bookmark.tags.join(",")) if bookmark.tags
116
+
117
+ icon = bookmark.icon
118
+ if icon
119
+ attr_name = icon.start_with?("data:") ? :ICON_URI : :ICON
120
+ write_attr(attr_name, icon)
121
+ end
122
+ end
123
+
124
+ def write_folder(name)
125
+ open_tag(:DT)
126
+ open_tag(:H3)
127
+ write_text(name)
128
+ close_tag(:H3)
129
+ end
130
+
131
+ def open_folder(name)
132
+ folders.push name
133
+ open_collection
134
+ write_folder(name)
135
+ end
136
+
137
+ def close_folder(name)
138
+ popped = folders.pop
139
+ close_collection
140
+ end
141
+
142
+ def open_collection
143
+ open_tag(:DL)
144
+ open_tag(:p)
145
+ end
146
+
147
+ def close_collection
148
+ close_tag(:DL)
149
+ end
150
+
151
+ def open_tag(tag, attributes={})
152
+ io.write("<#{tag} ")
153
+ yield if block_given?
154
+ io.write(">")
155
+ end
156
+
157
+ def write_attr(name, value)
158
+ io.write(name)
159
+ io.write('="')
160
+ io.write(encode(value))
161
+ io.write('" ')
162
+ end
163
+
164
+ def close_tag(tag)
165
+ io.write("</#{tag}>")
166
+ io.write("\n")
167
+ end
168
+
169
+ def write_text(str)
170
+ io.write(encode(str))
171
+ end
172
+
173
+ def encode(str)
174
+ str.to_s
175
+ .gsub('&', "&amp;")
176
+ .gsub('"', "&quot;")
177
+ .gsub("'", "&apos;")
178
+ .gsub("<", "&lt;")
179
+ .gsub(">", "&gt;")
180
+ end
181
+ end
182
+
183
+ end
184
+ end
@@ -0,0 +1,138 @@
1
+ require 'nokogiri'
2
+
3
+ module BookmarkMachine
4
+ # Parser for the Netscape Bookmark File format.
5
+ # Amusingly, the best documentation for the format comes from Microsoft.
6
+ #
7
+ # https://msdn.microsoft.com/en-us/library/aa753582(v=vs.85).aspx
8
+ #
9
+ # We live in interesting times.
10
+ class NetscapeParser
11
+
12
+ def initialize(html)
13
+ @html = html
14
+ end
15
+
16
+ # Returns an Array of Bookmark objects.
17
+ def bookmarks
18
+ @bookmarks ||= begin
19
+ doc = BookmarkDocument.new
20
+ parser = Nokogiri::HTML::SAX::Parser.new(doc)
21
+ parser.parse(@html)
22
+
23
+ doc.bookmarks
24
+ end
25
+ end
26
+
27
+ end
28
+
29
+ # :nodoc:
30
+ # BookmarkDocument implements SAX callbacks for parsing messy bookmark files.
31
+ # It turns out that a SAX parser is more resilient to bizarre inputs than the
32
+ # typical Nokogiri parser since it doesn't bother itself with the document
33
+ # structure.
34
+ class BookmarkDocument < Nokogiri::XML::SAX::Document
35
+ attr_reader :bookmarks
36
+
37
+ def initialize
38
+ super
39
+
40
+ @folders = []
41
+ @bookmarks = []
42
+ @current_bookmark = nil
43
+
44
+ reset_state
45
+ end
46
+
47
+ # Only three elements have semantic meaning, A, H3, and DD,
48
+ # representing Folder names, Bookmarks, and Descriptions.
49
+ def start_element(name, attrs = [])
50
+ case name
51
+ when "a" then start_bookmark(attrs)
52
+ when "h3" then start_folder(attrs)
53
+ when "dd" then start_description(attrs)
54
+ else done
55
+ end
56
+ end
57
+
58
+ # Only one closing element has semantic meaning, a closed DL,
59
+ # which indicates the end of a folder.
60
+ def end_element(name, attrs = [])
61
+ case name
62
+ when "dl" then pop_folder
63
+ else done
64
+ end
65
+ end
66
+
67
+ def end_document
68
+ done
69
+ end
70
+
71
+ def characters(string)
72
+ @text << string if @state
73
+ end
74
+
75
+ def start_bookmark(attrs)
76
+ attrs = Hash[attrs]
77
+
78
+ @current_bookmark = Bookmark.new(attrs['href'])
79
+ @current_bookmark.created_at = epoch_time(attrs['add_date'])
80
+ @current_bookmark.updated_at = epoch_time((attrs['last_modified'] || attrs['add_date']))
81
+ @current_bookmark.icon = attrs['icon'] || attrs['icon_uri']
82
+ @current_bookmark.tags = tagged_text(attrs['tags'])
83
+ @current_bookmark.folders = @folders.clone
84
+
85
+ @state = :bookmark
86
+ end
87
+
88
+ def start_folder(attrs)
89
+ @state = :folder
90
+ end
91
+
92
+ def start_description(attrs)
93
+ @state = :description
94
+ end
95
+
96
+ def done
97
+ case @state
98
+ when :bookmark
99
+ @current_bookmark.name = @text.strip
100
+ @bookmarks << @current_bookmark
101
+ @current_bookmark = nil
102
+ reset_state
103
+
104
+ when :folder
105
+ @folders << @text.strip
106
+ reset_state
107
+
108
+ when :description
109
+ description = @text.strip
110
+ @bookmarks.last.description = description unless description == ""
111
+ reset_state
112
+
113
+ end
114
+ end
115
+
116
+ def pop_folder
117
+ @folders.pop
118
+ done
119
+ end
120
+
121
+ def reset_state
122
+ @text = ""
123
+ @state = nil
124
+ end
125
+
126
+ # Converts from epoch seconds to a Time object.
127
+ # Returns nil on a nil input.
128
+ def epoch_time(seconds)
129
+ Time.at(seconds.to_i) if seconds
130
+ end
131
+
132
+ def tagged_text(str)
133
+ str.split(",").map{|t| t.strip} if str
134
+ end
135
+
136
+ end
137
+
138
+ end
@@ -0,0 +1,3 @@
1
+ module BookmarkMachine
2
+ VERSION = "0.0.1"
3
+ end
metadata ADDED
@@ -0,0 +1,107 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: bookmark_machine
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.0.1
5
+ platform: ruby
6
+ authors:
7
+ - Adam Sanderson
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2017-03-31 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: nokogiri
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - "~>"
18
+ - !ruby/object:Gem::Version
19
+ version: '1.6'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - "~>"
25
+ - !ruby/object:Gem::Version
26
+ version: '1.6'
27
+ - !ruby/object:Gem::Dependency
28
+ name: minitest
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - "~>"
32
+ - !ruby/object:Gem::Version
33
+ version: '5.9'
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - "~>"
39
+ - !ruby/object:Gem::Version
40
+ version: '5.9'
41
+ - !ruby/object:Gem::Dependency
42
+ name: bundler
43
+ requirement: !ruby/object:Gem::Requirement
44
+ requirements:
45
+ - - "~>"
46
+ - !ruby/object:Gem::Version
47
+ version: '1.8'
48
+ type: :development
49
+ prerelease: false
50
+ version_requirements: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - "~>"
53
+ - !ruby/object:Gem::Version
54
+ version: '1.8'
55
+ - !ruby/object:Gem::Dependency
56
+ name: rake
57
+ requirement: !ruby/object:Gem::Requirement
58
+ requirements:
59
+ - - "~>"
60
+ - !ruby/object:Gem::Version
61
+ version: '10.0'
62
+ type: :development
63
+ prerelease: false
64
+ version_requirements: !ruby/object:Gem::Requirement
65
+ requirements:
66
+ - - "~>"
67
+ - !ruby/object:Gem::Version
68
+ version: '10.0'
69
+ description: Parse and format bookmarks
70
+ email:
71
+ - netghost@gmail.com
72
+ executables: []
73
+ extensions: []
74
+ extra_rdoc_files: []
75
+ files:
76
+ - README.markdown
77
+ - bin/anonymize_html.rb
78
+ - lib/bookmark_machine.rb
79
+ - lib/bookmark_machine/bookmark.rb
80
+ - lib/bookmark_machine/netscape_formatter.rb
81
+ - lib/bookmark_machine/netscape_parser.rb
82
+ - lib/bookmark_machine/version.rb
83
+ homepage: https://github.com/adamsanderson/bookmark_machine
84
+ licenses:
85
+ - MIT
86
+ metadata: {}
87
+ post_install_message:
88
+ rdoc_options: []
89
+ require_paths:
90
+ - lib
91
+ required_ruby_version: !ruby/object:Gem::Requirement
92
+ requirements:
93
+ - - ">="
94
+ - !ruby/object:Gem::Version
95
+ version: '0'
96
+ required_rubygems_version: !ruby/object:Gem::Requirement
97
+ requirements:
98
+ - - ">="
99
+ - !ruby/object:Gem::Version
100
+ version: '0'
101
+ requirements: []
102
+ rubyforge_project:
103
+ rubygems_version: 2.4.5.1
104
+ signing_key:
105
+ specification_version: 4
106
+ summary: Reads and writes netscape bookmark files.
107
+ test_files: []