pismo 0.6.2 → 0.7.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -27,6 +27,7 @@ There's also a shorter "convenience" method which might be handy in IRB - it doe
27
27
  Pismo['http://www.rubyflow.com/items/4082'].title # => "Install Ruby as a non-root User"
28
28
 
29
29
  The current metadata methods are:
30
+
30
31
  * title
31
32
  * titles
32
33
  * author
@@ -50,11 +51,14 @@ The html_body and body methods will be of particular interest. They return the "
50
51
 
51
52
  There are some shortcomings or problems that I'm aware of and am going to pursue:
52
53
 
53
- * I do not know how Pismo fares on JRuby, Rubinius, or others yet.
54
- * The "Reader" content extraction algorithm is not perfect. It can sometimes return crap and can barf on certain types of characters for sentence extraction.
55
- * The author name extraction is quite poor.
56
- * The image extraction only handles images with absolute URLs.
57
- * The stopword list leaves a bit to be desired. It errs on the side of being too long rather than too short, though (1024 words long!)
54
+ * I do not know how Pismo fares on Rubinius or other versions of 1.9 (e.g. 1.9.2) yet
55
+ * pismo does not install on JRuby due to a problem in the fast-stemmer dependency
56
+ * Some users have had issues with using Pismo from irb. This appears to be related to Nokogiri use causing a segfault
57
+ * The "Reader" content extraction algorithm is not perfect. It can sometimes return crap and can barf on certain types of characters for sentence extraction
58
+ * The author name extraction isn't very strong and is best avoided for now
59
+ * The image extraction only deals with images with absolute URLs
60
+ * The stopword list is a little too long (~1000 words) and needs to be trimmed
61
+ * The corpus in test/corpus needs significantly extending
58
62
 
59
63
  ## OTHER GROOVY STUFF:
60
64
 
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.6.2
1
+ 0.7.0
data/bin/pismo CHANGED
@@ -32,6 +32,7 @@ if ARGV.empty?
32
32
  P = doc
33
33
  @p = doc
34
34
  puts "Pismo has loaded #{url} into @p and P"
35
+ puts "Note: There have been several reports of Nokogiri segfaulting while using Pismo from irb. If this happens, try the same code as a standalone Ruby app."
35
36
  IRB.start
36
37
  else
37
38
  output = { :url => doc.url }
@@ -33,7 +33,7 @@ module Pismo
33
33
  handle
34
34
  end
35
35
 
36
- @html = clean_html(@html)
36
+ @html = self.class.clean_html(@html)
37
37
 
38
38
  @doc = Nokogiri::HTML(@html)
39
39
  end
@@ -42,11 +42,18 @@ module Pismo
42
42
  @doc.match([*args], all)
43
43
  end
44
44
 
45
- def clean_html(html)
46
- html.gsub!('’', '\'')
47
- html.gsub!('”', '"')
45
+ def self.clean_html(html)
46
+ # Normalize stupid entities
47
+ # TODO: Optimize this so we don't need all these sequential gsubs
48
+ html.gsub!(" ", " ")
49
+ html.gsub!(" ", " ")
50
+ html.gsub!(" ", " ")
48
51
  html.gsub!('–', '-')
52
+ html.gsub!("‘", "'")
53
+ html.gsub!('’', "'")
49
54
  html.gsub!('“', '"')
55
+ html.gsub!('”', '"')
56
+ html.gsub!("…", '...')
50
57
  html.gsub!(' ', ' ')
51
58
  html
52
59
  end
@@ -130,7 +130,6 @@ module Pismo
130
130
  '.byline',
131
131
  '.post_subheader_left a', # TechCrunch style
132
132
  '.byl', # BBC News style
133
- '.meta a',
134
133
  '.articledata .author a',
135
134
  '#owners a', # Google Code style
136
135
  '.author a',
@@ -147,7 +146,8 @@ module Pismo
147
146
  '.blog_meta a',
148
147
  'cite a',
149
148
  'cite',
150
- '.contributor_details h4 a'
149
+ '.contributor_details h4 a',
150
+ '.meta a'
151
151
  ], all)
152
152
 
153
153
  return unless author
@@ -8,7 +8,7 @@ module Pismo
8
8
  attr_reader :raw_content, :doc, :content_candidates
9
9
 
10
10
  # Elements to keep for /input/ sanitization
11
- OK_ELEMENTS = %w{a td br th tbody table tr div span img strong em b i body html head title p h1 h2 h3 h4 h5 h6 pre code tt ul li ol blockquote font big small section article abbr audio video cite dd dt figure caption sup form dl dt dd}
11
+ OK_ELEMENTS = %w{a td br th tbody table tr div span img strong em b i body html head title p h1 h2 h3 h4 h5 h6 pre code tt ul li ol blockquote font big small section article abbr audio video cite dd dt figure caption sup form dl dt dd center}
12
12
 
13
13
  # Build a tree of attributes that are allowed for each element.. doing it this messy way due to how Sanitize works, alas
14
14
  OK_ATTRIBUTES = {}
@@ -21,7 +21,7 @@ module Pismo
21
21
  GOOD_WORDS = %w{content post blogpost main story body entry text desc asset hentry single entrytext postcontent bodycontent}.uniq
22
22
 
23
23
  # Words that indicate crap in general
24
- BAD_WORDS = %w{reply metadata options commenting comments comment about footer header outer credit sidebar widget subscribe clearfix date social bookmarks links share video watch excerpt related supplement accessibility offscreen meta title signup blq secondary feedback featured clearfix small job jobs listing listings navigation nav byline addcomment postcomment trackback neighbor ads commentform fbfans login similar thumb link blogroll grid twitter wrapper container nav sitesub printfooter editsection visualclear catlinks hidden toc contentsub caption disqus rss shoutbox sponsor}.uniq
24
+ BAD_WORDS = %w{reply metadata options commenting comments comment about footer header outer credit sidebar widget subscribe clearfix date social bookmarks links share video watch excerpt related supplement accessibility offscreen meta title signup blq secondary feedback featured clearfix small job jobs listing listings navigation nav byline addcomment postcomment trackback neighbor ads commentform fbfans login similar thumb link blogroll grid twitter wrapper container nav sitesub printfooter editsection visualclear catlinks hidden toc contentsub caption disqus rss shoutbox sponsor blogcomments}.uniq
25
25
 
26
26
  # Words that kill a branch dead
27
27
  FATAL_WORDS = %w{comments comment bookmarks social links ads related similar footer digg totop metadata sitesub nav sidebar commenting options addcomment leaderboard offscreen job prevlink prevnext navigation reply-link hide hidden sidebox archives vcard}
@@ -39,7 +39,7 @@ module Pismo
39
39
 
40
40
  # Create a document object based on the raw HTML content provided
41
41
  def initialize(raw_content)
42
- @raw_content = raw_content
42
+ @raw_content = Pismo::Document.clean_html(raw_content)
43
43
  build_doc
44
44
  end
45
45
 
@@ -59,6 +59,17 @@ module Pismo
59
59
 
60
60
  # Remove scripts manually, Sanitize and/or Nokogiri seem to go a bit funny with them
61
61
  @raw_content.gsub!(/\<script .*?\<\/script\>/im, '')
62
+
63
+ # Get rid of bullshit "smart" quotes and other Unicode nonsense
64
+ @raw_content.force_encoding("ASCII-8BIT") if RUBY_VERSION > "1.9"
65
+ @raw_content.gsub!("\xe2\x80\x89", " ")
66
+ @raw_content.gsub!("\xe2\x80\x99", "'")
67
+ @raw_content.gsub!("\xe2\x80\x98", "'")
68
+ @raw_content.gsub!("\xe2\x80\x9c", '"')
69
+ @raw_content.gsub!("\xe2\x80\x9d", '"')
70
+ @raw_content.gsub!("\xe2\x80\xf6", '.')
71
+ @raw_content.force_encoding("UTF-8") if RUBY_VERSION > "1.9"
72
+
62
73
 
63
74
  # Sanitize the HTML
64
75
  @raw_content = Sanitize.clean(@raw_content,
@@ -70,8 +81,6 @@ module Pismo
70
81
 
71
82
  @doc = Nokogiri::HTML(@raw_content, nil, 'utf-8')
72
83
 
73
- #ap @raw_content
74
- #exit
75
84
  build_analysis_tree
76
85
  end
77
86
 
@@ -102,20 +111,34 @@ module Pismo
102
111
  # Assume that no content we'll want comes in a total package of fewer than 80 characters!
103
112
  next unless el.text.to_s.strip.length >= 80
104
113
 
105
- ids = (el['id'].to_s + ' ' + el['class'].to_s).downcase.strip.scan(/[a-z]+/)
106
114
  path_segments = el.path.scan(/[a-z]+/)[2..-1] || []
107
115
  depth = path_segments.length
116
+
117
+ local_ids = (el['id'].to_s + ' ' + el['class'].to_s).downcase.strip.scan(/[a-z]+/)
118
+ ids = local_ids
119
+
120
+ cp = el.parent
121
+ (depth - 1).times do
122
+ ids += (cp['id'].to_s + ' ' + cp['class'].to_s).downcase.strip.scan(/[a-z]+/)
123
+ cp = cp.parent
124
+ end if depth > 1
125
+
126
+ #puts "IDS"
127
+ #ap ids
128
+ #puts "LOCAL IDS"
129
+ #ap local_ids
108
130
 
109
131
  branch = {}
110
132
  branch[:ids] = ids
133
+ branch[:local_ids] = local_ids
111
134
  branch[:score] = -(BAD_WORDS & ids).size
112
- branch[:score] += (GOOD_WORDS & ids).size
113
- next if branch[:score] < 0
135
+ branch[:score] += ((GOOD_WORDS & ids).size * 2)
136
+ next if branch[:score] < -5
114
137
 
115
138
  #puts "#{ids.join(",")} - #{branch[:score].to_s} - #{el.text.to_s.strip.length}"
116
139
 
117
140
  # Elements that have an ID or class are more likely to be our winners
118
- branch[:score] += 2 unless ids.empty?
141
+ branch[:score] += 2 unless local_ids.empty?
119
142
 
120
143
  branch[:name] = el.name
121
144
  branch[:depth] = depth
@@ -198,6 +221,7 @@ module Pismo
198
221
  branch[:score] -= 5 if branch[:bad_child_count] > 20
199
222
 
200
223
  branch[:score] += depth
224
+ branch[:score] *= 0.8 if ids.length > 10
201
225
 
202
226
 
203
227
 
@@ -212,8 +236,7 @@ module Pismo
212
236
  # Sort the branches by their score in reverse order
213
237
  @content_candidates = sorted_tree.reverse.first([5, sorted_tree.length].min)
214
238
 
215
- @content_candidates #.map { |i| [i[0], i[1][:name], i[1][:ids].join(','), i[1][:score] ]}
216
- #ap @content_candidates
239
+ #ap @content_candidates #.map { |i| [i[0], i[1][:name], i[1][:ids].join(','), i[1][:score] ]}
217
240
  #t2 = Time.now.to_i + (Time.now.usec.to_f / 1000000)
218
241
  #puts t2 - t1
219
242
  #exit
@@ -278,7 +301,7 @@ module Pismo
278
301
  next
279
302
  end
280
303
 
281
- if el.name == "p" && el.text !~ /\.(\s|$)/ && el.inner_html !~ /\<img/
304
+ if el.name == "p" && el.text !~ /(\.|\?|\!|\"|\')(\s|$)/ && el.inner_html !~ /\<img/
282
305
  el.remove
283
306
  next
284
307
  end
@@ -321,29 +344,15 @@ module Pismo
321
344
  # Remove empty tags
322
345
  clean_html.gsub!(/<(\w+)><\/\1>/, "")
323
346
 
324
- # Trim leading space from lines but without removing blank lines
325
- #clean_html.gsub!(/^\ +(?=\S)/, '')
326
-
327
347
  # Just a messy, hacky way to make output look nicer with subsequent paragraphs..
328
348
  clean_html.gsub!(/<\/(div|p|h1|h2|h3|h4|h5|h6)>/, '</\1>' + "\n\n")
329
-
330
- # Get rid of bullshit "smart" quotes
331
- clean_html.force_encoding("ASCII-8BIT") if RUBY_VERSION > "1.9"
332
- clean_html.gsub!("\xe2\x80\x89", " ")
333
- clean_html.gsub!("\xe2\x80\x99", "'")
334
- clean_html.gsub!("\xe2\x80\x98", "'")
335
- clean_html.gsub!("\xe2\x80\x9c", '"')
336
- clean_html.gsub!("\xe2\x80\x9d", '"')
337
- clean_html.force_encoding("UTF-8") if RUBY_VERSION > "1.9"
338
349
 
339
350
  @content[[clean, index]] = clean_html
340
351
  end
341
352
 
342
353
  def sentences(qty = 3)
343
- # ap content
344
354
  clean_content = Sanitize.clean(content, :elements => NON_HEADER_ELEMENTS, :attributes => OK_CLEAN_ATTRIBUTES, :remove_contents => %w{h1 h2 h3 h4 h5 h6})
345
- #ap clean_content
346
- #exit
355
+
347
356
  fodder = ''
348
357
  doc = Nokogiri::HTML(clean_content, nil, 'utf-8')
349
358
 
@@ -1,9 +1,3 @@
1
- 0
2
- 1
3
- 10
4
- 100
5
- 20
6
- a
7
1
  a's
8
2
  Aaliyah
9
3
  Aaron
@@ -17,8 +11,6 @@ accordingly
17
11
  across
18
12
  actually
19
13
  Adam
20
- add
21
- added
22
14
  Addison
23
15
  Adrian
24
16
  after
@@ -67,7 +59,6 @@ annual
67
59
  another
68
60
  Anthony
69
61
  Antonio
70
- any
71
62
  anybody
72
63
  anyhow
73
64
  anyone
@@ -107,7 +98,6 @@ Avery
107
98
  away
108
99
  awesome
109
100
  awfully
110
- b
111
101
  Bailey
112
102
  based
113
103
  basically
@@ -118,7 +108,6 @@ become
118
108
  becomes
119
109
  becoming
120
110
  been
121
- before
122
111
  beforehand
123
112
  behind
124
113
  being
@@ -130,16 +119,12 @@ beside
130
119
  besides
131
120
  best
132
121
  better
133
- between
134
122
  beyond
135
123
  big
136
124
  biggest
137
- bit
138
- bits
139
125
  Blake
140
126
  both
141
127
  bother
142
- box
143
128
  Brady
144
129
  Brandon
145
130
  Brayden
@@ -152,10 +137,7 @@ Brooke
152
137
  Brooklyn
153
138
  Bryan
154
139
  Bryce
155
- built
156
140
  but
157
- by
158
- c
159
141
  c'mon
160
142
  c's
161
143
  Caden
@@ -192,15 +174,14 @@ Cody
192
174
  Cole
193
175
  Colin
194
176
  Colton
195
- com
196
177
  come
197
178
  comes
198
179
  coming
199
180
  comment
200
181
  company
201
- compared
202
182
  compelling
203
183
  concerning
184
+ congratulations
204
185
  Connor
205
186
  consequently
206
187
  consider
@@ -220,7 +201,6 @@ covering
220
201
  cunt
221
202
  currently
222
203
  customizable
223
- d
224
204
  damn
225
205
  Daniel
226
206
  Danielle
@@ -258,7 +238,6 @@ driven
258
238
  drove
259
239
  during
260
240
  Dylan
261
- e
262
241
  each
263
242
  easier
264
243
  edu
@@ -278,8 +257,6 @@ end
278
257
  english
279
258
  enough
280
259
  entirely
281
- episodes
282
- equals
283
260
  Eric
284
261
  Erin
285
262
  es
@@ -305,12 +282,10 @@ existing
305
282
  extensive
306
283
  extra
307
284
  extremely
308
- f
309
285
  Faith
310
286
  false
311
287
  fame
312
288
  far
313
- favorite
314
289
  feb
315
290
  february
316
291
  feel
@@ -335,19 +310,20 @@ fuck
335
310
  full
336
311
  further
337
312
  furthermore
338
- g
339
313
  Gabriel
340
314
  Gabriella
341
315
  Gabrielle
342
316
  Garrett
317
+ gave
343
318
  Gavin
319
+ generally
344
320
  get
345
321
  gets
346
322
  getting
323
+ give
347
324
  given
348
325
  gives
349
326
  glory
350
- go
351
327
  goal
352
328
  goes
353
329
  going
@@ -358,7 +334,6 @@ gotten
358
334
  Grace
359
335
  great
360
336
  greetings
361
- h
362
337
  had
363
338
  hadn't
364
339
  Hailey
@@ -395,23 +370,18 @@ himself
395
370
  hire
396
371
  his
397
372
  hither
398
- homepage
399
373
  hopefully
400
- hour
401
- hours
402
374
  how
403
375
  howbeit
404
376
  however
405
377
  huge
406
378
  Hunter
407
- i
408
379
  i'd
409
380
  i'll
410
381
  i'm
411
382
  i've
412
383
  Ian
413
384
  ie
414
- if
415
385
  ignored
416
386
  imagine
417
387
  immediate
@@ -428,7 +398,6 @@ indicates
428
398
  informative
429
399
  inhibits
430
400
  inner
431
- inside
432
401
  insofar
433
402
  instead
434
403
  interest
@@ -448,9 +417,7 @@ it'll
448
417
  it's
449
418
  its
450
419
  itself
451
- itunes
452
420
  Ivan
453
- j
454
421
  Jack
455
422
  Jackson
456
423
  Jacob
@@ -492,7 +459,6 @@ jun
492
459
  june
493
460
  just
494
461
  Justin
495
- k
496
462
  Kaden
497
463
  Kaitlyn
498
464
  Kaleb
@@ -513,7 +479,6 @@ known
513
479
  knows
514
480
  Kyle
515
481
  Kylie
516
- l
517
482
  la
518
483
  Landon
519
484
  last
@@ -541,8 +506,6 @@ line
541
506
  listing
542
507
  listings
543
508
  little
544
- live
545
- loading
546
509
  Logan
547
510
  look
548
511
  looking
@@ -555,7 +518,6 @@ ltd
555
518
  Lucas
556
519
  Luis
557
520
  Luke
558
- m
559
521
  Mackenzie
560
522
  Madeline
561
523
  Madison
@@ -592,8 +554,6 @@ Michelle
592
554
  might
593
555
  Miguel
594
556
  mile
595
- minute
596
- minutes
597
557
  more
598
558
  moreover
599
559
  Morgan
@@ -601,11 +561,9 @@ most
601
561
  mostly
602
562
  moving
603
563
  much
604
- multiple
605
564
  must
606
565
  my
607
566
  myself
608
- n
609
567
  name
610
568
  namely
611
569
  Natalie
@@ -644,7 +602,6 @@ novel
644
602
  november
645
603
  now
646
604
  nowhere
647
- o
648
605
  Obie
649
606
  obviously
650
607
  oct
@@ -670,7 +627,6 @@ or
670
627
  org
671
628
  oriented
672
629
  Oscar
673
- other
674
630
  others
675
631
  otherwise
676
632
  ought
@@ -678,12 +634,9 @@ our
678
634
  ours
679
635
  ourselves
680
636
  out
681
- outside
682
- over
683
637
  overall
684
638
  Owen
685
639
  own
686
- p
687
640
  Paige
688
641
  par
689
642
  Parker
@@ -694,7 +647,6 @@ Patrick
694
647
  Paul
695
648
  peasy
696
649
  per
697
- perform
698
650
  perhaps
699
651
  piece
700
652
  placed
@@ -714,11 +666,9 @@ proud
714
666
  provide
715
667
  provides
716
668
  put
717
- q
718
669
  que
719
670
  quite
720
671
  qv
721
- r
722
672
  Rachel
723
673
  rather
724
674
  rd
@@ -744,7 +694,6 @@ Riley
744
694
  Robert
745
695
  run
746
696
  Ryan
747
- s
748
697
  safest
749
698
  said
750
699
  Samantha
@@ -778,7 +727,6 @@ september
778
727
  serious
779
728
  seriously
780
729
  set
781
- Seth
782
730
  settings
783
731
  seven
784
732
  several
@@ -822,6 +770,7 @@ step
822
770
  Stephanie
823
771
  Steven
824
772
  still
773
+ stuff
825
774
  sub
826
775
  subscribe
827
776
  such
@@ -831,7 +780,6 @@ sup
831
780
  sur
832
781
  sure
833
782
  Sydney
834
- t
835
783
  t's
836
784
  take
837
785
  taken
@@ -871,6 +819,8 @@ they'd
871
819
  they'll
872
820
  they're
873
821
  they've
822
+ thing
823
+ things
874
824
  think
875
825
  third
876
826
  this
@@ -909,12 +859,9 @@ twice
909
859
  two
910
860
  Tyler
911
861
  typically
912
- u
913
862
  ultra
914
863
  un
915
- under
916
864
  unfortunately
917
- unless
918
865
  unlikely
919
866
  unsurprisingly
920
867
  until
@@ -929,7 +876,6 @@ uses
929
876
  using
930
877
  usually
931
878
  uucp
932
- v
933
879
  value
934
880
  Vanessa
935
881
  various
@@ -940,7 +886,6 @@ Victoria
940
886
  Vincent
941
887
  viz
942
888
  vs
943
- w
944
889
  walks
945
890
  want
946
891
  wants
@@ -982,6 +927,8 @@ who's
982
927
  whoever
983
928
  whole
984
929
  whom
930
+ approximate
931
+ approximately
985
932
  whose
986
933
  why
987
934
  will
@@ -1000,11 +947,8 @@ would
1000
947
  wouldn't
1001
948
  wrapped
1002
949
  Wyatt
1003
- x
1004
950
  Xavier
1005
- y
1006
951
  yeah
1007
- years
1008
952
  yes
1009
953
  yet
1010
954
  you
@@ -1016,9 +960,6 @@ your
1016
960
  yours
1017
961
  yourself
1018
962
  yourselves
1019
- generally
1020
- z
1021
963
  Zachary
1022
964
  zero
1023
- Zoe
1024
- congratulations
965
+ Zoe
@@ -5,11 +5,11 @@
5
5
 
6
6
  Gem::Specification.new do |s|
7
7
  s.name = %q{pismo}
8
- s.version = "0.6.2"
8
+ s.version = "0.7.0"
9
9
 
10
10
  s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
11
11
  s.authors = ["Peter Cooper"]
12
- s.date = %q{2010-06-20}
12
+ s.date = %q{2010-07-27}
13
13
  s.default_executable = %q{pismo}
14
14
  s.description = %q{Pismo extracts and retrieves content-related metadata from HTML pages - you can use the resulting data in an organized way, such as a summary/first paragraph, body text, keywords, RSS feed URL, favicon, etc.}
15
15
  s.email = %q{git@peterc.org}
@@ -42,7 +42,7 @@
42
42
  :spolsky:
43
43
  :title: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) - Joel on Software
44
44
  :description: Haven't mastered the basics of Unicode and character sets? Please don't write another line of code until you've read this article.
45
- :lede: I've been dismayed to discover just how many software developers aren't really completely up to speed on the mysterious world of character sets, encodings, Unicode, all that stuff. A couple of years ago, a beta tester for FogBUGZ was wondering whether it could handle incoming email in Japanese. Japanese?
45
+ :lede: Ever wonder about that mysterious Content-Type tag? You know, the one you're supposed to put in HTML and you never quite know what it should be? Did you ever get an email from your friends in Bulgaria with the subject line "????
46
46
  :author: Joel Spolsky
47
47
  :favicon: /favicon.ico
48
48
  :feed: http://www.joelonsoftware.com/rss.xml
@@ -61,7 +61,6 @@
61
61
  :tweet:
62
62
  :lede: Gobsmacked that TeX/LaTeX (document formatting tools) for OS X is a 1.3GB (yes, GIGAbytes) download OS X. Wow..!
63
63
  :sentences: Gobsmacked that TeX/LaTeX (document formatting tools) for OS X is a 1.3GB (yes, GIGAbytes) download OS Wow..!
64
- :datetime: 2010-06-05 12:00:00 +01:00
65
64
  :cant_read:
66
65
  :sentences: "For those of us who grew up as weird kids in the 1980s, the work of Berkeley Breathed was as important as those twin eternal pillars of weird-kid-dom: Monty Python and Mad magazine. In a word: seminal. In two words: fucking seminal."
67
66
  :gmane:
@@ -27,16 +27,13 @@
27
27
  - "I'm just aching to know if the new Apple tablet (insert caveats, weasel words and qualifiers here) is a potential Cintiq competitor."
28
28
  - "I don't think it will be, but you never know."
29
29
  :spolsky:
30
- - "I've been dismayed to discover just how many software developers aren't really completely up to speed on the mysterious world of character sets, encodings, Unicode, all that stuff."
31
- - "A couple of years ago, a beta tester for FogBUGZ was wondering whether it could handle incoming email in Japanese."
30
+ - "Ever wonder about that mysterious Content-Type tag?"
31
+ - "You know, the one you're supposed to put in HTML and you never quite know what it should be?"
32
32
  :techcrunch:
33
33
  - "Last week, we covered Googlle opening a school in India."
34
34
  - "Googlle, not to be confused with Google."
35
35
  :tweet:
36
36
  - "Gobsmacked that TeX/LaTeX (document formatting tools) for OS X is a 1.3GB (yes, GIGAbytes) download OS Wow..!"
37
- :youtube:
38
- - "The location filter shows you popular videos from the selected country or region on lists like Most Viewed and in search results.If you would like to change either of these preferences, please use the links in the footer at the bottom of the page."
39
- - "Click \"OK\" to accept these settings or click \"Cancel\" to set your language preference to \"English (UK)\" and your location filter to \"Worldwide\"."
40
37
  :zefrank:
41
38
  - "If there's anyone who knows how to marshal an online audience, it's Ze Frank."
42
39
  - "Ze is best-known for his 2006 program \"The Show,\" in which he made a new 2-3 minute video every day for 1 year."
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: pismo
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.6.2
4
+ version: 0.7.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Peter Cooper
@@ -9,7 +9,7 @@ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
11
 
12
- date: 2010-06-20 00:00:00 +01:00
12
+ date: 2010-07-27 00:00:00 +01:00
13
13
  default_executable: pismo
14
14
  dependencies:
15
15
  - !ruby/object:Gem::Dependency