pismo 0.5.0 → 0.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/LICENSE +19 -28
- data/NOTICE +4 -0
- data/README.markdown +37 -40
- data/Rakefile +3 -2
- data/VERSION +1 -1
- data/bin/pismo +15 -7
- data/lib/pismo/document.rb +2 -2
- data/lib/pismo/internal_attributes.rb +23 -16
- data/lib/pismo/reader.rb +390 -0
- data/lib/pismo.rb +3 -2
- data/pismo.gemspec +23 -15
- data/test/corpus/bbcnews2.html +1575 -0
- data/test/corpus/gmane.html +138 -0
- data/test/corpus/metadata_expected.yaml +20 -5
- data/test/corpus/queness.html +919 -0
- data/test/corpus/reader_expected.yaml +45 -0
- data/test/corpus/tweet.html +360 -0
- data/test/corpus/zefrank.html +535 -0
- data/test/test_corpus.rb +9 -1
- metadata +89 -34
- data/lib/pismo/readability.rb +0 -342
- data/test/test_readability.rb +0 -152
@@ -0,0 +1,138 @@
|
|
1
|
+
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
2
|
+
<html>
|
3
|
+
<head>
|
4
|
+
<title>Gmane -- Mail To News And Back Again</title>
|
5
|
+
<link href="http://gmane.org/img/gmane.css" rel="stylesheet" type="text/css">
|
6
|
+
<link rel="SHORTCUT ICON" href="http://gmane.org/favicon.ico">
|
7
|
+
</head>
|
8
|
+
<body bgcolor=white text=black class=main>
|
9
|
+
<table cellpadding=5 cellspacing=10 class="main">
|
10
|
+
<tr valign=top>
|
11
|
+
<td class="maintd">
|
12
|
+
<div class="tdiv">
|
13
|
+
<div>
|
14
|
+
<a href="http://gmane.org/">Home</a><br>
|
15
|
+
<a href="http://dir.gmane.org/">Reading</a><br>
|
16
|
+
<a href="http://search.gmane.org">Searching</a><br>
|
17
|
+
<a href="http://gmane.org/subscribe.php">Subscribe</a><br>
|
18
|
+
<a href="http://gmane.org/sponsors.php">Sponsors</a><br>
|
19
|
+
<a href="http://gmane.org/stats.php">Statistics</a><br>
|
20
|
+
<a href="http://gmane.org/post.php">Posting</a><br>
|
21
|
+
<a href="http://gmane.org/contact.php">Contact</a><br>
|
22
|
+
<a href="http://gmane.org/spam-control.php">Spam</a><br>
|
23
|
+
<a href="http://gmane.org/find.php">Lists</a><br>
|
24
|
+
<a href="http://gmane.org/links.php">Links</a><br>
|
25
|
+
<a href="http://gmane.org/about.php">About</a><br>
|
26
|
+
<a href="http://gmane.org/host.php">Hosting</a><br>
|
27
|
+
<a href="http://gmane.org/filter.php">Filtering</a><br>
|
28
|
+
<a href="http://gmane.org/features.php">Features</a>
|
29
|
+
<a href="http://gmane.org/dist.php">Download</a><br>
|
30
|
+
<a href="http://gmane.org/logo.php">Marketing</a><br>
|
31
|
+
<a href="http://gmane.org/import.php">Archives</a><br>
|
32
|
+
<a href="http://weaver.gmane.org/">Weaver</a><br>
|
33
|
+
<a href="http://gmane.org/faq.php">FAQ</a><br>
|
34
|
+
<br>
|
35
|
+
</div>
|
36
|
+
<div class="ltd">
|
37
|
+
<!-- <img src="http://gmane.org/img/gmane-25.png" width=25 height=25 alt="Gmane"> -->
|
38
|
+
</div>
|
39
|
+
</div>
|
40
|
+
</td>
|
41
|
+
<td align=center class="maintd">
|
42
|
+
<a href="http://gmane.org/"><img src="http://gmane.org/img/gmane-rot.png" alt="Gmane" border=0></a>
|
43
|
+
</td>
|
44
|
+
<td class="bodytd">
|
45
|
+
<div class="headers">
|
46
|
+
<div class="face">
|
47
|
+
<img border=0 alt="Favicon" src="http://cache.gmane.org/gmane/comp/gcc/devel/114407-favicon.png">
|
48
|
+
</div>
|
49
|
+
From: Mark Mitchell <mark <at> codesourcery.com><br>
|
50
|
+
Subject: <a target="_top" rel="nofollow" href="http://news.gmane.org/find-root.php?message_id=%3c4C030228.8020201%40codesourcery.com%3e">Using C++ in GCC is OK</a><br>
|
51
|
+
Newsgroups: <a href="http://news.gmane.org/gmane.comp.gcc.devel" target="_top">gmane.comp.gcc.devel</a><br>
|
52
|
+
Date: 2010-05-31 00:26:16 GMT
|
53
|
+
(1 week, 5 days, 5 hours and 22 minutes ago)<br></div>
|
54
|
+
<pre>
|
55
|
+
I am pleased to report that the GCC Steering Committee and the FSF have
|
56
|
+
approved the use of C++ in GCC itself. Of course, there's no reason for
|
57
|
+
us to use C++ features just because we can. The goal is a better
|
58
|
+
compiler for users, not a C++ code base for its own sake.
|
59
|
+
|
60
|
+
Before we start to actually use C++, we need to determine a set of
|
61
|
+
coding standards that will apply to use of C++ within GCC. At first, I
|
62
|
+
believe that we should keep the set of C++ features permitted small, in
|
63
|
+
part so that GCC developers not familiar with C++ are not rapidly
|
64
|
+
overwhelmed by a major change in the implementation language for the
|
65
|
+
compiler itself. We can always use more of C++ later if it seems
|
66
|
+
appropriate to do so, then.
|
67
|
+
|
68
|
+
For example, I think it goes without question that at this point we are
|
69
|
+
limiting ourselves to C++98 (plus "long long" so that we have a 64-bit
|
70
|
+
integer type); C++0x features should not be used. Using multiple
|
71
|
+
inheritance, templates (other than when using the C++ standard library,
|
72
|
+
e.g. std::list<X>), or exceptions also seems overly aggressive to me.
|
73
|
+
We should use features that are relatively easy for C programmers to
|
74
|
+
understand and relatively hard for new C++ programmers to misuse. (For
|
75
|
+
example, I think constructors and destructors are pretty easy and hard
|
76
|
+
to misuse.)
|
77
|
+
|
78
|
+
Because C++ is a big language, I think we should try to enumerate what
|
79
|
+
is OK, rather than what is not OK. But, at the same time, I don't think
|
80
|
+
we should try to get overly legalistic about exactly what is in and what
|
81
|
+
is out. We need information guidelines, not an ISO standard.
|
82
|
+
|
83
|
+
Is there anyone who would like to volunteer to develop the C++ coding
|
84
|
+
standards? I think that this could be done as a Wiki page. (If nobody
|
85
|
+
volunteers, I will volunteer myself.) Whoever ends up doing this, I
|
86
|
+
would urge the rest of us not to spend too much time in the C++
|
87
|
+
coding-standards bikeshed; we're not going to win or lose too much
|
88
|
+
because we do or do not permit default parameters.
|
89
|
+
|
90
|
+
--
|
91
|
+
Mark Mitchell
|
92
|
+
CodeSourcery
|
93
|
+
mark <at> codesourcery.com
|
94
|
+
(650) 331-3385 x713
|
95
|
+
|
96
|
+
</pre>
|
97
|
+
<script type="text/javascript">
|
98
|
+
document.domain = 'gmane.org';
|
99
|
+
document.title = 'Using C in GCC is OK';
|
100
|
+
</script>
|
101
|
+
<td rowspan=2 class="listid">
|
102
|
+
<a href="http://dir.gmane.org/gmane.comp.gcc.devel">
|
103
|
+
<img border=0 rel=nofollow src="http://gmane.org/paint-list-id.php?group=gmane.comp.gcc.devel">
|
104
|
+
</a>
|
105
|
+
</td>
|
106
|
+
</td>
|
107
|
+
</tr>
|
108
|
+
<tr>
|
109
|
+
<td class="bads" colspan=2>
|
110
|
+
|
111
|
+
<!--
|
112
|
+
Get rid of ads<br><a href="http://gmane.org/donate.php">Donate to Gmane</a>
|
113
|
+
-->
|
114
|
+
</td>
|
115
|
+
<td class="maintd" colspan=1 align=left>
|
116
|
+
<script type="text/javascript"><!--
|
117
|
+
google_ad_client = "pub-5884878215917141";
|
118
|
+
google_alternate_ad_url = "http://gmane.org/blank.php";
|
119
|
+
google_ad_width = 728;
|
120
|
+
google_ad_height = 90;
|
121
|
+
google_ad_format = "728x90_as";
|
122
|
+
google_ad_channel ="";
|
123
|
+
google_page_url = document.location;
|
124
|
+
google_color_border = "FFFFFF";
|
125
|
+
google_color_bg = "FFFFFF";
|
126
|
+
google_color_link = "002390";
|
127
|
+
google_color_url = "000000";
|
128
|
+
google_color_text = "000000";
|
129
|
+
google_ad_type = "text_image";
|
130
|
+
//--></script>
|
131
|
+
<script type="text/javascript"
|
132
|
+
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
|
133
|
+
</script>
|
134
|
+
</td>
|
135
|
+
</tr>
|
136
|
+
</table>
|
137
|
+
</body>
|
138
|
+
</html>
|
@@ -2,6 +2,7 @@
|
|
2
2
|
:rww:
|
3
3
|
:title: "Cartoon: Apple Tablet: Now With Barometer and Bird Call Generator"
|
4
4
|
:feed: http://www.readwriteweb.com/rss.xml
|
5
|
+
:lede: I'm just aching to know if the new Apple tablet (insert caveats, weasel words and qualifiers here) is a potential Cintiq competitor. I don't think it will be, but you never know.
|
5
6
|
:feeds:
|
6
7
|
- http://www.readwriteweb.com/rss.xml
|
7
8
|
- http://www.readwriteweb.com/archives/2010/01/cartoon_apple_tablet_now_with_barometer_and_bird_c.xml
|
@@ -33,7 +34,6 @@
|
|
33
34
|
:lede: "Separation of concerns between Factor VM and library codeThe Factor VM implements an abstract machine consisting of a data heap of objects, a code heap of machine code blocks, and a set of stacks. The VM loads an image file on startup, which becomes the data and code heap. "
|
34
35
|
:ledes:
|
35
36
|
- "Separation of concerns between Factor VM and library codeThe Factor VM implements an abstract machine consisting of a data heap of objects, a code heap of machine code blocks, and a set of stacks. The VM loads an image file on startup, which becomes the data and code heap. "
|
36
|
-
- Slava Pestov's weblog, primarily about Factor.
|
37
37
|
:youtube:
|
38
38
|
:title: YMO - Rydeen (Official Video)
|
39
39
|
:author: ymo1965
|
@@ -42,8 +42,7 @@
|
|
42
42
|
:spolsky:
|
43
43
|
:title: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) - Joel on Software
|
44
44
|
:description: Haven't mastered the basics of Unicode and character sets? Please don't write another line of code until you've read this article.
|
45
|
-
:
|
46
|
-
- Ever wonder about that mysterious Content-Type tag? You know, the one you're supposed to put in HTML and you never quite know what it should be?
|
45
|
+
:lede: I've been dismayed to discover just how many software developers aren't really completely up to speed on the mysterious world of character sets, encodings, Unicode, all that stuff. A couple of years ago, a beta tester for FogBUGZ was wondering whether it could handle incoming email in Japanese.
|
47
46
|
:author: Joel Spolsky
|
48
47
|
:favicon: /favicon.ico
|
49
48
|
:feed: http://www.joelonsoftware.com/rss.xml
|
@@ -54,5 +53,21 @@
|
|
54
53
|
:title: "CoffeeScript: A New Language With A Pure Ruby Compiler"
|
55
54
|
:author: Peter Cooper
|
56
55
|
:lede: CoffeeScript (GitHub repo) is a new programming language with a pure Ruby compiler. Creator Jeremy Ashkenas calls it "JavaScript's less ostentatious kid brother" - mostly because it compiles into JavaScript and shares most of the same constructs, but with a different, tighter syntax.
|
57
|
-
:
|
58
|
-
|
56
|
+
:feed: http://www.rubyinside.com/feed/
|
57
|
+
:zefrank:
|
58
|
+
:sentences: If there's anyone who knows how to marshal an online audience, it's Ze Frank. Ze is best-known for his 2006 program "The Show," in which he made a new 2-3 minute video every day for 1 year. Topics ranged from "fingers in food" to the mysteries of airport signage to a tour de force summary of creatives' addiction to un-executed ideas, aka brain crack.
|
59
|
+
:title: "Ze Frank on Imaginary Audiences :: Articles :: The 99 Percent"
|
60
|
+
:description: We chat with the Internet's most notorious mass-collaboration instigator Ze Frank about idea execution and how to build armies of sportsracers.
|
61
|
+
:tweet:
|
62
|
+
:lede: Gobsmacked that TeX/LaTeX (document formatting tools) for OS X is a 1.3GB (yes, GIGAbytes) download OS X. Wow..!
|
63
|
+
:sentences: Gobsmacked that TeX/LaTeX (document formatting tools) for OS X is a 1.3GB (yes, GIGAbytes) download OS Wow..!
|
64
|
+
:datetime: 2010-06-05 12:00:00 +01:00
|
65
|
+
:cant_read:
|
66
|
+
:sentences: "For those of us who grew up as weird kids in the 1980s, the work of Berkeley Breathed was as important as those twin eternal pillars of weird-kid-dom: Monty Python and Mad magazine. In a word: seminal. In two words: fucking seminal."
|
67
|
+
:gmane:
|
68
|
+
:sentences: I am pleased to report that the GCC Steering Committee and the FSF have approved the use of C++ in GCC itself. Of course, there's no reason for us to use C++ features just because we can. The goal is a better compiler for users, not a C++ code base for its own sake.
|
69
|
+
:queness:
|
70
|
+
:title: 18 Incredible CSS3 Effects You Have Never Seen Before
|
71
|
+
:lede: "CSS3 is hot these days and will soon be available in most modern browser. Just recently, I started to become aware to the present of CSS3 around the web. "
|
72
|
+
:sentences: CSS3 is hot these days and will soon be available in most modern browser. Just recently, I started to become aware to the present of CSS3 around the web. I can see some of the websites such as twitter and designer portfolios websites are using it.
|
73
|
+
:datetime: 2010-06-02 12:00:00 +01:00
|