pismo 0.5.0 → 0.6.0
Sign up to get free protection for your applications and to get access to all the features.
- data/LICENSE +19 -28
- data/NOTICE +4 -0
- data/README.markdown +37 -40
- data/Rakefile +3 -2
- data/VERSION +1 -1
- data/bin/pismo +15 -7
- data/lib/pismo/document.rb +2 -2
- data/lib/pismo/internal_attributes.rb +23 -16
- data/lib/pismo/reader.rb +390 -0
- data/lib/pismo.rb +3 -2
- data/pismo.gemspec +23 -15
- data/test/corpus/bbcnews2.html +1575 -0
- data/test/corpus/gmane.html +138 -0
- data/test/corpus/metadata_expected.yaml +20 -5
- data/test/corpus/queness.html +919 -0
- data/test/corpus/reader_expected.yaml +45 -0
- data/test/corpus/tweet.html +360 -0
- data/test/corpus/zefrank.html +535 -0
- data/test/test_corpus.rb +9 -1
- metadata +89 -34
- data/lib/pismo/readability.rb +0 -342
- data/test/test_readability.rb +0 -152
@@ -0,0 +1,138 @@
|
|
1
|
+
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
2
|
+
<html>
|
3
|
+
<head>
|
4
|
+
<title>Gmane -- Mail To News And Back Again</title>
|
5
|
+
<link href="http://gmane.org/img/gmane.css" rel="stylesheet" type="text/css">
|
6
|
+
<link rel="SHORTCUT ICON" href="http://gmane.org/favicon.ico">
|
7
|
+
</head>
|
8
|
+
<body bgcolor=white text=black class=main>
|
9
|
+
<table cellpadding=5 cellspacing=10 class="main">
|
10
|
+
<tr valign=top>
|
11
|
+
<td class="maintd">
|
12
|
+
<div class="tdiv">
|
13
|
+
<div>
|
14
|
+
<a href="http://gmane.org/">Home</a><br>
|
15
|
+
<a href="http://dir.gmane.org/">Reading</a><br>
|
16
|
+
<a href="http://search.gmane.org">Searching</a><br>
|
17
|
+
<a href="http://gmane.org/subscribe.php">Subscribe</a><br>
|
18
|
+
<a href="http://gmane.org/sponsors.php">Sponsors</a><br>
|
19
|
+
<a href="http://gmane.org/stats.php">Statistics</a><br>
|
20
|
+
<a href="http://gmane.org/post.php">Posting</a><br>
|
21
|
+
<a href="http://gmane.org/contact.php">Contact</a><br>
|
22
|
+
<a href="http://gmane.org/spam-control.php">Spam</a><br>
|
23
|
+
<a href="http://gmane.org/find.php">Lists</a><br>
|
24
|
+
<a href="http://gmane.org/links.php">Links</a><br>
|
25
|
+
<a href="http://gmane.org/about.php">About</a><br>
|
26
|
+
<a href="http://gmane.org/host.php">Hosting</a><br>
|
27
|
+
<a href="http://gmane.org/filter.php">Filtering</a><br>
|
28
|
+
<a href="http://gmane.org/features.php">Features</a>
|
29
|
+
<a href="http://gmane.org/dist.php">Download</a><br>
|
30
|
+
<a href="http://gmane.org/logo.php">Marketing</a><br>
|
31
|
+
<a href="http://gmane.org/import.php">Archives</a><br>
|
32
|
+
<a href="http://weaver.gmane.org/">Weaver</a><br>
|
33
|
+
<a href="http://gmane.org/faq.php">FAQ</a><br>
|
34
|
+
<br>
|
35
|
+
</div>
|
36
|
+
<div class="ltd">
|
37
|
+
<!-- <img src="http://gmane.org/img/gmane-25.png" width=25 height=25 alt="Gmane"> -->
|
38
|
+
</div>
|
39
|
+
</div>
|
40
|
+
</td>
|
41
|
+
<td align=center class="maintd">
|
42
|
+
<a href="http://gmane.org/"><img src="http://gmane.org/img/gmane-rot.png" alt="Gmane" border=0></a>
|
43
|
+
</td>
|
44
|
+
<td class="bodytd">
|
45
|
+
<div class="headers">
|
46
|
+
<div class="face">
|
47
|
+
<img border=0 alt="Favicon" src="http://cache.gmane.org/gmane/comp/gcc/devel/114407-favicon.png">
|
48
|
+
</div>
|
49
|
+
From: Mark Mitchell <mark <at> codesourcery.com><br>
|
50
|
+
Subject: <a target="_top" rel="nofollow" href="http://news.gmane.org/find-root.php?message_id=%3c4C030228.8020201%40codesourcery.com%3e">Using C++ in GCC is OK</a><br>
|
51
|
+
Newsgroups: <a href="http://news.gmane.org/gmane.comp.gcc.devel" target="_top">gmane.comp.gcc.devel</a><br>
|
52
|
+
Date: 2010-05-31 00:26:16 GMT
|
53
|
+
(1 week, 5 days, 5 hours and 22 minutes ago)<br></div>
|
54
|
+
<pre>
|
55
|
+
I am pleased to report that the GCC Steering Committee and the FSF have
|
56
|
+
approved the use of C++ in GCC itself. Of course, there's no reason for
|
57
|
+
us to use C++ features just because we can. The goal is a better
|
58
|
+
compiler for users, not a C++ code base for its own sake.
|
59
|
+
|
60
|
+
Before we start to actually use C++, we need to determine a set of
|
61
|
+
coding standards that will apply to use of C++ within GCC. At first, I
|
62
|
+
believe that we should keep the set of C++ features permitted small, in
|
63
|
+
part so that GCC developers not familiar with C++ are not rapidly
|
64
|
+
overwhelmed by a major change in the implementation language for the
|
65
|
+
compiler itself. We can always use more of C++ later if it seems
|
66
|
+
appropriate to do so, then.
|
67
|
+
|
68
|
+
For example, I think it goes without question that at this point we are
|
69
|
+
limiting ourselves to C++98 (plus "long long" so that we have a 64-bit
|
70
|
+
integer type); C++0x features should not be used. Using multiple
|
71
|
+
inheritance, templates (other than when using the C++ standard library,
|
72
|
+
e.g. std::list<X>), or exceptions also seems overly aggressive to me.
|
73
|
+
We should use features that are relatively easy for C programmers to
|
74
|
+
understand and relatively hard for new C++ programmers to misuse. (For
|
75
|
+
example, I think constructors and destructors are pretty easy and hard
|
76
|
+
to misuse.)
|
77
|
+
|
78
|
+
Because C++ is a big language, I think we should try to enumerate what
|
79
|
+
is OK, rather than what is not OK. But, at the same time, I don't think
|
80
|
+
we should try to get overly legalistic about exactly what is in and what
|
81
|
+
is out. We need information guidelines, not an ISO standard.
|
82
|
+
|
83
|
+
Is there anyone who would like to volunteer to develop the C++ coding
|
84
|
+
standards? I think that this could be done as a Wiki page. (If nobody
|
85
|
+
volunteers, I will volunteer myself.) Whoever ends up doing this, I
|
86
|
+
would urge the rest of us not to spend too much time in the C++
|
87
|
+
coding-standards bikeshed; we're not going to win or lose too much
|
88
|
+
because we do or do not permit default parameters.
|
89
|
+
|
90
|
+
--
|
91
|
+
Mark Mitchell
|
92
|
+
CodeSourcery
|
93
|
+
mark <at> codesourcery.com
|
94
|
+
(650) 331-3385 x713
|
95
|
+
|
96
|
+
</pre>
|
97
|
+
<script type="text/javascript">
|
98
|
+
document.domain = 'gmane.org';
|
99
|
+
document.title = 'Using C in GCC is OK';
|
100
|
+
</script>
|
101
|
+
<td rowspan=2 class="listid">
|
102
|
+
<a href="http://dir.gmane.org/gmane.comp.gcc.devel">
|
103
|
+
<img border=0 rel=nofollow src="http://gmane.org/paint-list-id.php?group=gmane.comp.gcc.devel">
|
104
|
+
</a>
|
105
|
+
</td>
|
106
|
+
</td>
|
107
|
+
</tr>
|
108
|
+
<tr>
|
109
|
+
<td class="bads" colspan=2>
|
110
|
+
|
111
|
+
<!--
|
112
|
+
Get rid of ads<br><a href="http://gmane.org/donate.php">Donate to Gmane</a>
|
113
|
+
-->
|
114
|
+
</td>
|
115
|
+
<td class="maintd" colspan=1 align=left>
|
116
|
+
<script type="text/javascript"><!--
|
117
|
+
google_ad_client = "pub-5884878215917141";
|
118
|
+
google_alternate_ad_url = "http://gmane.org/blank.php";
|
119
|
+
google_ad_width = 728;
|
120
|
+
google_ad_height = 90;
|
121
|
+
google_ad_format = "728x90_as";
|
122
|
+
google_ad_channel ="";
|
123
|
+
google_page_url = document.location;
|
124
|
+
google_color_border = "FFFFFF";
|
125
|
+
google_color_bg = "FFFFFF";
|
126
|
+
google_color_link = "002390";
|
127
|
+
google_color_url = "000000";
|
128
|
+
google_color_text = "000000";
|
129
|
+
google_ad_type = "text_image";
|
130
|
+
//--></script>
|
131
|
+
<script type="text/javascript"
|
132
|
+
src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
|
133
|
+
</script>
|
134
|
+
</td>
|
135
|
+
</tr>
|
136
|
+
</table>
|
137
|
+
</body>
|
138
|
+
</html>
|
@@ -2,6 +2,7 @@
|
|
2
2
|
:rww:
|
3
3
|
:title: "Cartoon: Apple Tablet: Now With Barometer and Bird Call Generator"
|
4
4
|
:feed: http://www.readwriteweb.com/rss.xml
|
5
|
+
:lede: I'm just aching to know if the new Apple tablet (insert caveats, weasel words and qualifiers here) is a potential Cintiq competitor. I don't think it will be, but you never know.
|
5
6
|
:feeds:
|
6
7
|
- http://www.readwriteweb.com/rss.xml
|
7
8
|
- http://www.readwriteweb.com/archives/2010/01/cartoon_apple_tablet_now_with_barometer_and_bird_c.xml
|
@@ -33,7 +34,6 @@
|
|
33
34
|
:lede: "Separation of concerns between Factor VM and library codeThe Factor VM implements an abstract machine consisting of a data heap of objects, a code heap of machine code blocks, and a set of stacks. The VM loads an image file on startup, which becomes the data and code heap. "
|
34
35
|
:ledes:
|
35
36
|
- "Separation of concerns between Factor VM and library codeThe Factor VM implements an abstract machine consisting of a data heap of objects, a code heap of machine code blocks, and a set of stacks. The VM loads an image file on startup, which becomes the data and code heap. "
|
36
|
-
- Slava Pestov's weblog, primarily about Factor.
|
37
37
|
:youtube:
|
38
38
|
:title: YMO - Rydeen (Official Video)
|
39
39
|
:author: ymo1965
|
@@ -42,8 +42,7 @@
|
|
42
42
|
:spolsky:
|
43
43
|
:title: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) - Joel on Software
|
44
44
|
:description: Haven't mastered the basics of Unicode and character sets? Please don't write another line of code until you've read this article.
|
45
|
-
:
|
46
|
-
- Ever wonder about that mysterious Content-Type tag? You know, the one you're supposed to put in HTML and you never quite know what it should be?
|
45
|
+
:lede: I've been dismayed to discover just how many software developers aren't really completely up to speed on the mysterious world of character sets, encodings, Unicode, all that stuff. A couple of years ago, a beta tester for FogBUGZ was wondering whether it could handle incoming email in Japanese.
|
47
46
|
:author: Joel Spolsky
|
48
47
|
:favicon: /favicon.ico
|
49
48
|
:feed: http://www.joelonsoftware.com/rss.xml
|
@@ -54,5 +53,21 @@
|
|
54
53
|
:title: "CoffeeScript: A New Language With A Pure Ruby Compiler"
|
55
54
|
:author: Peter Cooper
|
56
55
|
:lede: CoffeeScript (GitHub repo) is a new programming language with a pure Ruby compiler. Creator Jeremy Ashkenas calls it "JavaScript's less ostentatious kid brother" - mostly because it compiles into JavaScript and shares most of the same constructs, but with a different, tighter syntax.
|
57
|
-
:
|
58
|
-
|
56
|
+
:feed: http://www.rubyinside.com/feed/
|
57
|
+
:zefrank:
|
58
|
+
:sentences: If there's anyone who knows how to marshal an online audience, it's Ze Frank. Ze is best-known for his 2006 program "The Show," in which he made a new 2-3 minute video every day for 1 year. Topics ranged from "fingers in food" to the mysteries of airport signage to a tour de force summary of creatives' addiction to un-executed ideas, aka brain crack.
|
59
|
+
:title: "Ze Frank on Imaginary Audiences :: Articles :: The 99 Percent"
|
60
|
+
:description: We chat with the Internet's most notorious mass-collaboration instigator Ze Frank about idea execution and how to build armies of sportsracers.
|
61
|
+
:tweet:
|
62
|
+
:lede: Gobsmacked that TeX/LaTeX (document formatting tools) for OS X is a 1.3GB (yes, GIGAbytes) download OS X. Wow..!
|
63
|
+
:sentences: Gobsmacked that TeX/LaTeX (document formatting tools) for OS X is a 1.3GB (yes, GIGAbytes) download OS Wow..!
|
64
|
+
:datetime: 2010-06-05 12:00:00 +01:00
|
65
|
+
:cant_read:
|
66
|
+
:sentences: "For those of us who grew up as weird kids in the 1980s, the work of Berkeley Breathed was as important as those twin eternal pillars of weird-kid-dom: Monty Python and Mad magazine. In a word: seminal. In two words: fucking seminal."
|
67
|
+
:gmane:
|
68
|
+
:sentences: I am pleased to report that the GCC Steering Committee and the FSF have approved the use of C++ in GCC itself. Of course, there's no reason for us to use C++ features just because we can. The goal is a better compiler for users, not a C++ code base for its own sake.
|
69
|
+
:queness:
|
70
|
+
:title: 18 Incredible CSS3 Effects You Have Never Seen Before
|
71
|
+
:lede: "CSS3 is hot these days and will soon be available in most modern browser. Just recently, I started to become aware to the present of CSS3 around the web. "
|
72
|
+
:sentences: CSS3 is hot these days and will soon be available in most modern browser. Just recently, I started to become aware to the present of CSS3 around the web. I can see some of the websites such as twitter and designer portfolios websites are using it.
|
73
|
+
:datetime: 2010-06-02 12:00:00 +01:00
|