RubyGems - feedparser - Versions diffs - 1.2.0 → 2.0.0 - Mend

feedparser 1.2.0 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (61) hide show

checksums.yaml +4 -4
data/Manifest.txt +2 -50
data/README.md +71 -9
data/Rakefile +1 -1
data/lib/feedparser.rb +2 -0
data/lib/feedparser/builder/microformats.rb +264 -0
data/lib/feedparser/parser.rb +27 -0
data/lib/feedparser/version.rb +2 -2
data/test/helper.rb +3 -57
data/test/test_microformats.rb +52 -0
metadata +10 -56
data/test/feeds/books/nostarch.rss +0 -125
data/test/feeds/books/oreilly.feedburner.atom +0 -387
data/test/feeds/books/pragprog.rss +0 -148
data/test/feeds/byparker.json +0 -643
data/test/feeds/daringfireball.atom +0 -1873
data/test/feeds/daringfireball.json +0 -619
data/test/feeds/googlegroups.atom +0 -37
data/test/feeds/googlegroups2.atom +0 -27
data/test/feeds/headius.atom +0 -123
data/test/feeds/inessential.json +0 -182
data/test/feeds/intertwingly.atom +0 -1197
data/test/feeds/jsonfeed.json +0 -37
data/test/feeds/lambdatheultimate.rss +0 -288
data/test/feeds/learnenough.feedburner.atom +0 -747
data/test/feeds/news/nytimes-blogs-bits.rss +0 -333
data/test/feeds/news/nytimes-paul-krugman.rss +0 -60
data/test/feeds/news/nytimes-tech.rss +0 -653
data/test/feeds/news/nytimes-thomas-l-friedman.rss +0 -80
data/test/feeds/news/nytimes.rss +0 -607
data/test/feeds/news/washingtonpost-blogs-innovations.rss +0 -183
data/test/feeds/news/washingtonpost-politics.rss +0 -35
data/test/feeds/news/washingtonpost-world.rss +0 -29
data/test/feeds/ongoing.atom +0 -1619
data/test/feeds/osm/blog.openstreetmap.rss +0 -252
data/test/feeds/osm/blogs.openstreetmap.rss +0 -585
data/test/feeds/osm/mapbox.rss +0 -1883
data/test/feeds/railstutorial.feedburner.atom +0 -656
data/test/feeds/rubyflow.feedburner.rss +0 -120
data/test/feeds/rubymine.feedburner.rss +0 -314
data/test/feeds/rubyonrails.atom +0 -1241
data/test/feeds/scripting.rss +0 -881
data/test/feeds/sitepoint.rss +0 -218
data/test/feeds/spec/atom/author.atom +0 -48
data/test/feeds/spec/atom/authors.atom +0 -70
data/test/feeds/spec/atom/categories.atom +0 -66
data/test/feeds/spec/json/example.json +0 -36
data/test/feeds/spec/json/microblog.json +0 -43
data/test/feeds/spec/json/tags.json +0 -33
data/test/feeds/spec/rss/author.rss +0 -41
data/test/feeds/spec/rss/categories.rss +0 -64
data/test/feeds/spec/rss/creator.rss +0 -38
data/test/feeds/xkcd.atom +0 -48
data/test/feeds/xkcd.rss +0 -55
data/test/test_atom.rb +0 -27
data/test/test_authors.rb +0 -26
data/test/test_books.rb +0 -25
data/test/test_feeds.rb +0 -29
data/test/test_json.rb +0 -27
data/test/test_rss.rb +0 -26
data/test/test_tags.rb +0 -25

data/test/feeds/googlegroups.atom DELETED

@@ -1,37 +0,0 @@
-<feed xmlns="http://www.w3.org/2005/Atom">
-  <id>https://groups.google.com/d/forum/beerdb</id>
-  <title type="text">Open Beer &amp; Brewery Database (beer.db)</title>
-  <subtitle>Free open public domain beer database &amp;amp; schema (beer.db) for use in any (programming) language (e.g. uses plain text fixtures/data sets). Questions? Comments?</subtitle>
-  <link rel="self" href="https://groups.google.com/forum/feed/beerdb/topics/atom_v1_0.xml" title="beerdb feed"></link>
-  <updated></updated>
-  <generator>Google Groups</generator>
-  <entry>
-    <author>
-      <name>Joe Sixpack</name>
-    </author>
-    <updated>2014-12-17T11:54:43Z</updated>
-    <id>https://groups.google.com/d/topic/beerdb/KpQOUDYJ3J8</id>
-    <link href="https://groups.google.com/d/topic/beerdb/KpQOUDYJ3J8"></link>
-    <title type="text">Planet Beer (Austria, Belgium) - Feeds Incl. Craft Fest Wien, Beer-A-Day, proBier n Friends</title>
-    <summary type="html">Hello, I&apos;ve started putting together a planet site for beer, that is, Planet Beer [1]. The first feed lists include: - Austria [2] - Belgium [3] You&apos;re welcome and invited to suggest new countries and feeds. Cheers. Prost. [1] http://planetbeer.herokuapp.com [2] http://github.com/openbeer/planet/blob/master/</summary>
-  </entry>
-</feed>
----
-feed.format:    atom
-feed.title:     Open Beer & Brewery Database (beer.db)
-feed.url:       https://groups.google.com/d/forum/beerdb
-feed.generator.name: Google Groups
-feed.items[0].title: Planet Beer (Austria, Belgium) - Feeds Incl. Craft Fest Wien, Beer-A-Day, proBier n Friends
-feed.items[0].url:   https://groups.google.com/d/topic/beerdb/KpQOUDYJ3J8
-### todo: fix: &amp;amp;  => &amp;  -> always assume plain text? (by default) - auto-escape xml entities??
-feed.summary: Free open public domain beer database &amp; schema (beer.db) for use in any (programming) language (e.g. uses plain text fixtures/data sets). Questions? Comments?
-### todo: add check for datetime (use to_s ??)
-## feed.updated.to_s:  2014-12-31T15:33:00+00:00
-## feed.items[0].to_s: 2014-12-31T15:33:00+00:00

data/test/feeds/googlegroups2.atom DELETED

@@ -1,27 +0,0 @@
-<feed xmlns="http://www.w3.org/2005/Atom">
-  <id>https://groups.google.com/d/forum/beerdb</id>
-  <title type="text">Open Beer &amp; Brewery Database (beer.db)</title>
-  <subtitle>Free open public domain beer database &amp;amp; schema (beer.db) for use in any (programming) language (e.g. uses plain text fixtures/data sets). Questions? Comments?</subtitle>
-  <link rel="self" href="https://groups.google.com/forum/feed/beerdb/topics/atom_v1_0.xml" title="beerdb feed"></link>
-  <updated></updated>
-  <generator>
-     Google Groups (w/ leading n trailing newlines stripped)
-  </generator>
-  <entry>
-    <author>
-      <name>Joe Sixpack</name>
-    </author>
-    <updated>2014-12-17T11:54:43Z</updated>
-    <id>https://groups.google.com/d/topic/beerdb/KpQOUDYJ3J8</id>
-    <link href="https://groups.google.com/d/topic/beerdb/KpQOUDYJ3J8"></link>
-    <title type="text">Planet Beer (Austria, Belgium) - Feeds Incl. Craft Fest Wien, Beer-A-Day, proBier n Friends</title>
-    <summary type="html">Hello, I&apos;ve started putting together a planet site for beer, that is, Planet Beer [1]. The first feed lists include: - Austria [2] - Belgium [3] You&apos;re welcome and invited to suggest new countries and feeds. Cheers. Prost. [1] http://planetbeer.herokuapp.com [2] http://github.com/openbeer/planet/blob/master/</summary>
-  </entry>
-</feed>
----
-feed.format:     atom
-feed.url:        https://groups.google.com/d/forum/beerdb
-feed.generator.name:  Google Groups (w/ leading n trailing newlines stripped)

data/test/feeds/headius.atom DELETED

@@ -1,123 +0,0 @@
-<?xml version='1.0' encoding='UTF-8'?>
-<?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?>
-<feed
-   xmlns='http://www.w3.org/2005/Atom'
-   xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/'
-   xmlns:blogger='http://schemas.google.com/blogger/2008'
-   xmlns:georss='http://www.georss.org/georss'
-   xmlns:gd="http://schemas.google.com/g/2005"
-   xmlns:thr='http://purl.org/syndication/thread/1.0'>
-   <id>tag:blogger.com,1999:blog-4704664917418794835</id>
-   <updated>2015-01-16T15:31:29.613-08:00</updated>
-   <category term="jruby"/>
-   <category term="ruby"/>
-   <category term="java"/>
-   <category term="rails"/>
-   <category term="netbeans"/>
-   <category term="invokedynamic"/>
-   <category term="javapolis"/>
-   <category term="jvm"/>
-   <category term="jython"/>
-   <category term="python"/><category term="sun"/><category term="yarv"/><category term="applet"/><category term="application bundle"/><category term="compilation"/><category term="compiler"/><category term="dynamic dispatch"/><category term="dynamic languages"/><category term="eclipse"/><category term="enterprise"/><category term="file structure"/><category term="glassfish"/>
-   <category term="gpl"/><category term="grails"/><category term="groovy"/><category term="gsoc"/><category term="humor"/><category term="irb"/><category term="ironruby"/><category term="jdk"/><category term="jruby on rails"/><category term="jruby release"/><category term="jsr223"/><category term="jsr270"/><category term="jsr292"/><category term="keywords"/>
-   <category term="macruby"/><category term="magic"/><category term="mcgovern"/><category term="meetup"/><category term="methods"/><category term="mongrel"/><category term="open source"/><category term="optimization"/><category term="os x"/><category term="presentation"/><category term="programming languages"/><category term="quick outline"/><category term="rubinius"/>
-   <category term="ruby 2.0"/><category term="ruby compiler"/><category term="scripting"/><category term="tech days"/><category term="tiobe"/>
-   <title type='text'>Headius</title>
-   <subtitle type='html'>Helping the JVM Into the 21st Century</subtitle>
-   <link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://blog.headius.com/feeds/posts/default'/>
-   <link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default'/>
-   <link rel='alternate' type='text/html' href='http://blog.headius.com/'/>
-   <link rel='hub' href='http://pubsubhubbub.appspot.com/'/>
-   <link rel='next' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default?start-index=26&amp;max-results=25'/>
-   <author>
-     <name>Charles Nutter</name>
-     <uri>https://plus.google.com/101599370339210456684</uri>
-     <email>noreply@blogger.com</email>
-     <gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh5.googleusercontent.com/-VT5b8KsgHgQ/AAAAAAAAAAI/AAAAAAAAFB0/9d5SU9AcHNQ/s512-c/photo.jpg'/>
-   </author>
-   <generator version='7.00' uri='http://www.blogger.com'>Blogger</generator>
-   <openSearch:totalResults>315</openSearch:totalResults>
-   <openSearch:startIndex>1</openSearch:startIndex>
-   <openSearch:itemsPerPage>25</openSearch:itemsPerPage>
-   <entry>
-     <id>tag:blogger.com,1999:blog-4704664917418794835.post-3430080308857860963</id>
-     <published>2014-05-21T10:44:00.000-07:00</published>
-     <updated>2014-05-21T10:44:01.683-07:00</updated>
-     <title type='text'>JRubyConf.eu 2014!</title>
-     <content type='html'>
-       &lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot;
-       trbidi=&quot;on&quot;&gt;I&#39;m thrilled to announce that we&#39;ll have another edition
-       of &lt;a href=&quot;http://2014.jrubyconf.eu/&quot;&gt;JRubyConf.eu&lt;/a&gt;&amp;nbsp;this year!&lt;br /&gt;&lt;br /&gt;&lt;a href=&quot;http://2013.jrubyconf.eu/&quot;&gt;
-       Last year&#39;s event&lt;/a&gt; was a great success.
-       We had a two-day conference in Berlin immediately before &lt;a href=&quot;http://2013.eurucamp.org/&quot;&gt;Eurucamp 2013&lt;/a&gt;, with two speakers from the core team (myself and &lt;a href=&quot;http://twitter.com/tom_enebo&quot;&gt;Tom Enebo&lt;/a&gt;) and a whopping &lt;b&gt;fifteen&lt;/b&gt;&amp;nbsp;non-core speakers. A great event was had by all.&lt;br /&gt;&lt;br /&gt;This year, we&#39;ve decided to pull the event back to its roots, as part of &lt;a href=&quot;http://2014.eurucamp.org/&quot;&gt;Eurucamp 2014&lt;/a&gt;. We&#39;ll return to the single-track, single-day event co-located with and immediately preceding Eurucamp on 1st August. We really wanted to bring JRuby back to Rubyists, and we&#39;re looking forward to hanging out at Eurucamp the whole weekend!&lt;br /&gt;&lt;br /&gt;Why not visit Eurucamp early and spend a day learning about JRuby with the best JRubyists in Europe?&lt;br /&gt;&lt;br /&gt;If you&#39;re interested in attending, tickets are available for only €99 at the &lt;a href=&quot;http://tickets.eurucamp.org/&quot;&gt;Eurucamp ticket site&lt;/a&gt; now!&lt;br /&gt;&lt;br /&gt;We&#39;re also looking for speakers from the JRuby community. You can submit to the CFP (which ends Sunday 28 May) using the &lt;a href=&quot;http://cfp.eurucamp.org/&quot;&gt;Eurucamp CFP app&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Looking forward to seeing you at JRubyConf and Eurucamp this summer!&lt;/div&gt;
-     </content>
-     <link rel='replies' type='application/atom+xml' href='http://blog.headius.com/feeds/3430080308857860963/comments/default' title='Post Comments'/>
-     <link rel='replies' type='text/html' href='http://blog.headius.com/2014/05/jrubyconfeu-2014.html#comment-form' title='0 Comments'/>
-     <link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/3430080308857860963'/>
-     <link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/3430080308857860963'/>
-     <link rel='alternate' type='text/html' href='http://blog.headius.com/2014/05/jrubyconfeu-2014.html' title='JRubyConf.eu 2014!'/>
-     <author>
-       <name>Charles Nutter</name>
-       <uri>https://plus.google.com/101599370339210456684</uri>
-       <email>noreply@blogger.com</email>
-       <gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh5.googleusercontent.com/-VT5b8KsgHgQ/AAAAAAAAAAI/AAAAAAAAFB0/9d5SU9AcHNQ/s512-c/photo.jpg'/>
-     </author>
-     <thr:total>0</thr:total>
-     </entry>
-     <entry>
-       <id>tag:blogger.com,1999:blog-4704664917418794835.post-462657466694269626</id>
-       <published>2013-06-07T01:58:00.002-07:00</published>
-       <updated>2013-06-07T02:00:53.773-07:00</updated>
-       <title type='text'>The Pain of Broken Subprocess Management on JDK</title>
-       <content type='html'>&lt;script type=&quot;text/javascript&quot;&gt;SyntaxHighlighter.defaults.gutter = false;&lt;/script&gt;&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;I prefer to write happy posts...I really do. But tonight I&#39;m completely defeated by the JDK&#39;s implementation of subprocess launching, and I need to tell the world why.&lt;br /&gt;&lt;br /&gt;JRuby has always strived to mimic MRI&#39;s behavior as much as possible, which in many cases has meant we need to route around the JDK to get at true POSIX APIs and behaviors.&lt;br /&gt;&lt;br /&gt;For example, JRuby has provided the ability to manipulate symbolic links since well before Java 7 provided that capability, using a native POSIX subsystem built atop jnr-ffi, our Java-to-C FFI layer (courtesy of Wayne Meissner). Everyone in the Java world knew for years the lack of symlink support was a gross omission, but most folks just sucked it up and went about their business. We could not afford to do that.&lt;br /&gt;&lt;br /&gt;We&#39;ve repeated this process for many other Ruby features: UNIX sockets, libc-like IO, selectable stdin, filesystem attributes...on and on. And we&#39;ve been able to provide the best POSIX runtime on the JVM &lt;b&gt;bar none&lt;/b&gt;. Nobody has gone as far or done as much as JRuby has.&lt;br /&gt;&lt;br /&gt;Another area where we&#39;ve had to route around the JDK is in subprocess launching and management. The JDK provides java.lang.ProcessBuilder, an API for assembling the appropriate pieces of a subprocess launch, producing a java.lang.Process object. Process in turn provides methods to wait for the subprocess, get access to its streams, and destroy it forcibly. It works great, on the surface.&lt;br /&gt;&lt;br /&gt;Unfortunately, the cake is a lie.&lt;br /&gt;&lt;br /&gt;Under the covers, the JDK implements Process through a complicated series of tricks. We want to be able to interactively control the child process, monitor it for writes, govern its lifecycle exactly. The JDK attempts to provide a consistent experience across all platforms. Unfortunately, those two worlds are not currently compatible, and the resulting experience is consistently awful.&lt;br /&gt;&lt;br /&gt;We&#39;ll start at the bottom to see where things go wrong.&lt;br /&gt;&lt;br /&gt;&lt;h4 style=&quot;text-align: left;&quot;&gt;POSIX, POSIX, Everywhere&lt;/h4&gt;&lt;br /&gt;At the core of ProcessBuilder, inside the native code behind UNIXProcess, we do find somewhat standard POSIX calls to fork and exec, wrapped up in a native downcall forkAndExec:&lt;br /&gt;&lt;br /&gt;&lt;div&gt;&lt;script class=&quot;brush: java;&quot; type=&quot;syntaxhighlighter&quot;&gt;&lt;![CDATA[     /**      * Create a process using fork(2) and exec(2).      *      * @param std_fds array of file descriptors.  Indexes 0, 1, and      *        2 correspond to standard input, standard output and      *        standard error, respectively.  On input, a value of -1      *        means to create a pipe to connect child and parent      *        processes.  On output, a value which is not -1 is the      *        parent pipe fd corresponding to the pipe which has      *        been created.  An element of this array is -1 on input      *        if and only if it is &lt;em&gt;not&lt;/em&gt; -1 on output.      * @return the pid of the subprocess      */     private native int forkAndExec(byte[] prog,                                    byte[] argBlock, int argc,                                    byte[] envBlock, int envc,                                    byte[] dir,                                    int[] std_fds,                                    boolean redirectErrorStream) ]]&gt;&lt;/script&gt;&lt;br /&gt;&lt;/div&gt;The C code behind this is a bit involved, so I&#39;ll summarize what it does.&lt;br /&gt;&lt;ol style=&quot;text-align: left;&quot;&gt;&lt;li&gt;Sets up pipes for in, out, err, and fail to communicate with the eventual child process.&lt;/li&gt;&lt;li&gt;Copies the parent&#39;s descriptors from the pipes into the &quot;fds&quot; array.&lt;/li&gt;&lt;li&gt;Launches the child through a fairly standard fork+exec sequence.&lt;/li&gt;&lt;li&gt;Waits for the child to write a byte to the fail pipe indicating success or failure.&lt;/li&gt;&lt;li&gt;Scrubs the unused sides of the pipes in parent and child.&lt;/li&gt;&lt;li&gt;Returns the child process ID.&lt;/li&gt;&lt;/ol&gt;&lt;div&gt;This is all pretty standard for subprocess launching, and if it proceeded to put those file descriptors into direct, selectable channels we&#39;d have no issues. Unfortunately, things immediately go awry once we return to the Java code.&lt;br /&gt;&lt;br /&gt;&lt;h4 style=&quot;text-align: left;&quot;&gt;Interactive?&lt;/h4&gt;&lt;br /&gt;The call to forkAndExec occurs inside the UNIXProcess constructor, as the very first thing it does. At that point, it has in hand the three standard file descriptors and the subprocess pid, and it knows that the subprocess has at least been successfully forked. The next step is to wrap the file descriptors in appropriate InputStream and OutputStream objects, and this is where we find the first flaw.&lt;br /&gt;&lt;br /&gt;&lt;div&gt;&lt;script class=&quot;brush: java;&quot; type=&quot;syntaxhighlighter&quot;&gt;&lt;![CDATA[             if (std_fds[0] == -1)                 stdin_stream = ProcessBuilder.NullOutputStream.INSTANCE;             else {                 FileDescriptor stdin_fd = new FileDescriptor();                 fdAccess.set(stdin_fd, std_fds[0]);                 stdin_stream = new BufferedOutputStream(                     new FileOutputStream(stdin_fd));             } ]]&gt;&lt;/script&gt;&lt;br /&gt;&lt;/div&gt;This is the code to set up an OutputStream for the input channel of the child process, so we can write to it. Now we know the operating system is going to funnel those written bytes directly to the subprocess&#39;s input stream, and ideally if we&#39;re launching a subprocess we intend to control it...perhaps by sending it interactive commands. Why, then, do we wrap the file descriptor with a BufferedOutputStream? &lt;br /&gt;This is where JRuby&#39;s hacks begin. In our process subsystem, we have the following piece of code, which attempts to unwrap buffering from any stream it is given. &lt;br /&gt;&lt;br /&gt;&lt;div&gt;&lt;script class=&quot;brush: java;&quot; type=&quot;syntaxhighlighter&quot;&gt;&lt;![CDATA[     /**      * Unwrap all filtering streams between the given stream and its actual      * unfiltered stream. This is primarily to unwrap streams that have      * buffers that would interfere with interactivity.      *      * @param filteredStream The stream to unwrap      * @return An unwrapped stream, presumably unbuffered      */     public static OutputStream unwrapBufferedStream(OutputStream filteredStream) {         if (RubyInstanceConfig.NO_UNWRAP_PROCESS_STREAMS) return filteredStream;         while (filteredStream instanceof FilterOutputStream) {             try {                 filteredStream = (OutputStream)                     FieldAccess.getProtectedFieldValue(FilterOutputStream.class,                         &quot;out&quot;, filteredStream);             } catch (Exception e) {                 break; // break out if we&#39;ve dug as deep as we can             }         }         return filteredStream;     } ]]&gt;&lt;/script&gt;&lt;/div&gt;&lt;br /&gt;The FieldAccess.getProtectedFieldValue call there does what you think it does...attempt to read the &quot;out&quot; field from within FilteredOutputStream, which in this case will be the FileOutputStream from above. Unwrapping the stream in this way allows us to do two things:&lt;br /&gt;&lt;ol style=&quot;text-align: left;&quot;&gt;&lt;li&gt;We can do unbuffered writes to (or reads from, in the case of the child&#39;s out and err streams) the child process.&lt;/li&gt;&lt;li&gt;We can get access to the more direct FileChannel for the stream, to do direct ByteBuffer reads and writes or low-level stream copying.&lt;/li&gt;&lt;/ol&gt;&lt;div&gt;So we&#39;re in good shape, right? It&#39;s a bit of hackery, but we&#39;ve got our unbuffered Channel and can interact directly with the subprocess. Is this good enough?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I wish it were.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h4 style=&quot;text-align: left;&quot;&gt;Selectable?&lt;/h4&gt;&lt;div&gt;&lt;br /&gt;The second problem we run into is that users very often would like to select against the output streams of the child process, to perform nonblocking IO operations until the child has actually written some data. It gets reported as a JRuby bug over and over again because there&#39;s simply no way for us to implement it. Why? Because FileChannel is not selectable. &lt;br /&gt;&lt;br /&gt;&lt;script class=&quot;brush: java;&quot; type=&quot;syntaxhighlighter&quot;&gt;&lt;![CDATA[ public abstract class FileChannel     extends AbstractInterruptibleChannel     implements SeekableByteChannel, GatheringByteChannel, ScatteringByteChannel ]]&gt;&lt;/script&gt;&lt;br /&gt;FileChannel implements methods for random-access reads and writes (positioning) and blocking IO interruption (which NIO implements by closing the stream...that&#39;s a rant for another day), but it does not implement any of the logic necessary for doing nonblocking IO using an NIO Selector. This comes up in at least one other place: the JVM&#39;s own standard IO streams are also not selectable, which means you can&#39;t select for user input at the console. Consistent experience indeed...it seems that all interaction with the user or with processes must be treated as file IO, with no selection capabilities. &lt;br /&gt;&lt;br /&gt;(It is interesting to note that the JVM&#39;s standard IO streams are *also* wrapped in buffers, which we dutifully unwrap to provide a truly interactive console.) &lt;br /&gt;&lt;br /&gt;Why are inter-proces file descriptors, which would support selector operations just wonderfully, wrapped in an unselectable channel? I have no idea, and it&#39;s impossible for us to hack around. &lt;br /&gt;&lt;br /&gt;Let&#39;s not dwell on this item, since there&#39;s more to cover. &lt;br /&gt;&lt;br /&gt;&lt;h4 style=&quot;text-align: left;&quot;&gt;Fear the Reaper&lt;/h4&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;You may recall I also wanted to have direct control over the lifecycle of the subprocess, to be able to wait for it or kill it at my own discretion. And on the surface, Process appears to provide these capabilities via the waitFor() and destroy() methods. Again it&#39;s all smoke and mirrors.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Further down in the UNIXProcess constructor, you&#39;ll find this curious piece of code:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;script class=&quot;brush: java;&quot; type=&quot;syntaxhighlighter&quot;&gt;&lt;![CDATA[         /*          * For each subprocess forked a corresponding reaper thread          * is started.  That thread is the only thread which waits          * for the subprocess to terminate and it doesn&#39;t hold any          * locks while doing so.  This design allows waitFor() and          * exitStatus() to be safely executed in parallel (and they          * need no native code).          */          java.security.AccessController.doPrivileged(             new java.security.PrivilegedAction&lt;void&gt;() { public Void run() {                 Thread t = new Thread(&quot;process reaper&quot;) {                     public void run() {                         int res = waitForProcessExit(pid);                         synchronized (UNIXProcess.this) {                             hasExited = true;                             exitcode = res;                             UNIXProcess.this.notifyAll();                         }                     }                 };                 t.setDaemon(true);                 t.start();                 return null; }}); ]]&gt;&lt;/script&gt; &lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;For each subprocess started through this API, the JVM will spin up a &quot;process reaper&quot; thread. This thread is designed to monitor the subprocess for liveness and notify the parent UNIXProcess object when that process has died, so it can pass on that information to the user via the waitFor() and exitValue() API calls.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The interesting bit here is the waitForProcessExit(pid) call, which is another native downcall into C land:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;script class=&quot;brush: c;&quot; type=&quot;syntaxhighlighter&quot;&gt;&lt;![CDATA[ /* Block until a child process exits and return its exit code.    Note, can only be called once for any given pid. */ JNIEXPORT jint JNICALL Java_java_lang_UNIXProcess_waitForProcessExit(JNIEnv* env,                                               jobject junk,                                               jint pid) {     /* We used to use waitid() on Solaris, waitpid() on Linux, but      * waitpid() is more standard, so use it on all POSIX platforms. */     int status;     /* Wait for the child process to exit.  This returns immediately if        the child has already exited. */     while (waitpid(pid, &amp;status, 0) &lt; 0) {         switch (errno) {         case ECHILD: return 0;         case EINTR: break;         default: return -1;         }     }      if (WIFEXITED(status)) {         /*          * The child exited normally; get its exit code.          */         return WEXITSTATUS(status);     } else if (WIFSIGNALED(status)) {         /* The child exited because of a signal.          * The best value to return is 0x80 + signal number,          * because that is what all Unix shells do, and because          * it allows callers to distinguish between process exit and          * process death by signal.          * Unfortunately, the historical behavior on Solaris is to return          * the signal number, and we preserve this for compatibility. */ #ifdef __solaris__         return WTERMSIG(status); #else         return 0x80 + WTERMSIG(status); #endif     } else {         /*          * Unknown exit code; pass it through.          */         return status;     } } ]]&gt;&lt;/script&gt;&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;There&#39;s nothing too peculiar here; this is how you&#39;d wait for the child process to exit if you were writing plain old C code. But there&#39;s a sinister detail you can&#39;t see just by looking at this code: waitpid can be called &lt;b&gt;exactly once&lt;/b&gt;&amp;nbsp;by the parent process.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Part of the Ruby Process API is the ability to get a subprocess PID and wait for it. The concept of a process ID has been around for a long time, and Rubyists (even amateur Rubyists who&#39;ve never written a line of C code) don&#39;t seem to have any problem calling Process.waitpid when they want to wait for a child to exit. JRuby is an implementation of Ruby, and we would ideally like to be able to run all Ruby code that exists, so we also must implement Process.waitpid in some reasonable way. Our choice was to literally call the C function waitpid(2) via our FFI layer.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Here&#39;s the subtle language from the wait(2) manpage (which includes waitpid):&lt;/div&gt;&lt;div&gt;&lt;pre&gt;&lt;br /&gt;RETURN VALUES&lt;br /&gt;     If wait() returns due to a stopped or terminated child&lt;br /&gt;     process, the process ID of the child is returned to the&lt;br /&gt;     calling process.  Otherwise, a value of -1 is returned&lt;br /&gt;     and errno is set to indicate the error.&lt;br /&gt;&lt;br /&gt;     If wait3(), wait4(), or waitpid() returns due to a&lt;br /&gt;     stopped or terminated child process, the process ID of&lt;br /&gt;     the child is returned to the calling process.  If there&lt;br /&gt;     are no children not previously awaited, -1 is returned&lt;br /&gt;     with errno set to [ECHILD].  Otherwise, if WNOHANG is&lt;br /&gt;     specified and there are no stopped or exited children,&lt;br /&gt;     0 is returned. If an error is detected or a caught&lt;br /&gt;     signal aborts the call, a value of -1 is returned and&lt;br /&gt;     errno is set to indicate the error.&lt;br /&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div&gt;There&#39;s a lot of negatives and passives and conditions there, so I&#39;ll spell it out for you more directly: If you call waitpid for a given child PID and someone else in your process has already done so...bad things happen.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;We effectively have to race the JDK to the waitpid call. If we get there first, the reaper thread bails out immediately and does no further work. If we don&#39;t get their first, it becomes impossible for a Ruby user to waitpid for that child process.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Now you may be saying &quot;why don&#39;t you just wait on the Process object and let the JDK do its job, old man? The problem here is that Ruby&#39;s Process API behaves like a POSIX process API: you get a PID back, and you wait on that PID. We can&#39;t mimic that API without returning a PID and implementing Process.waitpid appropriately.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;(Interesting note: we also use reflection tricks to get the real PID out of the java.lang.Process object, since it is not normally exposed.)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Could we have some internal lookup table mapping PIDs to Process objects, and make our wait logic just call Process.waitFor? In order to do so, we&#39;d need to manage a weak-valued map from integers to Process objects...which is certainly doable, but it breaks if someone uses a native library or FFI call to launch a process themselves. Oh, but if it&#39;s not in our table we could do waitpid. And so the onion grows more layers, all because we can&#39;t simply launch a process, get a PID, and wait on it.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;It doesn&#39;t end here, though.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;h4 style=&quot;text-align: left;&quot;&gt;Keep Boiling That Ocean&lt;/h4&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;At this point we&#39;ve managed to at least get interactive streams to the child process, and even if they&#39;re not selectable that&#39;s a big improvement over the standard API. We&#39;ve managed to dig out a process ID and sometimes we can successfully wait for it with a normal waitpid function call. So out of our three goals (interactivity, selectability, lifecycle control) we&#39;re maybe close to halfway there.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Then the JDK engineers go and pull the rug out from under us.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The logic for UNIXProcess has changed over time. Here&#39;s the notable differences in the current JDK 7 codebase:&lt;/div&gt;&lt;div&gt;&lt;ul style=&quot;text-align: left;&quot;&gt;&lt;li&gt;An Executor is now used to avoid spinning up a new thread for each child process. I&#39;d&amp;nbsp;+1 this, if the reaping logic weren&#39;t already causing me headaches.&lt;/li&gt;&lt;li&gt;The streams are now instances of UNIXProcess.ProcessPipeOutputStream and ProcessPipeInputStream. Don&#39;t get excited...they&#39;re still just buffered wrappers around File streams.&lt;/li&gt;&lt;li&gt;The logic run when the child process exist has changed...with catastrophic consequences.&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;Here&#39;s the new stream setup and reaper logic:&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;script class=&quot;brush: java;&quot; type=&quot;syntaxhighlighter&quot;&gt;&lt;![CDATA[      void initStreams(int[] fds) throws IOException {         stdin = (fds[0] == -1) ?             ProcessBuilder.NullOutputStream.INSTANCE :             new ProcessPipeOutputStream(fds[0]);          stdout = (fds[1] == -1) ?             ProcessBuilder.NullInputStream.INSTANCE :             new ProcessPipeInputStream(fds[1]);          stderr = (fds[2] == -1) ?             ProcessBuilder.NullInputStream.INSTANCE :             new ProcessPipeInputStream(fds[2]);          processReaperExecutor.execute(new Runnable() {             public void run() {                 int exitcode = waitForProcessExit(pid);                 UNIXProcess.this.processExited(exitcode);             }});     } ]]&gt;&lt;/script&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Now instead of simply notifying the UNIXProcess that the child has died, there&#39;s a call to processExited().&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;script class=&quot;brush: java;&quot; type=&quot;syntaxhighlighter&quot;&gt;&lt;![CDATA[     void processExited(int exitcode) {         synchronized (this) {             this.exitcode = exitcode;             hasExited = true;             notifyAll();         }          if (stdout instanceof ProcessPipeInputStream)             ((ProcessPipeInputStream) stdout).processExited();          if (stderr instanceof ProcessPipeInputStream)             ((ProcessPipeInputStream) stderr).processExited();          if (stdin instanceof ProcessPipeOutputStream)             ((ProcessPipeOutputStream) stdin).processExited();     } ]]&gt;&lt;/script&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Ok, doesn&#39;t look bad so far. Let&#39;s look at ProcessPipeInputStream, which handles output from the child process.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;script class=&quot;brush: java;&quot; type=&quot;syntaxhighlighter&quot;&gt;&lt;![CDATA[     /**      * A buffered input stream for a subprocess pipe file descriptor      * that allows the underlying file descriptor to be reclaimed when      * the process exits, via the processExited hook.      *      * This is tricky because we do not want the user-level InputStream to be      * closed until the user invokes close(), and we need to continue to be      * able to read any buffered data lingering in the OS pipe buffer.      */     static class ProcessPipeInputStream extends BufferedInputStream {         ProcessPipeInputStream(int fd) {             super(new FileInputStream(newFileDescriptor(fd)));         }          private static byte[] drainInputStream(InputStream in)                 throws IOException {             if (in == null) return null;             int n = 0;             int j;             byte[] a = null;             while ((j = in.available()) &gt; 0) {                 a = (a == null) ? new byte[j] : Arrays.copyOf(a, n + j);                 n += in.read(a, n, j);             }             return (a == null || n == a.length) ? a : Arrays.copyOf(a, n);         }          /** Called by the process reaper thread when the process exits. */         synchronized void processExited() {             // Most BufferedInputStream methods are synchronized, but close()             // is not, and so we have to handle concurrent racing close().             try {                 InputStream in = this.in;                 if (in != null) {                     byte[] stragglers = drainInputStream(in);                     in.close();                     this.in = (stragglers == null) ?                         ProcessBuilder.NullInputStream.INSTANCE :                         new ByteArrayInputStream(stragglers);                     if (buf == null) // asynchronous close()?                         this.in = null;                 }             } catch (IOException ignored) {                 // probably an asynchronous close().             }         }     } ]]&gt;&lt;/script&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So when the child process exits, the any data waiting to be read from its output stream is drained into a buffer. &lt;b&gt;All of it. In memory.&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Did you launch a process that writes a gigabyte of data to its output stream and then terminates? Well, friend, I sure hope you have a gigabyte of memory, because the JDK is going to read that sucker in and there&#39;s nothing you can do about it. And let&#39;s hope there&#39;s not more than 2GB of data, since this code basically just grows a byte[], which in Java can only grow to 2GB. If there&#39;s more than 2GB of data on that stream, this logic errors out and the data is lost forever.  Oh, and by the way...if you happened to be devlishly clever and managed to dig down to the real FileChannel attached to the child process, all the data from that stream has suddenly disappeared, and the channel itself is closed, even if you never got a chance to read from it. Thanks for the help, JDK.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The JDK has managed to both break our clever workarounds (for its previously broken logic) an break itself even more badly. It&#39;s almost like they want to make subprocess launching so dreadfully bad you just don&#39;t use it anymore.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;h4 style=&quot;text-align: left;&quot;&gt;Never Surrender&lt;/h4&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Of course I could cry into my beer over this, but these sorts of problems and challenges are exactly why I&#39;m involved in JRuby and OpenJDK. Obviously this API has gone off the deep end and can&#39;t be saved, so what&#39;s a hacker to do? In our case, we make our own API.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;At this point, that&#39;s our only option. The ProcessBuilder and Process APIs are so terribly broken that we can&#39;t rely on them anymore. Thankfully, JRuby ships with a solid, fast FFI layer called the Java Native Runtime (JNR) that should make it possible for us to write our own process API entirely in Java. We will of course do that in the open, and we are hoping you will help us.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;What&#39;s the moral of the story? I don&#39;t really know. Perhaps it&#39;s that lowest-common-denominator APIs usually trend toward uselessness. Perhaps it&#39;s that ignoring POSIX is an expressway to failure. Perhaps it&#39;s that I don&#39;t know when to quit. In any case, you can count on the JRuby team to continue bringing you the only true POSIX experience on the JVM, and you can count on me to keep pushing OpenJDK to follow our lead.&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;
-     </content>
-     <link rel='replies' type='application/atom+xml' href='http://blog.headius.com/feeds/462657466694269626/comments/default' title='Post Comments'/>
-     <link rel='replies' type='text/html' href='http://blog.headius.com/2013/06/the-pain-of-broken-subprocess.html#comment-form' title='12 Comments'/>
-     <link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/462657466694269626'/>
-     <link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/462657466694269626'/>
-     <link rel='alternate' type='text/html' href='http://blog.headius.com/2013/06/the-pain-of-broken-subprocess.html' title='The Pain of Broken Subprocess Management on JDK'/>
-     <author>
-       <name>Charles Nutter</name>
-       <uri>https://plus.google.com/101599370339210456684</uri>
-       <email>noreply@blogger.com</email>
-       <gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh5.googleusercontent.com/-VT5b8KsgHgQ/AAAAAAAAAAI/AAAAAAAAFB0/9d5SU9AcHNQ/s512-c/photo.jpg'/>
-      </author>
-      <thr:total>12</thr:total>
-    </entry>
-    <entry>
-      <id>tag:blogger.com,1999:blog-4704664917418794835.post-681101033932402497</id>
-      <published>2013-05-11T03:05:00.001-07:00</published>
-      <updated>2013-05-11T03:10:32.549-07:00</updated>
-      <title type='text'>On Languages, VMs, Optimization, and the Way of the World</title>
-      <content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;&lt;div style=&quot;color: #222222; font-family: arial; font-size: small;&quot;&gt;I shouldn&#39;t be up this late, but I&#39;ve been doing lots of thinking and exploring tonight.&lt;/div&gt;&lt;div style=&quot;color: #222222; font-family: arial; font-size: small;&quot;&gt;&lt;br /&gt;In studying various VMs over the past few years, I&#39;ve come up with a list of do&#39;s and don&#39;t that make things optimize right. These apply to languages, the structures that back them, and the VMs that optimize those languages, and from what I&#39;ve seen there&#39;s a lot of immutable truths here given current optimization technology.&lt;/div&gt;&lt;div style=&quot;color: #222222; font-family: arial; font-size: small;&quot;&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style=&quot;color: #222222; font-family: arial; font-size: small;&quot;&gt;Let&#39;s dive in.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h3 style=&quot;text-align: left;&quot;&gt;#1: Types don&#39;t have to be static&lt;/h3&gt;&lt;div style=&quot;color: #222222; font-family: arial; font-size: small;&quot;&gt;&lt;br /&gt;JVM and other dynamic-optimizing runtimes have proven this out. At runtime, it&#39;s possible to gather the same information static types would provide you at compile time, leading to optimizations at least as good as fully statically-typed, statically-optimized code. In some cases, it may be possible to do a better job, since runtime profiling is based on real execution, real branch percentages, real behavior, rather than a guess at what a program might do. You could probably make the claim that static optimization is a halting problem, and dynamic optimization eventually can beat it by definition since it can optimize what the program is actually doing.&lt;/div&gt;&lt;div style=&quot;color: #222222; font-family: arial; font-size: small;&quot;&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style=&quot;color: #222222; font-family: arial; font-size: small;&quot;&gt;However, this requires one key thing to really work well.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h3 style=&quot;text-align: left;&quot;&gt;#2: Types need to be predictable&lt;/h3&gt;&lt;div style=&quot;color: #222222; font-family: arial; font-size: small;&quot;&gt;&lt;br /&gt;In order for runtime optimization to happen, objects need to have predictable types and those types need to have a predictable structure. This isn&#39;t to say that types must be statically declared...they just need to look the same on repeat visits. If objects can change type (smalltalk&#39;s become, perl&#39;s and C&#39;s weak typing) you&#39;re forced to include more guards against those changes, or you&#39;re forced to invalidate more code whenever something changes (or in the case of C, you just completely shit the bed when things aren&#39;t as expected). If change is possible and exposed at a language level, there may be nothing you can do to cope with all those different type shapes, and optimization can only go so far.&lt;/div&gt;&lt;div style=&quot;color: #222222; font-family: arial; font-size: small;&quot;&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style=&quot;color: #222222; font-family: arial; font-size: small;&quot;&gt;This applies both to the shape of a type&#39;s method table (methods remaining consistent once encountered) and the shape of the type&#39;s instances (predictable object layout). Many dynamically-typed languages impose dynamic type shape and object shape on VMs that run them, preventing those VMs from making useful predictions about how to optimize code. Optimistic predictions (generating synthetic types for known type shapes or preemptively allocating objects based on previously-seen shapes) still have to include fallback logic to maintain the mutable behavior, should it ever be needed. Again, optimization potential is limited, because the shape of the world can change on a whim and the VM has to be vigilent&lt;/div&gt;&lt;div style=&quot;color: #222222; font-family: arial; font-size: small;&quot;&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style=&quot;color: #222222; font-family: arial; font-size: small;&quot;&gt;The alternative summation of #1 and #2 is that types don&#39;t have to be statically declared, but they need to be statically defined. Most popular dynamic languages do neither, but all they really need to do is the latter.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h3 style=&quot;text-align: left;&quot;&gt;#3: You can&#39;t cheat the CPU&lt;/h3&gt;&lt;div style=&quot;color: #222222; font-family: arial; font-size: small;&quot;&gt;&lt;br /&gt;Regardless of how clever you&#39;d like to be in your code or language or VM or JIT, the limiting factor is how modern CPUs actually run your code. There&#39;s a long list of expectations you must meet to squeeze every last drop of speed out of a system, and diverging from those guidelines will always impose a penalty. This is the end...the bottom turtle...the unifying theory. It is, at the end of the day, the CPU you must appease to get the best performance. All other considerations fall out of that, and anywhere performance does not live up to expectations you are guaranteed to discover that someone tried to cheat the CPU.&lt;/div&gt;&lt;div style=&quot;color: #222222; font-family: arial; font-size: small;&quot;&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style=&quot;color: #222222; font-family: arial; font-size: small;&quot;&gt;Traditionally, static typing was the best way to guarantee we produced good CPU instructions. It gave us a clear picture of the world we could ponder and meditate over, eventually boiling out the secrets of the universe and producing the fastest possible code. But that always assumed a narrow vision of a world with unlimited resources. It assumed we could make all the right decisions for a program ahead of time and that no limitations outside our target instruction set would ever affect us. In the real world, however, CPUs have limited cache sizes, multiple threads, bottlenecked memory pipelines, and basic physics to contend with (you can only push so many electrons through a given piece of matter without blowing it up). Language and VM authors ignore the expectations of their target systems only at great peril.&lt;/div&gt;&lt;div style=&quot;color: #222222; font-family: arial; font-size: small;&quot;&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style=&quot;color: #222222; font-family: arial; font-size: small;&quot;&gt;Let&#39;s look at a few languages and where they fit.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h3 style=&quot;text-align: left;&quot;&gt;Language Scorecard&lt;/h3&gt;&lt;div style=&quot;color: #222222; font-family: arial; font-size: small;&quot;&gt;&lt;br /&gt;Java is statically typed and types are of a fixed shape. This is the ideal situation mostly because of the type structure being predictable. Once encountered, a rose is just a rose. Given appropriate dynamic optimizations, there&#39;s no reason Java code can&#39;t compete with or surpass statically-typed and statically-compiled C/++, and in theory there&#39;s nothing preventing Java code from becoming optimal CPU instructions.&lt;/div&gt;&lt;div style=&quot;color: #222222; font-family: arial; font-size: small;&quot;&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style=&quot;color: #222222; font-family: arial; font-size: small;&quot;&gt;Dart is dynamically typed (or at least, types are optional and the VM doesn&#39;t care about them), but types are of a fixed shape. If programmers can tolerate fixed-shape types, Dart provides a very nice dynamic language that still can achieve the same optimizations as statically-typed Java or statically-compiled C/++.&lt;/div&gt;&lt;div style=&quot;color: #222222; font-family: arial; font-size: small;&quot;&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style=&quot;color: #222222; font-family: arial; font-size: small;&quot;&gt;Groovy is dynamically typed with some inference and optimization if you specify static types, but most (all?) types defined in Groovy are not guaranteed to be a fixed shape. As a result, even when specifying static types, guards must be inserted to check that those types&#39; shapes have not changed. Groovy does, however, guarantee object shape is consistent over time, which avoids overhead from being able to reshape objects at runtime.&lt;/div&gt;&lt;div style=&quot;color: #222222; font-family: arial; font-size: small;&quot;&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style=&quot;color: #222222; font-family: arial; font-size: small;&quot;&gt;Ruby and JavaScript are dynamically typed and types and objects can change shape at runtime. This is a confluence of all the hardest-to-optimize language characteristics. In both cases, the best we can do is to attempt to predict common type and object shapes and insert guards for when we&#39;re wrong, but it&#39;s not possible to achieve the performance of a system with fully-predictable type and object shapes. Prove me wrong.&lt;/div&gt;&lt;div style=&quot;color: #222222; font-family: arial; font-size: small;&quot;&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style=&quot;color: #222222; font-family: arial; font-size: small;&quot;&gt;Now of course when I say it&#39;s not possible, I mean it&#39;s not possible for the general case. Specific cases of a known closed-world application can indeed be optimized as though the types and objects involved had static shapes. I do something along these lines in my RubyFlux compiler, which statically analyzes incoming Ruby code and assumes the methods it sees defined and the fields it sees accessed will be the only methods and fields it ever needs to worry about. But that requires omitting features that can mutate type and object structure, or else you have to have a way to know which types and objects those features will affect. Sufficiently smart compiler indeed.&lt;/div&gt;&lt;div style=&quot;color: #222222; font-family: arial; font-size: small;&quot;&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style=&quot;color: #222222; font-family: arial; font-size: small;&quot;&gt;Python has similar structural complexities to Ruby and adds in the additional complexity of an introspectable call stack. Under those circumstances, even on-stack execution state is not safe; a VM can&#39;t even make guarantees about the values it has in hand or the shape of a given call&#39;s activation. PyPy does an admirable job of attacking this problem by rewriting currently-running code and lifting on-stack state to the heap when it is accessed, but this approach prevents dropping unused local state (since you can&#39;t predict who might want to see it) and also fails to work under parallel execution (since you can&#39;t rewrite code another thread might be executing). Again, the dynamicity of a &quot;cool&quot; feature brings with it intrinsic penalties that are reducible but not removable.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h3 style=&quot;text-align: left;&quot;&gt;Get to the Damn Point, Already&lt;/h3&gt;&lt;div style=&quot;color: #222222; font-family: arial; font-size: small;&quot;&gt;&lt;br /&gt;So what am I trying to say in all this? I started the evening by exploring a benchmark post comparing Dart&#39;s VM with JVM on the same benchmark. The numbers were not actually very exciting...with a line-by-line port from Dart to Java, Java came out slightly behind Dart. With a few modifications to the Java code, Java pulled slightly ahead. With additional modifications to the Dart code, it might leapfrog Java again. But this isn&#39;t interesting because Dart and Java can both rely on type and object shapes remaining consistent, and as a result the optimizations they perform can basically accomplish the same thing. Where it matters, they&#39;re similar enough that VMs don&#39;t care about the differences.&lt;/div&gt;&lt;div style=&quot;color: #222222; font-family: arial; font-size: small;&quot;&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style=&quot;color: #222222; font-family: arial; font-size: small;&quot;&gt;Where does this put languages I love, like Ruby? It&#39;s probably fair to concede that Ruby can&#39;t ever achieve the raw, straight-line performance of type-static (not statically-typed) languages like Dart or Java, regardless of the VM technologies involved. We&#39;ll be able to get close; JRuby can, with the help of invokedynamic, make method calls *nearly* as fast as Java calls, and by generating type shapes we can make object state *nearly* as predictable as Java types, but we can&#39;t go all the way. Regardless of how great the underlying VM is, if you can&#39;t hold to its immutable truths, you&#39;re walking against the wind. Ruby on Dart would probably not be any faster than Ruby on JVM, because you&#39;d still have to implement mutable types and growable objects in pretty much the same way. Ruby on PyPy might be able to go farther, since the VM is designed for mutable types and growable objects, but you might have to sacrifice parallelism or accept that straight-line object-manipulating performance won&#39;t go all the way to a Java or Dart. Conversely, languages that make those type-static guarantees might be able to beat dynamic languages when running on dynamic language VMs (e.g. dart2js) for exactly the same reasons that they excel on their own VMs: they provide a more consistent view of the world, and offer no surprises to the VM that would hinder optimization. You trade dynamicity at the language level for predictability at the VM level.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h3 style=&quot;text-align: left;&quot;&gt;The Actual Lesson&lt;/h3&gt;&lt;div style=&quot;color: #222222; font-family: arial; font-size: small;&quot;&gt;&lt;br /&gt;I guess the bottom line for me is realizing that there&#39;s always going to be a conflict between what programmers want out of programming languages and what&#39;s actually possible to give them. There&#39;s no magical fairy world where every language can be as fast as every other language, because there&#39;s no way to predict how every program is going to execute (or in truth, how a given program is going to execute given a general strategy). And that&#39;s ok; most of these languages can still get very close to each other in performance, and over time the dynamic type/object-shaped languages may offer ways to ratchet down some of that dynamism...or they might not care and just accept what limitations result. The important thing is for language users to recognize that nothing is free, and to understand the implications of language features and design decisions they make in their own programs.&lt;/div&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.headius.com/feeds/681101033932402497/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://blog.headius.com/2013/05/on-languages-vms-optimization-and-way.html#comment-form' title='15 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/681101033932402497'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/681101033932402497'/><link rel='alternate' type='text/html' href='http://blog.headius.com/2013/05/on-languages-vms-optimization-and-way.html' title='On Languages, VMs, Optimization, and the Way of the World'/><author><name>Charles Nutter</name><uri>https://plus.google.com/101599370339210456684</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh5.googleusercontent.com/-VT5b8KsgHgQ/AAAAAAAAAAI/AAAAAAAAFB0/9d5SU9AcHNQ/s512-c/photo.jpg'/></author><thr:total>15</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4704664917418794835.post-5392245422382118146</id><published>2013-01-05T08:47:00.000-08:00</published><updated>2013-01-05T08:47:32.452-08:00</updated><category scheme="http://www.blogger.com/atom/ns#" term="invokedynamic"/><category scheme="http://www.blogger.com/atom/ns#" term="jruby"/><category scheme="http://www.blogger.com/atom/ns#" term="jvm"/><category scheme="http://www.blogger.com/atom/ns#" term="optimization"/><category scheme="http://www.blogger.com/atom/ns#" term="ruby"/><title type='text'>Constant and Global Optimization in JRuby 1.7.1 and 1.7.2</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;With every JRuby release, there&#39;s always at least a handful of optimizations. They range from tiny improvements in the compiler to perf-aware rewrites of core class methods, but they&#39;re almost always driven by real-world cases.&lt;br /&gt;&lt;br /&gt;In JRuby 1.7.1 and 1.7.2, I made several improvements to the performance of Ruby constants and global variables that might be of some interest to you, dear reader.&lt;br /&gt;&lt;br /&gt;&lt;h2 style=&quot;text-align: left;&quot;&gt;Constants&lt;/h2&gt;&lt;div&gt;In Ruby, a constant is a lexically and hierarchically accessed variable that starts with a capital letter. Class and module names like Object, Kernel, String, are all constants defined under the Object class. When I say constants are both lexical and hierarchically accessed, what I mean is that at access time we first search outward through lexically-enclosing scopes, and failing that we search through the class hierarchy of the innermost scope. For example:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;script src=&quot;https://gist.github.com/4459891.js?file=file1.rb&quot;&gt;&lt;/script&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Here, the first two constant accesses inside class B are successful; the first (IN_FOO) is located lexically in Foo, because it encloses the body of class B. The second (IN_A) is located hierarchically by searching B&#39;s ancestors. The third access fails, because the IN_BAR constant is only available within the Bar module&#39;s scope, so B can&#39;t see it.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Constants also...aren&#39;t. It is possible to redefine a constant, or define new constants deeper in a lexical or hierarchical strcture that mask earlier ones. However in most code (i.e. &quot;good&quot; code) constants eventually stabilize. This makes it possible to perform a variety of optimizations against them, even though they&#39;re not necessarily static.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Constants are used heavily throughout Ruby, both for constant values like Float::MAX and for classes like Array or Hash. It is therefore especially important that they be as fast as possible.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h2 style=&quot;text-align: left;&quot;&gt;Global Variables&lt;/h2&gt;&lt;div&gt;Globals in Ruby are about like you&#39;d expect...name/value pairs in a global namespace. They start with &amp;nbsp;$ character. Several global variables are &quot;special&quot; and exist in a more localized source, like $~ (last regular expression match in this call frame), $! (last exception raised in this thread), and so on. Use of these &quot;local globals&quot; mostly just amounts to special variable names that are always available; they&#39;re not really true global variables.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Everyone knows global variables should be discouraged, but that&#39;s largely referring to global variable use in normal program flow. Using global state across your application – potentially across threads – is a pretty nasty thing to do to yourself and your coworkers. But there are some valid uses of globals, like for logging state and levels, debugging flags, and truly global constructs like standard IO.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;script src=&quot;https://gist.github.com/4459891.js?file=file2.rb&quot;&gt;&lt;/script&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Here, we&#39;re using the global $DEBUG to specify whether logging should occur in MyApp#log. Those log messages are written to the stderr stream accessed via $stderr. Note also that $DEBUG can be set to true by passing -d at the JRuby command line.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h2 style=&quot;text-align: left;&quot;&gt;Optimizing Constant Access (pre-1.7.1)&lt;/h2&gt;&lt;div&gt;I&#39;ve posted in the past about how JRuby optimizes constant access, so I&#39;ll just quickly review that here.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;At a given access point, constant values are looked up from the current lexical scope and cached. Because constants can be modified, or new constants can be introduce that mask earlier ones, the JRuby runtime (org.jruby.Ruby) holds a global constant invalidator checked on each access to ensure the previous value is still valid.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;On non-invokedynamic JVMs, verifying the cache involves an object identity comparison every time, which means a non-final value must be accessed via a couple levels of indirection. This adds a certain amount of overhead to constant access, and also makes it impossible for the JVM to fold multiple constant accesses away, or make static decisions based on a constant&#39;s value.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;On an invokedynamic JVM, the cache verification is in the form of a SwitchPoint. SwitchPoint is a type of on/off guard used at invokedynamic call sites to represent a hard failure. Because it can only be switched off, the JVM is able to optimize the SwitchPoint logic down to what&#39;s called a &quot;safe point&quot;, a very inexpensive ping back into the VM. As a result, constant accesses under invokedynamic can be folded away, and repeat access or unused accesses are not made at all.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;However, there&#39;s a problem. In JRuby 1.7.0 and earlier, the only way we could access the current lexical scope (in a StaticScope object) was via the current call frame&#39;s DynamicScope, a heap-based object created on each activation of a given body of code. In order to reduce the performance hit to methods containing constants, we introduced a one-time DynamicScope called the &quot;dummy scope&quot;, attached to the lexical scope and only created once. This avoided the huge hit of constructing a DynamicScope for every call, but caused constant-containing methods to be considerably slower than those without constants.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h2 style=&quot;text-align: left;&quot;&gt;Lifting Lexical Scope Into Code&lt;/h2&gt;&lt;div&gt;In JRuby 1.7.1, I decided to finally bite the bullet and make the lexical scope available to all method bodies, without requiring a DynamicScope intermediate. This was a&amp;nbsp;&lt;a href=&quot;https://github.com/jruby/jruby/compare/fb65c539a9b4f52d1d063dbe36de69217ab6a896...ad5d07291d09f57849f873d405607fbb6fed1544&quot;&gt;nontrivial piece of work&lt;/a&gt;&amp;nbsp;that took several days to get right, so although most of the work occurred before JRuby 1.7.0 was released, we opted to let it bake a bit before release.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The changes made it possible for all class, module, method, and block bodies to access their lexical scope essentially for free. It also helped us finally deliver on the promise of truly free constant access when running under invokedynamic.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So, does it work?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;script src=&quot;https://gist.github.com/4459891.js?file=file3.rb&quot;&gt;&lt;/script&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Assuming constant access is free, the three loops here should perform identically. The non-expression calls to foo and bar should disappear, since they both return a constant value that&#39;s never used. The calls for decrementing the &#39;a&#39; variable should produce a constant value &#39;1&#39; and perform the same as the literal decrement in the control loop.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Here&#39;s Ruby (MRI) 2.0.0 performance on this benchmark.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;script src=&quot;https://gist.github.com/4459891.js?file=file4.rb&quot;&gt;&lt;/script&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The method call itself adds a significant amount of overhead here, and the constant access adds another 50% of that overhead. Ruby 2.0.0 has done a lot of work on performance, but the cost of invoking Ruby methods and accessing constants remains high, and constant accesses do not fold away as you would like.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Here&#39;s JRuby 1.7.2 performance on the same benchmark.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;script src=&quot;https://gist.github.com/4459891.js?file=file5.rb&quot;&gt;&lt;/script&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;We obviously run all cases significantly faster than Ruby 2.0.0, but the important detail is that the method call adds only about 11% overhead to the control case, and constant access adds almost nothing.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;For comparison, here&#39;s JRuby 1.7.0, which did not have free access to lexical scopes.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;script src=&quot;https://gist.github.com/4459891.js?file=file6.rb&quot;&gt;&lt;/script&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So by avoiding the intermediate DynamicScope, methods containing constant accesses are somewhere around 7x faster than before. Not bad.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h2 style=&quot;text-align: left;&quot;&gt;Optimizing Global Variables&lt;/h2&gt;&lt;div&gt;Because global variables have a much simpler structure than constants, they&#39;re pretty easy to optimize. I had not done so up to JRuby 1.7.1 mostly because I didn&#39;t see a compelling use case and didn&#39;t want to encourage their use. However, after Tony Arcieri pointed out that invokedynamic-optimized global variables could be used to add logging and profiling to an application with zero impact when disabled, I was convinced. Let&#39;s look at the example from above again.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;script src=&quot;https://gist.github.com/4459891.js?file=file2.rb&quot;&gt;&lt;/script&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In this example, we would ideally like there to be no overhead at all when $DEBUG is untrue, so we&#39;re free to add optional logging throughout the application with no penalty. In order to support this, two improvements were needed.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;First, I modified our invokedynamic logic to cache global variables using a per-variable SwitchPoint. This makes access to mostly-static global variables as free as constant access, with the same performance improvements.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Second, I added some smarts into the compiler for conditional forms like &quot;if $DEBUG&quot; that would avoid re-checking the $DEBUG value at all if it were false the first time (and start checking it again if it were modified).&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;It&#39;s worth noting I also made this second optimization for constants; code like &quot;if DEBUG_ENABLED&quot; will also have the same performance characteristics.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Let&#39;s see how it performs.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;script src=&quot;https://gist.github.com/4459891.js?file=file7.rb&quot;&gt;&lt;/script&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In this case, we should again expect that all three forms have identical performance. Both the constant and the global resolve to an untrue value, so they should ideally not introduce any overhead compared to the bare method.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Here&#39;s Ruby (MRI) 2.0.0:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;script src=&quot;https://gist.github.com/4459891.js?file=file8.rb&quot;&gt;&lt;/script&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Both the global and the constant add overhead here in the neighborhood of 25% over an empty method. This means you can&#39;t freely add globally-conditional logic to your application without accepting a performance hit.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;JRuby 1.7.2:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;script src=&quot;https://gist.github.com/4459891.js?file=file9.rb&quot;&gt;&lt;/script&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Again we see JRuby +&amp;nbsp;invokedynamic optimizing method calls considerably better than MRI, but additionally we see that the untrue global conditions add no overhead compared to the empty method. You can freely use globals as conditions for logging, profiling, and other code you&#39;d like to have disabled most of the time.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;And finally, JRuby 1.7.1, which optimized constants, did not optimize globals, and did not have specialized conditional logic for either:&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;script src=&quot;https://gist.github.com/4459891.js?file=file10.rb&quot;&gt;&lt;/script&gt;&lt;/div&gt;&lt;h2 style=&quot;text-align: left;&quot;&gt;Where Do We Go From Here?&lt;/h2&gt;&lt;div&gt;Hopefully I&#39;ve helped show that we&#39;re really just seeing the tip of the iceberg as far as optimizing JRuby using invokedynamic. More than anything we want you to report real-world use cases that could benefit from additional optimization, so we can target our work effectively. And as always, please try out your apps on JRuby, enable JRuby testing in Travis CI, and let us know what we can do to make your JRuby experience better!&lt;/div&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.headius.com/feeds/5392245422382118146/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://blog.headius.com/2013/01/constant-and-global-optimization-in.html#comment-form' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/5392245422382118146'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/5392245422382118146'/><link rel='alternate' type='text/html' href='http://blog.headius.com/2013/01/constant-and-global-optimization-in.html' title='Constant and Global Optimization in JRuby 1.7.1 and 1.7.2'/><author><name>Charles Nutter</name><uri>https://plus.google.com/101599370339210456684</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh5.googleusercontent.com/-VT5b8KsgHgQ/AAAAAAAAAAI/AAAAAAAAFB0/9d5SU9AcHNQ/s512-c/photo.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4704664917418794835.post-5309576998658669333</id><published>2012-11-19T06:38:00.001-08:00</published><updated>2012-11-19T10:36:04.211-08:00</updated><title type='text'>Refining Ruby</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;What does the following code do?&lt;br /&gt;&lt;br /&gt;&lt;script src=&quot;https://gist.github.com/4110634.js?file=ref_1.rb&quot;&gt;&lt;/script&gt; If you answered &quot;it upcases two strings and adds them together, returning the result&quot; you might be wrong because of a new Ruby feature called &quot;refinements&quot;.&lt;br /&gt;&lt;br /&gt;Let&#39;s start with the problem refinements are supposed to solve: monkey-patching.&lt;br /&gt;&lt;br /&gt;&lt;h2 style=&quot;text-align: left;&quot;&gt;Monkey-patching&lt;/h2&gt;&lt;div&gt;In Ruby, all classes are mutable. Indeed, when you define a new class, you&#39;re really just creating an empty class and filling it with methods. The ability to mutate classes at runtime has been used (or abused) by many libraries and frameworks to decorate Ruby&#39;s core classes with additional (or replacement) behavior. For example, you might add a &quot;camelize&quot; method to String that knows how to convert under_score_names to camelCaseNames. This is lovingly called &quot;monkey-patching&quot; by the Ruby community.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Monkey-patching can be very useful, and many patterns in Ruby are built around the ability to modify classes. It can also cause problems if a library patches code in a way the user does not expect (or want), or if two libraries try to apply conflicting patches. Sometimes, you simply don&#39;t want patches to apply globally, and this is where refinements come in.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h2 style=&quot;text-align: left;&quot;&gt;Localizing Monkeypatches&lt;/h2&gt;&lt;div&gt;Refinements have been discussed as a feature for several years, sometimes under the name &quot;selector namespaces&quot;. In essence, refinements are intended to allow monkey-patching only within certain limited scopes, like within a library that wants to use altered or enhanced versions of core Ruby types without affecting code outside the library. This is the case within the ActiveSupport library that forms part of the core of Rails.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;ActiveSupport provides a number of extensions (patches) to the core Ruby classes like String#pluralize, Range#overlaps?, and Array#second. Some of these extensions are intended for use by Ruby developers, as conveniences that improve the readability or conciseness of code. Others exist mostly to support Rails itself. In both cases, it would be nice if we could prevent those extensions from leaking out of ActiveSupport into code that does not want or need them.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h2 style=&quot;text-align: left;&quot;&gt;Refinements&lt;/h2&gt;&lt;div&gt;In short, refinements provide a way to make class modifications that are only seen from within certain scopes. In the following example, I add a &quot;camelize&quot; method to the String class that&#39;s only seen from code within the Foo class.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;script src=&quot;https://gist.github.com/4110634.js?file=ref_2.rb&quot;&gt;&lt;/script&gt; &lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;With the Foo class refined, we can see that the &quot;camelize&quot; method is indeed available within the &quot;camelize_string&quot; method but not outside of the Foo class.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;script src=&quot;https://gist.github.com/4110634.js?file=ref_3.txt&quot;&gt;&lt;/script&gt; &lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;On the surface, this seems like exactly what we want. Unfortunately, there&#39;s a lot more complexity here than meets the eye.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h2 style=&quot;text-align: left;&quot;&gt;Ruby Method Dispatch&lt;/h2&gt;&lt;div&gt;In order to do a method call in Ruby, a runtime simply looks at the target object&#39;s class hierarchy, searches for the method from bottom to top, and upon finding it performs the call. A smart runtime will cache the method to avoid performing this search every time, but in general the mechanics of looking up a method body are rather simple.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In an implementation like JRuby, we might cache the method at what&#39;s called the &quot;call site&quot;—the point in Ruby code where a method call is actually performed. In order to know that the method is valid for future calls, we perform two checks at the call site: that the incoming object is of the same type as for previous calls; and that the type&#39;s hierarchy has not been mutated since the method was cached.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Up to now, method dispatch in Ruby has depended solely on the type of the target object. The calling context has not been important to the method lookup process, other than to confirm that visibility restrictions are enforced (primarily for protected methods, since private methods are rejected for non–self calls). That simplicity has allowed Ruby implementations to optimize method calls and Ruby programmers to understand code by simply determining the target object and methods available on it.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Refinements change everything.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h2 style=&quot;text-align: left;&quot;&gt;Refinements Basics&lt;/h2&gt;&lt;div&gt;Let&#39;s revisit the camelize example again.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;script src=&quot;https://gist.github.com/4110634.js?file=ref_2.rb&quot;&gt;&lt;/script&gt; &lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The visible manifestation of refinements comes via the &quot;refine&quot; and &quot;using&quot; methods.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The &quot;refine&quot; method takes a class or module (the String class, in this case) and a block. Within the block, methods defined (camelize) are added to what might be called a patch set (a la monkey-patching) that can be applied to specific scopes in the future. The methods are not actually added to the refined class (String) except in a &quot;virtual&quot; sense when a body of code activates the refinement via the &quot;using&quot; method.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The &quot;using&quot; method takes a refinement-containing module and applies it to the current scope. Methods within that scope should see the refined version of the class, while methods outside that scope do not.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Where things get a little weird is in defining exactly what that scope should be and in implementing refined method lookup in such a way that does not negatively impact the performance of unrefined method lookup. In the current implementation of refinements, a &quot;using&quot; call affects all of the following scopes related to where it is called:&lt;/div&gt;&lt;div&gt;&lt;ul style=&quot;text-align: left;&quot;&gt;&lt;li&gt;The direct scope, such as the top-level of a script, the body of a class, or the body of a method or block&lt;/li&gt;&lt;li&gt;Classes down-hierarchy from a refined class or module body&lt;/li&gt;&lt;li&gt;Bodies of code run via eval forms that change the &quot;self&quot; of the code, such as module_eval&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;It&#39;s worth emphasizing at this point that refinements can affect code far away from the original &quot;using&quot; call site. It goes without saying that refined method calls must now be aware of both the target type and the calling scope, but what of unrefined calls?&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;h2 style=&quot;text-align: left;&quot;&gt;Dynamic Scoping of Method Lookup&lt;/h2&gt;&lt;div&gt;Refinements (in their current form) basically cause method lookup to be dynamically scoped. In order to properly do a refined call, we need to know what refinements are active for the context in which the call is occurring and the type of the object we&#39;re calling against. The latter is simple, obviously, but determining the former turns out to be rather tricky.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h3 style=&quot;text-align: left;&quot;&gt;Locally-applied refinements&lt;/h3&gt;&lt;div&gt;In the simple case, where a &quot;using&quot; call appears alongside the methods we want to affect, the immediate calling scope contains everything we need. Calls in that scope (or in child scopes like method bodies) would perform method lookup based on the target class, a method name, and the hierarchy of scopes that surrounds them. The key for method lookup expands from a simple name to a name plus a call context.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h3 style=&quot;text-align: left;&quot;&gt;Hierarchically-applied refinements&lt;/h3&gt;&lt;div&gt;Refinements applied to a class must also affect subclasses, so even when we don&#39;t have a &quot;using&quot; call present we still may need to do refined dispatch. The following example illustrates this with a subclass of Foo (building off the previous example).&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;script src=&quot;https://gist.github.com/4110634.js?file=ref_4.rb&quot;&gt;&lt;/script&gt; &lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Here, the camelize method is used within a &quot;map&quot; call, showing that refinements used by the Foo class apply to Bar, its method definitions, and any subscopes like blocks within those methods. It should be apparent now why my first example might not do what you expect. Here&#39;s my first example again, this time with the Quux class visible.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;script src=&quot;https://gist.github.com/4110634.js?file=ref_5.rb&quot;&gt;&lt;/script&gt; &lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The Quux class uses refinements from the BadRefinement module, effectively changing String#upcase to actually do String#reverse. By looking at the Baz class alone you can&#39;t tell what&#39;s supposed to happen, even if you are certain that str1 and str2 are always going to be String. Refinements have effectively localized the changes applied by the BadRefinement module, but they&#39;ve also made the code more difficult to understand; the programmer (or the reader of the code) must know everything about the calling hierarchy to reason about method calls and expected results.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h3 style=&quot;text-align: left;&quot;&gt;Dynamically-applied refinements&lt;/h3&gt;&lt;div&gt;One of the key features of refinements is to allow block-based DSLs (domain-specific languages) to decorate various types of objects without affecting code outside the DSL. For example, an RSpec spec.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;script src=&quot;https://gist.github.com/4110634.js?file=ref_6.rb&quot;&gt;&lt;/script&gt; &lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;There&#39;s several calls here that we&#39;d like to refine.&lt;/div&gt;&lt;div&gt;&lt;ul style=&quot;text-align: left;&quot;&gt;&lt;li&gt;The &quot;describe&quot; method is called at the top of the script against the &quot;toplevel&quot; object (essentially a singleton Object instance). We&#39;d like to apply a refinement at this level so &quot;describe&quot; does not have to be defined on Object itself.&lt;/li&gt;&lt;li&gt;The &quot;it&quot; method is called within the block passed to &quot;describe&quot;. We&#39;d like whatever self object is live inside that block to have an &quot;it&quot; method without modifying self&#39;s type directly.&lt;/li&gt;&lt;li&gt;The &quot;should&quot; method is called against an instance of MyClass, presumably a user-created class that does not define such a method. We would like to refine MyClass to have the &quot;should&quot; method only within the context of the block we pass to &quot;it&quot;.&lt;/li&gt;&lt;li&gt;Finally, the &quot;be_awesome&quot; method—which RSpec translates into a call to MyClass#awesome?—should be available on the self object active in the &quot;it&quot; block without actually adding be_awesome to self&#39;s type.&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;In order to do this without having a &quot;using&quot; present in the spec file itself, we need to be able to dynamically apply refinements to code that might otherwise not be refined. The current implementation does this via Module#module_eval (or its argument-receiving brother, Module#module_exec).&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;A block of code passed to &quot;module_eval&quot; or &quot;instance_eval&quot; will see its self object changed from that of the original surrounding scope (the self at block creation time) to the target class or module. This is frequently used in Ruby to run a block of code as if it were within the body of the target class, so that method definitions affect the &quot;module_eval&quot; target rather than the code surrounding the block.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;We can leverage this behavior to apply refinements to any block of code in the system. Because refined calls must look at the hierarchy of classes in the surrounding scope, every call in every block in every piece of code can potentially become refined in the future, if the block is passed via module_eval to a refined hierarchy. The following simple case might not do what you expect, even if the String class has not been modified directly.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;script src=&quot;https://gist.github.com/4110634.js?file=ref_7.rb&quot;&gt;&lt;/script&gt; &lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Because the &quot;+&quot; method is called within a block, all bets are off. The str_ary passed in might not be a simple Array; it could be any user class that implements the &quot;inject&quot; method. If that implementation chooses, it can force the incoming block of code to be refined. Here&#39;s a longer version with such an implementation visible.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;script src=&quot;https://gist.github.com/4110634.js?file=ref_8.rb&quot;&gt;&lt;/script&gt; &lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Suddenly, what looks like a simple addition of two strings produces a distinctly different result.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;script src=&quot;https://gist.github.com/4110634.js?file=ref_9.txt&quot;&gt;&lt;/script&gt; &lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Now that you know how refinements work, let&#39;s discuss the problems they create.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h2 style=&quot;text-align: left;&quot;&gt;Implementation Challenges&lt;/h2&gt;&lt;div&gt;Because I know that most users don&#39;t care if a new, useful feature makes my life as a Ruby implementer harder, I&#39;m not going to spend a great deal of time here.&amp;nbsp;My concerns revolve around the complexities of knowing when to do a refined call and how to discover those refinements.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Current Ruby implementations are all built around method dispatch depending solely on the target object&#39;s type, and much of the caching and optimization we do depends on that. With refinements in play, we must also search and guard against types in the caller&#39;s context, which makes lookup much more complicated. Ideally we&#39;d be able to limit this complexity to only refined calls, but because &quot;using&quot; can affect code far away from where it is called, we often have no way to know whether a given call might be refined in the future. This is especially pronounced in the &quot;module_eval&quot; case, where code that isn&#39;t even in the same class hierarchy as a refinement must still observe it.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;There are numerous ways to address the implementation challenges.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h3 style=&quot;text-align: left;&quot;&gt;Eliminate the &quot;module_eval&quot; Feature&lt;/h3&gt;&lt;div&gt;At present, nobody knows of an easy way to implement the &quot;module_eval&quot; aspect of refinements. The current implementation in MRI does it in a brute-force way, flushing the global method cache on every execution and generating a new, refined, anonymous module for every call. Obviously this is not a feasible direction to go; block dispatch will happen very frequently at runtime, and we can&#39;t allow refined blocks to destroy performance for code elsewhere in the system.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The basic problem here is that in order for &quot;module_eval&quot; to work, every block in the system must be treated as a refined body of code all the time. That means that calls inside blocks throughout the system need to search and guard against the calling context even if no refinements are ever applied to them. The end result is that those calls suffer complexity and performance hits across the board.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;At the moment, I do not see (nor does anyone else see) an efficient way to handle the &quot;module_eval&quot; case. It should be removed.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h3 style=&quot;text-align: left;&quot;&gt;Localize the &quot;using&quot; Call&lt;/h3&gt;&lt;div&gt;No new Ruby feature should cause across-the-board performance hits; one solution is for refinements to be recognized at parse time. This makes it easy to keep existing calls the way they are and only impose refinement complexity upon method calls that are actually refined.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The simplest way to do this is also the most limiting and the most cumbersome: force &quot;using&quot; to only apply to the immediate scope. This would require every body of code to &quot;using&quot; a refinement if method calls in that body should be refined. Here&#39;s a couple of our previous examples with this modification.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;script src=&quot;https://gist.github.com/4110634.js?file=ref10.rb&quot;&gt;&lt;/script&gt; &lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This is obviously pretty ugly, but it makes implementation much simpler. In every scope where we see a &quot;using&quot; call, we simply force all future calls to honor refinements. Calls appearing outside &quot;using&quot; scopes do not get refined and perform calls as normal.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;We can improve this by making &quot;using&quot; apply to child scopes as well. This still provides the same parse-time &quot;pseudo-keyword&quot; benefit without the repetition.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;script src=&quot;https://gist.github.com/4110634.js?file=ref11.rb&quot;&gt;&lt;/script&gt; &lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Even better would be to officially make &quot;using&quot; a keyword and have it open a refined scope; that results in a clear delineation between refined and unrefined code. I show two forms of this below; the first opens a scope like &quot;class&quot; or &quot;module&quot;, and the second uses a &quot;do...end&quot; block form.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;script src=&quot;https://gist.github.com/4110634.js?file=ref12.rb&quot;&gt;&lt;/script&gt; &lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;It would be fair to say that requiring more explicit scoping of &quot;using&quot; would address my concern about knowing when to do a refined call. It does not, however, address the issues of locating active refinements at call time.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h3 style=&quot;text-align: left;&quot;&gt;Locating Refinements&lt;/h3&gt;&lt;div&gt;In each of the above examples, we still must pass some state from the calling context through to the method dispatch logic. Ideally we&#39;d only need to pass in the calling object, which is already passed through for visibility checking. This works for refined class hierarchies, but it does not work for the RSpec case, since the calling object in some cases is just the top-level Object instance (and remember we don&#39;t want to decorate Object).&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;It turns out that there&#39;s already a feature in Ruby that follows lexical scoping: constant lookup. When Ruby code accesses a constant, the runtime must first search all enclosing scopes for a definition of that constant. Failing that, the runtime will walk the self object&#39;s class hierarchy. This is similar to what we want for the simplified version of refinements.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;If we assume we&#39;ve localized refinements to only calls within &quot;using&quot; scopes, then at parse time we can emit something like a RefinedCall for every method call in the code. A RefinedCall would be special in that it uses both the containing scope and the target class to look up a target method. The lookup process would proceed as follows:&lt;/div&gt;&lt;div&gt;&lt;ol style=&quot;text-align: left;&quot;&gt;&lt;li&gt;Search the call&#39;s context for refinements, walking lexical scopes only&lt;/li&gt;&lt;li&gt;If refinements are found, search for the target method&lt;/li&gt;&lt;li&gt;If a refined method is found, use it for the call&lt;/li&gt;&lt;li&gt;Otherwise, proceed with normal lookup against the target object&#39;s class&lt;/li&gt;&lt;/ol&gt;&lt;div&gt;Because the parser has already isolated refinement logic to specific calls, the only change needed is to pass the caller&#39;s context through to method dispatch.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;h2 style=&quot;text-align: left;&quot;&gt;Usability Concerns&lt;/h2&gt;&lt;div&gt;There are indeed flavors of refinements that can be implemented reasonably efficiently, or at least implemented in such a way that unrefined code will not pay a price. I believe this is a requirement of any new feature: do no harm. But harm can come in a different form if a new feature makes Ruby code harder to reason about. I have some concerns here.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Let&#39;s go back to our &quot;module_eval&quot; case.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;script src=&quot;https://gist.github.com/4110634.js?file=ref_7.rb&quot;&gt;&lt;/script&gt; &lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Because there&#39;s no &quot;using&quot; anywhere in the code, and we&#39;re not extending some other class, most folks will assume we&#39;re simply concatenating strings here. After all, why would I expect my &quot;+&quot; call to do something else? Why &lt;b&gt;should&lt;/b&gt;&amp;nbsp;my &quot;+&quot; call ever do something else here?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Ruby has many features that might be considered a little &quot;magical&quot;. In most cases, they&#39;re only magic because the programmer doesn&#39;t have a good understanding of how they work. Constant lookup, for example, is actually rather simple...but if you don&#39;t know it searches both lexical and hierarchical contexts, you may be confused where values are coming from.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The &quot;module_eval&quot; behavior of refinements simply goes too far. It forces every Ruby programmer to second-guess every block of code they pass into someone else&#39;s library or someone else&#39;s method call. The guarantees of standard method dispatch no longer apply; you need to know if the method you&#39;re calling will change what calls your code makes. You need to understand the internal details of the target method. That&#39;s a terrible, terrible thing to do to Rubyists.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The same goes for refinements that are active down a class hierarchy. You can no longer extend a class and know that methods you call actually do what you expect. Instead, you have to know whether your parent classes or their ancestors refine some call you intend to make. I would argue this is considerably &lt;b&gt;worse&lt;/b&gt;&amp;nbsp;than directly monkey-patching some class, since at least in that case every piece of code has a uniform view.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The problems are compounded over time, too. As libraries you use change, you need to again review them to see if refinements are in play. You need to understand all those refinements just to be able to reason about your own code. And you need to hope and pray two libraries you&#39;re using don&#39;t define different refinements, causing one half of your application to behave one way and the other half of your application to behave another way.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I believe the current implementation of refinements introduces more complexity than it solves, mostly due to the lack of a strict lexical &quot;using&quot;. Rubyists should be able to look at a piece of code and know what it does based solely on the types of objects it calls. Refinements make that impossible.&lt;br /&gt;&lt;br /&gt;&lt;i style=&quot;font-weight: bold;&quot;&gt;Update:&lt;/i&gt;&amp;nbsp;Josh Ballanco points out another usability problem: &quot;using&quot; only affects method bodies defined temporally after it is called. For example, the following code only refines the &quot;bar&quot; method, not the &quot;foo&quot; method.&lt;br /&gt;&lt;br /&gt;&lt;script src=&quot;https://gist.github.com/4110634.js?file=ref13.rb&quot;&gt;&lt;/script&gt;&lt;br /&gt;This may simply be an artifact of the current implementation, or it may be specified behavior; it&#39;s hard to tell since there&#39;s no specification of any kind other than the implementation and a handful of tests. In any case, it&#39;s yet another confusing aspect, since it means the order in which code is loaded can actually change which refinements are active.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;h2 style=&quot;text-align: left;&quot;&gt;tl;dr&lt;/h2&gt;&lt;div&gt;My point here is not to beat down refinements. I agree there are cases where they&#39;d be very useful, especially given the sort of monkey-patching I&#39;ve seen in the wild. But the current implementation overreaches; it provides several features of questionable value, while simultaneously making both performance and understandability harder to achieve. Hopefully we&#39;ll be able to work with Matz and ruby-core to come up with a more reasonable, limited version of refinements...or else convince them not to include refinements in Ruby 2.0.&lt;/div&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.headius.com/feeds/5309576998658669333/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://blog.headius.com/2012/11/refining-ruby.html#comment-form' title='19 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/5309576998658669333'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/5309576998658669333'/><link rel='alternate' type='text/html' href='http://blog.headius.com/2012/11/refining-ruby.html' title='Refining Ruby'/><author><name>Charles Nutter</name><uri>https://plus.google.com/101599370339210456684</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh5.googleusercontent.com/-VT5b8KsgHgQ/AAAAAAAAAAI/AAAAAAAAFB0/9d5SU9AcHNQ/s512-c/photo.jpg'/></author><thr:total>19</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4704664917418794835.post-6804148115098747648</id><published>2012-10-15T11:26:00.001-07:00</published><updated>2012-10-15T11:40:18.324-07:00</updated><title type='text'>So You Want To Optimize Ruby</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;I was recently asked for a list of &quot;hard problems&quot; a Ruby implementation really needs to solve before reporting benchmark numbers. You know...the sort of problems that might invalidate early perf numbers because they impact how you optimize Ruby. This post is a rework of my response...I hope you find it informative!&lt;br /&gt;&lt;h4 style=&quot;text-align: left;&quot;&gt;Fixnum to Bignum promotion&lt;/h4&gt;In Ruby, Fixnum math can promote to Bignum when the result is out of Fixnum&#39;s range. On implementations that use tagged pointers to represent Fixnum (MRI, Rubinius, MacRuby), the Fixnum range is somewhat less than the base CPU bits (32/64). On JRuby, Fixnum is always a straight 64-bit signed value.&lt;br /&gt;&lt;br /&gt;This promotion is a performance concern for a couple reasons:&lt;br /&gt;&lt;ul style=&quot;text-align: left;&quot;&gt;&lt;li&gt;Every math operation that returns a new Fixnum must be range-checked. This slows all Fixnum operations.&lt;/li&gt;&lt;li&gt;It is difficult (if not impossible) to predict whether a Fixnum math operation will return a Fixnum or a Bignum. Since Bignum is always represented as a full object (not a primitive or a tagged pointer) this impacts optimizing Fixnum math call sites.&lt;/li&gt;&lt;/ul&gt;&lt;h4 style=&quot;text-align: left;&quot;&gt;Floating-point performance&lt;/h4&gt;A similar concern is the performance of floating point values. Most of&amp;nbsp;the native implementations have tagged values for Fixnum but only one&amp;nbsp;I know of (Macruby) uses tagged values for Float. This can skew&amp;nbsp;expectations because an implementation may perform very well on integer math and&amp;nbsp;considerably worse on floating-point math due to the objects created (and collected). JRuby uses objects for both Fixnum and Float, so performance is roughly equivalent (and slower than I&#39;d like).&lt;br /&gt;&lt;h4 style=&quot;text-align: left;&quot;&gt;Closures&lt;/h4&gt;&lt;div&gt;Any language that supports closures (&quot;blocks&quot; in Ruby) has to deal with efficiently accessing frame-local data from calls down-stack. In Java, both anonymous inner classes and the upcoming lambda feature treat frame-local values (local variables, basically) as immutable...so their values can simply be copied into the closure object or carried along in some other way. In Ruby, local variables are always mutable, so an eventual activation of a closure body needs to be able to write into its containing frame. If a runtime does not support arbitrary frame access (as is the case on the JVM) it may have to allocate a separate data structure to represent those frame locals...and that impacts performance.&lt;/div&gt;&lt;h4 style=&quot;text-align: left;&quot;&gt;Bindings and eval&lt;/h4&gt;The eval methods in Ruby can usually accept an optional binding under which to run. This means any call to binding must return a fully-functional execution environment, and in JRuby this means both eval and binding force a full deoptimization of the surrounding method body.&lt;br /&gt;&lt;br /&gt;There&#39;s an even more unpleasant aspect to this, however: every block can be used as a binding too.&lt;br /&gt;&lt;br /&gt;All blocks can be&amp;nbsp;turned into Proc and used as bindings, which means every block in the&amp;nbsp;system has to have full access to values in the containing call frame. Most implementers hate this feature, since it means that optimizing call frames in the presence of blocks is much more difficult. Because they can be used as a binding, that of course means&amp;nbsp;literally all frame data must be accessible: local variables;&amp;nbsp;frame-local $ variables like $~; constants lookup environment; method visibility; and so on.&lt;br /&gt;&lt;h4 style=&quot;text-align: left;&quot;&gt;callcc and Continuation&lt;/h4&gt;JRuby doesn&#39;t implement callcc since the JVM doesn&#39;t support continuations, but any implementation hoping to optimize Ruby will have to take a stance here. Continuations obviously make optimization more difficult since you can branch into and out of execution contexts in rather unusual ways.&lt;br /&gt;&lt;h4 style=&quot;text-align: left;&quot;&gt;Fiber implementation&lt;/h4&gt;In JRuby, each Fiber runs on its own thread (though we pool the native thread to reduce Fiber spin-up costs). Other than that they&amp;nbsp;operate pretty much like closures.&lt;br /&gt;&lt;br /&gt;A Ruby implementer needs to decide whether it will use C-style native stack juggling (which makes optimizations like frame elimination trickier to implement) or give Fibers their own stacks in which to execute independently.&lt;br /&gt;&lt;h4 style=&quot;text-align: left;&quot;&gt;Thread/frame/etc local $globals&lt;/h4&gt;Thread globals are easy, obviously. All(?) host systems already have some repesentation of thread-local values.&amp;nbsp;The tricky ones are explicit frame&amp;nbsp;globals like $~ and $_ and implicit frame-local values like&amp;nbsp;visibility, etc.&lt;br /&gt;&lt;br /&gt;In the case of $~ and $_, the challenge is not in representing accesses of them directly but in handling implicit reads and writes of them that cross call boundaries. For example, calling [] on a String and passing a Regexp will cause the caller&#39;s frame-local $~ (and related values) to be updated to the MatchData for the pattern match that happens inside [].
-     There are a number of core Ruby methods like this that can reach back into the caller&#39;s frame and read or write these values. This obviously makes reducing or eliminating call frames very tricky.&lt;br /&gt;&lt;br /&gt;In JRuby, we track all core methods that read or write these values, and if we see those methods called in a body of code (the names, mind you...this is a static inspection), we will stand up a call frame for that body. This is not ideal. We would like to move these values into a separate stack that&#39;s lazily allocated only when actually needed, since methods that cross frames like String#[] force other methods like Array#[] to deoptimize too.&lt;br /&gt;&lt;h4 style=&quot;text-align: left;&quot;&gt;C extension support&lt;/h4&gt;If a given Ruby implementation is likely to fit into the &quot;native&quot; side of Ruby&amp;nbsp;implementations (as opposed to implementations like JRuby or IronRuby that target an existing managed runtime), it will need to have a C extension story.&lt;br /&gt;&lt;br /&gt;Ruby&#39;s C&amp;nbsp;extension API is easier to support than some languages&#39; native APIs (e.g. no reference-counting as in Python)&amp;nbsp;but it still very much impacts how a runtime optimizes. Because the API needs to return forever-valid object references, implementations that don&#39;t give out pointers will have to maintain a handle table. The API includes a number of macros that provide access to object internals; they&#39;ll need to be simulated or explicitly unsupported. And the API makes no guarantees about concurrency and provides few primitives for controlling concurrent execution, so most implementations will need to lock around native downcalls.&lt;br /&gt;&lt;br /&gt;An alternative for a new Ruby implementation is to expect extensions to be written in the host runtime&#39;s native language (Java or other JVM languages for JRuby; C# or other .NET languages for IronRuby, etc). However this imposes a burden on folks implementing language extensions, since they&#39;ll have to support yet another language to cover all Ruby implementations.&lt;br /&gt;&lt;br /&gt;Ultimately, though, the unfortunate fact for most &quot;native&quot; impls is that regardless of how fast&amp;nbsp;you can run Ruby code, the choke point is often going to be the C API&amp;nbsp;emulation, since it will require a lot of handle-juggling and indirection&amp;nbsp;compared to MRI. So without supporting the C API, there&#39;s a very large&amp;nbsp;part of the story missing...a part of the story that accesses frame&amp;nbsp;locals, closure bodies, bindings, and so on.&lt;br /&gt;&lt;br /&gt;Of course if you can run Ruby code as fast as C, maybe it won&#39;t&amp;nbsp;matter. :) Users can just implement their extensions in Ruby.&amp;nbsp;JRuby is starting to approach that kind of performance for non-numeric,&amp;nbsp;non-closure cases, but that sort of perf is not yet widespread enough to&amp;nbsp;bank on.&lt;br /&gt;&lt;h4 style=&quot;text-align: left;&quot;&gt;Ruby 1.9 encoding support&lt;/h4&gt;Any benchmark that touches anything relating to binary text&amp;nbsp;data must have encoding support, or you&#39;re really fudging the&amp;nbsp;numbers. Encoding touches damn near everything, and can add a significant amount of overhead to String-manipulating benchmarks.&lt;br /&gt;&lt;h4 style=&quot;text-align: left;&quot;&gt;Garbage collection and object allocation&lt;/h4&gt;It&#39;s easy for a new impl to show good performance on benchmarks that&amp;nbsp;do no allocation (or little allocation) and require no GC, like raw numerics (fib, tak, etc).&amp;nbsp;Macruby and Rubinius, for example, really shine here. But many impls&amp;nbsp;have drastically different performance when an algorithm starts&amp;nbsp;allocating objects.&amp;nbsp;Very&amp;nbsp;few applications are doing pure integer numeric algorithms, so object&lt;br /&gt;allocation and GC performance are an absolutely critical part of the performance story.&lt;br /&gt;&lt;h4 style=&quot;text-align: left;&quot;&gt;Concurrency / Parallelism&lt;/h4&gt;If you intend to be an impl that supports parallel thread execution,&amp;nbsp;you&#39;re going to have to deal with various issues before publishing&amp;nbsp;numbers. For example, threads can #kill or #raise each other, which in&lt;br /&gt;a truly parallel runtime requires periodic safepoints/pings to know&amp;nbsp;whether a cross-thread event has fired. If you&#39;re not handling those&amp;nbsp;safepoints, you&#39;re not telling the whole story, since they impact execution.&lt;br /&gt;&lt;br /&gt;There&#39;s also the thread-safety of runtime structures to be considered. As an example,&amp;nbsp;Rubinius until recently had a hard lock around a data structure responsible for invalidating call sites, which&amp;nbsp;meant that its simple inline cache could see a severe performance&amp;nbsp;degradation at polymorphic call sites (they&#39;ve since added polymorphic caching to ameliorate this case). The thread-safety of a Ruby implementation&#39;s core runtime structures can drastically impact even straight-line, non-concurrent performance.&lt;br /&gt;&lt;br /&gt;Of course, for an impl that doesn&#39;t support parallel execution (which&amp;nbsp;would put it in the somewhat more limited realm of MRI), you can get away with GIL&amp;nbsp;scheduling tricks. You just won&#39;t have a very good in-process scaling story.&lt;br /&gt;&lt;h4 style=&quot;text-align: left;&quot;&gt;Tracing/debugging&lt;/h4&gt;All current impls support tracing or debugging APIs, though some (like&lt;br /&gt;JRuby) require you to enable support for them via command-line or compile-time flags. A Ruby implementation needs to have an answer for&amp;nbsp;this, since the runtime-level hooks required will have an impact...and may&amp;nbsp;require users to opt-in.&lt;br /&gt;&lt;h4&gt;ObjectSpace&lt;/h4&gt;ObjectSpace#each_object needs to be addressed before talking about&amp;nbsp;performance. In JRuby, supporting each_object over arbitrary types was&amp;nbsp;a major performance issue, since we had to track all objects in a&amp;nbsp;separate data structure in case they were needed. We ultimately&amp;nbsp;decided each_object would only work with Class and Module, since those&amp;nbsp;were the major practical use cases (and tracking Class/Module hierarchies is far easier than tracking all objects in the system).&lt;br /&gt;&lt;br /&gt;Depending on how a Ruby implementation tracks in-memory objects (and depending on the level of accuracy expected from ObjectSpace#each_object) this can impact how allocation logic and GC are optimized.&lt;br /&gt;&lt;h4 style=&quot;text-align: left;&quot;&gt;Method invalidation&lt;/h4&gt;Several implementations can see severe global effects due to methods like Object#extend&amp;nbsp;blowing all global caches (or at least several caches), so you need to be&amp;nbsp;able to support #extend in a reasonable way before talking about&amp;nbsp;performance. Singleton objects also have a similar effect, since they&amp;nbsp;alter the character of method caches by introducing new anonymous types at&amp;nbsp;any time (and sometimes, in rapid succession).&lt;br /&gt;&lt;br /&gt;In JRuby, singleton and #extend effects are limited to the call sites that see them. I also have an experimental branch that&#39;s smarter about type identity, so simple anonymous types (that have only had modules included or extended into them) will not damage caches at all. Hopefully we&#39;ll land that in a future release.&lt;br /&gt;&lt;h4 style=&quot;text-align: left;&quot;&gt;Constant lookup and invalidation&lt;/h4&gt;I believe all implementations have implemented constant cache&amp;nbsp;invalidation as a global invalidation, though there are other more&amp;nbsp;complicated ways to do it. The main challenge is the fact that constant lookup is tied to both lexical scope and class hiearchy, so invalidating individual constant lookup sites is usually infeasible. Constant lookup is also rather tricky&amp;nbsp;and must be implemented correctly&amp;nbsp;before talking about the performance of any benchmark that references&amp;nbsp;constants.&lt;br /&gt;&lt;h4 style=&quot;text-align: left;&quot;&gt;Rails&lt;/h4&gt;&lt;div&gt;Finally, regardless of how awesome a new Ruby implementation claims to be, most users will simply ask &quot;but does it run Rails?&quot; You can substitute your favorite framework or library, if you like...the bottom line is that an awesome Ruby implementation that doesn&#39;t run any Ruby applications is basically useless. Beware of crowing about your victory over Ruby performance before you can run code people actually care about.&lt;/div&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.headius.com/feeds/6804148115098747648/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://blog.headius.com/2012/10/so-you-want-to-optimize-ruby.html#comment-form' title='17 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/6804148115098747648'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/6804148115098747648'/><link rel='alternate' type='text/html' href='http://blog.headius.com/2012/10/so-you-want-to-optimize-ruby.html' title='So You Want To Optimize Ruby'/><author><name>Charles Nutter</name><uri>https://plus.google.com/101599370339210456684</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh5.googleusercontent.com/-VT5b8KsgHgQ/AAAAAAAAAAI/AAAAAAAAFB0/9d5SU9AcHNQ/s512-c/photo.jpg'/></author><thr:total>17</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4704664917418794835.post-5284790001981835990</id><published>2012-09-26T14:03:00.003-07:00</published><updated>2012-09-26T14:06:40.428-07:00</updated><title type='text'>Explanation of Warnings From MRI&#39;s Test Suite</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;JRuby has, for some time now, run the same &lt;a href=&quot;https://github.com/jruby/jruby/tree/master/test/externals/ruby1.9&quot;&gt;test suite as MRI&lt;/a&gt; (C Ruby, Matz&#39;s Ruby). Because not all tests pass, we use &lt;a href=&quot;https://github.com/seattlerb/minitest-excludes&quot;&gt;minitest-excludes&lt;/a&gt; to mask out the failures, and over time we unmask stuff as we fix it.&lt;br /&gt;&lt;br /&gt;However, there&#39;s a number of warnings we get from the suite that are nonfatal and unmaskable. I thought I&#39;d show them to you and tell their stories.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;JRuby 1.9 mode only supports the `psych` YAML engine; ignoring `syck`&lt;/b&gt;&lt;br /&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;div&gt;When we started implementing support for the new &quot;psych&quot; YAML engine that Aaron Patterson created (atop libyaml) for Ruby 1.9, we decided that we would not support the broken &quot;syck&quot; engine anymore. The libyaml version is strictly YAML spec compliant, and this is our contribution to ridding the world of &quot;syck&quot;&#39;s broken YAML forever.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;GC.stress= does nothing on JRuby&lt;/b&gt;&lt;br /&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/div&gt;&lt;div&gt;JRuby does not have direct control over the JVM&#39;s GC, and so we can&#39;t implement things like GC.stress=, which MRI uses to put the GC into &quot;stress&quot; mode (GCing much more frequently to better test GC stability and behavior). There are flags for the JVM to do this sort of testing, but since we don&#39;t really need to test the JVM&#39;s GC for correctness and stability, we have not exposed those flags directly.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This flag is used in a number of MRI tests to force GC to happen more often and/or to actually test GC behaviors.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;SAFE levels are not supported in JRuby&lt;/b&gt;&lt;br /&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/div&gt;&lt;div&gt;JRuby does not support standard Ruby&#39;s security model, &quot;safe levels&quot;, because we believe safe levels are a flawed, too-coarse mechanism. On JRuby, you can use standard Java security policies.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;We have debated mapping the various Ruby safe levels to equivalent sets of Java security permissions, but have never gotten around to it.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;GC.enable does nothing on JRuby / GC.disable does nothing on JRuby&lt;/b&gt;&lt;br /&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/div&gt;&lt;div&gt;There&#39;s no standard API on the JVM to disable the garbage collector completely, so GC.enable and GC.disable do nothing in JRuby.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;It&#39;s also interesting to note that while you &lt;b&gt;can&lt;/b&gt;&amp;nbsp;request a GC run from the JVM by calling System.gc, JRuby also stubs out Ruby&#39;s GC.start. We opted to do this because GC.start is used in some Ruby libraries as a band-aid around Ruby&#39;s sometimes-slow GC, but the same call on JRuby is both unnecessary (because GC overhead is rarely a problem) and a major performance hit (because it triggers a full GC over the entire heap).&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.headius.com/feeds/5284790001981835990/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://blog.headius.com/2012/09/explanation-of-warnings-from-mris-test.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/5284790001981835990'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/5284790001981835990'/><link rel='alternate' type='text/html' href='http://blog.headius.com/2012/09/explanation-of-warnings-from-mris-test.html' title='Explanation of Warnings From MRI&#39;s Test Suite'/><author><name>Charles Nutter</name><uri>https://plus.google.com/101599370339210456684</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh5.googleusercontent.com/-VT5b8KsgHgQ/AAAAAAAAAAI/AAAAAAAAFB0/9d5SU9AcHNQ/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4704664917418794835.post-4242550722639887199</id><published>2012-09-16T21:36:00.000-07:00</published><updated>2012-09-16T21:46:42.441-07:00</updated><title type='text'>An experiment in static compilation of Ruby: FASTRUBY!</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;While at GoGaRuCo this weekend, I finally made good on an experiment I had been thinking about for a while: &lt;a href=&quot;https://github.com/headius/fastruby&quot;&gt;a static compiler for Ruby&lt;/a&gt;. I thought I&#39;d share it with you good people today.&lt;br /&gt;&lt;br /&gt;First we have a simple Ruby script with a class in it:&lt;br /&gt;&lt;br /&gt;&lt;script src=&quot;https://gist.github.com/3735400.js?file=hello.rb&quot;&gt;&lt;/script&gt;&lt;br /&gt;&lt;br /&gt;We compile it with fastruby, and it produces two .java source files: Hello.java and RObject.java.&lt;br /&gt;&lt;br /&gt;Hello.java implements the methods the Ruby class does in the script, and calls the same methods (with some mangling for invalid Java method names like _plus_ and _lt_).&lt;br /&gt;&lt;br /&gt;&lt;script src=&quot;https://gist.github.com/3735400.js?file=Hello.java&quot;&gt;&lt;/script&gt;&lt;br /&gt;&lt;br /&gt;RObject.java implements stubs for &lt;u&gt;all&lt;/u&gt; method names seen in the script. As a result, all dynamic calls can just be virtual invocations against RObject. Classes that implement one of the methods will just work and the call is direct. Classes that don&#39;t implement the called method will raise an error.&lt;br /&gt;&lt;br /&gt;&lt;script src=&quot;https://gist.github.com/3735400.js?file=RObject.java&quot;&gt;&lt;/script&gt;&lt;br /&gt;&lt;br /&gt;RKernel comes with fastruby, and provides Kernel-level methods like &quot;puts&quot;, plus methods for coercing to Java types like toBoolean and toString. It also caches some built-in singleton values like nil.&lt;br /&gt;&lt;br /&gt;&lt;script src=&quot;https://gist.github.com/3735400.js?file=RKernel.java&quot;&gt;&lt;/script&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;And there&#39;s a few other classes for this script to work. It should be easy to see how we could fill them out to do everything the equivalent Ruby classes do.&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;script src=&quot;https://gist.github.com/3735400.js?file=RFixnum.java&quot;&gt;&lt;/script&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;script src=&quot;https://gist.github.com/3735400.js?file=RString.java&quot;&gt;&lt;/script&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;script src=&quot;https://gist.github.com/3735400.js?file=RBoolean.java&quot;&gt;&lt;/script&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I don&#39;t have any support for a &quot;main&quot; method yet, so I wrote a little runner script to test it.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;script src=&quot;https://gist.github.com/3735400.js?file=HelloRunner.java&quot;&gt;&lt;/script&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;And away we go!&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;script src=&quot;https://gist.github.com/3735400.js?file=output.txt&quot;&gt;&lt;/script&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;This is about 30% faster than JRuby with invokedynamic. It is not doing any boundschecking (for rolling over to Bignum) but it is also not caching 1...256 Fixnum objects like JRuby does, nor caching them in any calls along the way (note that it creates three new RFixnums for every recursion that JRuby would not recreate). I call that pretty good.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Obviously because this is designed to compile the whole system at once, we could also emit optimized versions of methods that look like they&#39;re doing math. That is yet to come, if I continue this little experiment at all.&lt;br /&gt;&lt;br /&gt;There&#39;s also some fun possibilities here. By specifying Java types, the compiler could add normal Java methods. Implementing interfaces could be done directly. And Android applications built with this tool would be entirely statically optimizable, only shipping the small amount of code they actually call and having a very minimal runtime.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Pretty neat?&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.headius.com/feeds/4242550722639887199/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://blog.headius.com/2012/09/an-experiment-in-static-compilation-of.html#comment-form' title='24 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/4242550722639887199'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/4242550722639887199'/><link rel='alternate' type='text/html' href='http://blog.headius.com/2012/09/an-experiment-in-static-compilation-of.html' title='An experiment in static compilation of Ruby: FASTRUBY!'/><author><name>Charles Nutter</name><uri>https://plus.google.com/101599370339210456684</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh5.googleusercontent.com/-VT5b8KsgHgQ/AAAAAAAAAAI/AAAAAAAAFB0/9d5SU9AcHNQ/s512-c/photo.jpg'/></author><thr:total>24</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4704664917418794835.post-7858745721921352272</id><published>2012-09-04T01:00:00.002-07:00</published><updated>2012-09-04T01:02:34.459-07:00</updated><title type='text'>Avoiding Hash Lookups in a Ruby Implementation</title><content type='html'>&lt;div dir=&quot;ltr&quot; style=&quot;text-align: left;&quot; trbidi=&quot;on&quot;&gt;I had an interesting realization tonight: I&#39;m terrified of hash tables. Specifically, my work on JRuby (and even more directly, my work optimizing JRuby) has made me terrified to ever consider using a hash table in the hot path of any program or piece of code if there&#39;s any possibility of eliminating it. And what I&#39;ve learned over the years is that the vast majority of execution-related (as opposed to data-related, purely dynamic-sourced lookup tables) hash tables are totally unnecessary.&lt;br /&gt;&lt;br /&gt;Some background might be interesting here.&lt;br /&gt;&lt;h2 style=&quot;text-align: left;&quot;&gt;Hashes are a Language Designer&#39;s First Tool&lt;/h2&gt;&lt;div&gt;Anyone who&#39;s ever designed a simple language knows that pretty much everything you do is trivial to implement as a hash table. Dynamically-expanding tables of functions or methods? Hash table! Variables? Hash table! Globals? Hash table!&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In fact, some languages never graduate beyond this phase and remain essentially gobs and gobs of hash tables even in fairly recent implementations. I won&#39;t name your favorite language here, but I will name one of mine: Ruby.&lt;/div&gt;&lt;h2 style=&quot;text-align: left;&quot;&gt;Ruby: A Study in Hashes All Over the Freaking Place&lt;/h2&gt;&lt;div&gt;As with many dynamic languages, early (for some definition of &quot;early&quot;) implementations of Ruby used hash tables all over the place. Let&#39;s just take a brief tour through the many places hash tables are used in Ruby 1.8.7&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;(Author&#39;s note: 1.8.7 is now, by most measures, the &quot;old&quot; Ruby implementation, having been largely supplanted by the 1.9 series which boasts a &quot;real&quot; VM and optimizations to avoid most hot-path hash lookup.)&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In Ruby (1.8.7), all of the following are (usually) implemented using hash lookups (and of these, many are hash lookups nearly every time, without any caching constructs):&lt;/div&gt;&lt;div&gt;&lt;ul style=&quot;text-align: left;&quot;&gt;&lt;li&gt;Method Lookup: Ruby&#39;s class hierarchy is essentially a tree of hash tables that contain, among other things, methods. Searching for a method involves searching the target object&#39;s class. If that fails, you must search the parent class, and so on. In the absence of any sort of caching, this can mean you search all the way up to the root of the hierarchy (Object or Kernel, depending what you consider root) to find the method you need to invoke. This is also known as &quot;slow&quot;.&lt;/li&gt;&lt;li&gt;Instance Variables: In Ruby, you do not declare ahead of time what variables a given class&#39;s object instances will contain. Instead, instance variables are allocated as they&#39;re assigned, like a hash table. And in fact, most Ruby implementations still use a hash table for variables under some circumstances, even though most of these variables can be statically determined ahead of time or dynamically determined (to static ends) at runtime.&lt;/li&gt;&lt;li&gt;Constants: Ruby&#39;s constants are actually &quot;mostly&quot; constant. They&#39;re a bit more like &quot;const&quot; in C, assignable once and never assignable again. Except that they &lt;b&gt;are&lt;/b&gt;&amp;nbsp;assignable again through various mechanisms. In any case, constants are also not declared ahead of time and are not purely a hierarchically-structured construct (they are both lexically and hierarchically scoped), and as a result the simplest implementation is a hash table (or chains of hash tables), once again.&lt;/li&gt;&lt;li&gt;Global Variables: Globals are frequently implemented as a top-level hash table even in modern, optimized language. They&#39;re also evil and you shouldn&#39;t use them, so most implementations don&#39;t even bother making them anything other than a hash table.&lt;/li&gt;&lt;li&gt;Local Variables: Oh yes, Ruby has not been immune to the greatest evil of all: purely hash table-based local variables. A &quot;pure&quot; version of Python would have to do the same, although in practice no implementations really support that (and yes, you can manipulate the execution frame to gain &quot;hash like&quot; behavior for Python locals, but you must surrender your Good Programmer&#39;s Card if you do). In Ruby&#39;s defense, however, hash tables were only ever used for closure scopes (blocks, etc), and no modern implementations of Ruby use hash tables for locals in any way.&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;There are other cases (like class variables) that are less interesting than these, but this list serves to show how easy it is for a language implementer to fall into the &quot;everything&#39;s a hash, dude!&quot; hole, only to find they have an incredibly flexible and totally useless language. Ruby is not such a language, and almost all of these cases can be optimized into largely static, predictable code paths with nary a hash calculation or lookup to be found.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;How? I&#39;m glad you asked.&lt;/div&gt;&lt;h2 style=&quot;text-align: left;&quot;&gt;JRuby: The Quest For Fewer Hashes&lt;/h2&gt;&lt;div&gt;If I were to sum up the past 6 years I&#39;ve spent optimizing JRuby (and learning how to optimize dynamic languages) it would be with the following phrase: Get Rid Of Hash Lookups.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;When I tweeted about this realization yesterday, I got a few replies back about better hashing algorithms (e.g. &quot;perfect&quot; hashes) and a a few replies from puzzled folks (&quot;what&#39;s wrong with hashes?&quot;), which made me realize that it&#39;s not always apparent how unnecessary most (execution-related) hash lookups really are (and from now on, when I talk about unnecessary or optimizable hash lookups, I&#39;m talking about execution-related hash lookups; you data folks can get off my back right now).&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So perhaps we should talk a little about why hashes are bad in the first place.&lt;/div&gt;&lt;h2 style=&quot;text-align: left;&quot;&gt;What&#39;s Wrong With a Little Hash, Bro?&lt;/h2&gt;&lt;div&gt;The most obvious problem with using hash tables is the mind-crunching frustration of finding THE PERFECT HASH ALGORITHM. Every year there&#39;s a new way to calculate String hashes, for example, that&#39;s [ better | faster | securer | awesomer ] than all precedents. JRuby, along with many other languages, actually released a security fix last year to patch the great hash collision DoS exploit so many folks made a big deal about (while us language implementers just sighed and said &quot;maybe you don&#39;t actually want a hash table here, kids&quot;). Now, the implementation we put in place has again been &quot;exploited&quot; and we&#39;re told we need to move to cryptographic hashing. Srsly? How about we just give you a crypto-awesome-mersenne-randomized hash impl you can use for all your outward-facing hash tables and you can leave us the hell alone?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;But I digress.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Obviously the cost of calculating hash codes is the first sin of a hash table. The second sin is deciding how, based on that hash code, you will distribute buckets. Too many buckets and you&#39;re wasting space. Too few and you&#39;re more likely to have a collision. Ahh, the intricate dance of space and time plagues us forever.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Ok, so let&#39;s say we&#39;ve got some absolutely smashing hash algorithm and foresight enough to balance our buckets so well we make Lady Justice shed a tear. We&#39;re still screwed, my friends, because we&#39;ve almost certainly defeated the prediction and optimization capabilities of our VM or our M, and we&#39;ve permanently signed over performance in exchange for ease of implementation.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;It is conceivable that a really good machine can learn our hash algorithm really well, but in the case of string hashing we still have to walk &lt;b&gt;some&lt;/b&gt;&amp;nbsp;memory to give us reasonable assurance of unique hash codes. So there&#39;s performance sin #1 violated: never read from memory.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Even if we ignore the cost of calculating a hash code, which at worst requires reading some object data from memory and at best requires reading a cached hash code from elsewhere in memory, we have to contend with how the buckets are implemented. Most hash tables implement the buckets as either of the typical list forms: an array (contiguous memory locations in a big chunk, so each element must be dereferenced...O(1) complexity) or a linked list (one entry chaining to the next through some sort of memory dereference, leading to O(N) complexity for searching collided entries).&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Assuming we&#39;re using simple arrays, we&#39;re &lt;b&gt;still&lt;/b&gt;&amp;nbsp;making life hard for the machine since it has to see through at least one and possibly several mostly-opaque memory references. By the time we&#39;ve got the data we&#39;re after, we&#39;ve done a bunch of memory-driven calculations to find a chain of memory dereferences. And you wanted this to be fast?&lt;/div&gt;&lt;h2 style=&quot;text-align: left;&quot;&gt;Get Rid Of The Hash&lt;/h2&gt;&lt;div&gt;Early attempts (of mine and others) to optimize JRuby centered around making hashing as cheap as possible. We made sure our tables only accepted interned strings, so we could guarantee they&#39;d already calculated and cached their hash values. We used the &quot;programmer&#39;s hash&quot;, switch statements, to localize hash lookups closer to the code performing them, rather than trying to balance buckets. We explored complicated implementations of hierarchical hash tables that &quot;saw through&quot; to parents, so we could represent hierarchical method table relationships in (close to) O(1) complexity.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;But we were missing the point. The problem was in our representing any of these language features as hash tables to begin with. And so we started working toward the implementation that has made JRuby actually become the fastest Ruby implementation: eliminate all hash lookups from hot execution paths.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;How? Oh right, that&#39;s what we were talking about. I&#39;ll tell you.&lt;/div&gt;&lt;h2 style=&quot;text-align: left;&quot;&gt;Method Tables&lt;/h2&gt;&lt;div&gt;I mentioned earlier that in Ruby, each class contains a method table (a hash table from method name to a piece of code that it binds) and method lookup proceeds up the class hierarchy. What I didn&#39;t tell you is that both the method tables and the hierarchy are mutable at runtime.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Hear that sound? It&#39;s the static-language fanatics&#39; heads exploding. Or maybe the &quot;everything must be mutable always forever or you are a very bad monkey&quot; fanatics. Whatever.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Ruby is what it is, and the ability to mix in new method tables and patch existing method tables at runtime is part of what makes it attractive. Indeed, it&#39;s a huge part of what made frameworks like Rails possible, and also a huge reason why other more static (or more reasonable, depending on how you look at it) languages have had such difficulty replicating Rails&#39; success.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Mine is not to reason why. Mine is but to do and die. I have to make it fast.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Proceeding from the naive implementation, there are certain truths we can hold at various times during execution:&lt;/div&gt;&lt;div&gt;&lt;ul style=&quot;text-align: left;&quot;&gt;&lt;li&gt;Most method table and hierarchy manipulation will happen early in execution. This was true when I started working on JRuby and it&#39;s largely true now, in no small part due to the fact that optmizing method tables and hierarchies that are wildly different all the time is really, really hard (so no implementer does it, so no user should do it). Before you say it: even prototype-based languages like Javascript that appear to have no fixed structure do indeed settle into a finite set of predictable, optimizable &quot;shapes&quot; which VMs like V8 can take advantage of.&lt;/li&gt;&lt;li&gt;When changes do happen, they only affect a limited set of observers. Specifically, only call sites (the places where you actually make calls in code) need to know about the changes, and even they only need to know about them if they&#39;ve already made some decision based on the old structure.&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;So we can assume method hierarchy structure is mostly static, and when it isn&#39;t there&#39;s only a limited set of cases where we care. How can we exploit that?&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;First, we implement what&#39;s called an &quot;inline cache&quot; at the call sites. In other words, every place where Ruby code makes a method call, we keep a slot in memory for the most recent method we looked up. In another quirk of fate, it turns out most calls are &quot;monomorphic&quot; (&quot;one shape&quot;) so caching more than one is &lt;b&gt;usually&lt;/b&gt;&amp;nbsp;not beneficial.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;When we revisit the cache, we need to know we&#39;ve still got the right method. Obviously it would be stupid to do a full search of the target object&#39;s class hierarchy all over again, so what we want is to simply be able to examine the type of the object and know we&#39;re ok to use the same method. In JRuby, this is (usually) done by assigning a unique serial number to every class in the system, and caching that serial number along with the method at the call site.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Oh, but wait...how do we know if the class or its ancestors have been modified?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;A simple implementation would be to keep a single global serial number that gets spun every time any method table or class hierarchy anywhere in the system is modified. If we assume that those changes eventually stop, this is good enough; the system stabilizes, the global serial number never changes, and all our cached methods are safely tucked away for the machine to branch-predict and optimize to death. This is how Ruby 1.9.3 optimizes inline caches (and I believe Ruby 2.0 works the same way).&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Unfortunately, our perfect world isn&#39;t quite so perfect. Methods do get defined at runtime, especially in Ruby where people often create one-off &quot;singleton methods&quot; that only redefine a couple methods for very localized use. We don&#39;t want such changes to blow all inline caches everywhere, do we?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Let&#39;s split up the serial number by method name. That way, if you are only redefining the &quot;foobar&quot; method on your singletons, only inline caches for &quot;foobar&quot; calls will be impacted. Much better! This is how Rubinius implements cache invalidation.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Unfortunately again, it turns out that the methods people override on singletons are very often common methods like &quot;hash&quot; or &quot;to_s&quot; or &quot;inspect&quot;, which means that a purely name-based invalidator still causes a large number of call sites to fail. Bummer.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In JRuby, we went through the above mechanisms and several others, finally settling on one that allows us to only ever invalidate the call sites that &lt;b&gt;actually&lt;/b&gt;&amp;nbsp;called a given method against a given type. And it&#39;s actually pretty simple: we spin the serial numbers on the individual classes, rather than in any global location.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Every Ruby class has one parent and zero or more children. The parent connection is obviously a hard link, since at various points during execution we need to be able to walk up the class hierarchy. In JRuby, we also add a &lt;b&gt;weak&lt;/b&gt;&amp;nbsp;link from parents to children, updated whenever the hierarchy changes. This allows changes anywhere in a class hiearchy to cascade down to all children, localizing changes to just that subhierarchy rather than inflicting its damage upon more global scopes.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Essentially, by actively invalidating down-hierarchy classes&#39; serial numbers, we automatically know that matching serial numbers at call sites mean the cached method is 100% ok to use. We have reduced O(N) hierarchically-oriented hash table lookups to a single identity check. Victory!&lt;/div&gt;&lt;h2 style=&quot;text-align: left;&quot;&gt;Instance Variables&lt;/h2&gt;&lt;div&gt;Optimizing method lookups actually turned out to be the easiest trick we had to pull. Instance variables defied optimization for a good while. Oddly enough, most Ruby implementations stumbled on a reasonably simple mechanism at the same time.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Ruby instance variables can be thought of as C++ or Java fields that only come into existence at runtime, when code actually starts using them. And where C++ and Java fields can be optimized right into the object&#39;s structure, Ruby instance variables have typically been implemented as a hash table that can grow and adapt to a running program as it runs.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Using a hash table for instance variables has some obvious issues:&lt;/div&gt;&lt;div&gt;&lt;ul style=&quot;text-align: left;&quot;&gt;&lt;li&gt;The aforementioned performance costs of using hashes&lt;/li&gt;&lt;li&gt;Space concerns; a collection of buckets already consumes space for some sort of table, and too many buckets means you are using &lt;b&gt;way&lt;/b&gt;&amp;nbsp;more space per object than you want&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;At first you might think this problem can be tackled exactly the same way as method lookup, but you&#39;d be wrong. What do we cache at the call site? It&#39;s not code we need to keep close to the point of use, it&#39;s the steps necessary to reach a point in a given object where a value is stored (ok, that could be considered code...just bear with me for a minute).&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;There are, however, truths we can exploit in this case as well.&lt;/div&gt;&lt;div&gt;&lt;ul style=&quot;text-align: left;&quot;&gt;&lt;li&gt;A given class of objects will generally reference a small, finite number of variable names during the lifetime of a given program.&lt;/li&gt;&lt;li&gt;If a variable is accessed once, it is very likely to be accessed again.&lt;/li&gt;&lt;li&gt;The set of variables used by a particular class of objects is largely unique to that class of objects.&lt;/li&gt;&lt;li&gt;The majority of the variables ever to be accessed can be determined by inspecting the code contained in that class and its superclasses.&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;This gives us a lot to work with. Since we can localize the set of variables to a given class, that means we can store something at the class level. How about the actual layout of the values in object instances of that class?&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This is how most current implementations of Ruby actually work.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In JRuby, as instance variables are first assigned, we bump a counter on the class that indicates an offset into an instance variable table associated with instances of that class. Eventually, all variables have been encountered and that table and that counter stop changing. Future instances of those objects, then, know exactly how larger the table needs to be and which variables are located where.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Invalidation of a given instance variable &quot;call site&quot; is then once again a simple class identity check. If we have the same class in hand, we know the offset into the object is guaranteed to be the same, and therefore we can go straight in without doing any hash lookup whatsoever.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Rubinius does things a little differently here. Instead of tracking the offsets at runtime, the Rubinius VM will examine all code associated with a class and use that to make a guess about how many variables will be needed. It sets up a table on the class ahead of time for those statically-determined names, and allocates exactly as much space for the object&#39;s header + those variables in memory (as opposed to JRuby, where the object and its table are two separate objects). This allows Rubinius to pack those known variables into a tighter space without hopping through the extra dereference JRuby has, and in many cases, this can translate to faster access.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;However, both cases have their failures. In JRuby&#39;s version, we pay the cost of a second object (an array of values) and a pointer dereference to reach it, even if we can cache the offset 100% successfully at the call site. This translates to larger memory footprints and somewhat slower access times. In Rubinius, variables that are dynamically allocated fall back on a simple hash table, so dynamically-generated (or dynamically-mutated) classes may end up accessing some values in a much slower way than others.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The quest for perfect Ruby instance variable tables continues, but at least we have the tools to almost completely eliminate hashes right now.&lt;/div&gt;&lt;h2 style=&quot;text-align: left;&quot;&gt;Constants&lt;/h2&gt;&lt;div&gt;The last case I&#39;m going to cover in depth is that of &quot;constant&quot; values in Ruby.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Constants are, as I mentioned earlier, stored on classes in another hash table. If that were their only means of access, they would be uninteresting; we could use exactly the same mechanism for caching them as we do for methods, since they&#39;d follow the same structure and behavior (other than being somewhat more static than method tables). Unfortunately, that&#39;s not the case; constants are located based on both lexical and hierarchical searches.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In Ruby, if you define a class or module, all constants lexically contained in that type&#39;s enclosing scopes are also visible within the type. This makes it possible to define new lexically-scoped aliased for values that might otherwise be difficult to retrieve without walking a class hierarchy or requiring a parent/child relationship to make those aliases visible. It also defeats nearly all reasonable mechanisms for eliminating hash lookups.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;When you access a constant in Ruby, the implementation must first search all lexically-enclosing scopes. Each scope has a type (class or module) associated, and we check that type (and not its parents) for the constant name in question. Failing that, we fall back on the current type&#39;s class hierarchy, searching all the way up to the root type. Obviously, this could be far more searching than even method lookup, and we want to eliminate it.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;If we had all the space in the world and no need to worry about dangling references, using our down-hierarchy method table invalidation would actually work very well here. We&#39;d simply add another hierarchy for invalidation: lexical scopes. In practice, however, this is not feasible (or at least I have not found a way to make it feasible) since there are &lt;b&gt;many times&lt;/b&gt;&amp;nbsp;more lexical scopes in a given system than there are types, and a large number of those scopes are transient; we&#39;d be tracking thousands or tens of thousands of parent/child relationships weakly all over the codebase. Even worse, invalidation due to constant updates or hierarchy changes would have to proceed both down the class hierarchy and throughout all lexically-enclosing scopes in the entire system. Ouch!&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The current state of the art for Ruby implementations is basically our good old global serial number. Change a constant anywhere in Ruby 1.9.3, Rubinius, or JRuby, and you have just caused all constant access sites to invalidate (or they&#39;ll invalidate next time they&#39;re encountered). Now this sounds bad, perhaps because I told you it was bad above for method caching. But remember that the majority of Ruby programmers advise and practice the art of keeping constants...constant. Most of the big-name Ruby folks would call it a bug if your code is continually assigning or reassigning constants at runtime; there are other structures you could be using that are better suited to mutation, they might say. And in general, most modern Ruby libraries and frameworks do keep constants constant.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I&#39;ll admit we could do better here, especially if the world changed such that mutating constants was considered proper and advisable. But until that happens, we have again managed to eliminate hash lookups by caching values based on a (hopefully rarely modified) global serial number.&lt;/div&gt;&lt;h2 style=&quot;text-align: left;&quot;&gt;The Others&lt;/h2&gt;&lt;div&gt;I did not go into the others because the solutions are either simple or not particularly interesting.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Local variables in any sane language (flame on!) are statically determinable at parse/compile time (rather than being dynamically scoped or determined at runtime). In JRuby, Ruby 1.9.3, and Rubinius, local variables are in all cases a simple tuple of offset into an execution frame and some depth at which to find the appropriate frame in the case of closures.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Global variables are largely discouraged, and usually only accessed at boot time to prepare more locally-defined values (e.g. configuration or environment variable access). In JRuby, we have experimented with mechanisms to cache global variable accessor logic in a way similar to instance variable accessors, but it turned out to be so rarely useful that we never shipped it.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Ruby also has another type of variable called a &quot;class variable&quot;, which follows lookup rules almost identical to methods. We don&#39;t currently optimize these in JRuby, but it&#39;s on my to-do list.&lt;/div&gt;&lt;h2 style=&quot;text-align: left;&quot;&gt;Final Words&lt;/h2&gt;&lt;div&gt;There are of course many other ways to avoid hash lookups, with probably the most robust and ambitious being code generation. Ruby developers, JIT compiler writers, and library authors have all used code generation to take what is a mostly-static lookup table and turn it into actually-static code. But you must be careful here to not fall into the trap of simply stuffing your hash logic into a switch table; you&#39;re still doing a calculation and some kind of indirection (memory dereference or code jump) to get to your target. Analyze the situation and figure out what immutable truths there are you can exploit, and you too can avoid the evils of hashes.&lt;/div&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.headius.com/feeds/7858745721921352272/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://blog.headius.com/2012/09/avoiding-hash-lookups-in-ruby.html#comment-form' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/7858745721921352272'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/7858745721921352272'/><link rel='alternate' type='text/html' href='http://blog.headius.com/2012/09/avoiding-hash-lookups-in-ruby.html' title='Avoiding Hash Lookups in a Ruby Implementation'/><author><name>Charles Nutter</name><uri>https://plus.google.com/101599370339210456684</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh5.googleusercontent.com/-VT5b8KsgHgQ/AAAAAAAAAAI/AAAAAAAAFB0/9d5SU9AcHNQ/s512-c/photo.jpg'/></author><thr:total>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4704664917418794835.post-1798238886261285117</id><published>2011-10-14T13:25:00.000-07:00</published><updated>2011-10-14T13:40:29.975-07:00</updated><title type='text'>Why Clojure Doesn&#39;t Need Invokedynamic (Unless You Want It to be More Awesome)</title><content type='html'>This was originally posted as a comment on &lt;a href=&quot;http://twitter.com/fogus&quot;&gt;@fogus&lt;/a&gt;&#39;s blog post &quot;&lt;a href=&quot;http://blog.fogus.me/2011/10/14/why-clojure-doesnt-need-invokedynamic-but-it-might-be-nice/&quot;&gt;Why Clojure doesn’t need invokedynamic, but it might be nice&lt;/a&gt;&quot;. I figured it&#39;s worth a top-level post here.&lt;br /&gt;&lt;br /&gt;Ok, there&#39;s some good points here and a few misguided/misinformed positions. I&#39;ll try to cover everything.&lt;br /&gt;&lt;br /&gt;First, I need to point out a key detail of invokedynamic that may have escaped notice: any case where you must bounce through a generic piece of code to do dispatch -- regardless of how fast that bounce may be -- prevents a whole slew of optimizations from happening. This might affect Java dispatch, if there&#39;s any argument-twiddling logic shared between call sites. It would definitely affect multimethods, which are using a hand-implemented PIC. Any case where there&#39;s intervening code between the call site and the target would benefit from invokedynamic, since invokedynamic could be used to plumb that logic and let it inline straight through. This is, indeed, the primary benefit of using invokedynamic: arbitrarily complex dispatch logic folds away allowing the dispatch to optimize as if it were direct.&lt;br /&gt;&lt;br /&gt;Your point about inference in Java dispatch is a fair one...if Clojure is able to infer all cases, then there&#39;s no need to use invokedynamic at all. But unless Clojure is able to infer all cases, then you&#39;ve got this little performance time bomb just waiting to happen. Tweak some code path and obscure the inference, and kablam, you&#39;re back on a slow reflective impl. Invokedynamic would provide a measure of consistency; the only unforeseen perf impact would be when the dispatch turns out to *actually* be polymorphic, in which case even a direct call wouldn&#39;t do much better.&lt;br /&gt;&lt;br /&gt;For multimethods, the benefit should be clear: the MM selection logic would be mostly implemented using method handles and &quot;leaf&quot; logic, allowing hotspot to inline it everywhere it is used. That means for small-morphic MM call sites, all targets could potentially inline too. That&#39;s impossible without invokedynamic unless you generate every MM path immediately around the eventual call.&lt;br /&gt;&lt;br /&gt;Now, on to defs and Var lookup. Depending on the cost of Var lookup, using a SwitchPoint-based invalidation plus invokedynamic could be a big win. In Java 7u2, SwitchPoint-based invalidation is essentially free until invalidated, and as you point out that&#39;s a rare case. There would essentially be *no* cost in indirecting through a var until that var changes...and then it would settle back into no cost until it changes again. Frequently-changing vars could gracefully degrade to a PIC.&lt;br /&gt;&lt;br /&gt;It&#39;s also dangerous to understate the impact code size has on JVM optimization. The usual recommendation on the JVM is to move code into many small methods, possibly using call-through logic as in multimethods to reuse the same logic in many places. As I&#39;ve mentioned, that defeats many optimizations, so the next approach is often to hand-inline logic everywhere it&#39;s used, to let the JVM have a more optimizable view of the system. But now we&#39;re stepping on our own feet...by adding more bytecode, we&#39;re almost certainly impacting the JVM&#39;s optimization and inlining budgets.&lt;br /&gt;&lt;br /&gt;OpenJDK (and probably the other VMs too) has various limits on how far it will go to optimize code. A large number of these limits are based on the bytecoded size of the target methods. Methods that get too big won&#39;t inline, and sometimes won&#39;t compile. Methods that inline a lot of code might not get inlined into other methods. Methods that inline one path and eat up too much budget might push out more important calls later on. The only way around this is to reduce bytecode size, which is where invokedynamic comes in.&lt;br /&gt;&lt;br /&gt;As of OpenJDK 7u2, MethodHandle logic is not included when calculating inlining budgets. In other words, if you push all the Java dispatch logic or multimethod dispatch logic or var lookup into mostly MethodHandles, you&#39;re getting that logic *for free*. That has had a tremendous impact on JRuby performance; I had previous versions of our compiler that did indeed infer static target methods from the interpreter, but they were often *slower* than call site caching solely because the code was considerably larger. With invokedynamic, a call is a call is a call, and the intervening plumbing is not counted against you.&lt;br /&gt;&lt;br /&gt;Now, what about negative impacts to Clojure itself...&lt;br /&gt;&lt;br /&gt;#0 is a red herring. JRuby supports Java 5, 6, and 7 with only a few hundred lines of changes in the compiler. Basically, the compiler has abstract interfaces for doing things like constant lookup, literal loading, and dispatch that we simply reimplement to use invokedynamic (extending the old non-indy logic for non-indified paths). In order to compile our uses of invokedynamic, we use Rémi Forax&#39;s JSR-292 backport, which includes a &quot;mock&quot; jar with all the invokedynamic APIs stubbed out. In our release, we just leave that library out, reflectively load the invokedynamic-based compiler impls, and we&#39;re off to the races.&lt;br /&gt;&lt;br /&gt;#1 would be fair if the Oracle Java 7u2 early-access drops did not already include the optimizations that gave JRuby those awesome numbers. The biggest of those optimizations was making SwitchPoint free, but also important are the inlining discounting and MutableCallSite improvements. The perf you see for JRuby there can apply to any indirected behavior in Clojure, with the same perf benefits as of 7u2.&lt;br /&gt;&lt;br /&gt;For #2, to address the apparent vagueness in &lt;a href=&quot;http://blog.headius.com/2011/08/invokedynamic-in-jruby-constant-lookup.html&quot;&gt;my blog post&lt;/a&gt;...the big perf gain was largely from using SwitchPoint to invalidate constants rather than pinging a global serial number. Again, indirection folds away if you can shove it into MethodHandles. And it&#39;s pretty easy to do it.&lt;br /&gt;&lt;br /&gt;#3 is just plain FUD. Oracle has committed to making invokedynamic work well for Java too. The current thinking is that &quot;lambda&quot;, the support for closures in Java 7, will use invokedynamic under the covers to implement &quot;function-like&quot; constructs. Oracle has also committed to Nashorn, a fully invokedynamic-based JavaScript implementation, which has many of the same challenges as languages like Ruby or Python. I talked with Adam Messinger at Oracle, who explained to me that Oracle chose JavaScript in part because it&#39;s so far away from Java...as I put it (and he agreed) it&#39;s going to &quot;keep Oracle honest&quot; about optimizing for non-Java languages. Invokedynamic is driving the future of the JVM, and Oracle knows it all too well.&lt;br /&gt;&lt;br /&gt;As for #4...well, all good things take a little effort :) I think the effort required is far lower than you suspect, though.</content><link rel='replies' type='application/atom+xml' href='http://blog.headius.com/feeds/1798238886261285117/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://blog.headius.com/2011/10/why-clojure-doesnt-need-invokedynamic.html#comment-form' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/1798238886261285117'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/1798238886261285117'/><link rel='alternate' type='text/html' href='http://blog.headius.com/2011/10/why-clojure-doesnt-need-invokedynamic.html' title='Why Clojure Doesn&#39;t Need Invokedynamic (Unless You Want It to be More Awesome)'/><author><name>Charles Nutter</name><uri>https://plus.google.com/101599370339210456684</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh5.googleusercontent.com/-VT5b8KsgHgQ/AAAAAAAAAAI/AAAAAAAAFB0/9d5SU9AcHNQ/s512-c/photo.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4704664917418794835.post-3395301669254719136</id><published>2011-08-10T11:00:00.000-07:00</published><updated>2011-08-10T11:59:32.843-07:00</updated><title type='text'>Invokedynamic in JRuby: Constant Lookup</title><content type='html'>&lt;i&gt;This is the first of a set (not a series...there&#39;s no particular order) of articles I&#39;ll write on how JRuby is using invokedynamic. Hopefully they will show Rubyists how drastically invokedynamic is going to improve JRuby, and show other JVM language folks how to use invokedynamic effectively.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Hello friends!&lt;br /&gt;&lt;br /&gt;I figured it&#39;s about time for me to start writing a bit on how JRuby is actually using invokedynamic.&lt;br /&gt;&lt;br /&gt;As of today, JRuby utilizes invokedynamic far more than any other mainstream JVM language. We have worked very closely with the JSR leads and the OpenJDK developers to make sure invokedynamic runs well. And we have been advocating invokedynamic as a game-changer for the JVM and for JVM languages.&lt;br /&gt;&lt;br /&gt;Let&#39;s explore one area where JRuby is using invokedynamic: Ruby&#39;s &quot;constant&quot; lookup.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Non-constant &quot;Constants&quot;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;A constant in Ruby is defined on a class or module, and is subject to Ruby&#39;s typical namespacing logic. Constants start with a capital letter.&lt;br /&gt;&lt;br /&gt;I often put &quot;constants&quot; in parentheses because constant values can be reassigned. This will usually produce a warning...but not an error. This means we can&#39;t simply look up constant values once and never look them up again (without special tricks I&#39;ll get into later).&lt;br /&gt;&lt;br /&gt;Constant lookup is a also bit more complicated than method lookup. When retrieving a constant, Ruby first scans lexically-enclosing scopes&#39; classes and modules for the constant. If the constant can&#39;t be found, the next search walks the current class&#39;s inheritance hierarchy. If we still can&#39;t find the constant, const_missing is called on the current class.&lt;br /&gt;&lt;br /&gt;In order to make constant lookup fast, we want to do some sort of caching. In classic JRuby, Ruby 1.9 (YARV), Rubinius, and probably most other modern Ruby implementations, this is done with a global serial number. Whenever a constant is updated or a module is included (changing the inheritance hierarchy) all cached constants everywhere are forced to lookup again.&lt;br /&gt;&lt;br /&gt;I have played with mechanisms for reducing the global impact of constant invalidation, but because constants can be looked up lexically it&#39;s simply too complicated to localize (since we need invalidate classes down-hierarchy from the change &lt;b&gt;and&lt;/b&gt; we also need to update all lexical scopes that might see the change).&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Constant Invalidation in JRuby 1.6&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;The logic in JRuby 1.6 goes something like this:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;If cache is empty or invalid, retrieve the constant value in the usual way (lexical, hierarchical search). Store the value with the current global constant serial number.&lt;/li&gt;
- &lt;li&gt;On subsequent lookups, check cache for validity against the global constant serial number. If we have a value cached and the cache is still valid, return it.&lt;/li&gt;
- &lt;li&gt;If any constant in the system is updated, or if a module is included into an existing class hierarchy, flip the serial number and force future constant lookups to re-cache.&lt;/li&gt;
- &lt;/ul&gt;&lt;div&gt;This turns out to work fairly well. The same mechanism in Ruby 1.9 produced drastically faster constant lookups, and JRuby&#39;s performance is even better than 1.9.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;But there&#39;s a problem here. Because there&#39;s this constant pinging of the global constant serial number, every constant access can potentially produce a new value. So we&#39;re paying the cost to check that serial number as well as interfering with optimizations that want to see constant values actually be constant.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Can we do better?&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Quick Invokedynamic Primer&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The main atom of invokedynamic is the MethodHandle. Method handles are essentially function pointers, which can point at Java methods or fields, constructors, constant values, or other method handles. Invokedynamic also provides the MethodHandles utility class, which lets us juggle method handles in various ways:&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;adapting method signatures by casting, adding, moving, or dropping arguments&lt;/li&gt;
- &lt;li&gt;combining three handles (&quot;test&quot;, &quot;target&quot;, and &quot;fallback&quot;) to form new a &quot;guard with test&quot; if-statement-like handle&lt;/li&gt;
- &lt;li&gt;wrap handles with exception handling or argument/return pre/post-processing&lt;/li&gt;
- &lt;/ul&gt;&lt;div&gt;You can think of method handles and the chains of adapter handles that stitch them together as a special sort of functional language the JVM knows how to optimize. Given a chain of handles, you should usually get a piece of code that optimizes as well as (or better, in some cases) writing the same logic by hand in Java.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The invokedynamic bytecode simply provides a place to plug a method handle chain into code. When the JVM encounters an invokedynamic bytecode, it calls a &quot;bootstrap method&quot; associated with that bytecode for further instructions.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The bootstrap method returns a CallSite object, provided in java.lang.invoke. There are constant call sites for constant values, mutable call sites for when the target handle chain may have to change, and volatile call sites for when those changes must immediately be reflected across threads.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Once a CallSite has been installed for a given invokedynamic, subsequent hits skip the bootstrapping process, and we&#39;re off to the races.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;SwitchPoint&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/div&gt;&lt;div&gt;I mentioned that the MethodHandles class provides a &quot;guardWithTest&quot; method for combining a test, a target (the &quot;then&quot; branch), and a fallback (the &quot;else&quot; branch). SwitchPoint, also in java.lang.invoke, acts like an on/off guardWithTest that once turned off can never be turned on again. You provide a target and fallback, and until the &quot;switch&quot; is thrown the target will be invoked. After the switch is thrown the fallback will be called.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;What&#39;s the difference between this and a guardWithTest where the test just pings some global value? The difference is that SwitchPoint doesn&#39;t need to check anything.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Optimization and Deoptimization in the JVM&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/div&gt;&lt;div&gt;When the JVM decides to optimize a piece of code, it does so in an &lt;b&gt;optimistic&lt;/b&gt;&amp;nbsp;way. In very broad terms, this means it assumes its information up to this point is perfect: no new methods or classes will be introduced, profiling information is accurate, etc. Based on this &quot;perfect&quot; view of the world, it aggressively optimizes code.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Of course, the world isn&#39;t perfect. The JVM has to give up profiling and monitoring at some point, so it always has an imperfect view of the system. In order to avoid its aggressive optimizations triggering a fatal error later on, JVMs like OpenJDK (Hotspot) do something called &lt;b&gt;deoptimization&lt;/b&gt;.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Deoptimization is the process by which running, optimized code can adapt on-the-fly to a changing system. In OpenJDK, there&#39;s several ways this is accomplished:&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;Branches out of compiled code back into the interpreter, when compiled code is determined to be invalid.&lt;/li&gt;
- &lt;li&gt;Guards around inlined virtual method accesses, to ensure we&#39;re still calling against the same class.&lt;/li&gt;
- &lt;li&gt;On-stack replacement, for fixing up a running method already on the native call stack&lt;/li&gt;
- &lt;li&gt;...&lt;/li&gt;
- &lt;/ul&gt;&lt;div&gt;Because of this ability to deoptimize, it&#39;s possible to support zero-cost guards at the JVM level. Returning to SwitchPoint, we can see how this new form of &quot;guardWithTest&quot; can be basically free: we&#39;re explicitly telling the JVM this switch is a rare occurrence it can optimize aggressively.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;SwitchPoint for Constant Lookup&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;JRuby on invokedynamic uses SwitchPoint for constant lookup, as you&#39;d expect. Instead of actively pinging that global constant serial number, we instead use a global SwitchPoint object to guard all cached constant accesses. When it comes time to invalidate the system&#39;s constants, we just flip the SwitchPoint off and create a new one. All SwitchPoint-guarded constant accesses in the system must then recache and use the new SwitchPoint.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In a well-behaved system, we should reach a steady state where no new constants are being defined and no new modules are being introduced. Because we&#39;re using SwitchPoint, the stable state means all constant accesses are treated as truly constant by the JVM, allowing optimizations that were impossible before. And of course this also means that we&#39;ve achieved constant lookup performance very near a theoretical maximum.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Numbers&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/div&gt;&lt;div&gt;First, a caveat: SwitchPoint is implemented in a fairly naïve way in the released OpenJDK 7, using a volatile field as the switch value. As a result, SwitchPoint guardWithTest is very slow currently, and JRuby&#39;s SwitchPoint-based constant logic must be enabled. I show numbers below based on leading-edge Hotspot compiler patches that will go into the first update release (numbers provided by one of the Hotspot devs, Christian Thalinger...thanks Christian!)&lt;br /&gt;&lt;br /&gt;The benchmark we&#39;re running is a modified version of &lt;a href=&quot;https://github.com/jruby/jruby/blob/master/bench/language/bench_const_lookup.rb&quot;&gt;bench_const_lookup&lt;/a&gt; in JRuby&#39;s benchmark suite. The modification here runs more iterations (10M instead of 1M) with more constant lookups (50 instead of 10) to get a better idea of optimized performance.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Here&#39;s JRuby running our constant-lookup benchmark without SwitchPoint-based constants on Java 7:&lt;/div&gt;&lt;div&gt;&lt;script src=&quot;https://gist.github.com/1137667.js&quot;&gt; &lt;/script&gt;&lt;br /&gt;&lt;br /&gt;As I said before, this is pretty good. JRuby&#39;s existing constant lookup performance is roughly 2x faster than Ruby 1.9.2.&lt;br /&gt;&lt;br /&gt;Next, we&#39;ll try JRuby with SwitchPoint constants on Java 7 (released version, so we expect this to be slow):&lt;/div&gt;&lt;br /&gt;&lt;div&gt;&lt;script src=&quot;https://gist.github.com/1137655.js&quot;&gt; &lt;/script&gt;&lt;br /&gt;&lt;br /&gt;The perf hit of purely volatile SwitchPoint is apparent.&lt;br /&gt;&lt;br /&gt;And finally, JRuby with SwitchPoint constants on a dev build of Hotspot, which uses deoptimization rather than a volatile field:&lt;/div&gt;&lt;br /&gt;&lt;div&gt;&lt;script src=&quot;https://gist.github.com/1137650.js&quot;&gt; &lt;/script&gt;&lt;br /&gt;&lt;br /&gt;This is basically the performance of the 10M iteration loop alone. In fact, if you look at the resulting optimized assembly, the constant accesses have been &lt;b&gt;eliminated entirely&lt;/b&gt;&amp;nbsp;since they&#39;re optimistically inlined and never used. Of course this would normally not happen in real code, but it shows how much better the JVM can optimized Ruby&#39;s behavior using invokedynamic.&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.headius.com/feeds/3395301669254719136/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://blog.headius.com/2011/08/invokedynamic-in-jruby-constant-lookup.html#comment-form' title='13 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/3395301669254719136'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/3395301669254719136'/><link rel='alternate' type='text/html' href='http://blog.headius.com/2011/08/invokedynamic-in-jruby-constant-lookup.html' title='Invokedynamic in JRuby: Constant Lookup'/><author><name>Charles Nutter</name><uri>https://plus.google.com/101599370339210456684</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh5.googleusercontent.com/-VT5b8KsgHgQ/AAAAAAAAAAI/AAAAAAAAFB0/9d5SU9AcHNQ/s512-c/photo.jpg'/></author><thr:total>13</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4704664917418794835.post-5727286770456885866</id><published>2011-08-02T10:27:00.000-07:00</published><updated>2011-08-02T21:03:30.529-07:00</updated><title type='text'>JRuby and Java 7: What to Expect</title><content type='html'>Java 7 has landed, with a modest set of new features and a few major improvements as well. What can you expect from JRuby running on Java 7?&lt;br /&gt;&lt;br /&gt;&lt;b&gt;What&#39;s In Java 7&lt;/b&gt;&lt;br /&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;br /&gt;The biggest changes in Java 7 are not related to the Java language at all. Sure, there&#39;s the &quot;project coin&quot; enhancements to the Java language, which add some exception-handling shortcuts, new literals for numbers, arrays, hashes, the oft-requested &quot;strings in switch&quot; support, and a few other things. But they&#39;re modest incremental changes; the real revolution is at the JVM and JDK level.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Invokedynamic&lt;/b&gt;&lt;br /&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;br /&gt;The most important change in Java 7 is the incorporation of a new bytecode -- invokedynamic -- and an API for building chains of &quot;method handles&quot; to back that bytecode up.&lt;br /&gt;&lt;br /&gt;You can look at invokedynamic as a way for JVM users to communicate directly with the optimizing backend of the JVM. Method handles act as both function pointers and as function combinators, allowing a built-in way to construct a call protocol flow from a caller to a callee. You can move arguments around, insert new arguments, process existing arguments and return values, catch exceptions, and perform fast guarded branches between two (or more) paths. The invokedynamic bytecode itself provides a bytecode-level hook to which you attach your method handle chain, with the assumption that the JVM can optimize that chain directly into the invokedynamic caller.&lt;br /&gt;&lt;br /&gt;The tl;dr is that invokedynamic makes it possible for the JVM to see through complicated method call logic, such as that found in dynamic languages, and optimize that logic like it would for regular &quot;static&quot; calls.&lt;br /&gt;&lt;br /&gt;JRuby&#39;s master branch already takes heavy advantage of invokedynamic, by routing most Ruby calls through invokedynamic operations. For simple paths and those that have been optimized by the Hotspot guys (Hotspot is the VM at the core of OpenJDK), invokedynamic often provides performance improvements of 150-200%, with work ongoing to make it even faster. Other paths may not be as well-optimized by the &quot;dot zero&quot; version of OpenJDK 7, so there&#39;s opportunity to improve them.&lt;br /&gt;&lt;br /&gt;Because JRuby is already well along the road to utilizing invokedynamic, you can try it out today.&lt;br /&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Build your own JRuby from master or grab a snapshot from&amp;nbsp;&lt;a href=&quot;http://ci.jruby.org/snapshots&quot;&gt;our CI server&lt;/a&gt;.&lt;/li&gt;&lt;li&gt;Grab a build of &lt;a href=&quot;http://www.oracle.com/technetwork/java/javase/downloads/java-se-jdk-7-download-432154.html&quot;&gt;OpenJDK 7 from Oracle&lt;/a&gt; (or a build of &lt;a href=&quot;http://code.google.com/p/openjdk-osx-build/&quot;&gt;OpenJDK 7 for OS X&lt;/a&gt;).&lt;/li&gt;&lt;li&gt;Point JAVA_HOME at the new JDK and try out JRuby!&lt;/li&gt;&lt;/ol&gt;&lt;div&gt;We&#39;re looking for small benchmarks that show the performance of invokedynamic (good or bad), so please contact me, the JRuby team, or the JRuby users mailing list with your reports from the field. Also, feel free to open performance bugs on the &lt;a href=&quot;http://bugs.jruby.org/&quot;&gt;JRuby bug tracker&lt;/a&gt; if invokedynamic performs &lt;b&gt;worse&lt;/b&gt;&amp;nbsp;than non-invokedynamic. Pass&amp;nbsp;-Xcompile.invokedynamic=false to JRuby to revert to the old non-invokedynamic logic.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;NIO.2&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/div&gt;&lt;div&gt;NIO is Java&#39;s &quot;New IO&quot; APIs, a set of wrappers around low-level file-descriptor logic and memory buffers. NIO has been around since Java 1.4, but the recent update -- dubbed &lt;a href=&quot;http://nio.2/&quot;&gt;NIO.2&lt;/a&gt; -- brings a sorely-needed update to the functionality provided:&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;Filesystem operations (like symlinks, permissions, etc) are now almost all available through NIO.2&#39;s filesystem APIs. This also includes standard, cross-platform support for filesystem events, such as watching a directory for changes (using efficient OS-level operations, rather than polling).&lt;/li&gt;&lt;li&gt;File and directory walking now comes with considerably less overhead and more options for filtering directory lists &lt;b&gt;before&lt;/b&gt;&amp;nbsp;handing filenames off to user code. There&#39;s also support for opening a directory directly and walking its contents as you would a file.&lt;/li&gt;&lt;li&gt;Most IO channel types now have asynchronous versions. Asynchronous in this case means &quot;punt my IO operation to a built-in thread pool&quot;, with subsequent code checking on the status of those operations and getting results from a &quot;future&quot; handle.&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;For JRuby, the new IO APIs will mean we can support more filesystem operations across platforms without resorting to native code. It will also provide JRuby users a means of handling filesystem events and asynchronous IO operations without using a platform-specific library. We have not yet started adding NIO.2 support to JRuby&#39;s core classes, but that will come soon.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;General Improvements&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/div&gt;&lt;div&gt;There&#39;s lots of smaller, less flashy changes in OpenJDK that also appear to help JRuby.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Even without invokedynamic, the latest OpenJDK 7 builds usually perform better than OpenJDK 6. Some benchmarks have proven to be as much as 2x faster, just by upgrading the JVM! General perf improvements will be more modest, but in almost every case we&#39;ve tested OpenJDK 7 definitely performs better.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The release of OpenJDK 7 also brings improvements to the &quot;tiered&quot; compilation mode. Tiered compilation aims to merge the benefits of the &quot;client&quot; mode (fast startup) with those of the &quot;server&quot; mode (maximum peak performance). You can turn on tiered compilation using -XX:+TieredCompilation (in JAVA_OPTS or at the &quot;java&quot; command line, or prefixed with -J when passed to JRuby). We&#39;re looking for user reports about how well &quot;tiered&quot; mode works, too.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This general improvement means that even JRuby 1.6.x users can take advantage of OpenJDK 7 today, with the promise of even bigger improvements in JRuby 1.7 (our target release for pervasive invokedynamic support).&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Consistency&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/div&gt;&lt;div&gt;As with previous Java releases, a great deal of care has been taken to ensure existing applications work properly. That applies as well to Java 7. We have been testing against Java 7 for over a year, on and off, and recently started running tests &quot;green&quot; with even heavy invokedynamic use.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;We have made no major Java 7-specific fixes in JRuby...it should generally &quot;just work&quot;.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Let Us Know!&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/div&gt;&lt;div&gt;As always, we really want to hear from you bleeding-edge users that are playing around with JRuby on Java 7. Please don&#39;t be shy...let us know how it works for you!&lt;br /&gt;&lt;br /&gt;&lt;i style=&quot;font-weight: bold;&quot;&gt;Update:&lt;/i&gt;&amp;nbsp;The Hotspot guys have been helping me find invokedynamic bottlenecks in a few JRuby microbenchmarks, and discovered that a flaw in invokedynamic was causing &lt;b&gt;too much&lt;/b&gt;&amp;nbsp;code to inline, forcing out more important optimizations. The details belong in another post, but they offered me a long Hotspot flag to accomplish basically what their fix does:&amp;nbsp;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Bitstream Vera Sans Mono&#39;, Courier, monospace; font-size: 12px; line-height: 16px; white-space: pre;&quot;&gt;-XX:CompileCommand=dontinline,org.jruby.runtime.invokedynamic.InvokeDynamicSupport::invocationFallback&lt;/span&gt;&amp;nbsp;... With this flag, performance on e.g. &quot;tak&quot; easily beats stock JRuby (see the third benchmark run here: &lt;a href=&quot;https://gist.github.com/1121880&quot;&gt;https://gist.github.com/1121880&lt;/a&gt;).&lt;br /&gt;&lt;br /&gt;I would recommend trying this flag if you are finding invokedynamic slowdowns in JRuby.&lt;br /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.headius.com/feeds/5727286770456885866/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://blog.headius.com/2011/08/jruby-and-java-7-what-to-expect.html#comment-form' title='22 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/5727286770456885866'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/5727286770456885866'/><link rel='alternate' type='text/html' href='http://blog.headius.com/2011/08/jruby-and-java-7-what-to-expect.html' title='JRuby and Java 7: What to Expect'/><author><name>Charles Nutter</name><uri>https://plus.google.com/101599370339210456684</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh5.googleusercontent.com/-VT5b8KsgHgQ/AAAAAAAAAAI/AAAAAAAAFB0/9d5SU9AcHNQ/s512-c/photo.jpg'/></author><thr:total>22</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4704664917418794835.post-1913573676049543893</id><published>2011-07-21T14:17:00.000-07:00</published><updated>2011-07-21T14:17:38.436-07:00</updated><title type='text'>Next July, Last Friday, This Tuesday</title><content type='html'>So after months of not blogging anything technical, I&#39;m going to blog something non-technical. Hopefully tech posts will pick up soon once my new baby boy Elliott is a bit older and less needy :)&lt;br /&gt;&lt;br /&gt;&lt;b&gt;When Is &quot;This Friday&quot;?&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;The most confusing time-oriented statements (for me) are when people use &quot;this&quot;, &quot;next&quot;, and &quot;last&quot; to describe a specific day or month. Some people consider &quot;this&quot; to always be the day/month coming soon&quot;, and others have different meanings. This short post will describe what &lt;i style=&quot;font-weight: bold;&quot;&gt;I&lt;/i&gt;&amp;nbsp;mean, in a way that hopefully convinces you to do the same.&lt;br /&gt;&lt;br /&gt;If we look at what &quot;this&quot;, &quot;next&quot;, and &quot;last&quot; are modifying, a simple pattern emerges. For days of the week, they&#39;re indicating what week contains the given day. &quot;This week&quot; means the week we&#39;re in right now, &quot;next week&quot; means the week that follows this one, and &quot;last week&quot; means the week that preceded this one. Taking that to days of the week, then, &quot;this Friday&quot; should always mean &quot;Friday of this week&quot;. Similarly, when those modifiers are applied to a month, they usually mean what year contains the given month. &quot;This August&quot; would mean the August of this year, and so on.&lt;br /&gt;&lt;br /&gt;There&#39;s no perfect way to interpret these modifiers, and even my system has some mildly confusing points.&lt;br /&gt;&lt;br /&gt;Let&#39;s say today is Thursday. The following day is &quot;this Friday&quot;, as you&#39;d expect. A week from tomorrow would be &quot;next Friday&quot;, the Friday of next week...not tomorrow, even though that&#39;s the Friday that comes &quot;next&quot; in time. Perhaps a bit more confusing is using &quot;this&quot; to describe days in the past; &quot;this Wednesday&quot; would mean the day before today, since that&#39;s the Wednesday of this week. A proper sentence would be &quot;this Wednesday I went to the store.&quot; Note the past-tense there.&lt;br /&gt;&lt;br /&gt;Sunday and Saturday are peculiarities too, and in almost any system they are the source of the most confusion. By my system, &quot;this Sunday&quot; would almost always mean a day in the past, since that&#39;s the Sunday of this week (and it would be weird to say &quot;this Sunday&quot; on Sunday). Similarly, &quot;next Saturday&quot; will almost always mean two Saturdays from now.&lt;br /&gt;&lt;br /&gt;Confusion about days in the past or in the future can be avoided with additional modifiers &quot;coming&quot; and &quot;past&quot;. &quot;This past Saturday&quot; always means the Saturday nearest in the past, and &quot;this coming Sunday&quot; always means the next Sunday in the future. My system is not ambiguous, but adding these additional modifiers can help smooth over places where it might confuse folks unfamiliar with it.&lt;br /&gt;&lt;br /&gt;One alternative would be to always have &quot;this&quot; mean the day/month next in time, &quot;next&quot; to always mean the one after that, and &quot;last&quot; to be the one nearest in the past. But that ends up ambiguous, since if tomorrow is Friday it&#39;s unclear if &quot;next Friday&quot; is tomorrow or the Friday of next week.&lt;br /&gt;&lt;br /&gt;So, what do you think? Does this system make sense? Is there a better way to disambiguate these modifiers?</content><link rel='replies' type='application/atom+xml' href='http://blog.headius.com/feeds/1913573676049543893/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://blog.headius.com/2011/07/next-july-last-friday-this-tuesday.html#comment-form' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/1913573676049543893'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/1913573676049543893'/><link rel='alternate' type='text/html' href='http://blog.headius.com/2011/07/next-july-last-friday-this-tuesday.html' title='Next July, Last Friday, This Tuesday'/><author><name>Charles Nutter</name><uri>https://plus.google.com/101599370339210456684</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh5.googleusercontent.com/-VT5b8KsgHgQ/AAAAAAAAAAI/AAAAAAAAFB0/9d5SU9AcHNQ/s512-c/photo.jpg'/></author><thr:total>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4704664917418794835.post-1383525904261169024</id><published>2011-03-07T11:05:00.000-08:00</published><updated>2011-03-07T11:05:23.832-08:00</updated><title type='text'>Differing java.util.regex.Matcher Unmatched Group Results on Android</title><content type='html'>Android is really an amazing little platform, but occasionally you will run into API differences. Some of these are actual bugs (like a number of reflection and enum issues in early releases), and others are just weakly-specified APIs.&lt;br /&gt;&lt;br /&gt;Today, I worked on &lt;a href=&quot;http://ira.codehaus.org/browse/JRUBY-5541&quot;&gt;JRUBY-5541&lt;/a&gt;:&amp;nbsp;Problem with java_import on Android (Ruboto)&lt;br /&gt;&lt;br /&gt;The issue boiled down to how we turn Java&#39;s camelCased method names into Ruby&#39;s snake_cased form. We were using the following code:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;private static final Pattern CAMEL_CASE_SPLITTER = Pattern.compile(&quot;(([a-z0-9])([A-Z])|([A-Za-z0-9])([A-Z][a-z]))&quot;);&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;public static String getRubyCasedName(String javaCasedName) {&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Matcher m = CAMEL_CASE_SPLITTER.matcher(javaCasedName);&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;return m.replaceAll(&quot;$2$4_$3$5&quot;).toLowerCase();&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: inherit;&quot;&gt;The logic here is to basically attempt two matches ORed together: methods of the form getName in the first half, and methods of the form getURLHandler in the second half. Given the resulting match, we &quot;cleverly&quot; did a replaceAll for both matches at the same time, combining what would be &quot;$2_$3&quot; for the first half and &quot;$4_$5&quot; in the second half.&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: inherit;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: inherit;&quot;&gt;This works fine against Hotspot/OpenJDK and any JVMs that use its class libraries. But Android uses Harmony&#39;s class libraries, and behaves differently. On OpenJDK, unmatched groups returned an empty string &quot;&quot; for the unmatched groups, properly turning &quot;getName&quot; and into &quot;get_name&quot; and &quot;getURLHandler&quot; into get_url_handler&quot;. On Android, however, the unmatched groups return null for the $ variables in replaceAll, causing &quot;getName&quot; to become &quot;getnull_nnullame&quot; and &quot;getURLHandler&quot; into something awful like &quot;getnull_unullrlnull_hnullandler&quot;. Subsequent logic in JRuby that tried to turn methods of the form &quot;get_name&quot; into &quot;name&quot; attributes then failed to execute, causing the issue in the bug report.&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: inherit;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: inherit;&quot;&gt;The fix is a bit cumbersome, but not too difficult to understand: manually walk the matches and appendReplacement using only the groups that matched:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;public static String getRubyCasedName(String javaCasedName) {&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;Matcher m = CAMEL_CASE_SPLITTER.matcher(javaCasedName);&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;// We do this replace loop manually because Android&#39;s Matcher produces null for unmatched $ groups.&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;// See JRUBY-5541&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;if (m.find()) {&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;StringBuffer buffer = new StringBuffer();&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;m.reset();&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;while (m.find()) {&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;if (m.group(2) != null) {&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;// first part matched&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;m.appendReplacement(buffer, &quot;$2_$3&quot;);&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;} else {&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;// second part matched {&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;m.appendReplacement(buffer, &quot;$4_$5&quot;);&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;m.appendTail(buffer);&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;return buffer.toString().toLowerCase();&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;} else {&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;return javaCasedName;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I&#39;m not sure whether Android (Harmony) or OpenJDK is &quot;right&quot; in this case, since the API for Matcher.group &lt;b&gt;does&lt;/b&gt;&amp;nbsp;say it will return null for unmatched groups, but nowhere is it specified if $ variables in replace calls should do the same.</content><link rel='replies' type='application/atom+xml' href='http://blog.headius.com/feeds/1383525904261169024/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://blog.headius.com/2011/03/differing-javautilregexmatcher.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/1383525904261169024'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/1383525904261169024'/><link rel='alternate' type='text/html' href='http://blog.headius.com/2011/03/differing-javautilregexmatcher.html' title='Differing java.util.regex.Matcher Unmatched Group Results on Android'/><author><name>Charles Nutter</name><uri>https://plus.google.com/101599370339210456684</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh5.googleusercontent.com/-VT5b8KsgHgQ/AAAAAAAAAAI/AAAAAAAAFB0/9d5SU9AcHNQ/s512-c/photo.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4704664917418794835.post-6627578193167913100</id><published>2011-02-02T10:14:00.000-08:00</published><updated>2011-02-02T15:12:27.669-08:00</updated><title type='text'>Working Around the Java Double.parseDouble Bug</title><content type='html'>You may have seen recently that Java suffers from a similar &lt;a href=&quot;http://www.exploringbinary.com/java-hangs-when-converting-2-2250738585072012e-308/&quot;&gt;floating-point parsing bug&lt;/a&gt; to the one that recently affected PHP users. The basic gist of it is that for this special 64-bit floating point value, the Java call Double.parseDouble(&quot;2.2250738585072012e-308&quot;) will get stuck in an infinite loop. Read the link above to understand what&#39;s happening.&lt;br /&gt;&lt;br /&gt;Naturally, this affects all JVM languages too, since we all use Double.parseDouble for something or another. In fact, it affects almost all the JVM language parsers and compilers (including javac itself), since they need to turn strings into doubles.&lt;br /&gt;&lt;br /&gt;Being the upright citizens we are on the JRuby team, we figured we&#39;d try to beat Oracle to the punch and patch around the bug, at least for Ruby-land conversions of String to Float.&lt;br /&gt;&lt;br /&gt;I started by looking for calls to Double.parseDouble in JRuby. It turned out there were only two: one for the lexer, and one used by String#to_f, BigDecimal#new, and so on. That was a relief; I expected to find dozens of calls.&lt;br /&gt;&lt;br /&gt;It also turned out all cases had already parsed out Ruby float literal oddities, like underscores, using &#39;d&#39; or &#39;D&#39; for the exponentiation marker, allowing ill-formatted exponents to be treated as zero, and so on.&lt;br /&gt;&lt;br /&gt;My first attempt was to simply normalize the cleaned-up string and pass it to new java.math.BigDecimal(), converting that result back to a primitive double. Unfortunately, BigDecimal&#39;s constructor *also* passes through the offending Double.parseDouble code, and we&#39;re back where we started.&lt;br /&gt;&lt;br /&gt;Ultimately, I ended up with the following code. I make no claims this is efficient, but it appears to pass all the Float tests and specs for JRuby and does not DOS like the bad code in Double.parseDouble:&lt;br /&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;public static double parseDouble(String value) {&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;String normalString = normalizeDoubleString(value);&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;int offset = normalString.indexOf(&#39;E&#39;);&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;BigDecimal base;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;int exponent;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;if (offset == -1) {&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;base = new BigDecimal(value);&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;exponent = 0;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;} else {&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;base = new BigDecimal(normalString.substring(0, offset));&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;exponent = Integer.parseInt(normalString.charAt(offset + 1) == &#39;+&#39; ?&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;normalString.substring(offset + 2) :&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;normalString.substring(offset + 1));&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;return base.scaleByPowerOfTen(exponent).doubleValue();&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;}&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: &#39;Courier New&#39;, Courier, monospace;&quot;&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class=&quot;Apple-style-span&quot; style=&quot;font-family: inherit;&quot;&gt;I didn&#39;t say it was particularly clever or efficient...but there you have it. A few notes:&lt;/span&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Do I really need UNLIMITED precision here? I almost used it to ensure there&#39;s no peculiarities passing through BigDecimal on the way to double, but are any such peculiarities outside 128-bit precision?&lt;/li&gt;&lt;li&gt;It might have been more efficient to normalize the decimal position and exponent and then see if it matched the magic value. But of course this magic value was not known until recently, so why risk there being another one?&lt;/li&gt;&lt;li&gt;Using BigDecimal is also lazy. I am lazy.&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;I welcome improvements. Everyone will probably need to start using code like this, since there will be a lot of unpatched JVMs out there for a long time.&lt;br /&gt;&lt;br /&gt;I&#39;m happy to say JRuby will be the first JVM language to route around the Double.parseDouble bug :)&lt;br /&gt;&lt;br /&gt;&lt;i style=&quot;font-weight: bold;&quot;&gt;Update: &lt;/i&gt;The JRuby commit with this logic is&amp;nbsp;&lt;a href=&quot;https://github.com/jruby/jruby/commit/4c712963885c0117b95066d927520a6a738c2a65&quot;&gt;4c71296&lt;/a&gt;, and the JRuby bug is at&amp;nbsp;&lt;a href=&quot;http://jira.codehaus.org/browse/JRUBY-5441&quot;&gt;http://jira.codehaus.org/browse/JRUBY-5441&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;i style=&quot;font-weight: bold;&quot;&gt;Update:&lt;/i&gt;&amp;nbsp;A commented on Hacker News pointed out that BigDecimal.doubleValue actually just converts to a string and calls Double.parseDouble. So unfortunately, the mechanism above only worked in an earlier version where I was losing some precision by calling Math.pow(10, exponent) rather than scaleByPowerOfTen. The version above unfortunately does &lt;b&gt;not work&lt;/b&gt;, so it&#39;s back to the drawing board. C&#39;est la vie!&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.headius.com/feeds/6627578193167913100/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://blog.headius.com/2011/02/working-around-java-doubleparsedouble.html#comment-form' title='13 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/6627578193167913100'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/6627578193167913100'/><link rel='alternate' type='text/html' href='http://blog.headius.com/2011/02/working-around-java-doubleparsedouble.html' title='Working Around the Java Double.parseDouble Bug'/><author><name>Charles Nutter</name><uri>https://plus.google.com/101599370339210456684</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh5.googleusercontent.com/-VT5b8KsgHgQ/AAAAAAAAAAI/AAAAAAAAFB0/9d5SU9AcHNQ/s512-c/photo.jpg'/></author><thr:total>13</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4704664917418794835.post-8236105658793522179</id><published>2011-01-19T12:31:00.000-08:00</published><updated>2011-01-25T21:44:30.183-08:00</updated><title type='text'>JRuby on Rails on Amazon Elastic Beanstalk</title><content type='html'>Amazon this week announced &lt;a href=&quot;http://aws.amazon.com/elasticbeanstalk/&quot;&gt;Elastic Beanstalk&lt;/a&gt;, a managed &lt;a href=&quot;http://tomcat.apache.org/&quot;&gt;Apache Tomcat&lt;/a&gt; service for AWS. Naturally, I had to try JRuby on it.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;First, the bad:&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;AWSEB is really slow to deploy stuff. Several times it got &quot;stuck&quot; and I waited for more than 30 minutes for it to recover. It did not appear to be an app issue, since the app came up just fine.&lt;/li&gt;&lt;li&gt;The default instance size is t1.micro. I was able to get a Rails app to boot there, but it&#39;s a very underpowered size.&lt;/li&gt;&lt;li&gt;It appears to start up JVMs with 256MB of memory max and 64MB of permgen. For a larger app, or one with many Rails instances, that might not be enough. For a &quot;threadsafe&quot; Rails app, though, it&#39;s plenty.&lt;/li&gt;&lt;li&gt;The default EC2 load balancer for the new Beanstalk instance is set to ping the &quot;/&quot; URL. If you don&#39;t rig up a / route in your Rails app (like I forgot to do) the app will come up for a few minutes and immediately get taken out.&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;And the good news: it works great once you get past the hassles! Here&#39;s the process that worked for my app (assuming app is already build and ready for deploy).&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Preparing the app:&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;Ensure jruby-openssl is in Gemfile. Rails seems to want it in production mode.&lt;/li&gt;&lt;li&gt;Edit config/environments/production.rb to enable threadsafe mode.&lt;/li&gt;&lt;li&gt;`warble`&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;Preparing Elastic Beanstalk:&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;Create a new instance, specifying the .war file Warbler created above as the app to deploy&lt;/li&gt;&lt;li&gt;There is no step two&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;Once the instance has been prepared, you may want to resize it to something larger than t1.micro if it&#39;s meant to be a real app...but it should boot ok.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;div&gt;&lt;a onblur=&quot;try {parent.deselectBloggerImageGracefully();} catch(e) {}&quot; href=&quot;http://i.min.us/ieh0Hc.png&quot;&gt;&lt;img style=&quot;margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 452px; height: 420px;&quot; src=&quot;http://i.min.us/ieh0Hc.png&quot; border=&quot;0&quot; alt=&quot;&quot; /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Have fun!&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.headius.com/feeds/8236105658793522179/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://blog.headius.com/2011/01/jruby-on-rails-on-amazon-elastic.html#comment-form' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/8236105658793522179'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/8236105658793522179'/><link rel='alternate' type='text/html' href='http://blog.headius.com/2011/01/jruby-on-rails-on-amazon-elastic.html' title='JRuby on Rails on Amazon Elastic Beanstalk'/><author><name>Charles Nutter</name><uri>https://plus.google.com/101599370339210456684</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh5.googleusercontent.com/-VT5b8KsgHgQ/AAAAAAAAAAI/AAAAAAAAFB0/9d5SU9AcHNQ/s512-c/photo.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4704664917418794835.post-6160623388426443665</id><published>2011-01-05T16:04:00.000-08:00</published><updated>2011-01-25T21:44:30.202-08:00</updated><title type='text'>Representing Non-Unicode Text on the JVM</title><content type='html'>JRuby is an implementation of Ruby, and in order to achieve the high level of compatibility we boast we&#39;ve had to put in some extra work. Probably the biggest area is in management of String data.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;Strings in Ruby 1.8&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Ruby 1.8 did not differentiate between strings of bytes or strings of characters. A String was just an array of bytes, and the representation of those bytes was dependent on how you used them. You could do regular expression matches against non-text binary sequences (used by some to parse binary formats like PNG). You could treat them as UTF-8 text by setting global encoding variables to assume UTF-8 in literal strings. And you could use the same methods for dealing with strings of bytes you would with strings of characters...split, gsub, and so on. The lack of a character abstraction was painful, but the ability to do character-string operations against byte-strings was frequently useful.&lt;br /&gt;&lt;br /&gt;In order to support all this, we were forced in JRuby to also represent strings as byte[]. This was not an easy decision. Java&#39;s strings are all UTF-16 internally. By moving to byte[]-based strings, we lost many benefits of being on the JVM, like built-in regular expression support, seamless passing of strings to Java methods, and easy interaction with any Java libraries that accept, return, or manipulate Java strings. We eventually had to implement our own regexp engines (or the byte[]-to-char[]-to-byte[] overhead would kill us) and calls from Ruby to Java still pay a cost to pass Strings.&lt;br /&gt;&lt;br /&gt;But we got a lot out of it too. We would not have been able to represent binary content easily with a char[]-based string, since it would either get garbled (when Java tried to decode it) or we&#39;d have to force the data into only the lower byte of each char, doubling the size of all strings in memory. We have some of the fastest String IO capabilities of any JVM language, since we never have to decode text. And most importantly, we&#39;ve achieved an incredibly high level of compatibility with C Ruby that would have been difficult or impossible forcing String data into char[].&lt;br /&gt;&lt;br /&gt;There&#39;s also another major benefit: we can support Ruby 1.9&#39;s &quot;multilingualization&quot;.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;Strings in Ruby 1.9&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Ruby 1.9 still represents all string data as byte[] internally, but it adds an additional field to all strings: Encoding (there&#39;s also &quot;code range&quot;, but it&#39;s merely an internal optimization).&lt;br /&gt;&lt;br /&gt;The biggest problem with encoded text in MRI was that you never knew what a string&#39;s encoding was supposed to be; the String object itself only aggregated byte[], and if you ever ended up with mixed encodings in a given app, you&#39;d be in trouble. Rails even introduced its own &quot;Chars&quot; type specifically to deal with the lack of encoding support.&lt;br /&gt;&lt;br /&gt;In Ruby 1.9, however, Strings know their own encoding. Strings can be forced to a specific encoding or transcoded to another. IO streams are aware of (and configurable for) external and internal encodings, and there&#39;s also default external and internal encodings. And you can still deal with raw binary data in the same structure and with the same String-manipulating features. For a full discussion of encoding support in Ruby 1.9, see &lt;a href=&quot;http://yehudakatz.com/&quot;&gt;Yehuda Katz&lt;/a&gt;&#39;s excellent post on &lt;a href=&quot;http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-the-solution-for-rails/&quot;&gt;Ruby 1.9 Encodings: A Primer&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;As part of JRuby 1.6, we&#39;ve been working on getting much closer to 100% compatible with Ruby 1.9. Of course this has meant working on encoding support. Luckily, we had a hacking machine some years ago in Marcin Mielzynski, who implemented not only our encoding-agnostic regexp engine (a port of Oniguruma from C), but also our byte[]-based String logic and almost all of our Encoding support. The remaining work has trickled in over the subsequent years, leading up the the last few months of heavy activity on 1.9 support.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;How It Works&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;You might find it interesting to know how all this works, since JRuby is a JVM-based language where Strings are supposed to be UTF-16 always.&lt;br /&gt;&lt;br /&gt;First off, String, implemented by the &lt;a href=&quot;https://github.com/jruby/jruby/blob/master/src/org/jruby/RubyString.java&quot;&gt;RubyString&lt;/a&gt; class. RubyString aggregates an Encoding and an array of bytes, using a structure we call &lt;a href=&quot;https://github.com/jruby/bytelist/blob/master/src/org/jruby/util/ByteList.java&quot;&gt;ByteList&lt;/a&gt;. ByteList represents an arbitrary array of bytes, a view into them, and an encoding. All operations against a String operate within RubyString&#39;s code against ByteLists.&lt;br /&gt;&lt;br /&gt;IO streams, implemented by &lt;a href=&quot;https://github.com/jruby/jruby/blob/master/src/org/jruby/RubyIO.java&quot;&gt;RubyIO&lt;/a&gt; (and subclasses) and ChannelStream/ChannelDescriptor, accept and return ByteList instances. ByteList is the text/binary currency of JRuby...our java.lang.String.&lt;br /&gt;&lt;br /&gt;Regexp is implemented in &lt;a href=&quot;http://&quot;&gt;RubyRegexp&lt;/a&gt; using &lt;a href=&quot;https://github.com/jruby/joni&quot;&gt;Joni&lt;/a&gt;, our Java port of the Oniguruma regular expression library. Oniguruma accepts byte arrays and uses encoding-specific information at match time to know what constitutes a character in that byte array. It is the only regular expression engine on the JVM capable of dealing with encodings other than UTF-16.&lt;br /&gt;&lt;br /&gt;The JRuby parser also ties into encoding, using it on a per-file basis to know how to encode each literal string it encounters. Literal strings in the AST are held in &lt;a href=&quot;https://github.com/jruby/jruby/blob/master/src/org/jruby/ast/StrNode.java&quot;&gt;StrNode&lt;/a&gt;, which aggregates a ByteList and constructs new String instances from it.&lt;br /&gt;&lt;br /&gt;The compiler is an interesting case. Ideally we would like all literal strings to still go into the class file&#39;s constant pool, so that they can be loaded quickly and live as part of the class metadata. In order to do this, the byte[]-based string content is forced into a char[], which is forced into a java.lang.String that goes in the constant pool. At load time, we unpack the bytes and return them to a ByteList that knows their actual encoding. Dare I claim that JRuby is the first JVM language to allow representing literal strings in encodings other than UTF-16?&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;What It Means For You&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;At the end of the day, how are you affected by all this? How is your life improved?&lt;br /&gt;&lt;br /&gt;If you are a Ruby user, you can count on JRuby having solid support for Ruby 1.9&#39;s &quot;M17N&quot; strings. That may not be complete for JRuby 1.6, but we intend to take it as far as possible. JRuby 1.6 *will* have the lion&#39;s share of M17N in-place and working.&lt;br /&gt;&lt;br /&gt;If you are a JVM user...JRuby represents the *only* way you can deal with arbitrarily-encoded text without converting it to UTF-16 Unicode. At a minimum, this means JRuby has the potential to deal with raw wire data much more efficiently than libraries that have to up-convert to UTF-16 and downconvert back to UTF-8. It may also mean encodings without complete representation in Unicode (like Japanese &quot;emoji&quot; characters) can *only* be losslessly processed using JRuby, since forcing them into UTF-16 would either destroy them or mangle characters. And of course no other JVM language provides JRuby&#39;s capabilities for using String-like operations against arbitrary binary data. That&#39;s gotta be worth something!&lt;br /&gt;&lt;br /&gt;I want to also take this opportunity to again thank Marcin for his work on JRuby in the past; Tom Enebo for his suffering through encoding-related parser work the past few weeks; Yukihiro &quot;Matz&quot; Matsumoto for adding encoding support to Ruby; and all JRuby committers and contributors who have helped us sort out M17N for JRuby 1.6.</content><link rel='replies' type='application/atom+xml' href='http://blog.headius.com/feeds/6160623388426443665/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://blog.headius.com/2011/01/representing-non-unicode-text-on-jvm.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/6160623388426443665'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/6160623388426443665'/><link rel='alternate' type='text/html' href='http://blog.headius.com/2011/01/representing-non-unicode-text-on-jvm.html' title='Representing Non-Unicode Text on the JVM'/><author><name>Charles Nutter</name><uri>https://plus.google.com/101599370339210456684</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh5.googleusercontent.com/-VT5b8KsgHgQ/AAAAAAAAAAI/AAAAAAAAFB0/9d5SU9AcHNQ/s512-c/photo.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4704664917418794835.post-2769363753545887472</id><published>2011-01-04T03:31:00.000-08:00</published><updated>2011-01-25T21:44:30.216-08:00</updated><title type='text'>Flat and Graph Profiles for JRuby 1.6</title><content type='html'>&lt;div&gt;Sometimes it&#39;s the little things that make all the difference in the world.&lt;/div&gt;&lt;br /&gt;&lt;div&gt;For a long time, we&#39;ve extolled the virtues of the amazing JVM tool ecosystem. There&#39;s literally dozens of profiling, debugging, and monitoring tools, making JRuby perhaps the best Ruby tool ecosystem you can get. But there&#39;s a surprising lack of tools for command-line use, and that&#39;s an area many Rubyists take for granted.&lt;/div&gt;&lt;br /&gt;&lt;div&gt;To help improve the situation, we recently got the ruby-debug maintainers to ship our JRuby version, so&lt;a href=&quot;http://blog.headius.com/2010/12/jruby-finally-installs-ruby-debug-gem.html&quot;&gt; JRuby has easy-to-use command-line Ruby debugging support&lt;/a&gt;. You can simply &quot;gem install ruby-debug&quot; now, so we&#39;ll stop shipping it in JRuby 1.6.&lt;/div&gt;&lt;br /&gt;&lt;div&gt;We also shipped a basic &quot;flat&quot; instrumented profiler for JRuby 1.5.6. It&#39;s almost shocking how few command-line profiling tools there are available for JVM users; most require you to boot up a GUI and click a bunch of buttons to get any information at all. Even when there are tools for profiling, they&#39;re often bad at reporting results for non-Java languages like JRuby. So we decided to whip out a --profile flag that gives you a basic, flat, instrumented profile of your code.&lt;/div&gt;&lt;br /&gt;&lt;div&gt;To continue this trend, we enlisted the help of &lt;a href=&quot;http://danlucraft.com/&quot;&gt;Dan Lucraft&lt;/a&gt; of the &lt;a href=&quot;http://redcareditor.com/&quot;&gt;RedCar&lt;/a&gt; project to expand our profiler to include &quot;graph&quot; profiling results. Dan previously implemented JRuby support for the &quot;ruby-prof&quot; project, a native extension to C Ruby, in the form of &quot;&lt;a href=&quot;https://github.com/danlucraft/jruby-prof&quot;&gt;jruby-prof&lt;/a&gt;&quot; (which you can install and use today on any recent JRuby release). He was a natural to work on the built-in profiling support.&lt;/div&gt;&lt;br /&gt;&lt;div&gt;For the uninitiated, &quot;flat&quot; profiles just show how much time each method body takes, possibly with downstream aggregate times and total aggregate times. This is what you usually get from built-in command-line profilers like the &quot;hprof&quot; profiler that ships with Hotspot/OpenJDK. Here&#39;s a &quot;flat&quot; profile for a simple piece of code.&lt;/div&gt;&lt;br /&gt;&lt;div&gt;&lt;pre&gt;~/projects/jruby ➔ jruby --profile.flat -e &quot;def foo; 100000.times { (2 ** 200).to_s }; end; foo&quot;&lt;br /&gt;Total time: 0.99&lt;br /&gt;&lt;br /&gt;    total        self    children       calls  method&lt;br /&gt;----------------------------------------------------------------&lt;br /&gt;     0.99        0.00        0.99           1  Object#foo&lt;br /&gt;     0.99        0.08        0.90           1  Fixnum#times&lt;br /&gt;     0.70        0.70        0.00      100000  Bignum#to_s&lt;br /&gt;     0.21        0.21        0.00      100000  Fixnum#**&lt;br /&gt;     0.00        0.00        0.00         145  Class#inherited&lt;br /&gt;     0.00        0.00        0.00           1  Module#method_added&lt;/pre&gt;&lt;/div&gt;&lt;br /&gt;&lt;div&gt;A &quot;graph&quot; profile shows the top N call stacks from your application&#39;s run, breaking them down by how much time is spent in each method. It gives you a more complete picture of where time is being spent while running your application. Here&#39;s a &quot;graph&quot; profile (abbreviated) for the same code.&lt;/div&gt;&lt;br /&gt;&lt;div&gt;&lt;pre&gt;~/projects/jruby ➔ jruby --profile.graph -e &quot;def foo; 100000.times { (2 ** 200).to_s }; end; foo&quot;&lt;br /&gt;%total   %self    total        self    children                 calls  name&lt;br /&gt;---------------------------------------------------------------------------------------------------------&lt;br /&gt;100%     0%        1.00        0.00        1.00                     0  (top)&lt;br /&gt;                   1.00        0.00        1.00                   1/1  Object#foo&lt;br /&gt;                   0.00        0.00        0.00               145/145  Class#inherited&lt;br /&gt;                   0.00        0.00        0.00                   1/1  Module#method_added&lt;br /&gt;---------------------------------------------------------------------------------------------------------&lt;br /&gt;                   1.00        0.00        1.00                   1/1  (top)&lt;br /&gt; 99%     0%        1.00        0.00        1.00                     1  Object#foo&lt;br /&gt;                   1.00        0.09        0.91                   1/1  Fixnum#times&lt;br /&gt;---------------------------------------------------------------------------------------------------------&lt;br /&gt;                   1.00        0.09        0.91                   1/1  Object#foo&lt;br /&gt; 99%     8%        1.00        0.09        0.91                     1  Fixnum#times&lt;br /&gt;                   0.70        0.70        0.00         100000/100000  Bignum#to_s&lt;br /&gt;                   0.21        0.21        0.00         100000/100000  Fixnum#**&lt;br /&gt;---------------------------------------------------------------------------------------------------------&lt;br /&gt;                   0.70        0.70        0.00         100000/100000  Fixnum#times&lt;br /&gt; 69%    69%        0.70        0.70        0.00                100000  Bignum#to_s&lt;br /&gt;---------------------------------------------------------------------------------------------------------&lt;br /&gt;                   0.21        0.21        0.00         100000/100000  Fixnum#times&lt;br /&gt; 21%    21%        0.21        0.21        0.00                100000  Fixnum#**&lt;br /&gt;---------------------------------------------------------------------------------------------------------&lt;br /&gt;                   0.00        0.00        0.00               145/145  (top)&lt;br /&gt;  0%     0%        0.00        0.00        0.00                   145  Class#inherited&lt;br /&gt;---------------------------------------------------------------------------------------------------------&lt;br /&gt;                   0.00        0.00        0.00                   1/1  (top)&lt;br /&gt;  0%     0%        0.00        0.00        0.00                     1  Module#method_added&lt;/pre&gt;&lt;/div&gt;&lt;br /&gt;&lt;div&gt;As you can see, you get a much better picture of why certain methods are taking up time and what component calls are contributing most to that time.&lt;/div&gt;&lt;br /&gt;&lt;div&gt;We haven&#39;t settled on the final command-line flags, but look for the new graph profiling (and the cleaned-up flat profile) to ship with JRuby 1.6 (real soon now!)&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://blog.headius.com/feeds/2769363753545887472/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://blog.headius.com/2011/01/flat-and-graph-profiles-for-jruby-16.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/2769363753545887472'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/2769363753545887472'/><link rel='alternate' type='text/html' href='http://blog.headius.com/2011/01/flat-and-graph-profiles-for-jruby-16.html' title='Flat and Graph Profiles for JRuby 1.6'/><author><name>Charles Nutter</name><uri>https://plus.google.com/101599370339210456684</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh5.googleusercontent.com/-VT5b8KsgHgQ/AAAAAAAAAAI/AAAAAAAAFB0/9d5SU9AcHNQ/s512-c/photo.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4704664917418794835.post-8270769560000473811</id><published>2010-12-23T14:01:00.000-08:00</published><updated>2011-01-25T21:44:30.233-08:00</updated><title type='text'>Improved JRuby Startup by Deferring Gem Plugins</title><content type='html'>Another present for you JRubyists out there!&lt;br /&gt;&lt;br /&gt;JRuby has had notoriously bad startup times. Not as bad as, say, IronRuby (sorry guys!), but definitely a big fat hit every time you need to run some Ruby code from the command line. Some of this overhead was related to JRuby, and we&#39;ve steadily worked to improve that over the years. Some of it is due to the JVM, most commonly due to running on the &quot;server&quot; Hotspot VM or another JVM that does not have an interpreter (both of which start up considerably slower than Hotspot/OpenJDK&#39;s &quot;client&quot; mode). I&#39;ve blogged &lt;a href=&quot;http://blog.headius.com/2010/03/jruby-startup-time-tips.html&quot;&gt;tips and tricks for JRuby startup&lt;/a&gt; before, and these mostly apply to vanilla JRuby startup performance.&lt;br /&gt;&lt;br /&gt;However, a large part of the overhead was not specifically due to JRuby or the JVM, but to RubyGems. RubyGems in version 1.3 added support for &quot;plugins&quot;, whereby gems could include a specially-named file to extend the functionality of RubyGems itself. Most of these plugins added command-line tools like &quot;gem push&quot; for pushing a new gem to gemcutter.org (now built-in for pushing to rubygems.org). Unfortunately, the feature was originally added by having RubyGems do a full scan of all installed gems on every startup. If you only had a few gems, this was a minor problem. If you had more than a few, it became a big fat O(N) problem, where each of those N could be arbitrarily complex in themselves.&lt;br /&gt;&lt;br /&gt;The good news is that it looks like my proposed change – &lt;a href=&quot;http://rubyforge.org/pipermail/rubygems-developers/2010-December/005898.html&quot;&gt;making plugin scanning happen *only* when using the &quot;gem&quot; command&lt;/a&gt; – appears likely to be approved for RubyGems 1.4, due out reasonably soon.&lt;br /&gt;&lt;br /&gt;Here&#39;s the &lt;a href=&quot;https://gist.github.com/751969&quot;&gt;patch&lt;/a&gt; and the impact to RubyGems startup times are below. The first two times are without the patch, with the first time against a &quot;cold&quot; filesystem. The final time is with the patch in place. In all cases, it&#39;s against my local JRuby working copy, which has around 500 gems installed.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;~/projects/jruby ➔ jruby -e &quot;t = Time.now; require &#39;rubygems&#39;; puts Time.now - t&quot;&lt;br /&gt;17.09&lt;br /&gt;&lt;br /&gt;~/projects/jruby ➔ jruby -e &quot;t = Time.now; require &#39;rubygems&#39;; puts Time.now - t&quot;&lt;br /&gt;6.959&lt;br /&gt;&lt;br /&gt;~/projects/jruby ➔ git stash pop&lt;br /&gt;# On branch master&lt;br /&gt;# Changed but not updated:&lt;br /&gt;#   (use &quot;git add &amp;lt;file&amp;gt;...&quot; to update what will be committed)&lt;br /&gt;#   (use &quot;git checkout -- &amp;lt;file&amp;gt;...&quot; to discard changes in working directory)&lt;br /&gt;#&lt;br /&gt;# modified:   lib/ruby/site_ruby/1.8/rubygems.rb&lt;br /&gt;# modified:   lib/ruby/site_ruby/1.8/rubygems/gem_runner.rb&lt;br /&gt;...&lt;br /&gt;&lt;br /&gt;~/projects/jruby ➔ jruby -e &quot;t = Time.now; require &#39;rubygems&#39;; puts Time.now - t&quot;&lt;br /&gt;0.481&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;It&#39;s truly a shocking difference, and it&#39;s easy to see why JRuby (plus RubyGems) has had such a bad startup-time reputation.&lt;br /&gt;&lt;br /&gt;I&#39;ve already made this change locally to JRuby&#39;s copy of RubyGems, which should help any users working against JRuby master. The change will almost certainly ship in JRuby 1.6, with RCs showing up in the next couple weeks. So with this change and my &lt;a href=&quot;http://blog.headius.com/2010/03/jruby-startup-time-tips.html&quot;&gt;JRuby startup tips&lt;/a&gt;, we&#39;re on the road to a much more pleasant JRuby experience.&lt;br /&gt;&lt;br /&gt;Happy Hacking!</content><link rel='replies' type='application/atom+xml' href='http://blog.headius.com/feeds/8270769560000473811/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://blog.headius.com/2010/12/improved-jruby-startup-by-deferring-gem.html#comment-form' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/8270769560000473811'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/8270769560000473811'/><link rel='alternate' type='text/html' href='http://blog.headius.com/2010/12/improved-jruby-startup-by-deferring-gem.html' title='Improved JRuby Startup by Deferring Gem Plugins'/><author><name>Charles Nutter</name><uri>https://plus.google.com/101599370339210456684</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh5.googleusercontent.com/-VT5b8KsgHgQ/AAAAAAAAAAI/AAAAAAAAFB0/9d5SU9AcHNQ/s512-c/photo.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4704664917418794835.post-518658072475549538</id><published>2010-12-23T13:31:00.000-08:00</published><updated>2011-01-25T21:44:30.245-08:00</updated><title type='text'>JRuby Finally Installs ruby-debug Gem</title><content type='html'>This should be a great Christmas present for many of you.&lt;br /&gt;&lt;br /&gt;After over three years, the &quot;ruby-debug&quot; gem finally installs properly on JRuby.&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;~/projects/jruby ➔ gem install ruby-debug&lt;br /&gt;Successfully installed ruby-debug-base-0.10.4-java&lt;br /&gt;Successfully installed ruby-debug-0.10.4&lt;br /&gt;2 gems installed&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;Back in 2007, folks working on NetBeans, Eclipse, and IntelliJ support for Ruby came together to build a new version of the ruby-debug backend that would work on JRuby. They shared the effort, we JRuby guys added features they needed to do a clean port of the ruby-debug C code, and the ultimate result was a ruby-debug-base gem that isolated the platform-specific bits.&lt;br /&gt;&lt;br /&gt;For whatever reason, the JRuby version of ruby-debug-base never got pushed as a real, live &quot;-java&quot; gem. This meant that you had to download ruby-debug-base-VERSION-java.gem yourself to get ruby-debug to install. To make this process easier, we even shipped ruby-debug and ruby-debug-base preinstalled in JRuby 1.5.&lt;br /&gt;&lt;br /&gt;Unfortunately, this was only a partial answer. Many libraries and applications want to install all their dependencies clean. If one of those dependencies was ruby-debug, it would fail to install. Rails even includes special JRuby-specific lines in its default Bundler Gemfile to exclude ruby-debug when bundling on JRuby.&lt;br /&gt;&lt;br /&gt;All that nonsense ends today. Rocky Bernstein, one of the maintainers of the ruby-debug gem, agreed to push our ruby-debug-base to the canonical rubygems.org repository. As a result, ruby-debug now installs properly on JRuby. It only took three years to get that gem pushed (by nobody&#39;s fault...I think everyone expected everyone else to follow through on it).&lt;br /&gt;&lt;br /&gt;Merry Christmas, Happy Chanukah, Joyous Kwanza, and enjoy your holiday season!</content><link rel='replies' type='application/atom+xml' href='http://blog.headius.com/feeds/518658072475549538/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://blog.headius.com/2010/12/jruby-finally-installs-ruby-debug-gem.html#comment-form' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/518658072475549538'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/518658072475549538'/><link rel='alternate' type='text/html' href='http://blog.headius.com/2010/12/jruby-finally-installs-ruby-debug-gem.html' title='JRuby Finally Installs ruby-debug Gem'/><author><name>Charles Nutter</name><uri>https://plus.google.com/101599370339210456684</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh5.googleusercontent.com/-VT5b8KsgHgQ/AAAAAAAAAAI/AAAAAAAAFB0/9d5SU9AcHNQ/s512-c/photo.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4704664917418794835.post-445107713335760346</id><published>2010-12-23T11:43:00.000-08:00</published><updated>2011-01-25T21:44:30.258-08:00</updated><title type='text'>Quick Thoughts on Oracle/Apache and the Java TCK</title><content type='html'>This was going to be a reply to a friend on Twitter, when I realized it would be several tweets and I might as well put them in one place.&lt;br /&gt;&lt;br /&gt;Grant Michaels (@grantmichaels) sent me this link to the &lt;a href=&quot;http://jcp.org/aboutJava/communityprocess/summaries/2010/October2010-public-minutes.html&quot;&gt;October JCP (Java Community Process) EC (Expert Committee) meeting notes&lt;/a&gt;. The Apache/&lt;a href=&quot;http://en.wikipedia.org/wiki/Technology_Compatibility_Kit&quot;&gt;TCK (Technology Compatibility Kit)&lt;/a&gt; issue was discussed at length.&lt;br /&gt;&lt;br /&gt;For those of you in the dark, Apache recently resigned from the JCP because of the ongoing dispute over their &quot;&lt;a href=&quot;http://harmony.apache.org/&quot;&gt;Harmony&lt;/a&gt;&quot; OSS (Open-Source Software) Java implementation&#39;s inability to get an unencumbered license to the Java TCK. Passing the TCK is a requirement for an implementation to officially be accepted as &quot;Java&quot;.&lt;br /&gt;&lt;br /&gt;I had heard about this problem from a distance, but only recently started to understand its complexity. The TCK includes FOU (Field of Use) clauses preventing TCK-tested implementations other than OpenJDK from being released as open-source. Only implementations &quot;largely&quot; based on OpenJDK (Open Java Development Kit, Sun&#39;s GPLed &quot;Hotspot&quot; VM and class libraries) are allowed to get around this requirement. Apache&#39;s Harmony, being entirely independent and &lt;a href=&quot;http://www.apache.org/licenses/LICENSE-2.0.html&quot;&gt;Apache-licensed&lt;/a&gt;, does not qualify.&lt;br /&gt;&lt;br /&gt;If that were all there is to it, Apache would not have any real grounds for complaining. But the &lt;a href=&quot;http://jcp.org/aboutJava/communityprocess/JSPA2.pdf&quot;&gt;JSPA (Java Specification Participation Agreement, PDF)&lt;/a&gt; requires only unencumbered specifications, reference implementations, and test kits be submitted to the JCP. This sets the stage for an ugly licensing battle that has stymied Java&#39;s progress for over three years.&lt;br /&gt;&lt;br /&gt;I&#39;m not going to do a tl;dr post on this like I did with &lt;a href=&quot;http://blog.headius.com/2010/08/my-thoughts-on-oracle-v-google.html&quot;&gt;Oracle/Google&lt;/a&gt;. Instead, just a few quick thoughts as I read through the notes.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;Overview&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The first thing you will notice is that this isn&#39;t a simple cut-and-dried issue. The meeting notes express Oracle&#39;s position as outwardly fearful of Harmony leading to many downstream forks, with no recourse for asserting they fulfill the requirements of the Java specification. There seems to be an implied stab at Android here, which uses Harmony&#39;s class libraries atop the Dalvik VM to implement a substantial portion (but not all) of what looks like Java SE 5. Oracle states this decision is final; Apache will not be granted an unencumbered TCK license. Oracle is not known for changing their minds.&lt;br /&gt;&lt;br /&gt;Several EC members, including Doug Lea and Josh Bloch, point out that it&#39;s fairly clear the encumbered TCK violates the JSPA&#39;s openness clauses. Oracle refuses to comment on this &quot;legal&quot; matter. Doug suggests that EC members might be able to vote to move Java forward with a clearer conscience if the JSPA were amended to make the encumbered TCK &quot;legal&quot;.&lt;br /&gt;&lt;br /&gt;Another point brought up by several members is the frustration that they have to deal with licensing at all. They recall a golden age of the JCP where it actually voted on technical matters rather than arguing over licensing.&lt;br /&gt;&lt;br /&gt;IBM declares they are unhappy with this decision, but even more unhappy that the Java platform has stagnated because of it for so long. IBM would eventually go on to &lt;a href=&quot;http://www.jcp.org/en/jsr/results?id=5111&quot;&gt;vote &quot;yes&quot; to the disputed Java 7 JSR&lt;/a&gt;, even in the presence of the apparent JSPA violation.&lt;br /&gt;&lt;br /&gt;Apache representative and longtime Harmony advocate Geir Magnusson also weighed in. He argued that the health of the platform would only be bolstered by allowing for many independent open-source implementations, and damaged by disallowing them. When asking for a clarification of why OpenJDK gets a free pass, Adam Messinger (Oracle) stated that he didn&#39;t want to answer a legal question, but that OpenJDK&#39;s GPL (Gnu Public License) requires reciprocity from downstream forks, reducing the damage and confusion they might cause if released publicly without full spec compliance (I&#39;m paraphrasing based on the notes here).&lt;br /&gt;&lt;br /&gt;Toward the end of the licensing discussion, Adam again called for all memory organizations to participate in OpenJDK. It&#39;s fairly clear from these notes and from previous announcements and discussions that Oracle intends for OpenJDK to be the &quot;one true OSS Java&quot;, and for all comers to contribute to it. They even managed to get Apple and IBM, longtime GPL foes, to join the family. Apache doesn&#39;t do GPL, and so Apache will not contribute to or base projects off OpenJDK.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;No Right Answer&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Now we enter entirely into the realm of my own opinions.&lt;br /&gt;&lt;br /&gt;At a glance, I hate Oracle for disallowing free use of the TCK. Nobody should be disallowed from implementing their own OSS Java. Harmony is a promising project that would have a promising future if the cloud of licensing, patent protection, and &quot;compliance&quot; could be lifted. It seems that&#39;s nearly impossible now.&lt;br /&gt;&lt;br /&gt;I also hate to see a good project like Harmony &quot;die&quot; or be stuck in legal limbo. I&#39;m sure dozens of developers have poured their hearts into Harmony, and they deserve to see it thrive.&lt;br /&gt;&lt;br /&gt;On the other hand, I applaud Oracle for so vigorously promoting OpenJDK. OpenJDK is certainly a more mature JVM and class library than Harmony, having been Sun&#39;s official Java implementation for years and years. Bringing IBM and Apple into OpenJDK will help ensure it moves forward on all platforms of note. If you look only at OpenJDK, OSS Java is stronger now than it ever has been.&lt;br /&gt;&lt;br /&gt;Oracle may have a point with the forking concerns. Because Apache&#39;s license is so open that anyone can create and release binary-only forks, it would in theory be possible to speckle the Java landscape with Harmony derivatives that are incompatible in subtle ways. Possible, but perhaps unlikely.&lt;br /&gt;&lt;br /&gt;Ultimately, I feel sad about the direction Oracle has taken, but I&#39;m bolstered by hopes that OpenJDK is going to really thrive and grow, and as a result &quot;Java&quot; will continue to see widespread use and adoption across a variety of platforms.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;Other Ways Out&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Now I go into pure speculation-mode.&lt;br /&gt;&lt;br /&gt;A key problem with the TCK restriction is the fact that the Java specification provides patent grants only to compliant implementations. In other words, if you don&#39;t want an Oracle/Google-esque lawsuit once you have billions in the bank, you need to be compliant with the one true TCK. In order to be compliant with the TCK, you can&#39;t release your implementation as open source. The patent grant is obviously intended to ensure that third-party implementations are &quot;really&quot; Java.&lt;br /&gt;&lt;br /&gt;There&#39;s a bit of a chicken-and-egg thing here. What if Harmony could pass the TCK? In theory, they may be at that point right now. Does the ability to pass the TCK but the inability to run it without tainting mean they&#39;re Java or not? If an unrelated third-party forked Harmony into &quot;Barmony&quot;, acquired a TCK license, and proved that it passed...would that mean Harmony could be considered compliant without ever having run the TCK?&lt;br /&gt;&lt;br /&gt;What if, as in Android, Harmony simply moved forward without claiming they were compliant? Oracle could eventually club them to death with patent bats. Perhaps such a legal battle would force the legal remifications of the JSPA violation to be addressed in court? Could Oracle survive a legal test of their violation in attempting to back up a patent suit?&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;What It Means For You&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The ultimate question is how this affects Java developers today and going forward.&lt;br /&gt;&lt;br /&gt;By most estimates, 99% of the worlds Java developers run on one of the standard Java implementations from Oracle, IBM, and other licensees. A massive number of them run atop Hotspot, either in an old closed-source Java 5/6 form or in an OpenJDK 6/7 form. &quot;Nobody&quot; runs Harmony, and so few if any day-to-day Java developers will be affected. That doesn&#39;t excuse the situation, but it does soften the actual damage caused.&lt;br /&gt;&lt;br /&gt;Android is another peculiar case. It would be difficult for it to be tested compliant, even if there weren&#39;t FOU restrictions on doing so. It uses Harmony libraries but Dalvik VM. It is also a massive force in mobile development now, and killing it would likely put the final nail in mobile Java&#39;s coffin. Oracle has to know this. Oracle also has to know that killing Android would hand the mobile keys over to Apple and Microsoft forever. Could Android switch to using OpenJDK-based class libraries? Would that qualify it as being &quot;largely&quot; based on OpenJDK (noting that the class libraries are the vast majority of the code in OpenJDK)? Oracle/Google is likely to be stuck in court for a long time, while Android continues to expand into televisions and tablets along with telephones.&lt;br /&gt;&lt;br /&gt;How about projects that build atop Java, like JRuby? Perhaps even higher a percentage of JRuby users are already running atop OpenJDK or an OpenJDK derivative like IcedTea or SoyLatte. Oracle pushing OpenJDK will only benefit those users.&lt;a href=&quot;http://ruboto.org/&quot;&gt; Ruboto (JRuby on Android)&lt;/a&gt; will follow whatever path Android itself ends up following, and nobody can see that future yet...but it seems unlikely Ruboto will ever die since it&#39;s unlikely Android will ever die.&lt;br /&gt;&lt;br /&gt;In closing...I encourage everyone to read the EC notes and gather as much information as they can before claiming this is now the final &quot;death&quot; of Java. I also encourage everyone to contribute thoughts, clarifications, and speculation in the comments here.&lt;br /&gt;&lt;br /&gt;Have a happy holiday!</content><link rel='replies' type='application/atom+xml' href='http://blog.headius.com/feeds/445107713335760346/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://blog.headius.com/2010/12/quick-thoughts-on-oracleapache-and-java.html#comment-form' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/445107713335760346'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/445107713335760346'/><link rel='alternate' type='text/html' href='http://blog.headius.com/2010/12/quick-thoughts-on-oracleapache-and-java.html' title='Quick Thoughts on Oracle/Apache and the Java TCK'/><author><name>Charles Nutter</name><uri>https://plus.google.com/101599370339210456684</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh5.googleusercontent.com/-VT5b8KsgHgQ/AAAAAAAAAAI/AAAAAAAAFB0/9d5SU9AcHNQ/s512-c/photo.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4704664917418794835.post-4895478538024829801</id><published>2010-09-23T01:39:00.000-07:00</published><updated>2011-01-25T21:44:30.273-08:00</updated><title type='text'>Predator and Prey</title><content type='html'>(This is a repost of an article I wrote in 2004, which I stumbled upon this evening and thought worthy of a reprint. Feel free to rip it up and offer your own commentary. I think it is still 100% valid.)&lt;br /&gt;&lt;br /&gt;I came up with the most compelling idea for a Disney-style film the other day. (Ok, perhaps not the most compelling idea, but certainly a fair shot at one)&lt;br /&gt;&lt;br /&gt;Over the years I&#39;ve heard a number of biologists (ecologists, environmentalists, what have you) comment on (as in expound endlessly upon) something called the &quot;Bambi Syndrome.&quot; Simply put, the &quot;Bambi Syndrome&quot; is brought about by cutesy, utopian images of nature, where only unexpected, amorphous entities (usually accompanied by menacing percussion or something equally non-musical) can embody &quot;evil&quot;; it is a view that, in all its splendor and glory, &quot;nature&quot; is &quot;good,&quot; while &quot;man&quot; is &quot;bad.&quot; The parallel between this viewpoint and several (all?) nature-based Disney films is apparent (although it should be said that Disney is far from being the only perpetrator of &quot;Bambiism&quot;).&lt;br /&gt;&lt;br /&gt;So then, you ask, if nature isn&#39;t &quot;good&quot;, then what is it? Evil and good are purely human constructions. Truth be told, nothing that exists is innately &quot;good&quot; or &quot;evil&quot;. These concepts exist only in the eye of the beholder: to the prey, the successful predator is evil; to the predator, the successful prey is evil.&lt;br /&gt;&lt;br /&gt;It could then be considered a great disservice to continue teaching these false ideals to our children, no? This has been my opinion, and I have tried to take an approach with my own son of presenting these facts of nature in as unbiased a way as possible--whence springs the compelling idea.&lt;br /&gt;&lt;br /&gt;Take a typical Disney movie; its clear definition of &quot;good&quot; and &quot;evil&quot; and its even clearer illustration of which roles fall into which category. This movie would begin the same. Also typically, it would be based in nature, perhaps at a very low stratus of the animal kingdom. Predator and prey would be represented by species A (the &quot;good&quot; prey) and B (the &quot;evil&quot; predator). A typical scene ensues, a contest between good and evil, predator and prey. The predator&#39;s evil nature is clearly illustrated here, but atypically, the predator wins.&lt;br /&gt;&lt;br /&gt;Just as people in the audience are questioning their faith in Hollywood, we move up one stratum. The evil predator, returning home with the spoils of war, becomes a gentle, caring mother. She was not simply an &quot;evil&quot; aggressor, bent on death and destruction, but a doting, protective mother, expending her own effort, at risk of her life, to care for her childen. In this way, stratum after stratum, &quot;evil&quot; becomes &quot;good&quot;, and the elaborate network that makes up our natural system becomes more recognizable for the purity, neutrality, simplicity of its form.&lt;br /&gt;&lt;br /&gt;Finally, as you would expect in such a movie, we would arrive at the most prolific of the Great Apes: man. Illustrating that all kingdoms on earth are becoming man&#39;s prey, with as much tree-hugging, granola-chomping tripe as possible to make sure we, the lords of creation, masters of destiny, killers of all, Shiva to nature&#39;s Brahma , are shown--incontrovertably--as the only pure &quot;evil&quot; on earth, the movie careens ever faster toward some measure of certainty: &quot;Ahh, now I understand the film&#39;s message.&quot;&lt;br /&gt;&lt;br /&gt;But man is just another spoke in the wheel. We can easily flip the coin, showing mothers feeding, defending children, innocents preyed upon by murderers, hunters taking prey not for food, but for the feeding of other hungers. We do what we do not out of pure evil, but because it is our capacity to do so to further our own species, further our goals, perpetuate. But we also have a capacity no other species possesses: the ability to create our own destinies. The only true evil we encounter in a world where we nearly reign supreme is ourselves. We daily pit our most animal desires--acqusition of resources and destruction of usurpers--against our knowledge that such desires run rampant will complicate our path through history, perhaps even terminating it. Can such a machine be affected by the changing opinions of a few small components? That is the question we leave for the viewers.&lt;br /&gt;&lt;br /&gt;The challenge in such a film would almost certainly be not overplaying the hand. No evil must ever appear to be of any different motivation than its antithesis; and man must, in the end, appear as the most schizophrenic creature on Earth. Our &quot;evil&quot; predatory instincts must be tempered by the &quot;good&quot; effects of our fear of intimate and ultimate mortality for us to continue indefinitely. In this, man has another trait not found among the animals: Our system balances on our own decisions alone. With the capacity we will soon possess to control nature completely, without fear of predators, we can only undo ourselves. The balance comes from within.&lt;br /&gt;&lt;br /&gt;Where will the viewer lie?&lt;br /&gt;&lt;br /&gt;I&#39;d hope every kid was as confused as possible by then; and eventually a bit more suspicious of being told what is &quot;good&quot; or &quot;evil&quot;.</content><link rel='replies' type='application/atom+xml' href='http://blog.headius.com/feeds/4895478538024829801/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://blog.headius.com/2010/09/predator-and-prey.html#comment-form' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/4895478538024829801'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/4895478538024829801'/><link rel='alternate' type='text/html' href='http://blog.headius.com/2010/09/predator-and-prey.html' title='Predator and Prey'/><author><name>Charles Nutter</name><uri>https://plus.google.com/101599370339210456684</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh5.googleusercontent.com/-VT5b8KsgHgQ/AAAAAAAAAAI/AAAAAAAAFB0/9d5SU9AcHNQ/s512-c/photo.jpg'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4704664917418794835.post-4344974410425965514</id><published>2010-08-15T18:53:00.000-07:00</published><updated>2011-01-25T21:44:30.290-08:00</updated><title type='text'>My Thoughts on Oracle v Google</title><content type='html'>As you&#39;ve probably heard by now, Oracle has decided to file suit against Google, claiming multiple counts of infringement against Java or JVM patents and copyrights they acquired when they assimilated Sun Microsystems this past year. Since I&#39;m unlikely to keep my mouth shut about even trivial matters, something this big obviously requires at least a couple thousand words.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:x-large;&quot;&gt;Who Am I?&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Any post of this nature really requires an author to identify where they stand, so their unavoidable biases can be taken with the appropriate dosage of salt. Rather than having you dig through my past and learn who and what I am, I&#39;ll just lay it out here.&lt;br /&gt;&lt;br /&gt;I am a Java developer. I&#39;ve been a Java developer since 1996 or so, when I got my first University job writing stupid little applets in this new-fangled web language. That job expanded into a web development position, also using Java, and culminated with me joining a few senior developers for a 6-month shared development gig with IBM&#39;s then-nascent Pacific Development Center in Vancouver, BC. Since then I&#39;ve had a string of Java-related jobs...some as a trenches developer, some as a lead, some as &quot;architect&quot;, but all of them heavily wrapped up in this thing called Java. And I can&#39;t say that I&#39;ve ever been particularly annoyed with Java as a language or a platform. Perhaps I haven&#39;t spent enough time on other runtimes, or perhaps I&#39;ve got tunnel-vision after being a Java developer for so many years. But I&#39;d like to think that I&#39;ve become &quot;seasoned&quot; enough as a developer to realize no platform is perfect, and the manifold benefits of the JVM and the Java platform vastly outweigh the troublesome aspects.&lt;br /&gt;&lt;br /&gt;I am an open-source developer. In the late 90s, I worked in earnest on my first open-source project: the LiteStep desktop replacement for Windows. At the time, the LiteStep project was a loosely-confederated glob of C code and amateur C hackers. Being a Windows user at the time, I was looking to improve my situation...specifically, I had worked for years on a small application called Hack-It that exposed aspects of the win32 API normally unavailable through standard Windows UI elements, and I was interested in taking that further. LiteStep was not my creation. It had many developers before me and many after, but my small contribution to the project was an almost complete rewrite in amateur-friendly C++ and a decoupling of the core LiteStep &quot;kernel&quot; from the various plugin mechanisms. I was also interviewed for a Wired article on the then-new domain of &quot;skinning&quot; computers, desktops, applications, and so on, though none of my quotes made it into the article. After LiteStep, I fell back into mostly anonymous corporate software development, all still using Java and many open-source technologies, but not much of a visible presence in the OSS world. Then, in 2004 while working as the lead &quot;Java EE Architect&quot; for a multi-million-dollar US government contract, I found JRuby.&lt;br /&gt;&lt;br /&gt;I am a JRuby developer. Since 2004 (or really since late 2005, when I started helping out in earnest) I&#39;ve been partially responsible for turning JRuby from an interesting novelty project into one of the top Ruby implementations. We&#39;ve become well known as one of the best-performing – if not the best-performing – Ruby implementations, even faced with increasing competition from the young upstarts. We&#39;re also increasingly popular (and perhaps the easiest path) for bringing Ruby and its many paradigm-shifting libraries and frameworks (like Rails) to Java and JVM users around the world – without them having to change platforms or leave any of their legacy code behind. Part of my interest in JRuby has been to bring Ruby to the JVM, plain and simple. I like Ruby, I like the Ruby community, and on most days I like the cockiness and enthusiasm of those community members toward trying crazy new things. But another large part of my interest in JRuby is more sinister: I want to prove to naysayers what a great platform the JVM actually is, and perhaps make them think twice about knee-jerk biases they&#39;ve carried and cultivated for so many years.&lt;br /&gt;&lt;br /&gt;You&#39;ll notice I refer to JRuby not as &quot;it&quot; or &quot;she&quot; or &quot;he&quot;, but as &quot;we&quot;. &quot;We&#39;ve become well known...We&#39;re also increasingly popular...&quot; That&#39;s not an accident. There&#39;s now over five years of my efforts in JRuby, and I consider it to be as much a part of me as I am a part of it. And so because of that, I have a much deeper, emotional investment in the platform upon which JRuby rests.&lt;br /&gt;&lt;br /&gt;I am a Java developer. I am an open-source developer. I am a JRuby developer and a Ruby fan.&lt;br /&gt;&lt;br /&gt;I am not a lawyer.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:x-large;&quot;&gt;The Facts, According to Me&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;These are the facts as I see them. You&#39;re free to disagree with my interpretation of the world, and I encourage you to do so in the comments, on other forums, over email, or to my face (but buy me a beer first).&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:large;&quot;&gt;On Java&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The Java platform is big. Really big. You just won&#39;t believe how vastly hugely mindbogglingly big it is. And by big, I mean it&#39;s everywhere.&lt;br /&gt;&lt;br /&gt;There are three mainstream JVMs people know about: JRockit (WebLogic&#39;s first and then Oracle&#39;s after it acquired them), Hotspot (Which came to Sun through an acquisition and eventually became OpenJDK), and J9 (IBM&#39;s own JVM, fully-licensed and with all its shots). Upon those three JVMs lives a gigantic world. If you want the details, there&#39;s numerous studies and reports about the use of Java in all manner of business, from the hippest new startups (Twitter recently switched much of their stack to the JVM) to the oldest of the old financial concerns. It&#39;s the favored choice for government server applications, the strongest not-quite-completely-Free managed runtime for open-source libraries and applications, and now with Android it&#39;s rapidly becoming one of the strongest (if not the strongest) mobile OS platform (even though Android isn&#39;t *really* Java, as I&#39;ll get into later). You may love or hate Java, but I guarantee it&#39;s part of your life in some way or another.&lt;br /&gt;&lt;br /&gt;There are a few open-source implementations of Java. The most well-known is OpenJDK, the Hotspot JVM that Sun relicensed under the GPL and set Free into the world. There&#39;s also Apache Harmony, whose class libraries form part of Dalvik&#39;s (Android&#39;s VM) Java-compatibility layer. There&#39;s GNU Classpath, a GPL-based implementation of the Java class libraries used for the ahead-of-time Java compiler GCJ. There&#39;s JamVM, which leverages Classpath to provide a very light, very minimal, (and very simple) JVM implementation. And there&#39;s others of varying qualities and relevance like IKVM (Java for .NET), VMKit (a Java compiler atop LLVM), and so on. OpenJDK is certainly the big daddy, though, and its release as GPL guarantees we&#39;ll at least have a solid Java 6 implementation forever.&lt;br /&gt;&lt;br /&gt;Java is not an entirely open platform, what with the now-obvious encumbrances of patents and copyrights (not to mention draconian policies toward Java&#39;s various specifications, which are often very slow to evolve due to the JCP quagmire). That&#39;s not a great state of affairs, and if nothing else you have to recognize that folks at Sun at least tried to release the platform from its shackles by chasing OpenJDK. But the process of &quot;freeing&quot; Java has been pretty rocky; OpenJDK itself took years to gain acceptance from OSS purists, and the choice of the GPL has meant that folks afeared of the GPL&#39;s &quot;viral&quot; side still had to look for other options (which is a large part of why Apache Harmony was used as part of the basis for Android).  Perhaps the biggest nail in the coffin is that Sun&#39;s Java test kit, the gold standard of whether an implementation is &quot;compliant&quot; or not, has never been released in open-source form, ultimately binding the hands of developers who wished to build a fully-compatible open-source Java.&lt;br /&gt;&lt;br /&gt;Java is not an entirely closed platform, either. OpenJDK was a huge step in the direction of Freeing Java, and the Java community in general has a very strong OSS ethos. There&#39;s no piece of Java software that isn&#39;t at least partially based on open-source componenents, and most Java library, framework, or application developers either initially or eventually open-source some or all of their works. Open-source development and the Java platform go hand-in-hand, and without that relationship the platform would not be where it is today. Contrast that to other popular environments like Microsoft&#39;s .NET – which has been admirably Freed through open standards, but which has not yet become synonymous with or popular for OSS development – or Apple&#39;s various platforms – which aren&#39;t based on open-standards *or* open-source, but which have managed to become many OSS developers&#39; environment of choice...for writing or consuming non-Apple open-source software. Among the corporation-controlled runtimes, the Java platform has more OSS in its blood than all others combined...many times more.&lt;br /&gt;&lt;br /&gt;Java is not perfect, but it&#39;s pretty darn good. Every platform has its warts. The Java platform represents a decade and a half of tradeoffs, and it&#39;s impossible in that amount of time to make everyone happy all the time. One of the big contentious items is the addition in Java 5 of parametric polymorphism as a compile-time trick without also adding VM-level support for reifying per-type specializations as .NET can do. But ask most Java developers if they&#39;d rather have nothing at all, and you&#39;ll get mixed responses. The sad, crippled version of generics in Java 5 doesn&#39;t do everything static-typing purists want, nor does it really extend to runtime at all (making reflective introspection almost impossible), but they do provide some nice surface-level sugar for Java developers. The same can be said of many Java &quot;features&quot; and tradeoffs. JavaEE became an abortively complicated jumble of mistakes (tradeoffs that went bad), but even upstarts that arguably made better decisions initially have themselves graduated into chaos (I believe the Spring framework has now grown even larger than the largest Java EE conglomerate, and Microsoft&#39;s periodically reboots their blessed-framework-of-the-week, resulting in an even more disruptive environment than a slow-moving, bulky standard like JavaEE). Designing good software is hard. Designing good *big* software is exponentially harder. Designing good *big* software that pleases everyone is impossible.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:large;&quot;&gt;Why People Hate Java&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Java is second only to Microsoft&#39;s platforms for being both wildly successful and almost universally hated by the self-sure software elite. The reasons for this are manifold and complex.&lt;br /&gt;&lt;br /&gt;First of all, back in the 90s Java started getting shoved down everyone&#39;s throat. Developers were increasingly told to investigate this new platform, since their managers and long-disconnected tech leads kept hearing how great it was from Sun Microsystems, then a big deal in server applications and hardware. So developers that were happily using other environments (many of which exist to this day) often found themselves forced to suck it up and become Java developers. Making matters worse, Java itself was designed to be a fairly limited language...or at least limited in how easily a developer could paint themselves into a corner. Many features those reluctant developers had become used to in other environments were explicitly rejected for Java on the grounds that they added too much complexity, too much confusion, and too little value to trenches developers. So people that were happily doing Perl or C++ or Smalltalk or what have you were suddenly forced into a little J-shaped box and forced to write all those same applications upon Java and the JVM at a time when both were still poorly-suited to those domains. Those folks have had a white-hot hate for anything relating to Java ever since, and many will stop at nothing to see the entire platform ejected into space.&lt;br /&gt;&lt;br /&gt;Second, as mentioned quickly above, Java in the 90s was simply not that great a platform. It had most of the current warts (classpath issues, VM limitations, poor system-level integration, a very limited language) on top of the fact that it was slow (optimizing JVMs didn&#39;t come around until the 2000s), marketed for highly-visible, highly-fickle application domains like desktop and browser-based applications (everyone&#39;s cursed a Java app or applet at some point in their life), and still largely driven and controlled by a single company (at a time when many developers were trying to get out from under Microsoft&#39;s thumb). It wasn&#39;t until Java 1.2 that we started to get a large and diverse update to Java&#39;s core libraries. Java 1.3 was the first release to ship Hotspot, which started to get the performance monkey off our backs. Java 1.5 brought the first major changes to the Java language, all designed to aid developers in expressing what they meant in standard ways (like using type-safe enums instead of static final ints, or generics for compiler-level assurances of collection homogeneity). And Java 6, the last major version, made great strides in improving startup time, overall performance, and manageability of JVM processes. Java 7, should it ever ship, will bring new changes to the Java language like support for closures and other syntactic sugar, better system-level integration features as found in NIO.2, and the feather in the cap: VM-level support for function objects and non-standard invocation sequences via Method Handles and InvokeDynamic. But unless you&#39;ve been a Java developer for the past decade, all you remember is the roaring 90s and the pain Java caused you as a developer or a user.&lt;br /&gt;&lt;br /&gt;Third, the Java language and environment has stagnated. Given years of declining fortunes at Sun Microsystems, disagreement among JCP members about the direction the platform should go, and a year of uncertainty triggered by Sun&#39;s collapse and rescue at the hands of Oracle, it&#39;s surprising anything&#39;s managed to get done at all. Java 7 is now many years overdue; they were talking about it when I joined Sun in 2006, and hoped to have preview releases within a year. For both technical and political reasons, it&#39;s taken a long time to bring the platform to the next level, and as a result many of the truly excellent improvements have remained on the shelf (much to my dismay...we really could use them in JRuby). For fast-moving technology hipsters, that&#39;s as good as dying on the vine; you need to shift paradigms on a regular schedule or you&#39;re yesterday&#39;s news.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;&lt;span style=&quot;font-style:italic;&quot;&gt;Update:&lt;/span&gt;&lt;/span&gt; At least one commenter also pointed out that it took a long time for Java to be &quot;everywhere&quot;, and even today most users still need to download and install it at least once on any newly-installed OS. Notable exceptions include Mac OS X, which ships a cracker-jack Java 6 based on Hotspot, and some flavors of Linux that come with some sort of Java installed out of the box. But this was definitely a very real problem; developers were being pushed to write apps and applets in Java, and users were forced to download a multi-megabyte installer just to run them...at a time when downloading multi-megabyte software was often a very painful ordeal. That would put a bad taste in anyone&#39;s mouth.&lt;br /&gt;&lt;br /&gt;It&#39;s because of these and similar reasons that folks like Google finally said &quot;enough is enough,&quot; and opted to start doing their own things. On the JRuby project, we&#39;ve routinely hacked around the limitations of the JVM, be they related to its piss-poor process management APIs, its cumbersome support for binding native libraries, or its stubborn reluctance to become the world&#39;s greatest dynamic language VM. I&#39;ve thought on numerous occasions how awesome it would be to spin off a company that took OpenJDK and made it &quot;right&quot; for the kinds of development people want to do today (and I&#39;d love to be a part of that company), but such ventures are both expensive and light on profitability. Nobody pays for platforms or runtimes...they pay for services around those platforms or runtimes, services which are often anathema to the developers of those platforms and runtimes. So it required someone &quot;bigger&quot; to make that happen...someone who could write off the costs of the platform by funding it in creative new ways. Someone with a massive existing investment in Java. Someone with deep pockets and an army of the best developers in the business who love nothing more than a challenge. Someone like Google.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:large;&quot;&gt;Why Android?&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;(Note that a lot of this is based on what information I&#39;ve managed to glean from various conversations. Clarifications or corrections are welcome.)&lt;br /&gt;&lt;br /&gt;There&#39;s an incredibly successful mobile Java platform out there. One that boasts millions of devices from almost all the major manufacturers, in form factors ranging from crappy mid-00s clamshells to high-end smartphones. A platform with hundreds or thousands of games and applications and freely-available development tools. That platform is Java ME.&lt;br /&gt;&lt;br /&gt;Java ME started out as an effort to bring Java back to its original roots: as a language and environment for writing embedded applications. The baseline ME profiles are pretty bare; I did some CLDC development years ago and had to implement my own buffered streams and various data structures just to get by. Even the biggest profiles are still fairly restricted, and I don&#39;t believe any of them have ever graduated beyond Java 1.3-level featuresets. So Sun did a great job of getting Java ME on devices, back when people cared about Sun...and then they let mobile Java stagnate to a terrible degree while they spent all resources trying to get people to use Java EE and trying to get Java EE to suck less. So while resources were getting poured into EE, people started to form the same opinions of mobile Java they had formed about desktop and server Java years earlier.&lt;br /&gt;&lt;br /&gt;At the same time, Java ME was one of the few Java-related technologies that brought in money. You see, in order for handset manufacturers to ship (and boast about) Java ME support, they had to license the technology from Sun. It wasn&#39;t a huge cash cow, but it was a cow nonetheless. Java ME actually made money for Sun. So in true Sun form, they loused it up terribly.&lt;br /&gt;&lt;br /&gt;Fast forward to a few years ago. Google, recognizing that mobile devices finally were becoming the next great technology market, decided that leaving the mobile world in the hands of proprietary platforms was a bad idea. Java ME seemed like it could be an answer, but Sun was starting to get desperate for both revenue and relevance...and they&#39;d started to back a completely new horse-that-would-be-cow called JavaFX, which they hoped to pimp as the next great development environment for in-browser and on-device apps alike. They weren&#39;t interested in making Java ME be what Google wanted it to be.&lt;br /&gt;&lt;br /&gt;Google decided to take the hard route: they&#39;d fund development of a new platform, building it entirely from open-source components, and leveraging two of the best platform technologies available: Linux, for the kernel, and Java, for the runtime environment. However there was a problem with Java: it was encumbered by all sorts of patents and copyrights and specifications and restrictions. Hell, even OpenJDK itself, the most complete and competitive OSS implementation of Java, could not be customized and shipped in binary-only form by hardware manufacturers and service providers due to it being GPL. So the answer was to build a new VM, use unencumbered versions of the core Java class libraries, and basically remake the world in a new, copyright and patent-free image. Android was born.&lt;br /&gt;&lt;br /&gt;There&#39;s many parts to Android, several of which I&#39;m not really qualified to talk about. But the application environment that runs atop the Dalvik VM needs some explanation.&lt;br /&gt;&lt;br /&gt;First, there&#39;s the VM. Dalvik is *not* a JVM. It doesn&#39;t run JVM bytecode, and you can&#39;t ship JVM bytecode expecting it to work on Dalvik. You must recompile it to Dalvik&#39;s own bytecode using one of the provided translation tools. This is similar to how IKVM gets Java code to run on .NET: you&#39;re not actually running a JVM, you&#39;re transforming your code into a different form so it will run on someone else&#39;s VM. So it bears repeating, lest anyone get confused: Dalvik is not a JVM...it just plays one on TV.&lt;br /&gt;&lt;br /&gt;Second, there&#39;s the core Java class libraries. Android supports a rough (but large) subset of the Java 1.5 class libraries. That subset is large enough that projects as complicated as JRuby can basically run unmodified on Android, with very few restrictions (a notable one is the fact that since we can&#39;t generate JVM bytecode, we can&#39;t reoptimize Ruby code at runtime right now). In order to do this without licensing Sun&#39;s class libraries (as most other mainstream Java runtimes like JRockit and J9 do), Google opted to go with the not-quite-complete-but-pretty-close Apache Harmony class libraries, which had for years been developed independent of Sun or OpenJDK but never really tested against the Java compatibility kits (and there&#39;s a long and storied history behind this situation).&lt;br /&gt;&lt;br /&gt;So by building their own non-JVM VM and using translated versions of non-Sun, non-encumbered class libraries, Google hoped to avoid (or at least blunt) the possibility that their &quot;unofficial&quot;, &quot;unlicensed&quot; mobile Java platform might face a legal test. In short, they hoped to build the open mobile Java platform developers wanted without the legal and financial encumbrances of Java ME.&lt;br /&gt;&lt;br /&gt;At first, they seemed to be on a gravy train with biscuit wheels.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:large;&quot;&gt;Splitting Up the Pie&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Sun Microsystems was not amused. A little over a year ago, when several Sun developers started to take an eager interest in Android, we were all told to back off. It wasn&#39;t yet clear whether Android stood on solid legal ground, and Sun execs didn&#39;t want egg on their face if a bunch of their own employees turned out to be supporting a platform they&#39;d eventually have to attack. Furthermore, it was an embarrassment to see Android drawing in the same developers Sun really, really wanted to look at JavaFX or PersonalJava or whatever the latest attempt to bring developers back might be. Android actually *was* a great platform that supported existing Java developers and libraries incredibly well (without actually being a Java environment), and for the first time there was a serious contender to &quot;standard&quot; Java that Sun had absolutely no control over.&lt;br /&gt;&lt;br /&gt;To make matters worse, handset manufacturers started to sign on in droves to this new non-Java ME platform, which meant all that technology licensing revenue was reaching a dead end. Nobody (including me) wanted to do Java ME development anymore. Everyone wanted to do Android development.&lt;br /&gt;&lt;br /&gt;Now we must say one thing to Sun&#39;s credit: they didn&#39;t do what Oracle is now attempting to do. As James Gosling blogged recently, patent litigation just wasn&#39;t in Sun&#39;s blood...even if there might have been legal ground to file suit. So while we Sun employees were still quietly discouraged from looking at or talking about Android, the rest of the world took Sun&#39;s silence as carte blanche to stuff Android into everything from phones to TVs, and mobile app developers started to think there might be hope for a real competitor to Apple&#39;s iPhone. Things might have proceeded in this way indefinitely, with Android continuing to grab market share (it recently passed iPhone in raw numbers with no slowing in sight) and mindshare (Android is far more approachable than almost any other mobile development environment, especially if you&#39;re one of the millions of developers who know Java.)&lt;br /&gt;&lt;br /&gt;And then it all started to go wrong.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:large;&quot;&gt;The Mantle of Java Passes to Oracle&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;If for nothing else, Jonathan Schwartz will be remembered as the man who broke open the Sun piñata, simultaneously releasing more open-source software than any company in history and killing Sun in the process. Either Jonathan had no &quot;step 2&quot; or the inertia of a company built on closed-source products was too great to overcome. In either case, by spring of 2009 Sun was hemorrhaging. Many reports claim that Jonathan had started shopping Sun around to possible buyers as early as 2008, but it wasn&#39;t until 2009 that the first candidates started lining up. Initially, it was IBM, hoping to gobble up its former competitor along with the IP, patents, and copyrights they carried. That deal ultimately went south when Sun refused to consider any deal that IBM wouldn&#39;t promise to carry to completion, even in the face of regulatory roadblocks sure to come up. Many of us breathed a sigh of relief; if there&#39;s any Java company even more firmly stuck in the old world than Sun, it&#39;s IBM...and we weren&#39;t looking forward to dealing with that.&lt;br /&gt;&lt;br /&gt;Once that deal fell through, folks like me became resigned to the fact that Sun was nearing the end of its independent life. Years of platform negligence, management incompetence, and resting on laurels had dug a hole far too deep for anyone to climb out of. Would it be Cisco, who had recently started building up an interesting new portfolio of application server hardware and virtualization software? What about VMWare, who had recently gobbled up Springsource and seemed to be making all the right moves toward a large-scale virtualized &quot;everything cloud.&quot; Or perhaps Oracle, a long-time partner to Sun, whose software was either Java-based or widely deployed on Sun hardware and operating systems. Dear god, please don&#39;t let it be Oracle.&lt;br /&gt;&lt;br /&gt;Don&#39;t get me wrong...Oracle&#39;s a highly successful company. They&#39;ve managed to turn almost every acquisition into gold while coaxing profitability out of just about every one of their divisions. But Oracle&#39;s not a developer-oriented company (like Sun)...it&#39;s a profit-oriented company (unlike Sun, sadly), and you need to either feed the bottom line or feed others in the company that do. So when it turned out that Oracle would gobble up Sun, many of us OSS folks started to get a little nervous.&lt;br /&gt;&lt;br /&gt;You see, many of us at Sun had been actively trying to change the perception of the platform from that of a corporate, enterprisey, closed world to that of a great VM with a great OSS ecosystem and an open-source reference implementation. Folks like Jonathan believed that by freeing Java we&#39;d free the platform, and both the platform and the developer community would be better for it. We were half right...the OpenJDK genie is out of the bottle, and there&#39;s basically no way to put it back now (and for that, the world owes Sun a great debt). But only part of the platform was Freed...the patents and copyrights surrounding Hotspot and Java itself remained in place, carefully tucked away in the vault of a company that just didn&#39;t mount patent or copyright-driven legal attacks.&lt;br /&gt;&lt;br /&gt;Oracle, now in control of those patents and copyrights, obviously has different plans.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:x-large;&quot;&gt;The Suit&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;So now, after spending 4000 words of your time, we come to the meat of the article: the actual Oracle v Google suit. The full text is provided various places online, though the &lt;a href=&quot;http://en.swpat.org/wiki/Oracle_v._Google_(2010,_USA)&quot;&gt;Software Patents Wiki&lt;/a&gt; has probably the best collection of related facts (though the wiki-driven discussions of the actual patents are woefully inaccurate).&lt;br /&gt;&lt;br /&gt;The suit largely comes down to a patent-infringement battle. Oracle claims that by developing and distributing Android, Google is in violation of seven patents. There&#39;s also an amorphous copyright claim without much backing information (&quot;Google probably stole something copyrighted so we&#39;ll list a bunch of stuff commonly stolen in that way&quot;), so we&#39;ll skip that one today.&lt;br /&gt;&lt;br /&gt;Before looking at the actual patents involved, I want to make one thing absolutely clear: Oracle has not already won this suit. Even after a couple days of analysis, nobody has any idea whether they *can* win such a suit, given that Google seems to have taken great pains to avoid legal entanglements when designing Android. So everybody needs to take a deep breath and let things progress as they should, and either trust that things will go the right direction or start doing your damndest to make sure they go the right direction.&lt;br /&gt;&lt;br /&gt;With that said, let&#39;s take a peek at the patents, one by one. And as always, the &quot;facts&quot; here are based on my reading of the patents and my understanding of the related systems.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;&lt;a href=&quot;http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&amp;amp;Sect2=HITOFF&amp;amp;p=1&amp;amp;u=/netahtml/PTO/search-bool.html&amp;amp;r=1&amp;amp;f=G&amp;amp;l=50&amp;amp;co1=AND&amp;amp;d=PTXT&amp;amp;s1=6,125,447.PN.&amp;amp;OS=PN/6,125,447&amp;amp;RS=PN/6,125,447&quot;&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:large;&quot;&gt;Protection Domains To Provide Security In A Computer System (6,125,447)&lt;/span&gt;&lt;/a&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:large;&quot;&gt; and &lt;/span&gt;&lt;a href=&quot;http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&amp;amp;Sect2=HITOFF&amp;amp;p=1&amp;amp;u=/netahtml/PTO/search-bool.html&amp;amp;r=1&amp;amp;f=G&amp;amp;l=50&amp;amp;co1=AND&amp;amp;d=PTXT&amp;amp;s1=6,192,476.PN.&amp;amp;OS=PN/6,192,476&amp;amp;RS=PN/6,192,476&quot;&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:large;&quot;&gt;Controlling Access To A Resource (6,192,476)&lt;/span&gt;&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The first two patents describe the Java Security Policy system, for controlling access to resources. One of the least-interesting but most-important aspects of the Java platform is its approach to security. Code under a specific classloader or thread can be forced to comply with a specific security policy by installing a security manager. These permissions control just about every aspect of the code&#39;s interaction with the JVM and with the host operating system: loading new code, reflectively accessing existing classes, accessing system-level resources like devices and filesystems, and so on. It&#39;s even easy for you to build up security policies of your own by checking for custom-named permissions and only granting them when appropriate. It&#39;s a pretty good system, and one of the reasons Java has a much stronger security track record than other runtimes that don&#39;t have pervasive security in mind from the beginning.&lt;br /&gt;&lt;br /&gt;In order to host applications written for the Java platform, and to sandbox them in a compatible way, Android necessarily had to support the same security mechanisms. The problem here is the same problem that plagues many patents: what boils down to a fairly simple and obvious way to solve a problem (associate pieces of code with sets of permissions, don&#39;t let that code do anything outside those permissions) becomes so far-reaching that almost any reasonable *implementation* of that idea would violate these patents. In this case the &#39;447 and &#39;476 patents do describe mechanisms for implementing Java security policies, but even that simple implementation is very vague and would be hard to avoid with even a clean-room implementation.&lt;br /&gt;&lt;br /&gt;Now I do not know exactly how Android implements security policies, but it&#39;s probably pretty close to what&#39;s described in these patents...since just about every implementation of security policies would be pretty close to what&#39;s described.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;&lt;a href=&quot;http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&amp;amp;Sect2=HITOFF&amp;amp;p=1&amp;amp;u=/netahtml/PTO/search-bool.html&amp;amp;r=1&amp;amp;f=G&amp;amp;l=50&amp;amp;co1=AND&amp;amp;d=PTXT&amp;amp;s1=5,966,702.PN.&amp;amp;OS=PN/5,966,702&amp;amp;RS=PN/5,966,702&quot;&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:large;&quot;&gt;Method And Apparatus For Preprocessing And Packaging Class Files (5,966,702)&lt;/span&gt;&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This is basically the patent governing the &quot;Pack200&quot; compression format provided as part of the JDK and used to better-compress class file archives.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;&lt;span style=&quot;font-style:italic;&quot;&gt;Update:&lt;/span&gt;&lt;/span&gt; Alex Blewitt has posted a &lt;a href=&quot;http://alblue.bandlem.com/2010/08/more-details-on-5966702-and-pack200.html&quot;&gt;discussion of the Pack200 specification&lt;/a&gt;. He says this patent isn&#39;t nearly as comprehensive, but that it may touch upon how Pack200 works. His post is a more complete treatment of the details of the class file format and how Pack200 improves compression ratios for class archives. It also occurs to me now that this patent could be related to mobile/embedded Java too, where better compression would obviously have an enormous savings.&lt;br /&gt;&lt;br /&gt;Java class files are filled with redundant data. For example, every class that contains code that calls PrintStream.println (as in System.out.println) contains the same &quot;constant pool&quot; entry identifying that method by name, a la &quot;java/io/PrintStream.println:(Ljava/lang/String;)V&quot;. Every field lookup, class reference, literal string, or method invocation will have some sort of entry in the constant pool. Pack200 takes advantage of this fact by compressing all class files as a single unit, batching duplicate data into one place so that the actual unique class data boils down to just the unique class, method, and code structure.&lt;br /&gt;&lt;br /&gt;The reason for having a separate compression format is because &quot;zip&quot; files, which includes Java&#39;s &quot;jar&quot; files, are notoriously bad at compressing many small files with redundant data. Because one of the features of the &quot;zip&quot; format is that you can easily pull a single file out, compressing all files together as a single unit prevents introducing any interdependencies between those files or a global table. This is a large part of why compression formats like &quot;tar.gz&quot; do a better job of compressing many small files: tar turns many files into one file, and gzip or bzip2 compress that one large file as a single unit (conversely, this is why you can&#39;t easily get a single file out of a tarball).&lt;br /&gt;&lt;br /&gt;On Android, this is accomplished in a similar way by the &quot;dex&quot; tool, which in the process of translating JVM bytecode into Dalvik bytecode also localizes all duplicate class data in a single place. The general technique is standard data compression theory, so presumably the novelty lies in applying decades-old compression theory specifically to Java classfile structure.&lt;br /&gt;&lt;br /&gt;If I&#39;ve lost you at this point, we can summarize it this way: part of Oracle&#39;s suit lies in a patent for a better compression mechanism for archives containing many class files that takes advantage of redundant data in those files.&lt;br /&gt;&lt;br /&gt;Are you laughing yet?&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;&lt;a href=&quot;http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&amp;amp;Sect2=HITOFF&amp;amp;p=1&amp;amp;u=/netahtml/PTO/search-bool.html&amp;amp;r=1&amp;amp;f=G&amp;amp;l=50&amp;amp;co1=AND&amp;amp;d=PTXT&amp;amp;s1=7,426,720.PN.&amp;amp;OS=PN/7,426,720&amp;amp;RS=PN/7,426,720&quot;&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:large;&quot;&gt;System And Method For Dynamic Preloading Of Classes Through Memory Space Cloning Of A Master Runtime System Process (7,426,720)&lt;/span&gt;&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I&#39;m not sure this patent ever saw the light of day in a mainstream JVM implementation. It describes a mechanism by which a master parent process could pre-load and pre-initialize code for a managed system, and then new processes that need to boot quickly would basically be memory-copied (plus copy-on-write friendly) &quot;forks&quot; of that master process, with the master maintaining overall control of those child processes through some sort of IPC.&lt;br /&gt;&lt;br /&gt;Ignore for the moment the obvious prior art of &quot;fork&quot; itself as applied to pre-initializing application state for many children. Anyone who&#39;s ever used fork to initialize a heavy process or runtime to avoid the cost of reinitializing children has either violated this patent (if done since 2003) or has a compelling case for prior art (if done before 2003).&lt;br /&gt;&lt;br /&gt;It&#39;s likely that this patent was formulated as an answer to the poor semantics of running many applications under the same JVM. Java servlets and later Java EE made it possible to consider deploying all of your company&#39;s applications in a single process, isolated by classloaders and security policies. What they never really addressed was the fact that code isn&#39;t the only thing you&#39;re sharing in this model; you&#39;re also sharing memory space, CPU time, and process resources like file descriptors. No amount of Java classloader or security trickery could make this a seamless multiapp environment, and so work like this patent hoped to find a lightweight way for all those child applications to actually live as their own processes.&lt;br /&gt;&lt;br /&gt;On Android, this manifests in the fact that each application runs independently, and they (like most operating systems) fork off from either the kernel process or some master process.&lt;br /&gt;&lt;br /&gt;In this case, Oracle&#39;s banking on being able to litigate with a patent for a very common application of &quot;fork&quot;.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;&lt;a href=&quot;http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&amp;amp;Sect2=HITOFF&amp;amp;p=1&amp;amp;u=/netahtml/PTO/search-bool.html&amp;amp;r=1&amp;amp;f=G&amp;amp;l=50&amp;amp;co1=AND&amp;amp;d=PTXT&amp;amp;s1=RE38,104.PN.&amp;amp;OS=PN/RE38,104&amp;amp;RS=PN/RE38,104&quot;&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:large;&quot;&gt;Method And Apparatus For Resolving Data References In Generated Code (RE38,104)&lt;/span&gt;&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This patent, invented by James Gosling himself, basically describes a mechanism by which symbolic data references in code (e.g. Java field references) can be resolved dynamically at runtime into actual direct memory accesses, eliminating the symbolic lookup overhead. It&#39;s part of standard JIT optimization techniques, and there&#39;s a lot of references in this patent many great JIT patents and papers of the past.&lt;br /&gt;&lt;br /&gt;Here there may actually be merit, or as much merit as can be found in a software patent to begin with. The patent itself is tiny, as most of these patents are. The techniques seem obvious to me, but perhaps they&#39;re obvious because this patent helped make them standard. I&#39;m not qualified to judge. What I can say is that I can&#39;t imagine a VM in existence that doesn&#39;t violate the spirit – if not the letter – of this patent as well. All systems with symbolic references will seek to eliminate the symbolic references in favor of direct access. The novelty of this patent may be in doing that translation on the fly...not even at a decidedly coarse-grained per-method level, but by rewriting code while the method is actually executing.&lt;br /&gt;&lt;br /&gt;I would guess that this is a patent filed during the development of Java&#39;s earlier JIT technologies, before systems like Hotspot came along to do a much better large-scale, cross-method job of optimization. It doesn&#39;t seem like it would be hard to debunk the novelty of the patent, or at least show prior art that makes it irrelevant.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;&lt;span style=&quot;font-style:italic;&quot;&gt;Update:&lt;/span&gt;&lt;/span&gt; I actually found a reference in the article &lt;a href=&quot;http://www.netmite.com/android/mydroid/dalvik/docs/dexopt.html&quot;&gt;Dalvik Optimization and Verification with dexopt&lt;/a&gt; to the technique described here (about 3/4 down the page, under &quot;Optimization&quot;):&lt;br /&gt;&lt;br /&gt;&quot;The Dalvik optimizer does the following: ... &lt;span style=&quot;font-weight:bold;&quot;&gt;For instance field get/put, replace the field index with a byte offset.&lt;/span&gt; ...&quot;&lt;br /&gt;&lt;br /&gt;But Dalvik still does this only once, before running the code (actually, at install time); not *while* running the code as described in the patent.&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;&lt;a href=&quot;http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&amp;amp;Sect2=HITOFF&amp;amp;p=1&amp;amp;u=/netahtml/PTO/search-bool.html&amp;amp;r=1&amp;amp;f=G&amp;amp;l=50&amp;amp;co1=AND&amp;amp;d=PTXT&amp;amp;s1=6,910,205.PN.&amp;amp;OS=PN/6,910,205&amp;amp;RS=PN/6,910,205&quot;&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:large;&quot;&gt;Interpreting Functions Utilizing A Hybrid Of Virtual And Native Machine Instructions (6,910,205)&lt;/span&gt;&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This patent, invented by Lars Bak of V8 fame, describes a mechanism for building a &quot;mixed mode&quot; VM that can execute interpreted code and compiled (presumably optimized) code in the same VM process, and flip between the two over time to produce better-optimized compiled code. This describes the basic underpinnings of VMs like Hotspot, which alternate between interpreting virtual machine code and executing real machine code even within the same thread of execution (and sometimes, even branching from virtual code to real code and back within the same method body). Any other VMs that are mixed mode would probably violate this patent, so its impact could reach much farther than Android. (In a sense, even JRuby might violate this patent, though our two mixed modes are both virtual instruction sets.)&lt;br /&gt;&lt;br /&gt;Now you might think the other mainstream JVMs would violate this patent, but they don&#39;t. Neither JRockit nor J9 have interpreters; they both go immediately to native code with various tiers of instrumentation to do the runtime profile data gathering. They iterative regenerate native code with successively more and better optimizations. Lars most recent VM, the V8 Javascript VM at the heart of Chrome, also goes straight to native code.&lt;br /&gt;&lt;br /&gt;Now here&#39;s where it gets weird: Up until Froyo (Android 2.2) Dalvik did a once-only compilation to native code before anything started executing, which means by definition that it was not mixed-mode. And even in Froyo, I believe it still does its initial execution in native code form with instrumentation to allow subsequent compiles to do a better job. Dalvik does not have an interpreter, Dalvik does not interpret Dalvik bytecode.&lt;br /&gt;&lt;br /&gt;Perhaps someone can explain how this patent even applies to Dalvik or Android?&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight: bold; &quot;&gt;&lt;span style=&quot;font-style: italic; &quot;&gt;Update:&lt;/span&gt;&lt;/span&gt; A couple commenters correct me here: Dalvik actually was 100% interpreted before Froyo, and is now a standard mixed-mode environment post-Froyo. So if this suit had been filed a year ago this patent might not have been applicable, but it probably is now.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;&lt;a href=&quot;http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&amp;amp;Sect2=HITOFF&amp;amp;p=1&amp;amp;u=/netahtml/PTO/search-bool.html&amp;amp;r=1&amp;amp;f=G&amp;amp;l=50&amp;amp;co1=AND&amp;amp;d=PTXT&amp;amp;s1=6,061,520.PN.&amp;amp;OS=PN/6,061,520&amp;amp;RS=PN/6,061,520&quot;&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:large;&quot;&gt;Method And System for Performing Static Initialization (6,061,520)&lt;/span&gt;&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Sigh. This patent appears to revolve completely around a mechanism by which the static initialization of arrays could be &quot;play executed&quot; in a preloader and then rewritten to do static initialization in one shot, or at least more efficiently than running dozens of class initializers that just construct arrays and populate them. Of all the patents, this is probably the narrowest, and the mechanism described are again not very unusual, but there&#39;s probably a good chance that the &quot;dex&quot; tool does something along these lines to tidy up static initializers in Android applications.&lt;br /&gt;&lt;br /&gt;Given the &quot;preloader&quot; aspect of this patent, I&#39;d surmise that it was formulated in part to simplify static initialization of code on embedded devices or in applet environments (because on servers...the boot time of static initialization is probably of little concern). Because of the much more limited nature of embedded environments (especially in 1998, when this patent was filed) it would be very beneficial to turn programmatic data initialization into a simple copy operation or a specialized virtual machine instruction. And this may be why it could apply to Android; it&#39;s another sort of embedded Java, with a preloader (either dex or the dexopt tool that jit-compiles your app on the device) and resource limitations that would warrant optimizing static initialization.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:large;&quot;&gt;So, Does the Suit Have Merit?&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I&#39;ll again reiterate that I&#39;m not a lawyer. I&#39;m just a Java developer with a logical mind and a penchant for debunking myths about the Java platform.&lt;br /&gt;&lt;br /&gt;The collection of patents specified by the suit seems pretty laughable to me. If I were Google, I wouldn&#39;t be particularly worried about showing prior art for the patents in question or demonstrating how Android/Dalvik don&#39;t actually violate them. Some, like the &quot;mixed mode&quot; patent, don&#39;t actually seem to apply at all. It feels very much like a bunch of Sun engineers got together in a room with a bunch of lawyers and started digging for patents that Google might have violated without actually knowing much about Android or Dalvik to begin with.&lt;br /&gt;&lt;br /&gt;But does the suit have merit? It depends if you consider baseless or over-general patents to have merit. The most substantial patent listed here is the &quot;mixed mode&quot; patent, and unless I&#39;m wrong that one doesn&#39;t  apply. The others are all variations on prior art, usually specialized for a Java runtime environment (and therefore with some question as to whether they can apply to a non-Java runtime environment that happens to have a translator from Java code). Having read through the suit and scanned the patents, I have to say I&#39;m not particularly worried. But then again, I don&#39;t know what sort of magic David Boies and company might be able to pull off.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:x-large;&quot;&gt;What Might Happen?&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;In the unlikely event of a total victory by Oracle, there&#39;s probably a lot of possible outcomes. I don&#39;t see the &quot;death of Java&quot; among them. There&#39;s also the possibility that Google could win a convincing victory. What might happen in each case?&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:large;&quot;&gt;The Nuclear Option&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The worst case scenario would be that Android is completely destroyed, all Android handsets are confiscated by the Oracle mafia and burned in the city square, and all hope for a Free and Open Java are forever laid to rest. Long live Mono.&lt;br /&gt;&lt;br /&gt;To understand why this won&#39;t happen we need to explore Oracle&#39;s possible motives.&lt;br /&gt;&lt;br /&gt;As I mentioned above, Java ME actually did bring licensing revenue to Sun. There&#39;s a lot of handset manufacturers, millions of handsets, and every one put a couple cents (or a couple bucks?) in Sun&#39;s pocket. In the heady high times of Java ME, it was the only managed mobile runtime in town, with sound and graphics and standard UI elements. It wasn&#39;t always pretty, but it worked well and it was really easy to write for.&lt;br /&gt;&lt;br /&gt;Now with Android rapidly becoming the preferred mobile and embedded Java, it&#39;s become apparent that there&#39;s no future for Java ME - or at least no future in the expanding &quot;smart&quot; consumer electronics business. Java ME lives on in Blackberries, some other low-end phones, in most Blu-Ray devices (BD-J is a standard for writing Java apps that run on Blu-Ray systems, utilizing one of the richer class libraries available for Java ME), and in some sub-micro devices like Ajile&#39;s AJ-200 Java-based multimedia CPU. If you want Java on a phone or in your TV, Android is taking that world by storm. That means Java ME licensing revenue is rapidly drying up.&lt;br /&gt;&lt;br /&gt;So why wouldn&#39;t Oracle want to take a bite of the rapidly-growing Android pie? Would they turn down a portion of that revenue and instead completely destroy a very popular and successful mobile Java, or would they just strongarm a few bucks out of Google and Android handset manufacturers? Remember we&#39;re talking about a profit-driven company here. Java ME is never going to come back to smartphones, that much is certain and I don&#39;t think even Oracle could argue it. There&#39;s no profit in filing this suit just to kill Android, since it would just mean competing mobile platforms like Windows Phone, RIM, Symbian, or iOS would just canibalize their younger brother. Instead of getting a slice of the fastest-growing segment of Java developers, you&#39;d kill off the entire segment and force those developers to non-Java, non-Oracle-friendly platforms.&lt;br /&gt;&lt;br /&gt;Oracle may be big and evil, but they&#39;re not stupid.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:large;&quot;&gt;Google Licensing Deal&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;A more likely outcome might be that Google would be forced to license the patents or pay royalties on Android revenue. I honestly believe this is the goal of this lawsuit; Oracle wants to get their foot into the door of the smartphone world, and they know they can&#39;t innovate enough to make up for the collapse of Java ME. So they&#39;re hoping that by sabre-rattling a few patents, Google will be forced (or scared) into sharing the harvest.&lt;br /&gt;&lt;br /&gt;Given the contents of the suit and the patents, I think this one is pretty unlikely too. Much of Android and Dalvik&#39;s designs are specifically crafted to avoid Java entanglements, and I think it&#39;s unlikely if this suit goes to trial that Oracle&#39;s lawyers would be able to make a convincing argument that the patents were both novel and that they were violated by Google. But let&#39;s not put anything past either the lawyers or the US federal court system, eh?&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:large;&quot;&gt;Nothing At All&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;There&#39;s a good chance that either Oracle or the court will realize quickly that the case has no merit, and drop all charges. I&#39;m obviously hoping for this one, but it&#39;s likely to take the longest of all. First, the court would need to gather all facts in the case, which could take months (especially given the highly technical nature of some of the compaints). Then there&#39;s the rebuttals of those facts, sorting out the wheat from the chaff, deciding there&#39;s not enough there to proceed, and either Oracle backs out or the court tosses the case. In the latter case, there&#39;s the possibility of appeals, and things could start to get very expensive.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:large;&quot;&gt;Total Collapse of Software Patents&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This is probably the one of the highest-profile cases involving software patents in recent years. The other would be Apple&#39;s recent suit against HTC for design elements of the iPhone. Several other bloggers and analysts have called out the possibility that this could lead to the death of software patents in general. I think that&#39;s a bit optimistic, but both Google *and* Oracle have come down officially against patents in the past (though perhaps Oracle&#39;s had a change of heart since acquiring Sun&#39;s portfolio).&lt;br /&gt;&lt;br /&gt;As much as I&#39;d like to see it happen, software patents probably won&#39;t be dead in the next year or two. But this might be a nail in the coffin.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:x-large;&quot;&gt;What Does This Mean for Java?&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Now we come to the biggest question of all: how does this suit affect the Java world, regardless of outcome?&lt;br /&gt;&lt;br /&gt;Well it&#39;s obviously not great to have two Java heavyweights bickering like schoolchildren, and it would be positively devastating if Android were obliterated because of this. But I think the real damage will be in how the developer community perceives Java, rather than in any lasting impact on the platform itself.&lt;br /&gt;&lt;br /&gt;Let&#39;s return to some of our facts. First off, nothing in this suit would apply to any of the three mainstream JVMs that 99% of the world&#39;s Java runs on. Hotspot and JRockit are both owned by Oracle, and J9 is subject to the Java specification&#39;s patent grant for compliant implementations. The lesson here is that Android is the first Java-like environment since Microsoft&#39;s J++ to attempt to unilaterally subset or superset the platform (with the difference in Android&#39;s case being that it doesn&#39;t claim to be a Java environment, and it may not actually need the patent grant). Other Java implementations that &quot;follow the Rules&quot; are in the clear, and so 99% of the world&#39;s use of Java is in the clear. Sorry, Java haters...this isn&#39;t your moment.&lt;br /&gt;&lt;br /&gt;This certainly does some damage to the notion of open-source Java implementations, but only those that are not (or can not be) compliant with the specification. As the Apache Harmony folks know all too well, it&#39;s really hard to build a clean-room implementation of Java and expect to get the &quot;spec compliance patent grant&quot; if you don&#39;t actually have the tools necessary to show spec compliance. Tossing the code over to Sun to run compliance testing is a nonstarter; the actual test kit is enormous and requires a huge time investment to set up and run (and Sun/Oracle have better things to do with their time than help out a competing OSS Java implementation). If the test kit had been open-sourced before Sun foundered, there would be no problem; everyone that wanted to make an open-source java would just aim for 100% compliance with the spec and all would be well. As it stands, independently implemented (i.e. non-OpenJDK) open-source Java is a really hard thing to create, especially if you have to clean-room implement all the class libraries yourself. Android has neatly dodged this issue by letting Android just be what it is: a subset of a Java-like platform that doesn&#39;t actually run Java bytecode and doesn&#39;t use any code from OpenJDK.&lt;br /&gt;&lt;br /&gt;How will it affect Android if this case drags on? It could certainly hurt Android&#39;s adoption by hardware manufacturers, but they&#39;re already getting such an oustanding deal on the platform that they might not even care. Android is the first platform that has the potential to unify all hardware profiles, freeing manufacturers from the drudgery of building their own OSes or licensing OSes from someone else. Hell, HTC rose from zero to Hero largely because of their backing of Android and shipping of Android devices. Are they going to back off from that platform now just because Oracle&#39;s throwing lawyerbombs at Google? Probably not.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:x-large;&quot;&gt;What Does This Mean For You?&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;If you&#39;re a non-Android Java developer...don&#39;t lose sleep over this. The details are going to take months to play out, and regardless of the outcome you&#39;re probably not going to be affected. Be happy, do great things, and keep making the Java platform a better place.&lt;br /&gt;&lt;br /&gt;If you&#39;re an Android developer...don&#39;t lose sleep over this. Even if things go the way of the &quot;Nuclear Option&quot;, you&#39;ve still got a lot of time to build and sell apps and improve yourself as a developer. For a bit of novelty, start considering what a migration path might look like and turn that into a nice Android-agnostic application layer, something that&#39;s largely lacking in the current Android APIs. Or explore Android development in languages like JRuby, which are based on off-platform ecosystems that will survive regardless of Android&#39;s fate. Whatever you do, don&#39;t panic and run for the hills, and don&#39;t tell your friends to panic.&lt;br /&gt;&lt;br /&gt;If you&#39;re mad as hell about this...I sympathize. I&#39;m personally going to do whatever I can to keep people informed and keep pushing Android, including but not limited to writing 8000-word essays with my moderately-educated analysis of the &quot;facts&quot;. I welcome your help in that fight, and I think it&#39;s a damn good time for people that want an open Java and an open mobile platform to show their quality by standing up and letting the world know we&#39;re here.&lt;br /&gt;&lt;br /&gt;&quot;All that is necessary for the triumph of evil is for good men to do nothing.&quot;&lt;br /&gt;&lt;br /&gt;Do something, and we&#39;ll get through this together.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;&lt;span class=&quot;Apple-style-span&quot;  style=&quot;font-size:x-large;&quot;&gt;Footnote: Java Copyrights&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I&#39;d love for someone versed in copyright law to provide a brief analysis of how the Java copyrights described (vaguely) in the lawsuit might play out in Android. Java is certainly not ignored as a concept in Android docs, tools, and libraries, but it&#39;s unclear to me whether those copyrights amount to something enforceable when it comes to Android or Dalvik.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;&lt;span style=&quot;font-style:italic;&quot;&gt;Update:&lt;/span&gt;&lt;/span&gt; &quot;Crazy&quot; Bob Lee emailed me to clear up a few facts. First off, Android and OpenJDK first came out around roughly the same time, so there was never really time to consider using OpenJDK&#39;s GPL&#39;ed class libraries in Android. Bob also claims that Dalvik&#39;s design decisions were all technical and not made to circumvent IP, but it seems impossible to me that IP, patent, and licensing issues didn&#39;t have *some* influence on those decisions. He goes on to say that Android relies on process separation to sandbox applications, rather than leveraging Java security policies (or similar mechanisms (which Bob insists are badly designed anyway, and I might agree). Finally, he believes that in the worst case scenario, Dalvik would probably only require minor modifications to address the complaints in this suit. The &quot;nuclear option&quot; is, according to Bob, out of the realm of possibility.&lt;br /&gt;&lt;br /&gt;Thanks for the clarifications, Bob!</content><link rel='replies' type='application/atom+xml' href='http://blog.headius.com/feeds/4344974410425965514/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://blog.headius.com/2010/08/my-thoughts-on-oracle-v-google.html#comment-form' title='74 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/4344974410425965514'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/4344974410425965514'/><link rel='alternate' type='text/html' href='http://blog.headius.com/2010/08/my-thoughts-on-oracle-v-google.html' title='My Thoughts on Oracle v Google'/><author><name>Charles Nutter</name><uri>https://plus.google.com/101599370339210456684</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh5.googleusercontent.com/-VT5b8KsgHgQ/AAAAAAAAAAI/AAAAAAAAFB0/9d5SU9AcHNQ/s512-c/photo.jpg'/></author><thr:total>74</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4704664917418794835.post-5562710616240372388</id><published>2010-07-19T14:16:00.000-07:00</published><updated>2011-01-25T21:44:30.319-08:00</updated><title type='text'>What JRuby C Extension Support Means to You</title><content type='html'>As part of the Ruby Summer of Code, &lt;a href=&quot;http://twitter.com/timfelgentreff&quot;&gt;Tim Felgentreff&lt;/a&gt; has been building out C extension support for JRuby. He&#39;s already made great progress, with simple libraries like Thin and Mongrel working now and larger libraries like RMagick and Yajl starting to function. And we haven&#39;t even reached the mid-term evaluation yet. I&#39;d say he gets an &quot;A&quot; so far.&lt;br /&gt;&lt;br /&gt;I figured it was time I talked a bit about C extensions, what they mean (or don&#39;t mean) for JRuby, and how you can help.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;The Promise of C Extensions&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;One of the &quot;last mile&quot; features keeping people from migrating to JRuby has been their dependence on C extensions that only work on regular Ruby. In some cases, these extensions have been written to improve performance, like the various json libraries. Some of that performance could be less of a concern under Ruby 1.9, but it&#39;s hard to claim that any implementation will be able to run Ruby as fast as C for general-purpose libraries any time soon.&lt;br /&gt;&lt;br /&gt;However, a large number of extensions – perhaps a majority of extensions – exist only to wrap a well-known and well-trusted C library. Nokogiri, for example, wraps the excellent libxml. RMagick wraps ImageMagick. For these cases, there&#39;s no alternative on regular Ruby...it&#39;s the C library or nothing (or in the case of Nokogiri, your alternatives are only slow and buggy pure-Ruby XML libraries).&lt;br /&gt;&lt;br /&gt;For the performance case, C extensions on JRuby don&#39;t mean a whole lot. In most cases, it would be easier and just as performant to write that code in Java, and many pure-Ruby libraries perform well enough to reduce the need for native code. In addition, there are often libraries that already do what the perf-driven extensions were written for, and it&#39;s trivial to just call those libraries directly from Ruby code.&lt;br /&gt;&lt;br /&gt;But the library case is a bit stickier. Nokogiri does have an FFI version, but it&#39;s a maintenance headache for them and a bug report headache for us, due to the lack of a C compiler tying the two halves together. There&#39;s a pure-Java Nokogiri in progress, but building both the Ruby bindings and emulating libxml behavior takes a long time to get right. For libraries like RMagick or the native MySQL and SQLite drivers, there are basically no options on the JVM. The Google Summer of Code project RMagick4J, by Sergio Arbeo, was a monumental effort that still has a lot of work left to be done. JDBC libraries work for databases, but they provide a very different interface from the native drivers and don&#39;t support things like UNIX domain sockets.&lt;br /&gt;&lt;br /&gt;There&#39;s a very good chance that JRuby C extension support won&#39;t perform as well as C extensions on C Ruby, but in many cases that won&#39;t matter. Where there&#39;s no equivalent library now, having something that&#39;s only 5-10x slower to call – but still runs fast and matches API – may be just fine. Think about the coarse-grained operations you feed to a MySQL or SQLite and you get the picture.&lt;br /&gt;&lt;br /&gt;So ultimately, I think C extensions will be a good thing for JRuby, even if they only serve as a stopgap measure to help people migrate small applications over to native Java equivalents. Why should the end goal be native Java equivalents, you ask?&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;The Peril of C Extensions&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Now that we&#39;re done with the happy, glowing discussion of how great C extension support will be, I can make a confession: I hate C extensions. No feature of C Ruby has done more to hold it back than the desire for backward compatibility with C extensions. Because they have direct pointer access, there&#39;s no easy way to build a better garbage collector or easily support multiple runtimes in the same VM, even though various research efforts have tried. I&#39;ve talked with Koichi Sasada, the creator of Ruby 1.9&#39;s &quot;YARV&quot; VM, and there&#39;s many things he would have liked to do with YARV that he couldn&#39;t because of C extension backward compatibility.&lt;br /&gt;&lt;br /&gt;For JRuby, supporting C extensions will limit many features that make JRuby compelling in the first place. For example, because C extensions often use a lot of global variables, you can&#39;t use them from multiple JRuby runtimes in the same process. Because they expect a Ruby-like threading model, we need to restrict concurrency when calling out from Java to C. And all the great memory tooling I&#39;ve blogged about recently won&#39;t see C extensions or the libraries they call, so it introduces an unknown.&lt;br /&gt;&lt;br /&gt;All that said, I think it&#39;s a good milestone to show that we can support C extensions, and it may make for a &quot;better JNI&quot; for people who really just want to write C or who simply need to wrap a native library.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;How You Can Help&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;There&#39;s a few things I think users like you can help with.&lt;br /&gt;&lt;br /&gt;First off, we&#39;d love to know what extensions you are using today, so we can explore what it would take to run them under JRuby (and so we can start exploring pure-Java alternatives, too.) Post your list in the comments, and we&#39;ll see what we can come up with.&lt;br /&gt;&lt;br /&gt;Second, anyone that knows C and the Ruby C API (like folks who work on extensions) could help us fill out bits and pieces that are missing. Set up the JRuby cext branch (I&#39;ll show you how in a moment), and try to get your extensions to build and load. Tim has already done the heavy lifting of making &quot;gem install xyz&quot; attempt to build the extension and &quot;require &#39;xyz&#39;&quot; try to load the resulting native library, so you can follow the usual processes (including extconf.rb/mkmf.rb for non-gem building and testing.) If it doesn&#39;t build ok, help us figure out what&#39;s missing or incorrect. If it builds but doesn&#39;t run, help us figure out what it&#39;s doing incorrectly.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;Building JRuby with C Extension Support&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Like building JRuby proper, building the cext work is probably the easiest thing you&#39;ll do all day (assuming the C compiler/build/toolchain doesn&#39;t bite you.&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Check out (or fork and check out) the JRuby repository from &lt;a href=&quot;http://github.com/jruby/jruby&quot;&gt;http://github.com/jruby/jruby&lt;/a&gt;:&lt;br /&gt;&lt;pre&gt;git clone git://github.com/jruby/jruby.git&lt;/pre&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Switch to the &quot;cext&quot; branch:&lt;br /&gt;&lt;pre&gt;git checkout -b cext origin/cext&lt;/pre&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Do a clean build of JRuby plus the cext subsystem:&lt;br /&gt;&lt;pre&gt;ant clean build-jruby-cext-native&lt;/pre&gt;&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;At this point you should have a JRuby build (run with bin/jruby) that can gem install and load native extensions.</content><link rel='replies' type='application/atom+xml' href='http://blog.headius.com/feeds/5562710616240372388/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://blog.headius.com/2010/07/what-jruby-c-extension-support-means-to.html#comment-form' title='14 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/5562710616240372388'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/5562710616240372388'/><link rel='alternate' type='text/html' href='http://blog.headius.com/2010/07/what-jruby-c-extension-support-means-to.html' title='What JRuby C Extension Support Means to You'/><author><name>Charles Nutter</name><uri>https://plus.google.com/101599370339210456684</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh5.googleusercontent.com/-VT5b8KsgHgQ/AAAAAAAAAAI/AAAAAAAAFB0/9d5SU9AcHNQ/s512-c/photo.jpg'/></author><thr:total>14</thr:total></entry><entry><id>tag:blogger.com,1999:blog-4704664917418794835.post-6351925351093074122</id><published>2010-07-17T14:58:00.000-07:00</published><updated>2011-01-25T21:44:30.373-08:00</updated><title type='text'>Browsing Memory with Ruby and Java Debug Interface</title><content type='html'>This is the third post in a series. The first two were on &lt;a href=&quot;http://blog.headius.com/2010/07/browsing-memory-jruby-way.html&quot;&gt;Browsing Memory the JRuby Way&lt;/a&gt; and &lt;a href=&quot;http://blog.headius.com/2010/07/finding-leaks-in-ruby-apps-with-eclipse.html&quot;&gt;Finding Leaks in Ruby Apps with Eclipse Memory Analyzer&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Hello again, friends! I&#39;m back with more exciting memory analysis tips and tricks! Ready? Here we go!&lt;br /&gt;&lt;br /&gt;After my previous two posts, several folks asked if it&#39;s possible to do all this stuff from Ruby, rather than using Java or C-based apps shipped with the JVM. The answer is yes! Because of the maturity of the Java platform, there are standard Java APIs you can use to access all the same information the previous tools consumed. And since we&#39;re talking about JRuby, that means you have Ruby APIs you can use to access that information.&lt;br /&gt;&lt;br /&gt;That&#39;s what I&#39;m going to show you today.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;Introducing JDI&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The APIs we&#39;ll be using are part of the &lt;a href=&quot;http://download.oracle.com/docs/cd/E17409_01/javase/6/docs/jdk/api/jpda/jdi/index.html&quot;&gt;Java Debug Interface (JDI)&lt;/a&gt;, a set of Java APIs for remotely inspecting a running application. It&#39;s part of the &lt;a href=&quot;http://download.oracle.com/docs/cd/E17409_01/javase/6/docs/technotes/guides/jpda/&quot;&gt;Java Platform Debugger Architecture&lt;/a&gt;, which also includes a &lt;a href=&quot;http://download.oracle.com/docs/cd/E17409_01/javase/6/docs/technotes/guides/jvmti/index.html&quot;&gt;C/++ API&lt;/a&gt;, a &lt;a href=&quot;http://download.oracle.com/docs/cd/E17409_01/javase/6/docs/technotes/guides/jpda/jdwp-spec.html&quot;&gt;wire protocol&lt;/a&gt;, and a raw &lt;a href=&quot;http://download.oracle.com/docs/cd/E17409_01/javase/6/docs/technotes/guides/jpda/jdwpTransport.html&quot;&gt;wire protocol API&lt;/a&gt;. Exploring those is left as an exercise for the reader...but they&#39;re also pretty cool.&lt;br /&gt;&lt;br /&gt;We&#39;ll use the Rails app from before, inspecting it immediately after boot. JDI provides a number of ways to connect up to a running VM, using &lt;a href=&quot;http://download.oracle.com/docs/cd/E17409_01/javase/6/docs/jdk/api/jpda/jdi/com/sun/jdi/VirtualMachineManager.html&quot;&gt;VirtualMachineManager&lt;/a&gt;; you can either have the debugger make the connection or the target VM make the connection, and optionally have the target VM launch the debugger or the debugger launch the target VM. For our example, we&#39;ll have the debugger attach to a target VM listening for connections.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;Preparing the Target VM&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The first step is to start up the application with the appropriate debugger endpoint installed. This new flag is a bit of a mouthful (and we should make a standard flag for JRuby users), but we&#39;re simply setting up a socket-based listener on port 12345, running as a server, and we don&#39;t want to suspend the JVM when the debugger connects.&lt;br /&gt;&lt;pre&gt;jruby -J-agentlib:jdwp=transport=dt_socket,server=y,address=12345,suspend=n -J-Djruby.reify.classes=true script/server -e production&lt;/pre&gt;&lt;br /&gt;The -J-Djruby.reify.classes bit I talked about in my first post. It makes Ruby classes show up as Java classes for purposes of heap inspection.&lt;br /&gt;&lt;br /&gt;The rest is just running the server in production mode.&lt;br /&gt;&lt;br /&gt;As you can see, remote debugging is already baked into the JVM, which means we didn&#39;t have to write it or debug it. And that&#39;s pretty awesome.&lt;br /&gt;&lt;br /&gt;Let&#39;s connect to our Rails process and see what we can do.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;Connecting to the target VM&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;In order to connect to the target VM, you need to do the Java factory dance. We start with the &lt;a href=&quot;http://download.oracle.com/docs/cd/E17409_01/javase/6/docs/jdk/api/jpda/jdi/com/sun/jdi/Bootstrap.html&quot;&gt;com.sun.jdi.Bootstrap class&lt;/a&gt;, get a &lt;a href=&quot;http://download.oracle.com/docs/cd/E17409_01/javase/6/docs/jdk/api/jpda/jdi/com/sun/jdi/VirtualMachineManager.html&quot;&gt;com.sun.jdi.VirtualMachineManager&lt;/a&gt;, and then connect to a target VM to get a &lt;a href=&quot;http://download.oracle.com/docs/cd/E17409_01/javase/6/docs/jdk/api/jpda/jdi/com/sun/jdi/VirtualMachine.html&quot;&gt;com.sun.jdi.VirtualMachine&lt;/a&gt; object.&lt;br /&gt;&lt;pre&gt;vmm = Bootstrap.virtual_machine_manager&lt;br /&gt;sock_conn = vmm.attaching_connectors[0] # not guaranteed to be Socket&lt;br /&gt;args = sock_conn.default_arguments&lt;br /&gt;args[&#39;hostname].value = &quot;localhost&quot;&lt;br /&gt;args[&#39;port&#39;].value = &quot;12345&quot;&lt;br /&gt;vm = sock_conn.attach(args)&lt;/pre&gt;&lt;br /&gt;Notice that I didn&#39;t dig out the socket connector explicitly here, because on my system, the first connector always appears to be the socket connector. Here&#39;s the full list for me on OS X:&lt;br /&gt;&lt;pre&gt;➔ jruby -rjava -e &quot;puts com.sun.jdi.Bootstrap.virtual_machine_manager.attaching_connectors&lt;br /&gt;&gt; &quot;&lt;br /&gt;[com.sun.jdi.SocketAttach (defaults: timeout=, hostname=charles-nutters-macbook-pro.local, port=),&lt;br /&gt;com.sun.jdi.ProcessAttach (defaults: pid=, timeout=)]&lt;/pre&gt;&lt;br /&gt;The ProcessAttach connector there isn&#39;t as magical as it looks; all it does is query the target process to find out what transport it&#39;s using (dt_socket in our case) and then calls the right connector (e.g. SocketAttach in the case of dt_socket or SharedMemoryAttach if you use dt_shmem on Windows). In our case, we know it&#39;s listening on a socket, so we&#39;re using the SocketAttach connector directly.&lt;br /&gt;&lt;br /&gt;The rest is pretty simple: we get the default arguments from the connector, twiddle them to have the right hostname and port number, and attach to the VM. Now we have a VirtualMachine object we can query and twiddle; we&#39;re inside the matrix.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;With Great Power...&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;So, what can we do with this VirtualMachine object? We can:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;walk all classes and objects on the heap&lt;/li&gt;&lt;li&gt;install breakpoints and step-debug any running code&lt;/li&gt;&lt;li&gt;inspect and modify the current state of any running thread, even manipulating in-flight arguments and variables&lt;/li&gt;&lt;li&gt;replace already-loaded classes with new definitions (such as to install custom instrumentation)&lt;/li&gt;&lt;/ul&gt;Here&#39;s the output from JRuby&#39;s ri command when we ask about VirtualMachine:&lt;br /&gt;&lt;pre&gt;➔ ri --java com.sun.jdi.VirtualMachine&lt;br /&gt;-------------------------------------- Class: com.sun.jdi.VirtualMachine&lt;br /&gt;     (no description...)&lt;br /&gt;------------------------------------------------------------------------&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Instance methods:&lt;br /&gt;-----------------&lt;br /&gt;     allClasses, allThreads, canAddMethod, canBeModified,&lt;br /&gt;     canForceEarlyReturn, canGetBytecodes, canGetClassFileVersion,&lt;br /&gt;     canGetConstantPool, canGetCurrentContendedMonitor,&lt;br /&gt;     canGetInstanceInfo, canGetMethodReturnValues,&lt;br /&gt;     canGetMonitorFrameInfo, canGetMonitorInfo, canGetOwnedMonitorInfo,&lt;br /&gt;     canGetSourceDebugExtension, canGetSyntheticAttribute, canPopFrames,&lt;br /&gt;     canRedefineClasses, canRequestMonitorEvents,&lt;br /&gt;     canRequestVMDeathEvent, canUnrestrictedlyRedefineClasses,&lt;br /&gt;     canUseInstanceFilters, canUseSourceNameFilters,&lt;br /&gt;     canWatchFieldAccess, canWatchFieldModification, classesByName,&lt;br /&gt;     description, dispose, eventQueue, eventRequestManager, exit,&lt;br /&gt;     getDefaultStratum, instanceCounts, mirrorOf, mirrorOfVoid, name,&lt;br /&gt;     process, redefineClasses, resume, setDebugTraceMode,&lt;br /&gt;     setDefaultStratum, suspend, toString, topLevelThreadGroups,&lt;br /&gt;     version, virtualMachine&lt;/pre&gt;&lt;br /&gt;We can basically make the target VM dance any way we want, even going so far as to write our own debugger entirely in Ruby code. But that&#39;s a topic for another day. Right now, we&#39;re going to do some memory inspection.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Creating a Histogram of the Heap&lt;/b&gt;&lt;br /&gt;The simplest heap inspection we might do is to produce a histogram of all objects on the heap. And as you might expect, this is one of the easiest things to do, because it&#39;s the first thing everyone looks for when debugging a memory issue.&lt;br /&gt;&lt;pre&gt;classes = VM.all_classes&lt;br /&gt;counts = VM.instance_counts(classes)&lt;br /&gt;classes.zip(counts)&lt;/pre&gt;&lt;br /&gt;VirtualMachine.all_classes gives you a list (a java.util.List, but we make those behave mostly like a Ruby Array) of every class the JVM has loaded, including Ruby classes, JRuby core and runtime classes, and other Java classes that JRuby and the JVM use. VirtualMachine.instance_counts takes that list of classes and returns another list of instance counts. Zip the two together, and we have an array of classes and instance counts. So easy!&lt;br /&gt;&lt;br /&gt;Let&#39;s take these two pieces and put them together in an easy-to-use class&lt;br /&gt;&lt;pre&gt;require &#39;java&#39;&lt;br /&gt;&lt;br /&gt;module JRuby&lt;br /&gt;  class Debugger&lt;br /&gt;    VMM = com.sun.jdi.Bootstrap.virtual_machine_manager&lt;br /&gt;    &lt;br /&gt;    attr_accessor :vm&lt;br /&gt;    &lt;br /&gt;    def initialize(options = {})&lt;br /&gt;      connectors = VMM.attaching_connectors&lt;br /&gt;      if options[:port]&lt;br /&gt;        connector = connectors.find {|ac| ac.name =~ /Socket/}&lt;br /&gt;      elsif options[:pid]&lt;br /&gt;        connector = connectors.find {|ac| ac.name =~ /Process/}&lt;br /&gt;      end&lt;br /&gt;&lt;br /&gt;      args = connector.default_arguments&lt;br /&gt;      for k, v in options&lt;br /&gt;        args[k.to_s].value = v.to_s&lt;br /&gt;      end&lt;br /&gt;      &lt;br /&gt;      @vm = connector.attach(args)&lt;br /&gt;    end&lt;br /&gt;&lt;br /&gt;    # Generate a histogram of all classes in the system&lt;br /&gt;    def histogram&lt;br /&gt;      classes = @vm.all_classes&lt;br /&gt;      counts = @vm.instance_counts(classes)&lt;br /&gt;      classes.zip(counts)&lt;br /&gt;    end&lt;br /&gt;  end&lt;br /&gt;end&lt;/pre&gt;&lt;br /&gt;I&#39;ve taken the liberty of expanding the connection process to handle pids and other arguments passed in. So to get a histogram from a VM listening on localhost port 12345, we can simply do:&lt;br /&gt;&lt;pre&gt;JRuby::Debugger.new(:hostname =&gt; &#39;localhost&#39;, :port =&gt; 12345).histogram&lt;/pre&gt;&lt;br /&gt;Now of course this list is going to have a lot of JRuby and Java objects that we might not be interested in, so we&#39;ll want to filter it to just the Ruby classes. On JRuby master, all the generated Ruby classes start with a package name &quot;ruby&quot;. Unfortunately, jitted Ruby methods start with a package of &quot;ruby.jit&quot; right now, so we&#39;ll want to filter those out too (unless you&#39;re interested in them, of course...JRuby is an open book!)&lt;br /&gt;&lt;pre&gt;require &#39;jruby_debugger&#39;&lt;br /&gt;&lt;br /&gt;# connect to the VM&lt;br /&gt;debugr = JRuby::Debugger.new(:hostname =&gt; &#39;localhost&#39;, :port =&gt; 12345)&lt;br /&gt;histo = debugr.histogram&lt;br /&gt;# sort by count&lt;br /&gt;histo.sort! {|a,b| b[1] &lt;=&gt; a[1]}&lt;br /&gt;# filter to only user-created Ruby classes with &gt;0 instances&lt;br /&gt;histo.each do |cls,num|&lt;br /&gt;  next if num == 0 || cls.name[0..4] != &#39;ruby.&#39; || cls.name[5..7] == &#39;jit&#39;&lt;br /&gt;  puts &quot;#{num} instances of #{cls.name[5..-1].gsub(&#39;.&#39;, &#39;::&#39;)}&quot;&lt;br /&gt;end&lt;/pre&gt;&lt;br /&gt;If we run this short script against our Rails application, we see similar results to the previous posts (but it&#39;s cooler, because we&#39;re doing it all from Ruby!)&lt;br /&gt;&lt;pre&gt;➔ jruby ruby_histogram.rb | head -10&lt;br /&gt;11685 instances of TZInfo::TimezoneTransitionInfo&lt;br /&gt;1071 instances of Gem::Version&lt;br /&gt;1012 instances of Gem::Requirement&lt;br /&gt;592 instances of TZInfo::TimezoneOffsetInfo&lt;br /&gt;432 instances of Gem::Dependency&lt;br /&gt;289 instances of Gem::Specification&lt;br /&gt;142 instances of ActiveSupport::TimeZone&lt;br /&gt;118 instances of TZInfo::DataTimezoneInfo&lt;br /&gt;118 instances of TZInfo::DataTimezone&lt;br /&gt;45 instances of Gem::Platform&lt;/pre&gt;&lt;br /&gt;Just so we&#39;re all on the same page, it&#39;s important to know what we&#39;re actually dealing with here. VirtualMachine.all_classes returns a list of &lt;a href=&quot;http://download.oracle.com/docs/cd/E17409_01/javase/6/docs/jdk/api/jpda/jdi/com/sun/jdi/ReferenceType.html&quot;&gt;com.sun.jdi.ReferenceType&lt;/a&gt; objects. Let&#39;s ri that.&lt;br /&gt;&lt;pre&gt;➔ ri --java com.sun.jdi.ReferenceType&lt;br /&gt;--------------------------------------- Class: com.sun.jdi.ReferenceType&lt;br /&gt;     (no description...)&lt;br /&gt;------------------------------------------------------------------------&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Instance methods:&lt;br /&gt;-----------------&lt;br /&gt;     allFields, allLineLocations, allMethods, availableStrata,&lt;br /&gt;     classLoader, classObject, compareTo, constantPool,&lt;br /&gt;     constantPoolCount, defaultStratum, equals, failedToInitialize,&lt;br /&gt;     fieldByName, fields, genericSignature, getValue, getValues,&lt;br /&gt;     hashCode, instances, isAbstract, isFinal, isInitialized,&lt;br /&gt;     isPackagePrivate, isPrepared, isPrivate, isProtected, isPublic,&lt;br /&gt;     isStatic, isVerified, locationsOfLine, majorVersion, methods,&lt;br /&gt;     methodsByName, minorVersion, modifiers, name, nestedTypes,&lt;br /&gt;     signature, sourceDebugExtension, sourceName, sourceNames,&lt;br /&gt;     sourcePaths, toString, virtualMachine, visibleFields,&lt;br /&gt;     visibleMethods&lt;/pre&gt;&lt;br /&gt;You can see there&#39;s quite a bit more you can do with a ReferenceType. Let&#39;s try something.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;Digging Deeper Into TimezoneTransitionInfo&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Let&#39;s actually take some time to explore our old friend TimezoneTransitionInfo (hereafter referred to as TTI). Instead of walking all classes in the system, we&#39;ll want to just grab TTI directly. For that we use VirtualMachine.classes_by_name, which returns a list of classes on the target VM of that name. There should be only one, since we only have a single JRuby instance in our server, so we&#39;ll grab that class and request exactly one instance of it...any old instance.&lt;br /&gt;&lt;pre&gt;tti_class = debugr.vm.classes_by_name(&#39;ruby.TZInfo.TimezoneTransitionInfo&#39;)[0]&lt;br /&gt;tti_obj = tti_class.instances(1)[0]&lt;br /&gt;puts tti_obj&lt;/pre&gt;&lt;br /&gt;Running this we can see we&#39;ve got the reference we&#39;re looking for.&lt;br /&gt;&lt;pre&gt;➔ jruby tti_digger.rb&lt;br /&gt;instance of ruby.TZInfo.TimezoneTransitionInfo(id=2)&lt;/pre&gt;&lt;br /&gt;ReferenceType.instances returns a list (no larger than the specified size, or all instances if you specify 0) of &lt;a href=&quot;http://download.oracle.com/docs/cd/E17409_01/javase/6/docs/jdk/api/jpda/jdi/com/sun/jdi/ObjectReference.html&quot;&gt;com.sun.jdi.ObjectReference&lt;/a&gt; objects.&lt;br /&gt;&lt;pre&gt;➔ ri --java com.sun.jdi.ObjectReference&lt;br /&gt;------------------------------------- Class: com.sun.jdi.ObjectReference&lt;br /&gt;     (no description...)&lt;br /&gt;------------------------------------------------------------------------&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Instance methods:&lt;br /&gt;-----------------&lt;br /&gt;     disableCollection, enableCollection, entryCount, equals, getValue,&lt;br /&gt;     getValues, hashCode, invokeMethod, isCollected, owningThread,&lt;br /&gt;     referenceType, referringObjects, setValue, toString, type,&lt;br /&gt;     uniqueID, virtualMachine, waitingThreads&lt;/pre&gt;&lt;br /&gt;Among the weirder things like disabling garbage collection for this object or listing all threads waiting on this object&#39;s monitor (a la &#39;synchronize&#39; in Java), we can access the object&#39;s fields through getValue and setValue.&lt;br /&gt;&lt;br /&gt;Let&#39;s examine the instance variables TTI contains. You may recall from previous posts that all Ruby objects in JRuby store their instance variables in an array, to avoid the large memory and cpu cost of storing them in a map. We can grab a reference to that array and display its contents.&lt;br /&gt;&lt;pre&gt;var_table_field = tti_class.field_by_name(&#39;varTable&#39;)&lt;br /&gt;tti_vars = tti_obj.get_value(var_table_field)&lt;br /&gt;puts &quot;varTable: #{tti_vars}&quot;&lt;br /&gt;puts tti_vars.values.map(&amp;:to_s)&lt;/pre&gt;&lt;br /&gt;And the new output:&lt;br /&gt;&lt;pre&gt;➔ jruby tti_digger.rb&lt;br /&gt;varTable: instance of java.lang.Object[7] (id=13)&lt;br /&gt;instance of ruby.TZInfo.TimezoneOffsetInfo(id=15)&lt;br /&gt;instance of ruby.TZInfo.TimezoneOffsetInfo(id=16)&lt;br /&gt;instance of org.jruby.RubyFixnum(id=17)&lt;br /&gt;instance of org.jruby.RubyFixnum(id=18)&lt;br /&gt;instance of org.jruby.RubyNil(id=19)&lt;br /&gt;instance of org.jruby.RubyNil(id=19)&lt;br /&gt;instance of org.jruby.RubyNil(id=19)&lt;/pre&gt;&lt;br /&gt;Since the varTable field is a simple Object[] in Java, the reference we get to it is of type &lt;a href=&quot;http://download.oracle.com/docs/cd/E17409_01/javase/6/docs/jdk/api/jpda/jdi/com/sun/jdi/ArrayReference.html&quot;&gt;com.sun.jdi.ArrayReference&lt;/a&gt;.&lt;br /&gt;&lt;pre&gt;➔ ri --java com.sun.jdi.ArrayReference&lt;br /&gt;-------------------------------------- Class: com.sun.jdi.ArrayReference&lt;br /&gt;     (no description...)&lt;br /&gt;------------------------------------------------------------------------&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Instance methods:&lt;br /&gt;-----------------&lt;br /&gt;     disableCollection, enableCollection, entryCount, equals, getValue,&lt;br /&gt;     getValues, hashCode, invokeMethod, isCollected, length,&lt;br /&gt;     owningThread, referenceType, referringObjects, setValue, setValues,&lt;br /&gt;     toString, type, uniqueID, virtualMachine, waitingThreads&lt;/pre&gt;&lt;br /&gt;Of course each of these references can be further explored, but already we can see that this TTI instance has seven instance variables: two TimezoneOffsetInfo objects, two Fixnums, and three nils. But we don&#39;t have instance variable names!&lt;br /&gt;&lt;br /&gt;Instance variable names are only stored on the object&#39;s class. There, a table of names to offsets is kept up-to-date as new instance variable names are discovered. We can access this from the TTI class reference and combine it with the variable table to get the output we want to see.&lt;br /&gt;&lt;pre&gt;# get the metaclass object and class reference&lt;br /&gt;metaclass_field = tti_class.field_by_name(&#39;metaClass&#39;)&lt;br /&gt;tti_class_obj = tti_obj.get_value(metaclass_field)&lt;br /&gt;tti_class_class = tti_class_obj.reference_type&lt;br /&gt;&lt;br /&gt;# get the variable names from the metaclass object&lt;br /&gt;var_names_field = tti_class_class.field_by_name(&#39;variableNames&#39;)&lt;br /&gt;var_names = tti_class_obj.get_value(var_names_field)&lt;br /&gt;&lt;br /&gt;# splice the names and values together&lt;br /&gt;table = var_names.values.zip(tti_vars.values)&lt;br /&gt;puts table&lt;/pre&gt;&lt;br /&gt;This looks a bit complicated, but there&#39;s actually a lot of boilerplate here we could put into a utility class. For example, the metaClass and variableNames fields are standard on all (J)Ruby objects and classes, respectively. But considering that we&#39;re actually walking a remote VM&#39;s *live* heap...this is pretty simple code.&lt;br /&gt;&lt;br /&gt;Here&#39;s what our script outputs now:&lt;br /&gt;&lt;pre&gt;➔ jruby tti_digger.rb&lt;br /&gt;&quot;@offset&quot;&lt;br /&gt;instance of ruby.TZInfo.TimezoneOffsetInfo(id=25)&lt;br /&gt;&quot;@previous_offset&quot;&lt;br /&gt;instance of ruby.TZInfo.TimezoneOffsetInfo(id=26)&lt;br /&gt;&quot;@numerator_or_time&quot;&lt;br /&gt;instance of org.jruby.RubyFixnum(id=27)&lt;br /&gt;&quot;@denominator&quot;&lt;br /&gt;instance of org.jruby.RubyFixnum(id=28)&lt;br /&gt;&quot;@at&quot;&lt;br /&gt;instance of org.jruby.RubyNil(id=29)&lt;br /&gt;&quot;@local_end&quot;&lt;br /&gt;instance of org.jruby.RubyNil(id=29)&lt;br /&gt;&quot;@local_start&quot;&lt;br /&gt;instance of org.jruby.RubyNil(id=29)&lt;/pre&gt;&lt;br /&gt;We could go even deeper, but I think you get the idea.&lt;br /&gt;&lt;br /&gt;&lt;span style=&quot;font-weight:bold;&quot;&gt;Your Turn&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Here&#39;s a &lt;a href=&quot;http://gist.github.com/481102&quot;&gt;gist of the three scripts we&#39;ve created&lt;/a&gt;, so you can refer to and build off of them. And of course the javadocs and ri docs will help you as well, plus everything we&#39;ve done here you can do in a jirb session.&lt;br /&gt;&lt;br /&gt;There&#39;s a lot to the JDI API, but once you&#39;ve got the VirtualMachine object in hand it&#39;s pretty easy to follow. As you&#39;d expect from any debugger API, you need to know a bit about how things work on the inside, but through the magic of JRuby it&#39;s actually possible to write most of those fancy memory and debugging tools entirely in Ruby. Perhaps this article has peaked your interest in exploring JRuby internals using JDI and you might start to write debugging tools. Perhaps we can ship a few utilities to make some of the boilerplate go away. In any case, I hope this series of articles shows that JRuby users have an amazing library of tools available to them, and you don&#39;t even have to leave your comfort zone if you don&#39;t want to.&lt;br /&gt;&lt;br /&gt;Note: The variableNames field is a recent addition to JRuby master, so if you&#39;d like to play with that you&#39;ll probably want to build JRuby yourself or wait for a nightly build that picks it up. But you can certainly do a lot of exploring even without that patch.</content><link rel='replies' type='application/atom+xml' href='http://blog.headius.com/feeds/6351925351093074122/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://blog.headius.com/2010/07/browsing-memory-with-ruby-and-java.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/6351925351093074122'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/4704664917418794835/posts/default/6351925351093074122'/><link rel='alternate' type='text/html' href='http://blog.headius.com/2010/07/browsing-memory-with-ruby-and-java.html' title='Browsing Memory with Ruby and Java Debug Interface'/><author><name>Charles Nutter</name><uri>https://plus.google.com/101599370339210456684</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh5.googleusercontent.com/-VT5b8KsgHgQ/AAAAAAAAAAI/AAAAAAAAFB0/9d5SU9AcHNQ/s512-c/photo.jpg'/></author><thr:total>1</thr:total>
- </entry>
-</feed>
----
-feed.format:            atom
-feed.title:             Headius
-feed.summary:           Helping the JVM Into the 21st Century
-feed.url:               http://blog.headius.com/
-feed.generator.name:         Blogger
-feed.generator.url:          http://www.blogger.com
-feed.generator.version:      7.00
-feed.items[0].title: JRubyConf.eu 2014!
-feed.items[0].url:   http://blog.headius.com/2014/05/jrubyconfeu-2014.html
-feed.items[0].id:  tag:blogger.com,1999:blog-4704664917418794835.post-3430080308857860963
-feed.items[1].title: The Pain of Broken Subprocess Management on JDK
-feed.items[1].id:  tag:blogger.com,1999:blog-4704664917418794835.post-462657466694269626

data/test/feeds/inessential.json DELETED

@@ -1,182 +0,0 @@
-{
-  "version": "https://jsonfeed.org/version/1",
-  "title": "inessential.com",
-  "description": "Brent Simmons’s weblog.",
-  "home_page_url": "http://inessential.com/",
-  "feed_url": "http://inessential.com/feed.json",
-  "user_comment": "This feed allows you to read the posts from this site in any feed reader that supports the JSON Feed format. To add this feed to your reader, copy the following URL — http://inessential.com/feed.json — and add it your reader.",
-  "favicon": "http://inessential.com/favicon.ico",
-  "author": {
-    "name": "Brent Simmons",
-    "url": "http://inessential.com/",
-    "avatar": "http://ranchero.com/downloads/brent_avatar.png"
-  },
-  "items": [
-    {
-      "id": "http://inessential.com/2017/05/17/json_feed",
-      "url": "http://inessential.com/2017/05/17/json_feed",
-      "title": "JSON Feed",
-      "content_html": "<p>I was hesitant, even up to this morning, to publish the <a href=\"https://jsonfeed.org/version/1\">JSON Feed spec</a>.</p>\n\n<p>If you read Dave Winer’s <a href=\"http://scripting.com/2017/05/09/rulesForStandardsmakers.html\">Rules for standards-makers</a>, you’ll see that we did a decent job with some of the rules — the spec is written in plain English, for example — but a strict application of the rules would have meant not publishing at all, since “Fewer formats is better.”</p>\n\n<p>I agree completely — but I also believe that developers (particularly Mac and iOS developers, the group I know best) are so loath to work with XML that they won’t even consider building software that needs an XML parser. Which says to me that JSON Feed is needed for the survival of syndication.</p>\n\n<p>I could be wrong, of course. I admit.</p>\n\n<h4>Feed Reader Starter Kit</h4>\n\n<p>See my <a href=\"https://github.com/brentsimmons/RSXML\">RSXML repository</a> for Objective-C code that reads RSS, Atom, and OPML. I’ve done the work for you of supporting those formats. Go write a feed reader! Seriously. Do it.</p>\n\n<p>I planned to have a JSON Feed parser for Swift done for today, but other things got in the way. It’s coming soon. But you probably don’t actually need any sample code, since JSON is so easy to handle.</p>\n\n<h4>Feedback so far</h4>\n\n<p>Feedback has been interesting so far. Some <a href=\"https://github.com/brentsimmons/JSONFeed\">questions</a> on the GitHub repo need answering.</p>\n\n<p>Some people have said this should have happened ten years ago, and other people have said that they hate how developers jump on the latest fad (JSON).</p>\n\n<p>And some people really like the icon:</p>\n\n<p><img src=\"http://jsonfeed.org/graphics/icon.png\" height=70 width=70 /></p>\n\n<h4>Microformats</h4>\n\n<p>One of the more serious criticisms was this: why not just support the <a href=\"http://microformats.org/wiki/hatom\">hAtom microformat</a> instead? Why do another side-file?</p>\n\n<p>My thinking:</p>\n\n<p>My experience as a feed reader author tells me that people screw up XML, badly, all the time — and they do even less well with HTML. So embedding info in HTML is just plain too difficult. In practice it would be even buggier than XML-based feeds.</p>\n\n<p>And there are other advantages to decoupling: a side-file can have 100 entries where there are only 10 on an HTML page, for instance. A side-file can have extra information that you wouldn’t put on an HTML page. And yet, despite the extra information, a side-file can be much smaller than an HTML page, and it can often be easier to cache (since it’s not different based on a logged-in user, for instance).</p>\n\n<p>Microformats sounds elegant, but I don’t prize elegance as much as I value things that work well.</p>",
-      "date_published": "2017-05-17T13:22:14-07:00"
-    },
-    {
-      "id": "http://inessential.com/2017/05/01/frontier_diary_8_when_worlds_collide",
-      "url": "http://inessential.com/2017/05/01/frontier_diary_8_when_worlds_collide",
-      "title": "Frontier Diary #8: When Worlds Collide",
-      "content_html": "<p>I spent the weekend making a bunch of progress on the compiler. It has two pieces: a <a href=\"https://github.com/brentsimmons/Frontier/blob/master/UserTalk/UserTalk/Compiler/Tokenizer.swift\">tokenizer</a>, which I created by rewriting the original C code (<a href=\"https://github.com/brentsimmons/Frontier/blob/master/FrontierOrigFork/Common/source/langscan.c\">langscan.c</a>) in Swift, and a parser.</p>\n\n<p>The parser in OrigFrontier was generated by MacYacc, which is similar to Yacc, which is similar to <a href=\"https://www.gnu.org/software/bison/\">Bison</a>, which is on my Mac. The thing about the parser is that it’s C code, and the rest of the app is Swift.</p>\n\n<p>How do you bridge the two worlds? Easy answer: with Objective-C, which is a superset of C and which plays nicely (enough) with Swift.</p>\n\n<p>So I renamed langparser.y — the rules file that the parser generator uses — to <a href=\"https://github.com/brentsimmons/Frontier/blob/master/UserTalk/UserTalk/Compiler/langparser.ym\">langparser.ym</a> so that Xcode would know to treat the generated parser source as Objective-C. I edited it slightly, not to change the grammar rules but to change how nodes are created (as return values rather than via inout).</p>\n\n<p>I also made my <a href=\"https://github.com/brentsimmons/Frontier/blob/master/UserTalk/UserTalk/CodeTreeNode.swift\">CodeTreeNode</a> class, written in Swift, an Objective-C class so that it would be visible to my Objective-C code.</p>\n\n<p>And then, finally, I started a build…</p>\n\n<p>…and then it stopped with an error because the parser places my <code>CodeTreeNode</code> in a C union, which isn’t allowed in ARC.</p>\n\n<p>Crushed.</p>\n\n<p style=\"text-align:center\">* * *</p>\n\n\n<p>I think I have three options:</p>\n\n<ol>\n<li>Go down the rabbit hole of figuring out how to get the parser to work with ARC.</li>\n<li>Go with the flow: have the parser generate nodes that are, as in OrigFrontier, C structs. The last compilation step would be Objective-C code that translates that tree of C structs into a tree of <code>CodeTreeNode</code> objects, and then disposes the C-struct-node-tree.</li>\n<li>Write the parser by hand, in Swift.</li>\n</ol>\n\n\n<p>My thinking:</p>\n\n<p>I could waste a ton of time on #1, and bending tools in that way can be pretty frustrating work when they refuse to bend.</p>\n\n<p>With #2 I’d feel a bit weird about the redundancy: building a tree and then building a copy of that tree with a different type of object.</p>\n\n<p>My heart tells me #3 is the answer. After all, I’ve already done the tokenizer. How hard would it be to parse those tokens into a code tree? I could skip C and Objective-C altogether and stay in Swift. And it would be <em>so fun</em>. (Because that’s precisely the style of weirdo I am.)</p>\n\n<p style=\"text-align:center\">* * *</p>\n\n\n<p>But the real answer is #2. Writing a parser by hand would take way longer than I think. Given enough tests, it shouldn’t be a huge source of bugs, but still.</p>\n\n<p>The thing about #2 is that yes, it’s redundant, it’s doing more work than it needs to, ideally — but my bet is that it would still be so fast that you wouldn’t be able to tell the difference. Computers are so good at this kind of thing. It’s not like reading files or networking; it’s just in-memory traversal and creating/releasing things.</p>\n\n<p>You remember in Indiana Jones that guy with the twirling swords, and Indy gives that look and then just shoots him? The second option is the Indiana Jones solution.</p>\n\n<p><i>Update 2:05 pm</i>: Two people have already written me to recommend <a href=\"http://www.antlr.org\">ANTLR</a>. So I will definitely give that a look. It might be exactly what I need.</p>",
-      "date_published": "2017-05-01T13:34:23-07:00"
-    },
-    {
-      "id": "http://inessential.com/2017/04/27/frontier_diary_7_pretty_much_everythin",
-      "url": "http://inessential.com/2017/04/27/frontier_diary_7_pretty_much_everythin",
-      "title": "Frontier Diary #7: Pretty Much Everything Throws",
-      "content_html": "<p>A script can throw an error, either intentionally (via the <code>scriptError</code> verb) or by doing something, such as referencing an undefined object, that generates an error.</p>\n\n<p>OrigFrontier was written in C, which has no error-throwing mechanism, and so it worked like this: most runtime functions returned a boolean (for success or failure), and the return value was passed in by reference. If there was an error, the function would set a global error variable and return false. The caller would then have to check that global to see if there was an error, and then do the right thing.</p>\n\n<p>This was not unreasonable, given the language and the times (early ’90s) and also given the need to be very careful about unwinding memory allocations.</p>\n\n<p>But, these days, it seems to me that Swift’s error system is the way to go. There’s just one downside to that, and it’s that I have to do that do/try/catch dance all over the place, since pretty much any runtime function can throw an error.</p>\n\n<p>Even the coercions can throw, so last night I changed the <a href=\"https://github.com/brentsimmons/Frontier/blob/master/FrontierData/FrontierData/Value/ValueProtocol.swift\">Value</a> protocol so that <code>asInt</code> and so on are now functions, since properties can’t throw (at least not yet).</p>\n\n<p>The extra housekeeping — the do/try/catch stuff — kind of bugs me, but it’s honest. I considered making script errors just another type of Value — but that meant that all those callers have to check the returned Value to see if it’s an error, and then do the right thing. Better to just use Swift’s error system, because it makes for more consistent code, and it makes sure I’m catching errors in every case.</p>\n\n<p>It also means I’m not multiplying entities. A Swift error is a script error, and vice versa.</p>\n\n<p style=\"text-align:center\">* * *</p>\n\n\n<p>Working on this code is like applying the last 25 years of programming history all at once.</p>\n\n<p>A completely different type of error is a <em>bug</em>, and I’m certain to write a bunch of them, because that’s how programming goes.</p>\n\n<p>That’s where unit tests come in. Frontier has long had a stress-test suite of scripts — you’d launch the app, run that suite, wait a while, and see if there are any errors. This was critically helpful.</p>\n\n<p>But OrigFontier didn’t have unit tests at the C code level. The new version does. (Well, <a href=\"https://github.com/brentsimmons/Frontier/blob/master/FrontierVerbs/FrontierVerbsTests/Math.swift\">I’ve started them anyway</a>.) This means I can more easily follow <a href=\"http://scripting.com/2002/09/29.html#rule1\">Rule 1</a> — the no-breakage rule — and can also more easily follow Rule 1b — the don’t-break-Dave rule.</p>\n\n<p>PS I’ve added a <a href=\"http://inessential.com/frontierdiary\">collection page for the Frontier Diary</a>, as I did with earlier diaries. There’s a link to it in the footer of every page on the blog.</p>",
-      "date_published": "2017-04-27T13:30:42-07:00"
-    },
-    {
-      "id": "http://inessential.com/2017/04/26/frontier_diary_6_ballard_from_the_par",
-      "url": "http://inessential.com/2017/04/26/frontier_diary_6_ballard_from_the_par",
-      "title": "Frontier Diary #6: Ballard, from the Parallel Universe",
-      "content_html": "<p>In another universe I didn’t decide to port Frontier — instead, I started over from scratch on an app <em>inspired</em> by Frontier.</p>\n\n<p>In that universe, the new scripting language, descended from UserTalk, is called Ballard. <a href=\"http://inessential.com/ballard_lang\">And it’s documented</a>.</p>",
-      "date_published": "2017-04-26T13:04:04-07:00"
-    },
-    {
-      "id": "http://inessential.com/2017/04/25/my_microblog",
-      "url": "http://inessential.com/2017/04/25/my_microblog",
-      "title": "My Microblog",
-      "content_html": "<p>I’m on Manton‘s cool new microblogs system. Here’s where you can follow me, once you’re on the system: <a href=\"http://micro.blog/brentsimmons\">http://micro.blog/brentsimmons</a>.</p>\n\n<p>And here’s my microblog: <a href=\"http://brent.micro.blog/\">http://brent.micro.blog/</a>. (Which you can read using RSS, whether you’re on the system or not.)</p>\n\n<p>I wrote about three-quarters of my own single-user microblog system — and then stopped because I didn’t feel like running a server and because Manton’s service is so good.</p>",
-      "date_published": "2017-04-25T14:27:28-07:00"
-    },
-    {
-      "id": "http://inessential.com/2017/04/25/frontier_diary_5_values_and_progress_o",
-      "url": "http://inessential.com/2017/04/25/frontier_diary_5_values_and_progress_o",
-      "title": "Frontier Diary #5: Values and Progress on the Language",
-      "content_html": "<p>I put the <a href=\"https://github.com/brentsimmons/Frontier\">Frontier repository</a> up on GitHub.</p>\n\n<p>(The build is currently broken. This is bad discipline, but since it’s still just me, I forgive myself. Sometimes I run out of time and I just commit what I have.)</p>\n\n<p>The repo has my new code and it also contains <a href=\"https://github.com/brentsimmons/Frontier/tree/master/FrontierOrigFork\">FrontierOrigFork</a>, which is the original Frontier source with a bunch of deletions and some changes. The point is to give me 1) code to read and 2) a project that builds and runs on <a href=\"http://inessential.com/2017/04/03/frontier_diary_1_vm_life\">my 10.6.8 virtual machine</a>.</p>\n\n<p>The original code is in C, and the port is, at least so far, all in Swift. In the end it should be <em>almost</em> all in Swift, but I anticipate a couple places where I may need to use Objective-C.</p>\n\n<p>Here’s one of the Swift wins:</p>\n\n<h4>Values</h4>\n\n<p>Since Frontier contains a database and scripting language, there’s a need for some kind of value object that could be a boolean, integer, string, date, and so on.</p>\n\n<p>Original Frontier used a <a href=\"https://github.com/brentsimmons/Frontier/blob/master/FrontierOrigFork/Common/headers/lang.h\">tyvaluedata</a> union, with fields for the various types of values.</p>\n\n<p>This is a perfectly reasonably approach in C. It’s great because you can pass the same type of value object everywhere.</p>\n\n<p>Were I writing this in Objective-C, however, I’d create a <code>Value</code> protocol, and then create new value objects for some types and also extend existing objects (<code>NSNumber</code>, <code>NSString</code>, etc.) to conform to the <code>Value</code> protocol. This would still give me the upside — passing a <code>Value</code> type everywhere — while reducing the amount of boxing.</p>\n\n<p>But: this still means I have an <code>NSNumber</code> when I really want a BOOL. Luckily, in Swift I can go one better: I can extend types such as <a href=\"https://github.com/brentsimmons/Frontier/blob/master/FrontierData/FrontierData/Value/ValueBool.swift\">Bool</a> and <code>Int</code> to conform to a <a href=\"https://github.com/brentsimmons/Frontier/blob/master/FrontierData/FrontierData/Value/ValueProtocol.swift\">Value</a> protocol.</p>\n\n<p>This means passing around an <em>actual</em> <code>Bool</code> rather than a boxed boolean. I like this a ton. It feels totally right.</p>\n\n<p>Other topic:</p>\n\n<h4>Language Progress</h4>\n\n<p>I’m still in architectural mode, where I’m writing just enough code to validate and refine my decisions. A couple days ago I started on the <a href=\"https://github.com/brentsimmons/Frontier/blob/master/UserTalk/UserTalk/LangEvaluator.swift\">language evaluator</a> — the thing that actually runs scripts.</p>\n\n<p>It works as you expect: it takes a compiled code tree and recursively evaluates it. It’s not difficult — it’s just that it’s going to end up being a fair amount of code.</p>\n\n<p>I’ve done just enough to know that I’m on the right path. (The Swift code looks a lot like the C code in OrigFrontier’s <a href=\"https://github.com/brentsimmons/Frontier/blob/master/FrontierOrigFork/Common/source/langevaluate.c\">langevaluate.c</a>. See <code>evaluateList</code>, for instance.)</p>\n\n<p>The next step is for me to build the parser. I thought about writing a parser by hand, because it sounds like fun, and it would give me some extra control — but, really, it would slow me way down, so forget it.</p>\n\n<p>OrigFrontier generated its parser by passing a grammar file — <a href=\"https://github.com/brentsimmons/Frontier/blob/master/FrontierOrigFork/Common/source/langparser.y\">langparser.y</a> — to MacYacc (there was such a thing!), which generated <a href=\"https://github.com/brentsimmons/Frontier/blob/master/FrontierOrigFork/Common/source/langparser.c\">langparser.c</a>.</p>\n\n<p>I’ll do a similar thing, except using <a href=\"https://www.gnu.org/software/bison/\">Bison</a> (which is compatible with Yacc). Or, possibly, using the <a href=\"http://www.hwaci.com/sw/lemon/\">Lemon parser generator</a> instead. Either way, I’ll want the generated code to be Objective-C. (Well, mostly C, but with Objective-C objects instead of structs.) (I don’t know of a generator that would create Swift code.)</p>\n\n<p>This is completely new territory for me, and is exciting.</p>\n\n<p>(Almost forgot to mention: I’ll need to write a tokenizer. This means porting <a href=\"https://github.com/brentsimmons/Frontier/blob/master/FrontierOrigFork/Common/source/langscan.c\">langscan.c</a>. I’ll need to do this first, since the parser generator needs it. So this is the real next step.)</p>",
-      "date_published": "2017-04-25T13:26:33-07:00"
-    },
-    {
-      "id": "http://inessential.com/2017/04/14/save_300_on_coccoaconf_next_door",
-      "url": "http://inessential.com/2017/04/14/save_300_on_coccoaconf_next_door",
-      "title": "Save $300 on CocoaConf Next Door",
-      "content_html": "<p>My pals at CocoaConf asked me to remind you that the <a href=\"https://twitter.com/cocoaconf/status/852898192035282944\">Early Bird sale ends in two weeks</a> for CocoaConf Next Door — the one taking place in San Jose during WWDC.</p>\n\n<p>I’ll be there. At least in the afternoons.</p>\n\n<p>Check out the <a href=\"http://cocoaconf.com/nextdoor/speakers\">speakers list</a>. Yummy, chewy, nutty speakers list.</p>",
-      "date_published": "2017-04-14T13:53:02-07:00"
-    },
-    {
-      "id": "http://inessential.com/2017/04/14/frontier_diary_4_the_quickdraw_problem",
-      "url": "http://inessential.com/2017/04/14/frontier_diary_4_the_quickdraw_problem",
-      "title": "Frontier Diary #4: The QuickDraw Problem and Where It Led Me",
-      "content_html": "<p>In my fork of Frontier there are still over 600 deprecation warnings. A whole bunch of these are due to <a href=\"https://en.wikipedia.org/wiki/QuickDraw\">QuickDraw</a> calls.</p>\n\n<p>For those who don’t know: QuickDraw was how, in the old days, you drew things to the Mac’s screen. It was amazing for its time and pretty easy to work with. Functions included things like <code>MoveTo</code>, <code>LineTo</code>, <code>DrawLine</code>, <code>FrameOval</code>, and so on. All pretty straightforward.</p>\n\n<p>These days we have Core Graphics instead, and we have higher-level things like <code>NSBezierPath</code>. QuickDraw was simpler — though yes, sure, that was partly because it did less.</p>\n\n<p style=\"text-align:center\">* * *</p>\n\n\n<p>I was looking at all these deprecation warnings for QuickDraw functions and wondering how I’m ever going to get through them.</p>\n\n<p>I could, after all, convert all or most of them to the equivalent Core Graphics thing. But sheesh, what a bunch of work.</p>\n\n<p>And, in the end, it would still be a Carbon app, but with modern drawing.</p>\n\n<p style=\"text-align:center\">* * *</p>\n\n\n<p>So I thought about it from another angle. The goal is to get to the point where it’s a 64-bit Cocoa app. All these QuickDraw calls are in the service of UI — so why not just start over with a Cocoa UI?</p>\n\n<p>The app has some outlines (database browser, script editor, etc.), a basic text editor, and a handful of small dialogs. <em>And all of that is super-easy in Cocoa.</em></p>\n\n<p>Use an <code>NSOutlineView</code>, <code>NSTextView</code>, and some xibs for the dialogs, and we’re done. (Well, after <em>some</em> work, but not nearly the same amount of work as actually writing an outliner from scratch.)</p>\n\n<p>In other words, instead of going from the bottom up — porting the existing source code — I decided to start from the top down.</p>\n\n<p>I started a new workspace and started a new Frontier project: a Cocoa app with Swift as the default language.</p>\n\n<p>Then I looked at the existing source and thought about how to organize things. I came up with this:</p>\n\n<ul>\n<li>Frontier — App UI</li>\n<li>UserTalk.framework — the language</li>\n<li>FrontierVerbs.framework - the standard library</li>\n<li>FrontierDB.framework — the object database</li>\n<li>FrontierCore.framework — common utility functions and extensions</li>\n</ul>\n\n\n<p>I like using frameworks, because it helps enforce separation, and it helps in doing unit testing. And frameworks are so easy with Swift these days.</p>\n\n<p>Hardly any of this is filled-in yet. I’ve got the barest start on FrontierVerbs. <a href=\"https://twitter.com/tedchoward\">Ted Howard</a>, my partner in all this, is taking UserTalk.framework and FrontierDB.framework.</p>\n\n<p>In the end, it’s possible that no code from the original code base survives. Which is totally fine. But it also means that this is no quick project.</p>\n\n<p>At this point I should probably put it up on GitHub, since it’s easier to write about it if I can link to the code. I’ll do that soon, possibly on the weekend.</p>",
-      "date_published": "2017-04-14T13:14:20-07:00"
-    },
-    {
-      "id": "http://inessential.com/2017/04/13/frontier_diary_3_built-in_verbs_config",
-      "url": "http://inessential.com/2017/04/13/frontier_diary_3_built-in_verbs_config",
-      "title": "Frontier Diary #3: Built-in Verbs Configuration",
-      "content_html": "<p>Frontier’s standard library is known as its built-in verbs. There are a number of different tables: <code>file</code>, <code>clock</code>, <code>xml</code>, and so on. Each contains a number of verbs: <code>file.readWholeFile</code>, <code>clock.now</code>, and so on.</p>\n\n<p>Most of these verbs are implemented in C, in the kernel, rather than as scripts. At the moment, to add one of these kernel verbs, you have to jump through a few hoops: edit a resource, add an integer ID, add to a switch statement, etc. It’s a pain and is error-prone.</p>\n\n<p>So I want to re-do this in Swift, because I’m all about Swift. And I want adding verbs to be fool-proof: I don’t want to remember how to configure this every single time I add a verb. Adding a verb needs to be <em>easy</em>.</p>\n\n<p>My thinking:</p>\n\n<ul>\n<li>Give each table its own class: ClockVerbs, FileVerbs, etc.</li>\n<li>Have each class report the names of the verbs it supports. These need to be strings, because we get a string at runtime.</li>\n<li>Run a verb simply by looking up the selector, performing it, and returning the result.</li>\n</ul>\n\n\n<p>To make things easy and obvious, I think it should work like this: the selector for a given verb is its name plus a parameter. Then there’s not even a lookup step.</p>\n\n<p>Each verb will take a VerbParameters object and return a VerbResult object.</p>\n\n<pre><code>dynamic func readWholeFile(_ params: VerbParameters) -&gt; VerbResult\n</code></pre>\n\n<p>The flow goes like this:</p>\n\n<ol>\n<li>We have the string <code>file.readWholeFile</code>.</li>\n<li>We see the <code>file</code> suffix and so we know we need a <code>FileVerbs</code> object.</li>\n<li>We check <code>fileVerbs.supportedVerbs</code> (an array) to see if <code>readWholeFile</code> is in the list. It is.</li>\n<li>We construct a selector using the <code>readWholeFile</code> part of the string and we add a <code>:</code> character: <code>NSSelectorFromString(verbName + \":\")</code></li>\n</ol>\n\n\n<p>This is great! We’re almost home free. Then we run the verb:</p>\n\n<pre><code>if let result = perform(selector, with: params) as? VerbResult {\n    return result\n}\n</code></pre>\n\n<p>That doesn’t work. We get:</p>\n\n<pre><code>Cast from 'Unmanaged&lt;AnyObject&gt;! to unrelated type 'VerbResult' always fails\n</code></pre>\n\n<p>Nuts.</p>\n\n<p style=\"text-align:center\">* * *</p>\n\n\n<p>It was <em>so</em> close.</p>\n\n<p>In Objective-C this would have worked. And obviously, apparently, I still think in Objective-C.</p>\n\n<p>I investigated some other options. At one point enums were abused, because there’s <em>always</em>, in Swift, an enum-abuse step. But everything I tried was more code and was more error-prone, and my goal here is to improve the situation.</p>\n\n<p>I think, in the end, I’m going to do something that looks kind of ugly: a switch statement where the cases are string literals.</p>\n\n<pre><code>switch(verbName) {\ncase \"readWholeFile\":\n    return readWholeFile(params)\n…\n}\n</code></pre>\n\n<p>“Nooooo!” you cry. I hear ya.</p>\n\n<p>My experience as an object-oriented programmer tells me this: if I write a <code>switch</code> statement, I blew it.</p>\n\n<p>And my experience as a programmer tells me that string literals are a bad idea.</p>\n\n<p>But the above may actually be the easiest to configure and maintain. Each string literal appears only in that one switch statement and nowhere else in the code. And the mapping between a verb name and its function couldn’t be more clear — it’s right there.</p>\n\n<p>(Yes, instead of using a string literal, I could create a String enum and switch on that. But that’s actually more code and more room for error. I’m going to have to type those string literals <em>somewhere</em>, so why not right where they’re used?)</p>\n\n<p>It does mean that <code>readWholeFile</code> appears three times in the code (the string literal, the call, and the function itself), and in an Objective-C version it would appear only twice (in a <code>supportedVerbs</code> array and the method itself).</p>\n\n<p>But. Well.</p>\n\n<p>I’m torn between shuddering in abject and complete horror at this solution and thinking, “Hey, that’s pretty straightforward. Anybody could read it. Anybody could edit it.” Which was the plan all along.</p>\n\n<p>And I get to stick with Swift, so there’s that.</p>\n\n<p>But, sure as shootin’, some day someone’s going to come across this code and say, “Brent, dude, are ya <em>new</em>?” And I’ll send them the link to this page.</p>\n\n<p style=\"text-align:center\">* * *</p>\n\n\n<p><i>Update the next day:</i> well, the <code>performSelector</code> thing <em>would</em> work, if only I’d known about Swift Unmanaged objects.</p>\n\n<p><a href=\"https://twitter.com/jckarter\">Joe Groff</a> told me how this works.</p>\n\n<p>Here’s the gist: the <code>Unmanaged&lt;AnyObject&gt;</code> just needs to be unwrapped by calling <code>takeRetainedValue</code> or <code>takeUnretainedValue</code>. Once unwrapped, it can be cast to <code>VerbResult</code>.</p>\n\n<p>All this means that I can use my original design, which is great news.</p>\n\n<p style=\"text-align:center\">* * *</p>\n\n\n<p><i>Update April 25, 2017:</i> I ended up using enums after all. See <a href=\"https://github.com/brentsimmons/Frontier/blob/master/FrontierVerbs/FrontierVerbs/VerbTables/MathVerbs.swift\">MathVerbs.swift</a> for an example.</p>",
-      "date_published": "2017-04-13T22:25:41-07:00"
-    },
-    {
-      "id": "http://inessential.com/2017/04/11/frontier_diary_2_two_good_ideas_that_a",
-      "url": "http://inessential.com/2017/04/11/frontier_diary_2_two_good_ideas_that_a",
-      "title": "Frontier Diary #2: Two Good Ideas that Aren’t Good Anymore",
-      "content_html": "<p>Strings in <a href=\"http://inessential.com/2017/04/03/frontier_diary_1_vm_life\">Frontier</a> are usually either Pascal strings or Handles.</p>\n\n<p>You probably don’t know what I’m talking about. I’ll explain.</p>\n\n<h4>Pascal Strings</h4>\n\n<p>Frontier is a Mac Toolbox app that’s been Carbonized just enough to run on OS X. You may recall that the Mac Toolbox was written so long ago that the <a href=\"https://developer.apple.com/legacy/library/documentation/mac/pdf/MacintoshToolboxEssentials.pdf\">original API</a> was in Pascal. That Pascal heritage lived on in many ways, even after everyone switched to C — and one of those ways was Pascal strings.</p>\n\n<p>A Pascal string is n bytes long, and the first byte specifies the length of the string, which leaves the rest of the bytes for the actual string. <code>Str255</code> was probably most common, and certainly is most common in Frontier, but there are also smaller sizes: <code>Str63</code> and <code>Str31</code>, for instance.</p>\n\n<p>Unlike C strings, they’re not zero-terminated, since there’s no need to calculate the length: you always know it from that first byte.</p>\n\n<p>You create a literal Pascal string like this…</p>\n\n<pre><code>Str255 s = \"\\pThis is a string\";\n</code></pre>\n\n<p>…and the compiler turns the <code>\\p</code> into the correct length (16 in this case).</p>\n\n<p>Now, I bet you’re saying to yourself, “Self, those Pascal strings are too small to be useful.”</p>\n\n<p>But consider this: every menu item name can fit into a Pascal string. You can fit a window title or a file name into a Pascal string (in fact, memory suggests that file names were even shorter, were <code>Str31</code> Pascal strings). Any label or message on any bit of UI is probably short enough to fit into a Pascal string. (Especially if you assume English.)</p>\n\n<p>So for GUI apps these were terrifically useful, and the 255-byte limit was no problem. (You can fit a tweet in a Pascal string, after all, with a bunch of room left over. [Well, depending on the size of the characters.])</p>\n\n<p>Frontier still uses them internally a ton. (For some reason, in the Frontier code, <code>Str255</code> strings are called <code>bigstring</code>, which sounds ironic, since they’re so small, but I think it was to differentiate them from even smaller Pascal strings such as <code>Str31</code>.)</p>\n\n<p>You might ask what the text encoding was for these strings.</p>\n\n<p>“Text whatzit?” I’d reply. “Oh, I see. Just regular.” (<a href=\"https://en.wikipedia.org/wiki/Mac_OS_Roman\">MacRoman</a>.)</p>\n\n<p>It was a good idea, but its time has come and gone. We have better strings these days.</p>\n\n<h4>Handles</h4>\n\n<p>Frontier includes a scripting language and a database, which means it certainly has a need for strings much larger than 255 bytes.</p>\n\n<p>It also needs heap storage for other things — binary data, structs, etc. — that could be much larger than 255 bytes.</p>\n\n<p>Enter the Handle. A Handle points to a pointer <em>that might move</em>: the memory you access via a Handle is <em>relocatable</em>.</p>\n\n<p>Which sounds awful, I know, but it was a smart optimization in the days when your Mac’s memory would be a single-digit number of megabytes, or even less than that.</p>\n\n<p>Here’s the problem: your application’s heap space can become fragmented. It could have a whole bunch of gaps in it after a while. So, to regain that memory, the system could compact the heap — it would remove those gaps, which means relocating the memory pointed to via a Handle.</p>\n\n<p>This is better than running out of memory, obviously. But it means that you have to be careful when dereferencing a Handle: you have to actually lock it first — <code>HLock(h)</code> — so that it can’t be moved while you’re using it. (And then you unlock it — <code>HUnlock(h)</code> — when finished.)</p>\n\n<p>Handles are also resizable — <code>SetHandleSize(h, size)</code> — and resizing a Handle can result in it needing to move, if there’s not enough space where it is. Or other Handles might move. You don’t ever know, and don’t care, and you think this is elegant because the system handles it all for you.</p>\n\n<p>All you have to deal with is an additional level of indirection (<code>**h</code> instead of <code>*p</code>), locking and unlocking it when needed, and disposing of it — <code>DisposeHandle(h)</code> — when finished. (No, there’s no reference counting, slacker.)</p>\n\n<p>Nowadays, on OS X, Handles don’t ever move and there’s no heap compaction. So there’s no reason for them whatsoever. And they are, as expected, deprecated.</p>\n\n<p>Nevertheless, Frontier, a Mac Toolbox app written in C, uses Handles everywhere.</p>\n\n<p>(I remember being shocked, when I first started learning Cocoa 15 years ago, that there were no Handles. It seemed <em>incredibly</em> daring that objects were just pointers. It made me nervous!)</p>\n\n<h4>The Size of the Job</h4>\n\n<p>Almost all the Mac APIs that Frontier uses are deprecated. That’s one thing.</p>\n\n<p>But it’s worse than just that: the ways Frontier handles strings and <em>pretty much every single thing it stores on the heap</em> are also deprecated.</p>\n\n<p>So: what to do?</p>\n\n<p>The end goal is a Cocoa app, which means I’ll be able to use Foundation, CoreFoundation, and Swift data types: NSString and Swift String, for instance. There are a number of different structs in the code, and those will be turned into Objective-C and Swift objects and Swift structs.</p>\n\n<p>The tricky part, though, is getting from here to there. I think the first step is to start with Objective-C and Foundation types and use them where possible. I can do that without actually turning it into a Cocoa app (the app will still have its own WaitNextEvent event loop and Carbon windows) — which means I’ll have to bracket all Objective-C code in autorelease pools, and I’ll have to use manual retains and releases. I’m not sure how far that will get me, but it will get me closer.</p>\n\n<p>PS Here are a couple articles by Gwynne Raskind on the Mac Toolbox you might enjoy: <a href=\"https://mikeash.com/pyblog/friday-qa-2012-01-13-the-mac-toolbox.html\">Friday Q&amp;A 2012-01-13: The Mac Toolbox</a> and <a href=\"https://mikeash.com/pyblog/the-mac-toolbox-followup.html\">The Mac Toolbox: Followup</a>.</p>",
-      "date_published": "2017-04-11T13:01:55-07:00"
-    },
-    {
-      "id": "http://inessential.com/2017/04/05/two_little-known_and_completely_unrelate",
-      "url": "http://inessential.com/2017/04/05/two_little-known_and_completely_unrelate",
-      "title": "Two Little-Known and Completely Unrelated Facts",
-      "content_html": "<p>One. <a href=\"https://www.omnigroup.com/omnioutliner\">OmniOutliner</a>’s outline view is implemented as CALayers rather than as a view with subviews. (I don’t think I’m giving away a trade secret here.)</p>\n\n<p>Two. If you eat fenugreek, your <a href=\"https://www.theatlantic.com/health/archive/2010/06/the-mystery-of-the-maple-syrup-smell/57980/\">armpits will smell like maple syrup</a>.</p>",
-      "date_published": "2017-04-05T16:57:59-07:00"
-    },
-    {
-      "id": "http://inessential.com/2017/04/05/ios_javascript_and_object_hierarchies",
-      "url": "http://inessential.com/2017/04/05/ios_javascript_and_object_hierarchies",
-      "title": "iOS, JavaScript, and Object Hierarchies",
-      "content_html": "<p><a href=\"http://iam.fahrni.me/2017/03/25/scripting-ios/\">Rob Fahrni</a>:</p>\n\n<blockquote><p>Given x-callback-url and App URL schemes in general it would be extremely cool to use those to create object hierarchies using JavaScript. Why JavaScript? Well, it’s native to iOS and applications can use the runtime.</p></blockquote>",
-      "date_published": "2017-04-05T14:53:01-07:00"
-    },
-    {
-      "id": "http://inessential.com/2017/04/05/cocoaconf_near_wwdc",
-      "url": "http://inessential.com/2017/04/05/cocoaconf_near_wwdc",
-      "title": "CocoaConf Near WWDC",
-      "content_html": "<p>There are a bunch of things happening near WWDC this year. Me, I’ll be at <a href=\"http://cocoaconf.com/blog/nextdoor\">CocoaConf Next Door</a>. I’m not preparing a talk, but I’ll probably be on a panel. And hanging out.</p>\n\n<p>Check out the <a href=\"http://cocoaconf.com/nextdoor/speakers\">speakers list</a>, which includes Omni’s own <a href=\"http://cocoaconf.com/nextdoor/speakers/162\">Liz Marley</a>. And a bunch of other people you totally want to see — Manton Reece, Jean MacDonald, Laura Savino, and plenty more.</p>\n\n<p>Also… <a href=\"http://altconf.com/\">AltConf</a> and <a href=\"https://layers.is/\">Layers</a> will be near WWDC. If you could be in three places at once, you would. Well, four, including WWDC itself, I suppose. :)</p>",
-      "date_published": "2017-04-05T14:35:05-07:00"
-    },
-    {
-      "id": "http://inessential.com/2017/04/05/omnioutliner_5_0_for_mac",
-      "url": "http://inessential.com/2017/04/05/omnioutliner_5_0_for_mac",
-      "title": "OmniOutliner 5.0 for Mac",
-      "content_html": "<p>I’ve been on the OmniOutliner team for over a year now. Though we don’t have positions like junior and senior developer, I enjoy calling myself the junior developer on the Outliner team, since I’m newest.</p>\n\n<p>I may be a new developer, but I’m not a new user — I’ve been using the app since the days when OmniOutliner 3 came installed on every Mac.</p>\n\n<p>Every time I start a talk, I outline it first. I organize the work I need to do in my side-project apps in OmniOutliner. And — don’t tell the OmniFocus guys, who are literally right here — sometimes I even use it for to-do management in general. I’d be lost without a great outliner.</p>\n\n<p>Anyway… <a href=\"https://www.omnigroup.com/blog/omnioutliner-5-is-now-available\">there’s a new version: OmniOutliner 5.0</a>. It’s my first dot-oh release at Omni, and I’m proud of it and proud of the team.</p>\n\n<p>As is common with our apps, we have two levels: a regular level and a Pro level. The regular level is called “Essentials” and is just $9.99. There’s a demo so you can try it out first.</p>\n\n<p>It syncs with iOS and with other Macs, by the way. Sync is free. And of course it comes with extensive documentation, and Omni’s awesome support humans are standing by.</p>\n\n<p><a href=\"https://www.omnigroup.com/omnioutliner/\">Get it while it’s hot</a>!</p>",
-      "date_published": "2017-04-05T10:44:45-07:00"
-    },
-    {
-      "id": "http://inessential.com/2017/04/03/frontier_diary_1_vm_life",
-      "url": "http://inessential.com/2017/04/03/frontier_diary_1_vm_life",
-      "title": "Frontier Diary #1: VM Life",
-      "content_html": "<p>It’s been years since I could build the <a href=\"http://frontierkernel.org\">Frontier kernel</a> — but I finally got it building.</p>\n\n<p>It’s really a ’90s Mac app that’s been Carbonized just enough to run on MacOS, but it’s by no means modern: it uses QuickDraw and early Carbon APIs. It’s written entirely in C.</p>\n\n<p>I got it building by installing MacOS 10.6.8 Server in VMWare. Installed Xcode 3.2.6. And now, finally, I can build and run it.</p>\n\n<h4>What is Frontier?</h4>\n\n<p>Frontier — as some of you know — was a UserLand Software product in the ’90s and 2000s. I worked there for about six years.</p>\n\n<p>The app is a development environment and runtime: a persistent, hierarchical database with a scripting language and a GUI for browsing and editing the database and for writing, debugging, and running scripts.</p>\n\n<p>The <a href=\"http://scripting.com/frontier/snippets/nerdsguide.html\">Nerd’s Guide to Frontier</a> gives some idea of what it’s like, though it was written before many of the later advances.</p>\n\n<p>Maybe you’ve never heard of it. But here’s the thing: it was in Frontier that the following were either invented or popularized and fleshed-out: scripted and templated websites, weblogs, hosted weblogs, web services over http, RSS, RSS readers, and OPML. (And things I’m forgetting.)</p>\n\n<p>Those innovations were due to the person — <a href=\"http://scripting.com/\">Dave Winer</a> — and to the times, the relatively early web days. But they were also in part due to the tool: Frontier was a fantastic tool for implementing and iterating quickly.</p>\n\n<h4>The Goal</h4>\n\n<p>The high-level goal is to make that tool available again, because I think we need it.</p>\n\n<p>The plan is to turn it into a modern Mac app, a 64-bit Cocoa app, and then add new features that make sense these days. (There are so many!) But that first step is a big one.</p>\n\n<p>The first part of the first step is simple, and it’s where I am now: mass deletions of code. Every reference to THINK_C and MPWC has to go. All references to the 68K and PPC versions must go. There was a Windows port, and all that code is getting tossed. And then I’ll see the scale of what needs to be done.</p>\n\n<p>(Note: my repo is a fork, and it’s not even on the web yet. The code I’m deleting is never really gone.)</p>\n\n<p>I’m doing a blog diary on it because it helps keep me focused. Otherwise I’m jumping around on my side projects. But if I have to write about it, then I’ll stay on target.</p>",
-      "date_published": "2017-04-03T13:44:34-07:00"
-    },
-    {
-      "id": "http://inessential.com/2017/03/31/the_goal",
-      "url": "http://inessential.com/2017/03/31/the_goal",
-      "title": "The Goal",
-      "content_html": "<p>The goal isn’t specifically impeachment and conviction. It’s for Trump to leave office.</p>\n\n<p>The stretch goal is that he dies broke and in prison.</p>\n\n<p>But we could settle for him going down in history as our worst President, as the worst person ever to become President, with the name Trump held in less esteem than that of Benedict Arnold, with Trumpism — that pseudo-populist white nationalism for the benefit of the super-rich — thoroughly loathed and seen for the brutish scam that it is.</p>\n\n<p>I think there comes a point before an actual trial in the Senate where Republican leaders — in Congress, in the Cabinet, wherever — realize that Trump can no longer govern, and they tell him so and urge him to resign.</p>\n\n<p>And I think he actually does resign at that point. He’s been through bankruptcy, and he’s shown that when there’s no path to winning, he’ll take the easiest route out of the situation, the route that leaves him the most status. He doesn’t have the stick-to-it-iveness to go to trial in the Senate: he’d quit.</p>\n\n<p>I don’t know what it will take to bring Republican leaders to this point. Their ongoing cowardice is the real scandal — when faced with a threat to our democracy, they play along because they’re hoping for some goodies.</p>\n\n<p>I don’t think they get to this point unless the public gets to this point, and so I look to the approval polls. If it gets below 30%, it’s probably there because of further revelations in the Russia affair, and it’s probably at the point where even cowards feel safe in doing the right thing — even if only to save their own necks, which will need saving.</p>\n\n<p>But right now Speaker Ryan won’t even replace Devin Nunes as chair of the house intelligence committee. So there’s still a long way to go.</p>",
-      "date_published": "2017-03-31T13:47:44-07:00"
-    },
-    {
-      "id": "http://inessential.com/2017/03/25/my_cocoaconf_yosemite_2017_talk",
-      "url": "http://inessential.com/2017/03/25/my_cocoaconf_yosemite_2017_talk",
-      "title": "My CocoaConf Yosemite 2017 Talk",
-      "content_html": "<p><a href=\"http://cocoaconf.com/yosemite\">Yosemite 2017</a> was so great. It always is.</p>\n\n<p>Below is the rough draft of my first-night talk. A few notes…</p>\n\n<p>The actual spoken version is probably not even close to the text, which was written before any rehearsal, and of course it’s never my intent to memorize it exactly.</p>\n\n<p>The bit with Laura Savino was a quick three-chord rock medley. We both played acoustic guitar and sang. It went like this:</p>\n\n<p>B: Louie Louie, oh baby, we gotta go<br />\nL: Yeah yeah yeah yeah yeah<br />\nB: Louie Louie, oh baby, we gotta go<br />\nL: Yeah yeah yeah yeah yeah<br />\nB: I live on an apartment on the 99th floor of my block<br />\nL: Hang on Sloopy, Sloopy hang on<br />\nB: I look out my window imagining the world has stopped</br />\nL: Hang on Sloopy, Sloopy hang on<br />\n[Slight change of chords]<br />\nB &amp; L: Teenage wasteland, oh yeah, only teenage wasteland [repeated]</p>\n\n<p>Here’s my <a href=\"https://www.youtube.com/watch?v=RzBz7p0A3-Y\">favorite video for Brimful of Asha</a>.</p>\n\n<p>During the Squirrel Picture interlude (slide #3) I told the <a href=\"http://inessential.com/2001/06/07/2001_06_07\">Squirrel Story</a>, which wasn’t planned or recently rehearsed, but I’ve told it often enough that it didn’t really need rehearsal.</p>\n\n<p>I dedicated the performance of Hallelujah to <a href=\"https://twitter.com/dori\">Dori Smith</a>.</p>\n\n<p>The talk was meant to be about 20 minutes long. Afterward I went around the room with a microphone and each person introduced themselves. (The talk’s job is to be a first-night ice-breaker talk.)</p>\n\n<p>I spent about 10 hours on rehearsal for those 20 minutes.</p>\n\n<p>Here’s the talk:</p>\n\n<h4>Slide #1: Three Chord Rock</h4>\n\n<p>Hi. I’m Brent.</p>\n\n<p>Before I get started — seeing my friend Brad Ellis reminded me of the most rock-n-roll moment of my life. Where’s Brad? Hi Brad. Anyway — I was at a party at my friend Chris’s house, and he let me borrow his guitar and do a sing-along. I think we did White Rabbit and Me and Bobby McGee and Hotel California.</p>\n\n<p>Well, here’s the problem — I have a hard time hanging on to a guitar pick. Especially after a few beers. So at one point the pick goes flying, and I’m strumming with my fingers.</p>\n\n<p>But I had a hangnail, and it got a bit aggravated as I was strumming. At the end I noticed that there was my actual blood on the guitar. I felt bad about it, but Chris was gracious, of course, and I thought that right then: that’s rock and roll.</p>\n\n<p>You can use this as metaphor. Bleeding? Keep right on playing. Maybe you won’t even notice that you’re bleeding, at least not until you stop.</p>\n\n<p>Chris told me later that the guitar cleaned up fine, so all was well.</p>\n\n<p>Okay. On to the actual talk…</p>\n\n<p>I bet most of you have heard the phrase “three chord rock n roll.” Or have heard that “rock is so great because you only need three chords.”</p>\n\n<p>What you may not realize is that it’s even easier than that: it’s three specific chords. Always the same three chords.</p>\n\n<p>They might be in any key but they’re the first, fourth, and fifth. In the key of C, the first is C, the fourth is F, and the fifth is G. In the key of A it’s A, D, and E.</p>\n\n<p>And when a song <em>does</em> have more than those three chords, it has at least those three chords. They’re the foundation for almost all pop and rock.</p>\n\n<p>One part of music is building tension and then resolving it. I’ll demonstrate on guitar.</p>\n\n<p>[On guitar] Play the first .... and you’re fine. You’re home. Play the fourth .... and there’s a little tension. Not a ton, but some. But you want to go back to the first, to home.</p>\n\n<p>Then play the fifth ... and you have maximum tension. You definitely want to go back home to the first.</p>\n\n<p>So with those three chords you have everything you need to write a thousand songs.</p>\n\n<p>Now for a little demo, I’d like to invite Laura Savino up to help me out.</p>\n\n<p>[music]</p>\n\n<p>Thanks, Laura!</p>\n\n<p>SO LET ME MAKE TWO POINTS VERY CLEAR.</p>\n\n<ol>\n<li><p>ONE. If you’re writing apps or a website or doing a podcast or whatever — if you’re just starting out and only know the equivalent of three chords, don’t worry — you can create a masterpiece with just three chords.</p></li>\n<li><p>TWO. If you do know more than three chords, you might want to consider just using those three chords anyway. People <em>love</em> those three chords. They’re appealing. They’re accessible and intimate. They work.</p></li>\n</ol>\n\n\n<h4>Slide #2: “Brimful of Asha“ by Cornershop, Asha Bhosle, and You</h4>\n\n<p>One of my personal favorite three-chord-rock songs came out in the mid-90s. Brimful of Asha by Cornershop.</p>\n\n<p>Who here knows this song?</p>\n\n<p>Let me explain what it’s about:</p>\n\n<p>Asha Bhosle sang songs for Bollywood musicals. The actresses would lip-sync, but it was her singing. She did this for over a thousand movies. Over 12,000 songs.</p>\n\n<p>Some of those songs would be released as singles. Years ago a single would come out on vinyl, as a 45. A 45 is smaller than a regular album, and it has one song on each side. The number 45 means 45 revolutions-per-minute — you’d have to set your turntable to 45 instead of the usual 33 1/3. So: a 45 is a single.</p>\n\n<p>So here’s a little bit from the song:</p>\n\n<p>[There’s dancing, behind movie screens…]</p>\n\n<p>I <em>love</em> that image. That Asha is not just singing but <em>dancing</em> as she’s singing. We never see her dancing, but that joy and engagement shows up in her performance.</p>\n\n<p>And so this song is about hope. It’s about how a song can bring some consolation and hope when people need it.</p>\n\n<p>And her name Asha actually <em>means</em> hope. Brimful of Asha — brimful of hope.</p>\n\n<p>HERE’S MY POINT.</p>\n\n<p>We're in the same business. People form an emotional connection to whatever we’re making. The things we make can bring hope to other people. Knowing that, it’s our job to be as engaged and joyful as she is as we make our things. Maybe we’re not literally dancing, but it should be the metaphorical equivalent.</p>\n\n<h4>Slide #3: Squirrel Picture</h4>\n\n<p>Squirrel!</p>\n\n<p>When I was a kid we went to a Methodist church. I haven’t been to church hardly at all since I was a kid, but I remember one cool thing from church services: the minister would pause and ask people to shake hands with the people around them.</p>\n\n<p>So here are the rules. Tell people to have a good conference, and shake hands with at least one person from another table. Stand up!</p>\n\n<h4>Slide #4: “Hallelujah” by Leonard Cohen, with Singing by James Dempsey</h4>\n\n<p>A few weeks ago I found myself in a hotel bar with a bunch of other nerds. I also found a piano. If there’s a piano, I’m going to play it. So I talked a few people — James Dempsey, Jean McDonald, Curt Clifton, and Jim Correia, into singing some songs.</p>\n\n<p>I forget who suggested Hallelujah. Might have been James. I didn’t know it very well, but I did my best. James sang, and he was awesome.</p>\n\n<p>So when I was thinking about this talk, I was thinking of doing the most beautiful possible thing I could do. So I remembered James singing this song.</p>\n\n<p>I may not be religious, but I think it’s plain that there is awesome magnificence greater than anything any human could make. It’s right outside.</p>\n\n<p>I’m not sure bears feel humble at the sight of these mountains; I’m not sure birds are awed at the vistas they fly over.</p>\n\n<p>But we do. Humans do. And knowing that we can’t measure up, it doesn’t stop us. Intead, we’re <em>inspired</em>.</p>\n\n<p>So here’s what I love about Hallelujah. It’s about trying and failing, and loving and losing — and singing Hallelujah anyway. In Cohen’s words, it may be a broken Hallelujah, but it’s still on our lips.</p>\n\n<p>James Dempsey please report to the stage.</p>\n\n<p>Everybody is encouraged to sing along. Especially to the chorus.</p>\n\n<p>[Hallelujah]</p>\n\n<h4>Slide #5: Picture of my cat Papa</h4>\n\n<p>I’m going to go around the room and have everyone introduce themselves. RULE: if anyone can’t hear, yell out.</p>",
-      "date_published": "2017-03-25T11:55:21-07:00"
-    },
-    {
-      "id": "http://inessential.com/2017/03/07/thems_thats_got_shall_get",
-      "url": "http://inessential.com/2017/03/07/thems_thats_got_shall_get",
-      "title": "Them That’s Got Shall Get",
-      "content_html": "<p>I try — earnestly, with good faith — to understand the Republican ideologies.</p>\n\n<p>And I think I’ve figured out one of them: they want to make life harder for poor people so that they have more incentive to become rich, and they want to make life better for rich people to reward success, since it <em>should</em> be rewarded, and since doing so provides even more incentive for poor people to become rich.</p>\n\n<p>If you look at it just the right way, you can see it’s not entirely wrong. If the government made material life pretty sweet for everybody, then some people wouldn’t bother to work to earn a living. <em>I</em> wouldn’t bother — I’d just make software and give it away for free.</p>\n\n<p>If the government made life semi-sweet — well, anybody who wants the full sweet would want a job. But some people would be fine with semi-sweet, and they wouldn’t work.</p>\n\n<p>I think that’s where Republicans stand: they think the government has made life semi-sweet, enough so that a bunch of people just <em>take</em> and don’t work. Republicans think: we need to give them an incentive to work.</p>\n\n<p>This explains the health care bill: it takes from the poor, who need incentives to work, and gives to the wealthy, who need rewards for their success. (So the Republicans think.)</p>\n\n<p style=\"text-align:center\">* * *</p>\n\n\n<p>It’s as if the Republicans have no realistic conception of what it’s like to be poor. The choice isn’t <a href=\"http://www.politico.com/story/2017/03/jason-chaffetz-new-gop-health-care-plan-235762\">between health care and an iPhone</a>, as one Republican suggested — it’s between food and rent, or worse, and forget health care and iPhones entirely.</p>\n\n<p>I was “poor” in my very early 20s. I put that in quotes because I was never in danger of starving or becoming homeless — my parents would have helped me. (They did plenty, in fact.)</p>\n\n<p>But still, even this small experience gives me some insight. I remember buying generic macaroni and cheese because I literally didn’t have enough money for Kraft. And forget hot dogs. And forget vegetables.</p>\n\n<p>I don’t mean that I had some money lying around that I’d put aside; I mean that I had a few dollars to last a week, and if I bought Kraft, which was a few dimes more, I would run out of money before the week was over.</p>\n\n<p>(My bank had a $5 minimum balance for my account. I could withdraw as little as $5 — and in those days ATMs were free — but that would have meant having more than $10 in my account to get that $5. I got so angry because I had, as I recall, $6.91 but couldn’t get at it. I remember thinking that another $5 would change my life.)</p>\n\n<p>I’m not complaining about this, or saying that I had things particularly tough. Not at all.</p>\n\n<p>I’m saying that if you take that experience, and take away any possibility of help from family, and then stretch it out for years and decades — with the inevitable issues, health and otherwise, that happen to everybody — then you have a life where getting ahead is really, really difficult. I can’t imagine; I can only try.</p>\n\n<p>But it’s no semi-sweet life. Not even close.</p>",
-      "date_published": "2017-03-07T18:29:27-08:00"
-    },
-    {
-      "id": "http://inessential.com/2017/02/23/dont_be_scared_if_you_have_to_get_an_mr",
-      "url": "http://inessential.com/2017/02/23/dont_be_scared_if_you_have_to_get_an_mr",
-      "title": "Don’t Be Scared If You Have to Get an MRI",
-      "content_html": "<p>“Totally normal,” said my neurologist of the results of the MRI on my head. No worries.</p>\n\n<p>I was afraid to get an MRI in the first place.</p>\n\n<p>I got a crown last week, and that didn’t worry me — it’s my ninth. Breathe the gas and just chill for a while. No big deal. It’s almost sad when it’s over.</p>\n\n<p>But I was afraid to get the MRI, because I’m slightly claustrophobic, and all I knew was that they’d put me in a big tube and then walk away.</p>\n\n<h4>How It Went</h4>\n\n<p>I didn’t have any dietary restrictions in advance. They didn’t inject me with anything. I was told to wear comfortable clothes with no metal — so I wore sweatpants, a T-shirt, and a sweatshirt. I was able to leave my rings (gold, two small diamonds) on.</p>\n\n<p>Beforehand I did a three-sixty in front of a ferrous metal detector. Then I was led through the doors with the giant warnings about extremely powerful magnets.</p>\n\n<p>I put in earplugs that the technician gave me, and then put on headphones. He asked me what music I’d like, and I replied, “80s. Bowie.” I lied down on the thing. There was a firm but not painful thing to hold my head still and give it something to rest on. Under the lower half of my legs was a foam thing that kept them elevated a little. It was comfortable.</p>\n\n<p>He told me it would take about 20 minutes. He also gave me a bulb to hold onto and to squeeze as an alert, and he said they could pause the tests if needed.</p>\n\n<p>Then he slid me in. The tube was more narrow than I expected. And for the first couple seconds I did feel panic rising a little bit, and I thought about squeezing the bulb — but I didn’t. I oriented myself and took some deep breaths.</p>\n\n<p>I was staring up at the top of the tube (I was on my back), but there was this mirror contraption (two mirrors? hard to tell) that I was looking at, and so I was looking out through the end of the tube. What I was actually seeing was a nice, calm painting on the wall — a river and some trees — and I could see the length of my body and my feet, which were free of the tube. I told myself I could scramble out on my own if I had to.</p>\n\n<p>The music started with a Bowie song — “<a href=\"https://www.youtube.com/watch?v=v--IqqusnNQ\">Life on Mars</a>.” Later there were songs by Talking Heads and similar bands. It was good to have music because I could note the passage of time that way. (I guess I was listening to a Pandora station or something similar.)</p>\n\n<p>The machine was noisy, but I had plenty enough ear protection, and the different scans had different patterns. One scan near the end included a bit of vibration. The technician talked to me through the headphones a couple times to let me know how much time was remaining. I just kept my eyes on that painting the whole time.</p>\n\n<p>I had no trouble being still, except when I had to swallow. I just did. It was otherwise comfortable. And I could have gone another 20 minutes, easy.</p>\n\n<p style=\"text-align:center\">* * *</p>\n\n\n<p>Of course, I’m lucky. I have very good insurance through Omni, and it paid for this. And, even luckier, the results were totally normal.</p>\n\n<p>Hear that, world? The inside of my head is totally normal. I don’t mind feeling good about some good news for a change.</p>\n\n<p><i>Update 4:15 pm</i>: I’ve heard that not all MRIs are so nice. They might not have the mirrors and the music. In that case, well, I’m sorry. Just remember that they won’t forget you’re in there, and they’ll let you out at the end. Stay cool.</p>",
-      "date_published": "2017-02-23T13:37:39-08:00"
-    },
-    {
-      "id": "http://inessential.com/2017/02/22/omnioutliner_essentials",
-      "url": "http://inessential.com/2017/02/22/omnioutliner_essentials",
-      "title": "OmniOutliner Essentials",
-      "content_html": "<p>Omni <a href=\"https://www.omnigroup.com/blog/introducing-omnioutliner-essentials-an-outliner-for-everyone\">introduces OmniOutliner Essentials</a>:</p>\n\n<blockquote><p>We didn’t want to just reach out to our existing audience; we wanted to introduce the joys and benefits of outlining to a much larger audience. We decided that meant two things: we needed to make the app much simpler, and we needed to make it much more affordable.</p></blockquote>\n\n<p>It’s in public preview now. <a href=\"https://www.omnigroup.com/omnioutliner/preview/essentials\">You can check it out</a>.</p>\n\n<p>I’ve been the junior developer on the OmniOutliner team for a couple years, and it’s a joy to work on an app that I’ve loved for years as a user. We’re not finished yet with this release, but I’m very happy with how it’s turning out.</p>\n\n<p>PS I like that Ken mentions MORE in the blog post:</p>\n\n<blockquote><p>We shipped the first beta of OmniOutliner while Mac OS X was still in beta, and doing so introduced us to a passionate community of outliners who had been using great outlining tools like <a href=\"https://en.wikipedia.org/wiki/MORE_(application)\">MORE</a> for over a decade.</p></blockquote>\n\n<p>MORE was by Living Videotext, which was <a href=\"http://scripting.com\">Dave Winer</a>’s company. Later I went to work at Dave’s company UserLand Software, which also included an outliner in its app Frontier, which I worked on. So there is a sort-of family tree connection from OmniOutliner back to MORE.</p>",
-      "date_published": "2017-02-22T10:17:48-08:00"
-    }
-  ]
-}
----
-feed.format:     json
-feed.title:      inessential.com
-feed.url:        http://inessential.com/
-feed.feed_url:   http://inessential.com/feed.json
-feed.summary:    Brent Simmons’s weblog.
-feed.authors[0].name:   Brent Simmons
-feed.authors[0].url:    http://inessential.com/
-feed.authors[0].avatar: http://ranchero.com/downloads/brent_avatar.png
-feed.items[0].url:       http://inessential.com/2017/05/17/json_feed
-feed.items[0].id:        http://inessential.com/2017/05/17/json_feed
-feed.items[0].title:     JSON Feed
-feed.items[0].published:       DateTime.new( 2017, 5, 17, 13, 22, 14, '-7').utc
-feed.items[0].published_local: DateTime.new( 2017, 5, 17, 13, 22, 14, '-7')
-feed.items[1].url:       http://inessential.com/2017/05/01/frontier_diary_8_when_worlds_collide
-feed.items[1].id:        http://inessential.com/2017/05/01/frontier_diary_8_when_worlds_collide
-feed.items[1].title:     Frontier Diary #8: When Worlds Collide
-feed.items[1].published:       DateTime.new( 2017, 5, 1, 13, 34, 23, '-7').utc
-feed.items[1].published_local: DateTime.new( 2017, 5, 1, 13, 34, 23, '-7')

data/test/feeds/intertwingly.atom DELETED

@@ -1,1197 +0,0 @@
-<?xml version="1.0" encoding="utf-8"?>
-<feed xmlns="http://www.w3.org/2005/Atom"
-  xmlns:thr="http://purl.org/syndication/thread/1.0">
-  <link rel="self" href="http://intertwingly.net/blog/index.atom"/>
-  <link rel="hub" href="http://pubsubhubbub.appspot.com/"/>
-  <id>http://intertwingly.net/blog/index.atom</id>
-  <icon>../favicon.ico</icon>
-  <title>Sam Ruby</title>
-  <subtitle>It’s just data</subtitle>
-  <author>
-    <name>Sam Ruby</name>
-    <email>rubys@intertwingly.net</email>
-    <uri>/blog/</uri>
-  </author>
-  <updated>2017-05-26T03:36:44-07:00</updated>
-  <link href="/blog/"/>
-  <link rel="license" href="http://creativecommons.org/licenses/BSD/"/>
-  <entry>
-    <id>tag:intertwingly.net,2004:3356</id>
-    <link href="/blog/2017/04/07/Badges-We-dont-need-no-stinkin-badges"/>
-    <link rel="replies" href="3356.atom" thr:count="3" thr:updated="2017-05-25T05:04:09-07:00"/>
-    <title>Badges? We don't need no stinkin' badges!</title>
-    <summary type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml">I found myself included in an IBM Resource Action ("RA").  I’m fine, nothing has changed.  I’m already working with a non-profit, namely the <a href="https://www.apache.org/">Apache Software Foundation</a>, and find my work there to be very rewarding.</div></summary>
-    <content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><svg style="float:right" xmlns="http://www.w3.org/2000/svg" width="158" height="61" viewBox="0 0 158 61">
-  <path d="M0,0v5h31v-5M35,0v5h45c0,0-4-4-9-5M88,0v5h27l-2-5M133,0l-2,5h27v-5M0,8v5h31v-5M35,8v5h49c0,0,0-3-2-5M88,8v5h30l-2-5M130,8l-2,5h30v-5M9,16v5h13v-5M44,16v5h13v-5M70,16v5h13s1,-2,1,-5M96,16v5h25l-2-5M127,16l-2,5h24v-5M9,24v5h13v-5M44,24v5h34s3-3,4-5M96,24v5h14v-3l1,3h24l1-3v3h13v-5h-25l-1,3l-1-3M9,32v5h13v-5M44,32v5h39s-2-4-4-5M96,32v5h14v-5M112,32l2,5h18l2-5M136,32v5h13v-5M9,40v5h13v-5M44,40v5h13v-5M70,40v5h15s0-3-1-5M96,40v5h14v-5M115,40l2,5h12l2-5M136,40v5h13v-5M0,48v5h31v-5M35,48v5h47s1-0,2.5-5M88,48v5h22v-5M118,48l2,5h6l2-5M136,48v5h22v-5M0,56v5h31v-5M35,56v5h38s4-1,7-5M88,56v5h22v-5M121,56l2,5l2-5M136,56v5h22v-5" fill="#1f70c1"></path>
-</svg>
-<p>I’ve worked from home since the late 90s.  When IBM made me go in a few years back to replace my badge, I joked that the next time I would need it was when it was time for me to turned it in.</p>
-<p>Well, I was close.  I used it for the first time yesterday to go to a seminar describing what options are available to those like me who are part of an IBM Resource Action ("RA").  Which is IBM’s way of saying that my job no longer exists, and I have until June 29th to find another job within IBM or I will be offered a modest severance package, and can pick from an array of options varying from helping me find a new job, connecting me with a non-profit organization, and retraining.</p>
-<p>TL;DR: I’m fine, nothing has changed.  I’m already working with a non-profit, namely the <a href="https://www.apache.org/">Apache Software Foundation</a>, and find my work there to be very rewarding.</p>
-<p>And, by the way, the key advice from the seminar is to network. That happens to be something that I’m fairly good at.</p>
-<p>In fact, now that I’ve told my family, my book editor, many people within IBM, and several hundred of my closest friends at the ASF — many of which want to spread the word and help me out — the inescapable conclusion is that I can’t tell all of these people without the word getting out.  So I might as well do it myself, in order to ensure that everybody gets the correct message.</p>
-<p>For starters, the most likely outcome is that I’m going to simply retire.  My wife and I have planned for this for several years. This may be the nudge that was needed to make it happen.  And like many retirees, I will donate my time to work for a non-profit. I’m just ahead of the curve as I am already doing that.</p>
-<p>The second most likely outcome is that I will find an equivalent job within IBM.  By equivalent, I mean an opportunity that lets me work full time on open source and open standards in general; and in particular lets me devote the time I feel necessary to the role of ASF President.  I would need to feel comfortable about that before accepting, as retiring later would mean that I would have lost the opportunity for the severance package.  The good news for those who are predisposed to root for this option is that that job has already been identified, and the management team there is working through what it takes to make it happen.  There is no guarantee that they will get HR approval, however, which is why this is listed as the second most likely outcome rather than the first.</p>
-<p>And finally, the third most likely outcome is that I take a job outside of IBM.  I have a number of people saying that they will shop my résumé around.  Based on these requests, I have now produced <a href="https://intertwingly.net/resume.html">one</a>.  I am <b>not</b> looking for a headhunter, but if somebody feels that they have a perfect opportunity for me, I am willing to listen.</p>
-<p>Again, whatever happens, I’m fine and nothing has changed.</p></div></content>
-    <updated>2017-04-07T05:07:22-07:00</updated>
-  </entry>
-  <entry>
-    <id>tag:intertwingly.net,2004:3355</id>
-    <link href="/blog/2016/07/11/Service-Workers-First-Impressions"/>
-    <link rel="replies" href="3355.atom" thr:count="8" thr:updated="2017-05-24T09:56:16-07:00"/>
-    <title>Service Workers - First Impressions</title>
-    <summary type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml">Cache <code>put</code> and <code>match</code> worked right
-the first time; cache <code>keys</code> not so much.  Authentication is a mystery.  Outline of future plans, and a call for help.</div></summary>
-    <content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><svg style="float:right" xmlns="http://www.w3.org/2000/svg" width="100" height="100" viewBox="0 0 100 100">
-<metadata>
-Created by potrace 1.13, written by Peter Selinger 2001-2015
-</metadata>
-<g transform="translate(-5,100) scale(0.035,-0.035)" fill="#000" stroke="none">
-<path d="M1041 2503 c-32 -92 -63 -139 -111 -163 -61 -31 -120 -26 -213 17
-l-78 37 -79 -79 c-44 -44 -80 -85 -80 -91 0 -7 16 -45 35 -85 40 -84 44 -133
-16 -195 -25 -54 -57 -77 -152 -111 l-79 -28 0 -120 0 -120 79 -28 c95 -34 127
--57 152 -111 28 -62 24 -111 -16 -195 -19 -40 -35 -78 -35 -85 0 -6 36 -47 80
--91 l79 -79 78 37 c93 43 152 48 213 17 48 -24 79 -71 111 -163 l23 -68 119 3
-119 3 28 78 c59 166 152 205 313 130 l78 -37 85 86 86 85 -35 69 c-44 87 -51
-122 -37 174 18 68 68 114 160 146 l80 27 0 122 0 122 -80 27 c-92 32 -142 78
--160 146 -14 52 -7 87 37 174 l35 69 -86 85 -85 86 -78 -37 c-161 -75 -254
--36 -313 130 l-28 78 -119 3 -119 3 -23 -68z m286 -533 c63 -31 112 -80 149
--150 27 -50 27 -220 0 -270 -38 -71 -86 -119 -153 -152 -132 -65 -274 -38
--375 70 -113 121 -116 309 -5 429 106 115 245 141 384 73z"></path>
-<path d="M2306 1346 c-13 -30 -33 -59 -44 -65 -31 -17 -78 -13 -121 8 l-38 20
--43 -44 -43 -44 16 -43 c35 -91 12 -144 -72 -169 l-42 -12 3 -65 3 -65 40 -14
-c60 -20 88 -57 82 -107 -2 -23 -10 -56 -18 -73 -13 -32 -12 -34 26 -73 21 -22
-44 -40 51 -40 6 0 31 9 55 19 74 32 124 8 152 -74 l14 -40 61 -3 62 -3 25 56
-c19 44 32 59 59 70 32 14 40 13 92 -6 l56 -21 43 42 44 43 -20 38 c-40 79 -18
-138 62 169 l44 18 3 64 3 65 -42 12 c-85 26 -111 89 -70 170 l20 38 -47 46
--47 46 -35 -19 c-79 -43 -145 -16 -171 69 l-12 41 -64 0 -63 0 -24 -54z m161
--262 c86 -41 119 -153 69 -238 -60 -103 -193 -114 -273 -23 -120 137 38 339
-204 261z"></path>
-<path d="M1586 820 c-7 -40 -31 -48 -60 -21 -27 25 -34 26 -64 3 -19 -15 -21
--22 -13 -43 14 -38 0 -53 -43 -46 -33 5 -38 3 -52 -25 -15 -28 -14 -31 6 -54
-28 -29 20 -52 -20 -61 -28 -5 -31 -9 -28 -41 3 -31 7 -37 36 -44 38 -10 42
--32 11 -64 -19 -21 -20 -24 -5 -52 15 -28 18 -30 48 -21 42 12 62 -9 47 -49
--10 -26 -8 -29 34 -53 15 -8 24 -6 41 10 31 29 45 26 62 -14 12 -29 19 -35 44
--35 25 0 32 6 44 35 17 40 31 43 62 14 21 -19 24 -20 52 -5 27 14 30 19 25 54
--6 46 3 55 46 42 29 -8 32 -6 47 22 15 29 14 31 -7 54 -27 29 -18 55 22 64 19
-5 25 13 27 42 3 32 1 36 -28 41 -41 9 -48 32 -19 63 22 24 22 26 7 52 -15 24
--22 27 -58 24 l-42 -4 4 42 c3 36 0 43 -24 58 -26 15 -28 15 -52 -7 -31 -29
--54 -22 -62 19 -6 27 -10 30 -44 30 -34 0 -38 -3 -44 -30z m91 -175 c12 -3 34
--20 48 -36 30 -36 35 -111 9 -147 -49 -71 -170 -67 -209 8 -36 70 -3 159 68
-179 32 9 41 8 84 -4z"></path>
-</g>
-</svg>
-<p>
-  Successes, progress, and stumbling blocks encountered while exploring
-  Service Workers.
-</p>
-<h3 id="preface">Preface</h3>
-<p>
-  The <a href="https://github.com/apache/whimsy/tree/master/www/board/agenda#readme">Apache
-  Whimsy Board Agenda tool</a> is designed to make ASF Board meetings run
-  more smoothly.  It does this by downloading all of the provided reports and
-  collating them with comments, prior comments, action items, minutes, links to
-  prior reports, links to committee information, and the like.  It provides a UI
-  to allow Directors and guests to enter comments.  It provides a UI to allow
-  the Secretary to take minutes.
-</p>
-<p>
-  The tool itself is built using
-  <a href="https://facebook.github.io/react/">React.JS</a>.  It starts by
-  downloading all of the reports.  Navigation between reports can be done via
-  mouse clicks or cursor keys and doesn't involve any server interaction.  As
-  new data is received, the window is updated.
-</p>
-<p>
-  Finally, I'm new to Service Workers so I may be doing things in a profoundly
-  dumb way.  Any pointers would be appreciated.  I am capable of RTFM and
-  following examples.
-</p>
-<h3 id="caching-json">First step - caching JSON</h3>
-<p>
-  Some of the data (e.g., the list of ASF JIRA projects) is fetched on demand.
-  Generally the page is first rendered using an empty list, and then updated
-  once the actual list is received.
-</p>
-<p>
-  This process could be improved by caching the results received and using that
-  data until fresh data arrives.  As the Cache API is built on promises, and
-  therefore asynchronous, this generally means rendering three times: once with
-  a empty list, then with the cache, and finally with live data.
-</p>
-<pre><span style="background-color:hsla(300,100%,50%,0.06)"><span style="color:#404">/</span><span style="color:#404">/</span></span> retrieve an cached object.  Note: block may be dispatched twice,
-<span style="background-color:hsla(300,100%,50%,0.06)"><span style="color:#404">/</span><span style="color:#404">/</span></span> once with slightly stale data <span style="color:#080;font-weight:bold">and</span> once with current data
-<span style="background-color:hsla(300,100%,50%,0.06)"><span style="color:#404">/</span><span style="color:#404">/</span></span>
-<span style="background-color:hsla(300,100%,50%,0.06)"><span style="color:#404">/</span><span style="color:#404">/</span></span> <span style="color:#606">Note</span>: caches only work currently on <span style="color:#036;font-weight:bold">Firefox</span> <span style="color:#080;font-weight:bold">and</span> <span style="color:#036;font-weight:bold">Chrome</span>.  All
-<span style="background-color:hsla(300,100%,50%,0.06)"><span style="color:#404">/</span><span style="color:#404">/</span></span> other browsers fall back to <span style="color:#036;font-weight:bold">XMLHttpRequest</span> (<span style="color:#036;font-weight:bold">AJAX</span>).
-JSONStorage.fetch = function(name, block) {
-  <span style="color:#080;font-weight:bold">if</span> (typeof fetch !== <span style="background-color:hsla(0,100%,50%,0.05)"><span style="color:#710">'</span><span style="color:#D20">undefined</span><span style="color:#710">'</span></span> &amp;&amp; typeof caches !== <span style="background-color:hsla(0,100%,50%,0.05)"><span style="color:#710">'</span><span style="color:#D20">undefined</span><span style="color:#710">'</span></span> &amp;&amp;
-     (location.protocol == <span style="background-color:hsla(0,100%,50%,0.05)"><span style="color:#710">"</span><span style="color:#D20">https:</span><span style="color:#710">"</span></span> || location.hostname == <span style="background-color:hsla(0,100%,50%,0.05)"><span style="color:#710">"</span><span style="color:#D20">localhost</span><span style="color:#710">"</span></span>)) {
-    caches.open(<span style="background-color:hsla(0,100%,50%,0.05)"><span style="color:#710">"</span><span style="color:#D20">board/agenda</span><span style="color:#710">"</span></span>).then(function(cache) {
-      var fetched = null;
-      clock_counter++;
-      <span style="background-color:hsla(300,100%,50%,0.06)"><span style="color:#404">/</span><span style="color:#404">/</span></span> construct arguments to fetch
-      var args = {
-        <span style="color:#606">method</span>: <span style="background-color:hsla(0,100%,50%,0.05)"><span style="color:#710">"</span><span style="color:#D20">get</span><span style="color:#710">"</span></span>,
-        <span style="color:#606">credentials</span>: <span style="background-color:hsla(0,100%,50%,0.05)"><span style="color:#710">"</span><span style="color:#D20">include</span><span style="color:#710">"</span></span>,
-        <span style="color:#606">headers</span>: {<span style="color:#606">Accept</span>: <span style="background-color:hsla(0,100%,50%,0.05)"><span style="color:#710">"</span><span style="color:#D20">application/json</span><span style="color:#710">"</span></span>}
-      };
-      <span style="background-color:hsla(300,100%,50%,0.06)"><span style="color:#404">/</span><span style="color:#404">/</span></span> dispatch request
-      fetch(<span style="background-color:hsla(0,100%,50%,0.05)"><span style="color:#710">"</span><span style="color:#D20">../json/</span><span style="color:#710">"</span></span> + name, args).then(function(response) {
-        cache.put(name, response.clone());
-        response.json().then(function(json) {
-          <span style="color:#080;font-weight:bold">if</span> (!fetched || <span style="color:#036;font-weight:bold">JSON</span>.stringify(fetched) != <span style="color:#036;font-weight:bold">JSON</span>.stringify(json)) {
-            <span style="color:#080;font-weight:bold">if</span> (!fetched) clock_counter--;
-            fetched = json;
-            <span style="color:#080;font-weight:bold">if</span> (json) block(json);
-            <span style="color:#036;font-weight:bold">Main</span>.refresh()
-          }
-        })
-      });
-      <span style="background-color:hsla(300,100%,50%,0.06)"><span style="color:#404">/</span><span style="color:#404">/</span></span> check cache
-      cache.match(name).then(function(response) {
-        <span style="color:#080;font-weight:bold">if</span> (response &amp;&amp; !fetched) {
-          response.json().then(function(json) {
-            clock_counter--;
-            fetched = json;
-            <span style="color:#080;font-weight:bold">if</span> (json) block(json);
-            <span style="color:#036;font-weight:bold">Main</span>.refresh()
-          })
-        }
-      })
-    })
-  } <span style="color:#080;font-weight:bold">else</span> <span style="color:#080;font-weight:bold">if</span> (typeof <span style="color:#036;font-weight:bold">XMLHttpRequest</span> !== <span style="background-color:hsla(0,100%,50%,0.05)"><span style="color:#710">'</span><span style="color:#D20">undefined</span><span style="color:#710">'</span></span>) {
-    <span style="background-color:hsla(300,100%,50%,0.06)"><span style="color:#404">/</span><span style="color:#404">/</span></span> retrieve from the network only
-    retrieve(name, <span style="background-color:hsla(0,100%,50%,0.05)"><span style="color:#710">"</span><span style="color:#D20">json</span><span style="color:#710">"</span></span>, function(item) {<span style="color:#080;font-weight:bold">return</span> item.block})
-  }
-}</pre>
-<p>
-  All in all remarkably painless and completely transparent to the calling
-  application.  Doesn't involve the activation of Service Workers, but it
-  doesn't have to.
-</p>
-<h3 id="caching-html">Second step - caching HTML</h3>
-<p>
-  What's true for JSON should also be true for HTML.  Prior to the caching
-  logic introduced above, and continuing to be true for browsers that don't
-  support the service workers caching interface, data that should appear on the
-  page would be missing briefly and show up a second or two later.  In the case
-  of HTML, that data would be the entire page.  This would typically be seen
-  both on the initial page load as well as any time a link is opened in a new
-  tab.
-</p>
-<p>
-  The HTML case is both simpler and more difficult.  Fetching the HTML from
-  cache and then replacing it wholesale from the network, while possible, would
-  be jarring.  Fortunately, there already is logic in place to update the
-  content of the pages based on updates received by XHR.  So initially
-  displaying where the user last left off, as well as updating the cache,
-  is sufficient.
-</p>
-<p>
-  Unfortunately, it isn't quite so simple.  I've included the current code below
-  complete with log statements and dead ends.
-</p>
-<pre><span style="background-color:hsla(300,100%,50%,0.06)"><span style="color:#404">/</span><span style="color:#404">/</span></span> simple hashcode to prevent authorization from leaking
-var hashcode = function(s) {
-  <span style="color:#080;font-weight:bold">return</span> s &amp;&amp; s.split(<span style="background-color:hsla(0,100%,50%,0.05)"><span style="color:#710">"</span><span style="color:#710">"</span></span>).reduce(
-    function(a, b) {
-      <span style="color:#080;font-weight:bold">return</span> ((a &lt;&lt; <span style="color:#00D">5</span>) - a) + b.charCodeAt(<span style="color:#00D">0</span>)
-    },
-    <span style="color:#00D">0</span>
-  )
-};
-var status = {<span style="color:#606">auth</span>: null};
-this.addEventListener(<span style="background-color:hsla(0,100%,50%,0.05)"><span style="color:#710">"</span><span style="color:#D20">fetch</span><span style="color:#710">"</span></span>, function(event) {
-  var scope = this.registration.scope;
-  var url = event.request.url;
-  var path = url.slice(scope.length);
-  var auth = hashcode(event.request.headers.get(<span style="background-color:hsla(0,100%,50%,0.05)"><span style="color:#710">"</span><span style="color:#D20">Authorization</span><span style="color:#710">"</span></span>));
-  <span style="color:#080;font-weight:bold">if</span> (<span style="background-color:hsla(300,100%,50%,0.06)"><span style="color:#404">/</span><span style="color:#808">^</span><span style="color:#D20">\d</span><span style="color:#D20">\d</span><span style="color:#D20">\d</span><span style="color:#D20">\d</span><span style="color:#808">-</span><span style="color:#D20">\d</span><span style="color:#D20">\d</span><span style="color:#808">-</span><span style="color:#D20">\d</span><span style="color:#D20">\d</span><span style="color:#404">/</span></span>/.test(path) &amp;&amp; event.request.method == <span style="background-color:hsla(0,100%,50%,0.05)"><span style="color:#710">"</span><span style="color:#D20">GET</span><span style="color:#710">"</span></span>) {
-    console.log(<span style="background-color:hsla(0,100%,50%,0.05)"><span style="color:#710">"</span><span style="color:#D20">Handling fetch event for</span><span style="color:#710">"</span></span>, event.request.url);
-    event.respondWith(caches.open(<span style="background-color:hsla(0,100%,50%,0.05)"><span style="color:#710">"</span><span style="color:#D20">board/agenda</span><span style="color:#710">"</span></span>).then(function(cache) {
-      <span style="color:#080;font-weight:bold">return</span> cache.match(path).then(function(cached) {
-        <span style="color:#080;font-weight:bold">if</span> (cached) console.log(<span style="background-color:hsla(0,100%,50%,0.05)"><span style="color:#710">"</span><span style="color:#D20">matched</span><span style="color:#710">"</span></span>);
-        console.log(<span style="background-color:hsla(0,100%,50%,0.05)"><span style="color:#710">"</span><span style="color:#D20">auth</span><span style="color:#710">"</span></span>, auth, status.auth);
-        <span style="color:#080;font-weight:bold">if</span> (!auth || auth != status.auth) {
-          <span style="background-color:hsla(300,100%,50%,0.06)"><span style="color:#404">/</span><span style="color:#404">/</span></span> the following doesn't work
-          cached = new Response(<span style="background-color:hsla(0,100%,50%,0.05)"><span style="color:#710">"</span><span style="color:#D20">Unauthorized</span><span style="color:#710">"</span></span>, {
-            <span style="color:#606">status</span>: <span style="color:#00D">401</span>,
-            <span style="color:#606">statusText</span>: <span style="background-color:hsla(0,100%,50%,0.05)"><span style="color:#710">"</span><span style="color:#D20">Unauthorized</span><span style="color:#710">"</span></span>,
-            <span style="color:#606">headers</span>: {<span style="color:#606"><span style="color:#404">"</span><span>WWW-Authenticate</span><span style="color:#404">"</span></span>: <span style="background-color:hsla(0,100%,50%,0.05)"><span style="color:#710">"</span><span style="color:#D20">Basic realm=</span><span style="color:#710">"</span></span><span style="color:#036;font-weight:bold">ASF</span> <span style="color:#036;font-weight:bold">Members</span> <span style="color:#080;font-weight:bold">and</span> <span style="color:#036;font-weight:bold">Officers</span><span style="background-color:hsla(0,100%,50%,0.05)"><span style="color:#710">"</span><span style="color:#710">"</span></span>}
-          });
-          <span style="background-color:hsla(300,100%,50%,0.06)"><span style="color:#404">/</span><span style="color:#404">/</span></span> <span style="color:#606">fallback</span>: ignore cache <span style="color:#080;font-weight:bold">unless</span> authorized
-          cached = null
-        };
-        <span style="color:#080;font-weight:bold">if</span> (cached) console.log(<span style="background-color:hsla(0,100%,50%,0.05)"><span style="color:#710">"</span><span style="color:#D20">serving from cache</span><span style="color:#710">"</span></span>);
-        var network = fetch(event.request).then(function(response) {
-          <span style="color:#080;font-weight:bold">if</span> (!cached) console.log(<span style="background-color:hsla(0,100%,50%,0.05)"><span style="color:#710">"</span><span style="color:#D20">fetching from network</span><span style="color:#710">"</span></span>);
-          <span style="color:#080;font-weight:bold">if</span> (cached) console.log(<span style="background-color:hsla(0,100%,50%,0.05)"><span style="color:#710">"</span><span style="color:#D20">updating cache</span><span style="color:#710">"</span></span>);
-          console.log(response);
-          <span style="color:#080;font-weight:bold">if</span> (response.ok) cache.put(path, response.clone());
-          status.auth = auth;
-          <span style="color:#080;font-weight:bold">return</span> response
-        });
-        <span style="color:#080;font-weight:bold">return</span> cached || network
-      })
-    }))
-  } <span style="color:#080;font-weight:bold">else</span> <span style="color:#080;font-weight:bold">if</span> (auth) {
-    <span style="background-color:hsla(300,100%,50%,0.06)"><span style="color:#404">/</span><span style="color:#404">/</span></span> capture authorization from other pages, <span style="color:#080;font-weight:bold">if</span> provided
-    status.auth = auth
-  }
-})</pre>
-<p>
-  The primary problem is that the board agenda tool requires authentication to
-  use as the data presented may contain Apache Software Foundation confidential
-  information.
-</p>
-<p>
-  Without accounting for this, what often would be placed into the cache would
-  be the HTTP <code>401</code> challenge response.  That's not desirable.
-</p>
-<p>
-  Attempting to force the return of a challenge when an Authorization header is not present results in the display of the challenge response.  Again, not what we want.
-</p>
-<p>
-  Falling back to only providing the cached data when the Authorization header
-  is present (and matches the one used for the cache) results in the cache being
-  used sometimes with Firefox.  And, unfortunately, never with Chrome.
-</p>
-<p>
-  A secondary problem, of lesser importance, is that the cache never gets
-  updated if the service worker responds with a cache copy.  Of if it does,
-  the <code>console.log</code> messages aren't getting executed or aren't
-  producing output.
-</p>
-<h3 id="monitoring">Third step - monitoring</h3>
-<p>
-  To help with debugging, it occurred to me that it would make sense to produce
-  a page that shows Service Worker and Cache status.
-</p>
-<ul>
-  <li>
-    <p>
-      For service workers, there was no problems, but the results were
-      underwhelming.  I only got information back about my service worker even
-      though I had several activated by this point by virtue of running
-      various demos.  That's not a problem, as that's all I needed.  The only
-      information I could get was the state of the service worker.  But even
-      so, I could use this as a building block to enable users to send a
-      message to the service worker and/or unregister it.  See plans below for
-      more details.
-    </p>
-  </li>
-  <li>
-    <p>
-      For caches, I simply couldn't get it to work.  For example, I tried
-      adding the following line immediate after the <code>cache.put</code>
-      line in the first code snippet:
-    </p>
-    <pre>console.log cache.keys()</pre>
-    <p>
-      The result was an empty list (<code>[]</code>) on both Firefox and
-      Chrome.  This is problematic on a number of levels, not the least of
-      which being that the interface is defined to return a promise and Arrays
-      in JavaScript don't respond to then.
-    </p>
-    <p>References:</p>
-    <ul>
-      <li>
-        <a href="https://slightlyoff.github.io/ServiceWorker/spec/service_worker/#cache-keys">Service Workers Nightly</a>
-      </li>
-      <li>
-        <a href="https://developer.mozilla.org/en-US/docs/Web/API/Cache/keys">Cache.keys() - Web APIs | MDN</a>
-      </li>
-    </ul>
-  </li>
-</ul>
-<h3 id="plans">Plans</h3>
-<p>
-  One thing I haven't explored yet is replacing the fetch call with one with
-  different values for the request mode and credentials mode.  I figured I would
-  ask for guidance before proceeding down that path.
-</p>
-<p>
-  Once caching HTML is mastered, caching related artifacts like stylesheets and
-  javascripts would be in order.  An online fallfack approach would likely be
-  the best match.
-</p>
-<p>
-  After that, the next order of business would be queuing of updates while
-  offline.  While in general, that would be a hard problem, in this case as user
-  operations are limited by role and generaally to editing their own changes,
-  it should be manageable.
-</p></div></content>
-    <updated>2016-07-11T11:27:29-07:00</updated>
-  </entry>
-  <entry>
-    <id>tag:intertwingly.net,2004:3354</id>
-    <link href="/blog/2015/09/24/FacePalm"/>
-    <link rel="replies" href="3354.atom" thr:count="2" thr:updated="2017-05-26T02:31:05-07:00"/>
-    <title>FacePalm</title>
-    <content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><svg style="float:right" xmlns="http://www.w3.org/2000/svg" width="100" height="100" viewBox="0 0 100 100">
-  <rect fill="#D22" x="0" y="3" height="95" width="95" rx="15"></rect>
-  <circle cx="18" cy="81" r="9" fill="#FFF"></circle>
-  <path d="M48,84s0-33-33-33 M75,84s0-60-60-60"
-    stroke-linecap="round" stroke-width="15" stroke="#FFF" fill="none"></path>
-</svg>
-<p><a href="https://developers.facebook.com/docs/instant-articles/automated-publishing">Automated Publishing with Instant Articles</a></p>
-<p><code>&lt;description&gt;</code> A summary of your article, in <b>plain text</b> form.</p>
-<p><code>&lt;pubDate&gt;</code> The date of the article’s publication, in <a href="http://en.wikipedia.org/wiki/ISO_8601">ISO-8601 format.</a></p>
-<p>Related: <a href="http://www.intertwingly.net/blog/2006/03/28/plaintext">plaintext</a>, <a href="http://intertwingly.net/blog/2006/05/01/May-Day">May Day</a>, <a href="http://intertwingly.net/blog/2006/06/01/June-Bug">June Bug</a>, <a href="http://intertwingly.net/blog/2006/07/14/Another-Month">Another Month</a>, and numerous others.</p></div></content>
-    <updated>2015-09-24T08:44:23-07:00</updated>
-  </entry>
-  <entry>
-    <id>tag:intertwingly.net,2004:3353</id>
-    <link href="/blog/2015/05/18/Brief-history-of-the-ASF-Board-Agenda-tool"/>
-    <link rel="replies" href="3353.atom" thr:count="6" thr:updated="2017-05-25T18:58:38-07:00"/>
-    <title>Brief history of the ASF Board Agenda tool</title>
-    <summary type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>the current implementation is a lot more fun to develop and easier to maintain than prior versions.  As an example, if it were decided that the moment the secretary clicked the ‘timestamp` button on the 'Call to order’ page, all comment buttons are to be removed from all windows and all comment modal dialogs are to be closed, this could be implemented using a single if statement as the event is already propagated, and a re-render is already triggered.  All that would be required is to change the conditions under which the comment button appears.</p>
-<p>The <a href="https://github.com/rubys/whimsy-agenda#readme">board agenda tool</a> has been tested on Linux, Mac OS/X, Vagrant, and Docker.  It contains a suite of tests.</p></div></summary>
-    <content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><svg style="float:right" xmlns='http://www.w3.org/2000/svg' width="100" height="100" viewBox="0 0 100 100">
-  <path d="M34,38a16,16,0,1,0,0,24l32-24a16,16,0,1,1,0,24M40,43l20,15"
-    stroke="#44B74A" stroke-width="4" fill="none"></path>
-</svg>
-<p>The gold standard of server side web applications is Model, View, Controller.  Early versions of this tool was not written that way: it was a CGI script that grew like a weed.  Over time, some <a href="https://jquery.com/">JQuery</a> effects were added.</p>
-<p>The first major rewrite was done using <a href="https://angularjs.org/">Angular.js</a> and <a href="http://getbootstrap.com/">Bootstrap</a>.  These frameworks enabled me to do things I had never done before.  It also required me to write code that watched for changes, and to ensure that changes were applied in place (specifically arrays and hashes could not be replaced, they had to be updated).</p>
-<p>While Angular.js used terms like Directives, Filters, and Services, the overall effect was to impose a structure on the client side application.  As with most things, this structure was both constraining and freeing.</p>
-<p>The current rewrite replaces Angular.js with <a href="https://facebook.github.io/react/">React.js</a>.  Gone is all watches and the need to update things in place.  In its place is a policy of “rerender everything” whenever an event (a keystroke, a mouse click, a server side event) occurs.  With React.JS, rerendering everything is efficient as React computes a delta and then only applies the delta to the DOM.  React.JS does provide a suggested architecture, namely Flux, that minimizes the need to rerender everything, but in practice I have not found that necessary.</p>
-<p>To illustrate, if you bring up the “Call to order” page and press and hold down the right arrow key, every page of the agenda will be flashed up and promptly replaced.</p>
-<p>The overall resulting flow is as follows: when a page is fetched the response starts out with a pre-rendered representation (simple HTML), followed by the scripts needed to produce that page, followed by the data used by those scripts.  This ensures that the data is presented promptly, then become reactive to input and events.</p>
-<p>The resulting architecture isn’t MVC on either the client or the server.  Instead, V and C get mushed together, and a unified client/server event stream is added.</p>
-<p>Events are received from the server using <a href="http://www.w3.org/TR/eventsource/">Server Sent Events</a>.  This is <a href="http://caniuse.com/#feat=eventsource">widely implemented</a>, and has a solid <a href="https://github.com/Yaffle/EventSource/">polyfill</a> for browsers (most notably, IE) that haven’t implemented this standard.  Its one way data flow is a good fit for React.js.</p>
-<p>Events are generally triggered by actions on a client browser window somewhere (typically a mouse click) resulting in a HTTP GET or POST request being sent to the server, but can also be triggered by file system changes on the server (example: a cron job does a svn update, which causes the agenda to contain new data).</p>
-<p>A single event-stream is maintained per browser, and that process is responsible for propagating updates to all tabs and windows.  Events can be sent to all clients, or only clients authenticated with a given user id.  This enables my pending updates to be immediately reflected on all of my tabs and windows but not affect others.  The result of an event is to update one or more models, and then trigger a re-render.</p>
-<p>Models on both the client and server are simple classes.  Class methods operate on the entity as a whole (example: write the whole agenda to disk on the server, or provide an index for the agenda on the client).  Instance methods refer to an individual item (example: an agenda item).</p>
-<p>What’s left is React Components on the client and actions on the server.</p>
-<p>React components have a render method.  That method has full (read-only) access to client models, and can do if statements, iterate over result, and (generally minor) computations.  More extensive computations should be refactored to other methods in the component when limited in scope to a single component, or to the client model otherwise.  The one limitation that is enforced is that render methods can not directly or indirectly change state.  A predefined <a href="https://facebook.github.io/react/docs/component-specs.html">life-cycle</a> is defined.  Other methods can be added, for example methods to handle onClick events.</p>
-<p>These methods can trigger HTTP POST and GET requests (the convenience method I provide for the latter is called fetch instead).  These run small scripts on the server that may update models, generate events, and return JSON.</p>
-<p>Taken together, the current implementation is a lot more fun to develop and easier to maintain than prior versions.  As an example, if it were decided that the moment the secretary clicked the ‘timestamp` button on the 'Call to order’ page, all comment buttons are to be removed from all windows and all comment modal dialogs are to be closed, this could be implemented using a single if statement as the event is already propagated, and a re-render is already triggered.  All that would be required is to change the conditions under which the comment button appears.</p>
-<p>The <a href="https://github.com/rubys/whimsy-agenda#readme">board agenda tool</a> has been tested on Linux, Mac OS/X, Vagrant, and Docker.  It contains a suite of tests.</p></div></content>
-    <updated>2015-05-18T09:15:15-07:00</updated>
-  </entry>
-  <entry>
-    <id>tag:intertwingly.net,2004:3352</id>
-    <link href="/blog/2015/04/02/Spartan-Test-Results"/>
-    <link rel="replies" href="3352.atom" thr:count="3" thr:updated="2017-05-24T10:09:45-07:00"/>
-    <title>Spartan Test Results</title>
-    <content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><svg style="float:right" xmlns="http://www.w3.org/2000/svg" width="100" height="100" viewBox="0 0 100 100">
-  <g stroke-width="5" stroke="#030092" fill="none">
-    <circle cx="50" cy="50" r="46"></circle>
-    <path d="M8,35h84M8,65h84"></path>
-    <ellipse cx="50" cy="50" rx="20" ry="43"></ellipse>
-  </g>
-</svg>
-<p>I replaced IE results with Spartan results in my <a href="https://url.spec.whatwg.org/interop/test-results/">urltests</a>.  Other than the user agent string, nothing changed.</p>
-<p>Following are selected examples where three out of four of the top browsers agree, identified by the odd browser out:</p>
-<ul>
-<li><a href="https://url.spec.whatwg.org/interop/test-results/53d49202f1?select=current&amp;baseline=chrome">Chrome</a></li>
-<li><a href="https://url.spec.whatwg.org/interop/test-results/15341d9fab?select=current&amp;baseline=firefox">Firefox</a></li>
-<li><a href="https://url.spec.whatwg.org/interop/test-results/d7d52bebd0?select=current&amp;baseline=safari">Safari</a></li>
-<li><a href="https://url.spec.whatwg.org/interop/test-results/8bb3c95bce?select=current&amp;baseline=spartan">Spartan</a></li>
-</ul></div></content>
-    <updated>2015-04-02T16:54:22-07:00</updated>
-  </entry>
-  <entry>
-    <id>tag:intertwingly.net,2004:3351</id>
-    <link href="/blog/2015/04/01/Ruby2JS-2-0"/>
-    <link rel="replies" href="3351.atom" thr:count="3" thr:updated="2017-05-24T03:49:35-07:00"/>
-    <title>Ruby2JS 2.0</title>
-    <content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><svg style="float:right" xmlns='http://www.w3.org/2000/svg' width="100" height="100" viewBox="0 0 100 100">
-<path d='M20,100l74-5l6-75zM61,35l37-2l-29-24z' fill='#b11'></path>
-<path d='M21,100l74-5l-47-4zM98,33c4-12,5-29-14-33l-15,9l29,24z' fill='#811'></path>
-<path d='M7,67l14,33l11-38z' fill='#d44'></path>
-<path d='M29,61l42,13l-10-42zM56,0h28l-16,10zM1,51l-1,29l7-13z' fill='#c22'></path>
-<path d='M32,61l39,13c-14,13-30,24-50,26z' fill='#a00'></path>
-<path d='M61,35l10,39l17-23zM32,61l16,30c9-5,16-11,23-17l-39-13z' fill='#900'></path>
-<path d='M61,35l27,17l10-20l-37,3z' fill='#800'></path>
-<path d='M71,74l23,21l-6-44zM0,80c1,19,15,20,21,20l-14-33l-7,13zM7,67l-2,26c4,6,9,7,15,6c-4-11-13-32-13-32zM69,9l30,4c-1-7-6-11-15-13l-15,9z' fill='#911'></path>
-<path
-d='M1,51l6,16l25-5l29-27l8-26l-13-9l-22,8c-6,7-20,19-20,19c-1,1-9,16-13,24z'
-fill='#f84'></path>
-<path d='M21,21c15-14,34-23,42-16c7,8-1,26-16,40c-14,15-33,24-41,17c-7-7,1-26,15-41z' fill='#F0DB4F'></path>
-<g transform="rotate(307,33,12),scale(0.45)">
-<path d='M26,84l8-5c1,3,3,5,6,5c3,0,5-1,5-6v-32h9v32c0,10-5,14-14,14c-7,0-11-4-14-8' id='j'></path>
-<path d='M60,83l7-5c2,3,5,6,9,6c4,0,7-2,7-5c0-3-3-4-7-6l-2-1c-7-3-12-7-12-14c0-7,6-13,14-13c6,0,10,2,13,8l-7,5c-1-3-3-4-6-4c-3,0-4,1-4,4c0,2,1,4,5,5l3,1c8,4,12,7,12,15c0,9-6,13-15,13c-9,0-15-4-17-9' id='s'></path>
-</g>
-</svg>
-<p>I’ve released <a href="https://github.com/rubys/ruby2js#readme">Ruby2JS</a> version 2.0.  Key new features:</p>
-<ul>
-<li>Line comment support.  More specifically, comments associated with statements are copied to the output.  Comments within statements are still omitted.</li>
-<li><a href="https://docs.google.com/document/d/1U1RGAehQwRypUTovF1KRlpiOFze0b-_2gc6fAH0KY0k/edit">Source Map</a> support.  This enables debugging of generated JavaScript using the Ruby source.</li>
-</ul>
-<p>The <a href="https://github.com/rubys/whimsy-agenda#readme">Whimsy Agenda</a> rewrite-in-progress (previously based on Angular.js, now being rebased on React.js) can be used to explore both of these features.</p></div></content>
-    <updated>2015-04-01T07:26:31-07:00</updated>
-  </entry>
-  <entry>
-    <id>tag:intertwingly.net,2004:3350</id>
-    <link href="/blog/2015/02/11/React-rb-updates"/>
-    <link rel="replies" href="3350.atom" thr:count="2" thr:updated="2017-05-26T03:36:17-07:00"/>
-    <title>React.rb updates</title>
-    <summary type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I’ve made a number of updates to the demos.  The <a href="http://facebook.github.io/react/docs/tutorial.html">tutorial</a> demo has been updated to do server side rendering.  This means that it is able to be used by clients which either don’t support or have turned off JavaScript.  </p>
-<p>The second demo is a calendar.  Unlike the tutorial which is a single file, this application is organized in a manner more consistent with how I expect projects to be organized.</p></div></summary>
-    <content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><svg style="float:right" xmlns="http://www.w3.org/2000/svg" width="100" height="100" viewBox="0 0 100 100">
-  <g transform="translate(50,50)">
-    <circle fill="#00D8FF" r="8"></circle>
-    <g fill="none" stroke="#00D8FF" stroke-width="4">
-      <ellipse rx="45" ry="17"></ellipse>
-      <ellipse rx="45" ry="17" transform="rotate(60)"></ellipse>
-      <ellipse rx="45" ry="17" transform="rotate(120)"></ellipse>
-    </g>
-  </g>
-</svg>
-<p>I’ve made a number of updates to the demos.  The <a href="http://facebook.github.io/react/docs/tutorial.html">tutorial</a> demo has been updated to do server side rendering.  This means that it is able to be used by clients which either don’t support or have turned off JavaScript.  To run:</p>
-<pre class="code">git clone https://github.com/rubys/ruby2js.git
-cd ruby2js/demo
-bundle update
-ruby react-tutorial.rb</pre>
-<p>Visit the URL (typically <a href="http://localhost:4567/">http://localhost:4567/</a>) and enter a comment.  Visit the same URL in a different tab or a different browser and enter another comment.  Switch back to the original browser/tab.  If you have client side JavaScript disabled, you will need to hit refresh.</p>
-<p>The second demo is a calendar.  To get started:</p>
-<pre class="code">git clone https://github.com/rubys/wunderbar.git
-cd wunderbar/demo/calendar
-bundle update
-rackup</pre>
-<p>Visit the URL (typically <a href="http://localhost:9292/">http://localhost:9292/</a>). This will take you to the current month.  Left and right arrows will take you different months (and update the URL).  Unlike the tutorial which is a single file, this application is organized in a manner more consistent with how I expect projects to be organized.</p></div></content>
-    <updated>2015-02-11T15:10:31-08:00</updated>
-  </entry>
-  <entry>
-    <id>tag:intertwingly.net,2004:3349</id>
-    <link href="/blog/2015/02/03/DSL-for-JavaScript"/>
-    <link rel="replies" href="3349.atom" thr:count="0"/>
-    <title>DSL for JavaScript</title>
-    <summary type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="https://twitter.com/jashkenas/status/562635888753377281">Jeremy Ashkenas</a>: <em>“work towards building a language that is to ES6 as CoffeeScript is to ES5”… close, but—do it for [ES6+HTML+CSS], and you’ll win ;)</em></p>
-<p>It occurs to me that there is a shortcut available.  Let a library like React replace [ES6+HTML+CSS].  Then build a <a href="http://en.wikipedia.org/wiki/Domain-specific_language">DSL</a> for that library.</p></div></summary>
-    <content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><svg style="float:right" xmlns="http://www.w3.org/2000/svg" width="100" height="100" viewBox="0 0 100 100">
-  <path d="M4,14h92" stroke="#4682b4" stroke-width="5"></path>
-  <text x="50" y="90" font-size="90" fill="#5f9ea0" font-family="serif" text-anchor="middle"><![CDATA[W]]></text>
-</svg>
-<p><a href="https://twitter.com/jashkenas/status/562635888753377281"><cite>Jeremy Ashkenas</cite></a>: <em>“work towards building a language that is to ES6 as CoffeeScript is to ES5”… close, but—do it for [ES6+HTML+CSS], and you’ll win ;)</em></p>
-<p>It occurs to me that there is a shortcut available.  Let a library like React replace [ES6+HTML+CSS].  Then build a <a href="http://en.wikipedia.org/wiki/Domain-specific_language">DSL</a> for that library.</p>
-<p>JavaScript isn’t exactly known for its ability to build DSLs.  Ruby, however, <a href="http://jroller.com/rolsen/entry/building_a_dsl_in_ruby">is</a>.  And has an excellent <a href="https://github.com/whitequark/parser">parser</a> library.  By <a href="https://github.com/rubys/ruby2js#filters">transforming</a> the <a href="http://en.wikipedia.org/wiki/Abstract_syntax_tree">AST</a>, I can convert <a href="https://svn.apache.org/repos/infra/infrastructure/trunk/projects/whimsy/www/calendar-demo/views/calendar.js.rb">calendar.js.rb</a> into <a href="http://intertwingly.net/stories/2015/02/02/calendar-demo/calendar.js">calendar.js</a>.</p>
-<p>In the process, I start by replacing <a href="http://facebook.github.io/react/docs/jsx-in-depth.html">JSX</a> with a <a href="https://github.com/rubys/wunderbar/#wunderbar-easy-html5-applications">library</a> which was inspired by <a href="https://github.com/jimweirich/builder#readme">Builder</a>, <a href="http://markaby.rubyforge.org/">Markaby</a>, and <a href="https://github.com/ahoward/tagz">Tagz</a>.  These libraries, in turn were presumably inspired by earlier works like <a href="http://perldoc.perl.org/CGI.html#CREATING-STANDARD-HTML-ELEMENTS:">Perl’s CGI</a>.</p>
-<p>But there is more.  JSX can’t directly express iteration.  Look at <a href="https://github.com/reactjs/react-tutorial/blob/85a92a09a9dbfbde6c74bf6fbc9cfa2919775d61/public/scripts/example.js#L81">CommentList</a> from the <a href="http://facebook.github.io/react/docs/tutorial.html">React tutorial</a>.  Instead you build up a list, and then subsequently wrap that list.  For nested lists, it appears worthwhile to split out separate components.  There is nothing wrong with doing that, but I will suggest that the primary reason to split out a component shouldn’t be to pander to the limitations of the programming language syntax.</p>
-<p>In Ruby you <b>can</b> directly express iteration.  So where a comment box in the tutorial takes four classes, an entire calendar month can be expressed in one.</p>
-<p>And there is even more.  <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Functions">Functions</a> in JavaScript are the swiss army knives of programming language features.  The can be used to express classes, blocks, lambdas, procs.  But this flexiblity comes at a <a href="http://alistapart.com/article/getoutbindingsituations">price</a>.  Ruby2JS can detect when idioms like <a href="http://stackoverflow.com/questions/962033/what-underlies-this-javascript-idiom-var-self-this">var self=this</a> are needed and automatically apply them.</p>
-<p>The net is that I can write smaller, more understandable code.  And in the process focus more on the problem I’m trying to solve.</p>
-<p>Like with <a href="http://coffeescript.org/">CoffeeScript</a>, <em>"It’s just JavaScript"</em>. The code compiles one-to-one into the equivalent JS, and there is no interpretation at runtime.  You can use any existing JavaScript library seamlessly from Ruby2JS (and vice-versa). The compiled output is readable and pretty-printed, will work in every JavaScript runtime, and tends to run as fast or faster than the equivalent handwritten JavaScript.</p>
-<p>Now I don’t expect to have the success or <a href="https://twitter.com/raganwald/status/555386257233027073">impact</a> that CoffeeScript has had.  But I can say that I’m having fun.  And in the process, I’m seeing the benefits with applications I write.</p></div></content>
-    <updated>2015-02-03T16:50:18-08:00</updated>
-  </entry>
-  <entry>
-    <id>tag:intertwingly.net,2004:3348</id>
-    <link href="/blog/2015/02/02/Web-Components"/>
-    <link rel="replies" href="3348.atom" thr:count="17" thr:updated="2017-05-20T09:37:23-07:00"/>
-    <title>Web Components</title>
-    <summary type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="https://twitter.com/brianleroux/status/561594569913950208">Brian Leroux</a>: <em>ES6 and Web Components</em></p>
-<p>My take is that this talk lumps React in with others based on when it was introduced; but that it is fundamentally different from, say Angular.js as Angular.js is from jQuery.</p></div></summary>
-    <content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><svg style="float:right" xmlns="http://www.w3.org/2000/svg" width="100" height="100" viewBox="0 0 100 100">
-  <g transform="translate(50,50)">
-    <circle fill="#00D8FF" r="8"></circle>
-    <g fill="none" stroke="#00D8FF" stroke-width="4">
-      <ellipse rx="45" ry="17"></ellipse>
-      <ellipse rx="45" ry="17" transform="rotate(60)"></ellipse>
-      <ellipse rx="45" ry="17" transform="rotate(120)"></ellipse>
-    </g>
-  </g>
-</svg>
-<p><a href="https://twitter.com/brianleroux/status/561594569913950208"><cite>Brian Leroux</cite></a>: <em>ES6 and Web Components</em></p>
-<p>Good overview.  Issues:</p>
-<ul>
-<li>YUI is an example of a key problem w/ corp stewardship; Angular, Polymer, React all OK though?</li>
-<li>HTML Imports in trouble as Mozilla doesn’t want to implement; Custom Elements OK even though Chrome is the only implementation?</li>
-<li>Overall, Brian mentions four specifications, and crosses off three.  Why not all four?</li>
-</ul>
-<p>My take is that this talk lumps React in with others based on when it was introduced; but that it is fundamentally different from, say Angular.js as Angular.js is from jQuery.  Compared to the alternatives, react is more imperative, and is based on a virtual DOM.  It also can run in both the server and the client.</p>
-<p>Brian suggests that you view source on <a href="http://brian.io/date-today/">http://brian.io/date-today/</a>.  What you don’t see when you do that is today’s date.  I’d suggest that the ideal would be a page where you do see today’s date — even if JavaScript is disabled.  And for you to be able to interact with that page in ways that involve the server.</p>
-<p>I have my own page on which I would suggest that you view source: <del><a href="https://whimsy.apache.org/calendar-demo">calendar-demo</a></del> (<strong>Update:</strong> that site is down, try <a href="http://intertwingly.net/stories/2015/02/02/calendar-demo/2015/02">this static snapshot</a>).  Use the left and right arrow buttons to go to the previous and next months.  Viewing source reveals that the page is delivered pre-rendered, and only after the content is delivered are script libraries loaded.  Traversing to the next and previous months are pretty snappy despite the fact that there has been no optimization: in particular, there are no anticipatory prefetches.  Nor is data retained should you go back to a previous month.  Neither of these changes would be hard to implement.</p>
-<p>Source is available in <a href="https://svn.apache.org/repos/infra/infrastructure/trunk/projects/whimsy/www/calendar-demo">svn</a>.  Check it out, do a bundle update to get the dependencies, run rake if you want to run a few tests, and run rackup to start a local server.</p>
-<p>I must say that being able to define a component with all of the rendering, client, and server logic in one place is very appealing to me.</p>
-<p>Brian suggests authoring source in ES6, and targeting ES5.  My preference would be to work towards building a language that is to ES6 as CoffeeScript is to ES5.  At the moment, my experimentation along those lines is happening in <a href="https://rubygems.org/gems/ruby2js">Ruby2JS</a>.</p>
-<p><a href="https://www.youtube.com/watch?v=7rDsRXj9-cU">React Native</a> looks worth watching.  Perhaps as my calendar is using flexbox, I will be able to quickly build an Android or IOS equivalent.</p></div></content>
-    <updated>2015-02-02T14:28:32-08:00</updated>
-  </entry>
-  <entry>
-    <id>tag:intertwingly.net,2004:3347</id>
-    <link href="/blog/2015/01/28/Email-addresses"/>
-    <link rel="replies" href="3347.atom" thr:count="1" thr:updated="2017-05-17T22:08:32-07:00"/>
-    <title>Email addresses</title>
-    <summary>I have been telling all non-IBMers to not use my ibm.com email address for years, but this advice is routinely ignored.  I’ve repeated the reaons behind why I ask this enough times that it makes sense for me to post the reasons in one place so that I can point to it.</summary>
-    <content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><svg style="float:right" xmlns='http://www.w3.org/2000/svg' width="167" height="60" viewBox="0 0 167 60">
-  <rect x='32' y='15' fill='#f3b457' rx='3' height='37' width='113'></rect>
-  <g stroke='#FFF' stroke-width='2'>
-    <path d='M38,9c-3,17-11,31-12,33c11,4,21,9,31,15c3-8,5-16,7-24-8-11-17-17-26-24' fill='#64a15a'></path>
-    <path d='M38,9c5,12,8,20,11,30l15-6' fill='#64a15a'></path>
-     <path d='M53,14c10,12,20,24,24,38c10-8,20-16,29-22-1-15-8-23-17-29z' fill='#57a295'></path>
-     <path d='M53,14c13,0,26,2,38,6c0-6,1-13-2-20' fill='#57a295'></path>
-     <path d='M91,33c11-7,22-13,38-15c17,6,16,11,21,17-14,3-25,14-35,23-7-16-16-18-24-25z' fill='#d37736'></path>
-     <path d='M91,33c14-2,26-1,39,0v-14' fill='#d37736'></path>
-   </g>
-   <path d='M4,24l5,4-5,4h7v-8z' fill='#FFF200'></path>
-   <path d='M25,27l-5-3h-16l9,4-9,4h16l5-3z' fill='#d4477e'></path>
-   <path d='M27,28l-4-2h-14l4,2-4,2h14l5-3z' fill='#e55d9c'></path>
-   <path d='M61,27h38l-4,2h-32zM31,27h-28l-3,1l4,1h27zM122,27h33v2h-31z' fill='#303f7a'></path>
-   <path d='M151,31l17-3-17-3c4,2,4,4,0,6' fill='#303f7a'></path>
-</svg>
-<p>I have been telling all non-IBMers to not use my ibm.com email address for years, but this advice is routinely ignored.  I’ve repeated the reaons behind why I ask this enough times that it makes sense for me to post the reasons in one place so that I can point to it.</p>
-<p>The back story is that 15 years ago I wrote some open source code in a programming language called Java.  I don’t use that language much any more, but I understand that it remains popular in some circles.  In any case, javadoc style comments encouraged sharing your email address, and my employer discouraged me from doing anything that would hide my relationship with them, so my email address was put out on the web.</p>
-<p>The inevitable result is that I’m deluged with spam, most in languages I am not familiar with.</p>
-<p>My personal email I have control over and the spam tools (all open source) I use are largely effective.  I don’t have that option with my corporate email.  As others within IBM don’t have this problem, I am clearly an outlier.</p>
-<p>Over time, I was missing enough important work-related emails that I tought myself enough LotusScript to write a script that I can invoke as an ‘Action’.  This script identifies emails that were sent from outside of Lotus Notes and places them into a separate folder.  If I am alerted to the presence of a single email, and given enough information (like senders name and time it was sent) I can generally find the email; but in general people should assume that emails sent to my corporate email address from outside of IBM are never seen by me.</p>
-<p>Another downside of this is that some of my IBM email is sent from service machines that don’t interface directly with Lotus Notes.  That means that I miss some important updates.  And important reminders.  Eventually such reminders copy my manager, who sends them on to me.</p>
-<p>Apparently there is plans in the works to migrate corporate email to the “cloud”.  Perhaps that will be better.  Perhaps I will need to find a way to reimplement my filter or equivalent.  Or perhaps it won’t be something that I <a href="http://www.cringely.com/2015/01/22/ibms-reorg-hell-launches-next-week/">won’t need to worry about any more</a>.</p></div></content>
-    <updated>2015-01-28T08:48:39-08:00</updated>
-  </entry>
-  <entry>
-    <id>tag:intertwingly.net,2004:3346</id>
-    <link href="/blog/2015/01/22/React-rb"/>
-    <link rel="replies" href="3346.atom" thr:count="9" thr:updated="2017-05-26T01:55:48-07:00"/>
-    <title>React.rb</title>
-    <summary type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml">Having determined that Angular.js is overkill for my <a href="http://intertwingly.net/blog/2014/12/19/Weblog-Software-Rewrite-Underway">blog rewrite</a>, I started looking more closely at <a href="http://facebook.github.io/react/">React</a>.  It occurred to me that I could do better than <a href="http://facebook.github.io/jsx/">JSX</a>, so I wrote a <a href="https://github.com/rubys/ruby2js">Ruby2JS</a> filter.  Compare for yourself.</div></summary>
-    <content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><svg style="float:right" xmlns="http://www.w3.org/2000/svg" width="100" height="100" viewBox="0 0 100 100">
-  <g transform="translate(50,50)">
-    <circle fill="#00D8FF" r="8"></circle>
-    <g fill="none" stroke="#00D8FF" stroke-width="4">
-      <ellipse rx="45" ry="17"></ellipse>
-      <ellipse rx="45" ry="17" transform="rotate(60)"></ellipse>
-      <ellipse rx="45" ry="17" transform="rotate(120)"></ellipse>
-    </g>
-  </g>
-</svg>
-<p>Having determined that Angular.js is overkill for my <a href="http://intertwingly.net/blog/2014/12/19/Weblog-Software-Rewrite-Underway">blog rewrite</a>, I started looking more closely at <a href="http://facebook.github.io/react/">React</a>.  It occurred to me that I could do better than <a href="http://facebook.github.io/jsx/">JSX</a>, so I wrote a <a href="https://github.com/rubys/ruby2js">Ruby2JS</a> filter.  Compare for yourself.  Excerpt from the <a href="http://facebook.github.io/react/docs/tutorial.html">React tutorial</a>:</p>
-<pre class="code">var CommentList = React.createClass({
-  render: function() {
-    var commentNodes = this.props.data.map(function (comment) {
-      return (
-        &lt;Comment author={comment.author}&gt;
-          {comment.text}
-        &lt;/Comment&gt;
-      );
-    });
-    return (
-      &lt;div className="commentList"&gt;
-        {commentNodes}
-      &lt;/div&gt;
-    );
-  }
-});</pre>
-<p>Equivalent using the Ruby2JS filter:</p>
-<pre class="code">class CommentList &lt; React
-  def render
-    _div.commentList do
-      @@data.forEach do |comment|
-        _CommentBlock comment.text, author: comment.author
-      end
-    end
-  end
-end</pre>
-<p>Note: I renamed the <code>Comment</code> class to <code>CommentBlock</code> to avoid a conflict with the existing <a href="https://developer.mozilla.org/en-US/docs/Web/API/Comment">Comment</a> API.  I wouldn’t have thought that would be necessary, but things didn’t work until I made this change.</p>
-<p><a href="https://github.com/rubys/ruby2js/blob/master/demo/react-tutorial.rb">Full source</a> for the tutorial reimplemented in Ruby is available.</p></div></content>
-    <updated>2015-01-22T17:54:56-08:00</updated>
-  </entry>
-  <entry>
-    <id>tag:intertwingly.net,2004:3345</id>
-    <link href="/blog/2015/01/17/RFC-3986bis"/>
-    <link rel="replies" href="3345.atom" thr:count="1" thr:updated="2017-05-13T05:51:06-07:00"/>
-    <title>RFC 3986bis</title>
-    <summary type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml">URL parsers consume URLs and generate URIs.  Such URIs are not <a href="http://www.ietf.org/rfc/rfc3986.txt">RFC 3986</a> complaint.  I’d like to fix that.</div></summary>
-    <content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><svg style="float:right" xmlns='http://www.w3.org/2000/svg' width="132" height="76" viewBox="0 0 132 76">
-  <path d='M57,29c0-9,18-12,17-2c0,7-12,10-12,18v5h8v-3c0-7,13-9,13-21c-1-16-34-16-34,3zM62,53h8v8h-8z' fill='#371'></path>
-  <circle cy='38' stroke='#371' fill='none' r='33' stroke-width='10' cx='66'></circle>
-  <path d='M45,17l9,9l-9,10l-9-10zM67,17l9,9l-9,10l-10-10zM88,17l9,9l-9,10l-9-10zM14,28l9,9l-9,9l-9-9zM35,28l9,9l-9,9l-9-9zM56,28l9,9l-9,9l-9-9zM77,28l9,9l-9,9l-9-9zM98,28l9,9l-9,9l-9-9zM119,28l10,9l-10,9l-9-9zM45,39l9,9l-9,9l-9-9zM67,39l9,9l-9,9l-10-9zM88,39l9,9l-9,9l-9-9z' fill='#bdbdc5'></path>
-  <path d='M44,13l9,31l9-31h25v3l-10,16c23,7,2,52-16,21l6-2c11,21,24-16,2-14v-3l9-15h-11l-13,44h-1l-10-31l-9,31h-1l-15-50h7l9,31l6-21l-3-10z' fill='#005A9C'></path>
-  <path stroke='#000' d='M5,36h20l10,10l10-10l11,10l21-21l11,11l10-11l12,11h19v3h-20l-11-11l-10,11l-11-11l-21,21l-11-10l-10,10l-11-10h-19z' fill='#ffd652' stroke-width='0.5'></path>
-  <path d='M88,49c11,24,22,11,26,5l-1-5c-12,20-24,2-25,0M109,21c-8-16-26,0-16,23c-4-23,12-29,17-16l4-8l-1-6'></path>
-  <path d='M2,35h5v5h-5zM127,35h5v5h-5z'></path>
-  <path d='M57,29c0-9,18-12,17-2c0,7-12,10-12,18v5h8v-3c0-7,13-9,13-21c-1-16-34-16-34,3zM62,53h8v8h-8z' fill='#371'></path>
-</svg>
-<p>TL;DR: URL parsers consume URLs and generate URIs.  Such URIs are not <a href="http://www.ietf.org/rfc/rfc3986.txt">RFC 3986</a> complaint.  I’d like to fix that.</p>
-<p> - - -</p>
-<p>Let’s talk a bit about nomenclature.</p>
-<p>On the web, particularly in places like values of attributes named <a href="http://www.w3.org/TR/html5/links.html#links-created-by-a-and-area-elements">href</a>, there are things that people have, at various times, attempted to call <a href="http://en.wikipedia.org/wiki/Uniform_resource_locator">web addresses</a> or <a href="http://www.ietf.org/rfc/rfc3987.txt">IRIs</a>.  Neither term has stuck.  In common uses these are called <a href="https://url.spec.whatwg.org/">URLs</a>.</p>
-<p>In between the markup and servers, there are user agents.  One such user agent is a browser.  Browsers don’t passively send URLs along, they reject some outright, and transform others.  There should be a name for the set of outputs of the various cleanups that browsers perform.</p>
-<p>Since browsers are programmable, you can directly observe this transformation.  The WHATWG URL specification defines an <a href="https://url.spec.whatwg.org/#api">API</a> which has already been implemented by Firefox and Chrome, and is being evaluated by Microsoft and Apple.  Create a JavaScript console and enter the following:</p>
-<pre class="code">new URL("hTtP:/EXamPLe.COM/").href</pre>
-<p>The output you will see is:</p>
-<pre class="code">"http://example.com/"</pre>
-<p>The output is clearly much cleaner and more consistent than the input.  In fact, in this case the output is RFC 3986 compliant.</p>
-<p>Unfortunately, in the general case, this isn’t true.  Browsers (and more generally: other libraries like the ones found in pretty much every modern programming language) can produce things that aren’t RFC 3986 compliant.</p>
-<p>I’m <a href="https://url.spec.whatwg.org/interop/test-results/">looking</a> at every browser and every library I can.  I’m specifically looking for differences.  In some cases, I’m pointing out where such outputs are clearly wrong and need to be fixed.</p>
-<p>In other cases, the output may not be RFC 3986 compliant, but actually are useful and actually work.  What this means in practice is that the set of things that consumers need to be able to correctly process is not defined by RFC 3986 but by these tools.</p>
-<p>People can learn this the hard way by starting out to implement RFC 3986 and then finding that they need to reverse engineer other tools.  We can do better.  We can set out to update RFC 3986 or otherwise document what the actual set of inputs that can be expected to interoperably process is.</p>
-<p>In general, I have found that it isn’t difficult to talk about places where RFC 3986 can be tightened up.  Where there has been push-back is exploring any notion of loosening the definition.  The reaction generally is expressed along the lines of “doing so would break things”.</p>
-<p>I can see how some see such a position as reasonable.  I don’t, and I’ll tell you why.  What is effectively being said is that documenting how things actually work will break things, which is clearly untrue.</p>
-<p>What such an effort will do is not break things, but uncover uncomfortable truths.  To build upon an <a href="http://www.ietf.org/mail-archive/web/apps-discuss/current/msg13827.html">example</a> from Dave Cridland, one such uncomfortable truth may be that the sets of things that everybody except LDAP schemas can handle is different than the sets of things LDAP schemas can handle.</p>
-<p>There are three ways to handle that.  One would be to change everybody to conform to what LDAP can handle.  One would be to change LDAP.  And one would be to document clearly that the set of things LDAP can handle and the set of things that everybody else expects to be handled are separate sets.  Largely overlapping, yes, but not identical sets.</p>
-<p>While documenting three sets (the set of things Chrome and other browser supports, the set of things HTTP and other protocols support, and the set of things LDAP supports) would not be my first choice, but it may be the only option available given the constraints.</p>
-<p>If you look at those three sets, ideally each would be a proper subset of these that precede it.  That’s not currently the case at the moment, but as I mentioned proposals made with clear rationale provided to tighten up RFC 3986 don’t seem to be getting much push-back.</p>
-<p>What we need then it three names.  URIs seem to be the obvious choice for name of the set of “things LDAP schemas support”.  For better or worse, URLs seem to be the name that has stuck for the first set.</p>
-<p>At this point, a number of people seeing an opening suggest IRIs as the name for the set in the middle.  Um, no.  Except for fragments, this set is 100% pure ASCII.  The name for what IRIs attempted to define is URLs.</p>
-<p>So this means that we need to define a new name.  That’s not so bad, is it?  It could be worse, at least we don’t have to define a <a href="http://martinfowler.com/bliki/TwoHardThings.html">cache invalidation</a> strategy.</p></div></content>
-    <updated>2015-01-17T10:55:26-08:00</updated>
-  </entry>
-  <entry>
-    <id>tag:intertwingly.net,2004:3344</id>
-    <link href="/blog/2015/01/11/URL-Work-Status"/>
-    <link rel="replies" href="3344.atom" thr:count="17" thr:updated="2017-05-25T19:31:54-07:00"/>
-    <title>URL Work Status</title>
-    <summary type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I have <a
-href="https://url.spec.whatwg.org/interop/test-results/">test results</a> that
-show that there is much work to be done.</p> <p>The most likely path forward
-at this point is to get representatives from browser vendors into a room and
-go through these results and make recommendations.  This likely will happen in
-the spring, and in the SF Bay Area.  With that in place, I can work with
-authors of libraries in popular programming languages to produce
-web-compatible versions.  This work will take the form of bug reports,
-patches, or — when required — authoring new libraries.</p></div></summary>
-    <content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><svg style="float:right" xmlns='http://www.w3.org/2000/svg' width="132" height="76" viewBox="0 0 132 76">
-  <path d='M57,29c0-9,18-12,17-2c0,7-12,10-12,18v5h8v-3c0-7,13-9,13-21c-1-16-34-16-34,3zM62,53h8v8h-8z' fill='#371'></path>
-  <circle cy='38' stroke='#371' fill='none' r='33' stroke-width='10' cx='66'></circle>
-  <path d='M45,17l9,9l-9,10l-9-10zM67,17l9,9l-9,10l-10-10zM88,17l9,9l-9,10l-9-10zM14,28l9,9l-9,9l-9-9zM35,28l9,9l-9,9l-9-9zM56,28l9,9l-9,9l-9-9zM77,28l9,9l-9,9l-9-9zM98,28l9,9l-9,9l-9-9zM119,28l10,9l-10,9l-9-9zM45,39l9,9l-9,9l-9-9zM67,39l9,9l-9,9l-10-9zM88,39l9,9l-9,9l-9-9z' fill='#bdbdc5'></path>
-  <path d='M44,13l9,31l9-31h25v3l-10,16c23,7,2,52-16,21l6-2c11,21,24-16,2-14v-3l9-15h-11l-13,44h-1l-10-31l-9,31h-1l-15-50h7l9,31l6-21l-3-10z' fill='#005A9C'></path>
-  <path stroke='#000' d='M5,36h20l10,10l10-10l11,10l21-21l11,11l10-11l12,11h19v3h-20l-11-11l-10,11l-11-11l-21,21l-11-10l-10,10l-11-10h-19z' fill='#ffd652' stroke-width='0.5'></path>
-  <path d='M88,49c11,24,22,11,26,5l-1-5c-12,20-24,2-25,0M109,21c-8-16-26,0-16,23c-4-23,12-29,17-16l4-8l-1-6'></path>
-  <path d='M2,35h5v5h-5zM127,35h5v5h-5z'></path>
-  <path d='M57,29c0-9,18-12,17-2c0,7-12,10-12,18v5h8v-3c0-7,13-9,13-21c-1-16-34-16-34,3zM62,53h8v8h-8z' fill='#371'></path>
-</svg>
-<p>I have <a href="https://url.spec.whatwg.org/interop/test-results/">test
-results</a> that show that there is much work to be done.</p>
-<p>The most likely path forward at this point is to get representatives from
-browser vendors into a room and go through these results and make
-recommendations.  This likely will happen in the spring, and in the SF Bay
-Area.  With that in place, I can work with authors of libraries in popular
-programming languages to produce web-compatible versions.  This work will take
-the form of bug reports, patches, or — when required — authoring new
-libraries.</p>
-<p>Status by venue:</p>
-<dl>
-<dt><b>WHATWG</b></dt>
-<dd><p>At the WHATWG, I’m limited only by my own ability to do the work
-required.  My biggest complaint remains that that the barrier to entry to
-participate is too high.  This. however, is something entirely under my
-control to fix for the specifications I’m working on.  I’m hopeful that
-leading by example will cause others in the WHATWG to do likewise.</p></dd>
-<dt><b>WebPlatform</b></dt>
-<dd><p>I’ve had <a href="https://github.com/webspecs/url/issues">some success</a>,
-but virtually all of this is attributable to GitHub, not WebPlatform.  At the
-moment, technical issues prevent me from updating the spec there.  These
-issues started on December 24th and were promptly reported.  If this
-continues, I’ll push the webspecs develop branch to a whatwg develop branch
-and <a href="https://github.com/IQAndreas/github-issues-import">migrate the
-issues</a>.</p></dd>
-<dt><b>W3C</b></dt>
-<dd><p>There has been no demonstrable progress in the WebApps WG.  The <a
-href="http://www.w3.org/2001/tag/">TAG</a> seems generally supportive.  I
-briefed the <a href="http://www.w3.org/2002/ab/">AB</a>, but nothing has come
-of that.  Same is <a
-href="http://www.w3.org/community/w3process/track/issues/150">true</a> for the
-process CG.  I’m willing to try proposing a <a
-href="http://rawgit.com/webspecs/url/develop/docs/url-charter.html">new
-working group</a>.  Failing this, I believe that I have all the evidence I
-need to convince the W3C Director that <a
-href="http://www.w3.org/2013/09/normative-references">normative references</a>
-to the Living Standard are the only viable alternative.  As Sherlock Holmes
-was known to say: <em>when you have eliminated the impossible, whatever
-remains, however improbable, must be the truth</em>.</p></dd>
-<dt><b>IETF</b></dt>
-<dd><p>I’ve <a
-href="https://lists.w3.org/Archives/Public/public-whatwg-archive/2014Nov/0000.html">met
-with</a> Area Directors.  I’ve participated on the <a
-href="http://www.ietf.org/mail-archive/web/apps-discuss/current/maillist.html">apps-discuss
-mailing list</a>.  With the help of <a href="http://larry.masinter.net/">Larry
-Masinter</a>, I’ve produced and published a <a
-href="http://xml2rfc.tools.ietf.org/cgi-bin/xml2rfc.cgi?url=https%3A%2F%2Fraw.githubusercontent.com%2Fwebspecs%2Furl%2Fdevelop%2Fdocs%2Furl-problem-statement.xml&amp;modeAsFormat=html%2Fascii">problem
-statement</a>.  Sadly, this seems like a clear case of <em>you can lead a
-horse to water, but you can’t make it drink</em>.  Should this change, I have
-until <a href="http://www.ietf.org/meeting/important-dates-2015.html">February
-5th</a> to propose a BOF.</p></dd>
-</dl>
-<p>More details and links are available in the
-<a href="https://github.com/webspecs/url#the-url-standard">README</a>.</p></div></content>
-    <updated>2015-01-11T06:46:06-08:00</updated>
-  </entry>
-  <entry>
-    <id>tag:intertwingly.net,2004:3343</id>
-    <link href="/blog/2015/01/08/Ununzippable-Modern-IE"/>
-    <link rel="replies" href="3343.atom" thr:count="7" thr:updated="2017-05-12T22:09:12-07:00"/>
-    <title>Ununzippable Modern.IE</title>
-    <summary type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml">I’ve downloaded the multi-part zip archive for IE11 on Win10 for VirtualBox on OS/X from <a href="https://www.modern.ie/">modern.ie</a>.  I’ve downloaded the single-file archive on both OS/X and Linux.  I’ve verified the md5 signatures for each.  Yet each time, when I try to unzip the result, I fail.</div></summary>
-    <content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><svg style="float:right" xmlns="http://www.w3.org/2000/svg" width="100" height="100" viewBox="0 0 100 100">
-  <path d="M57,11c40-22,42-2,35,12c8-27-15-20-30-11z" fill="#47b"></path>
-  <path d="M36,56h56c4-60-83-60-86-6c13-16,26-26,36-30l-29,53c20,23,64,26,79-12h-30c0,14-26,12-25-5zM37,43c0-17,26-17,26,0zM39,89c-10,7-42,15-26-16l29-52c-15,6-36,40-37,48c-12,35,14,37,37,20" fill="#47b"></path>
-</svg>
-<p>I’ve downloaded the multi-part zip archive for IE11 on Win10 for VirtualBox on OS/X from <a href="https://www.modern.ie/">modern.ie</a>.  I’ve downloaded the single-file archive on both OS/X and Linux.  I’ve verified the md5 signatures for each.  Yet each time, when I try to unzip the result, I get the following:</p>
-<pre class="code">$ unzip IE11.Win10.For.LinuxVirtualBox.zip
-Archive:  IE11.Win10.For.LinuxVirtualBox.zip
-warning [IE11.Win10.For.LinuxVirtualBox.zip]:  4294967296 extra bytes at beginning or within zipfile
-  (attempting to process anyway)
-file #1:  bad zipfile offset (local header sig):  4294967296
-  (attempting to re-compensate)
-  inflating: IE11 - Win10.ova
-  error:  invalid compressed data to inflate</pre>
-<p>I’ve even tried <a href="http://serverfault.com/a/434537">jar xf</a> with no luck:</p>
-<pre class="code">$ jar xf IE11.Win10.For.LinuxVirtualBox.zip
-java.util.zip.ZipException: invalid entry size (expected 5632888297048912 but got 4801961472 bytes)
-	at java.util.zip.ZipInputStream.readEnd(ZipInputStream.java:403)
-	at java.util.zip.ZipInputStream.read(ZipInputStream.java:195)
-	at java.util.zip.ZipInputStream.closeEntry(ZipInputStream.java:139)
-	at sun.tools.jar.Main.extractFile(Main.java:961)
-	at sun.tools.jar.Main.extract(Main.java:877)
-	at sun.tools.jar.Main.run(Main.java:263)
-	at sun.tools.jar.Main.main(Main.java:1177)</pre>
-<p>This shows signs of <a href="http://googology.wikia.com/wiki/4294967296">integer overflow</a>, so it seems likely that the problem is client side.  Even with that said, choosing to make a this content available in a format for which there isn’t working client libraries available to unpack it isn’t helpful.</p>
-<p>I’m submitting this link as <a href="https://www.modern.ie/en-us/feedback">feedback</a>.</p></div></content>
-    <updated>2015-01-08T03:55:41-08:00</updated>
-  </entry>
-  <entry>
-    <id>tag:intertwingly.net,2004:3342</id>
-    <link href="/blog/2015/01/06/New-PhantomJS-and-Capybara-fan"/>
-    <link rel="replies" href="3342.atom" thr:count="1" thr:updated="2015-02-24T19:58:33-08:00"/>
-    <title>New PhantomJS and Capybara fan</title>
-    <content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><svg style="float:right" xmlns='http://www.w3.org/2000/svg' width="96" height="104" viewBox="0 0 96 104">
-  <path d='M4,88c4,3,10,1,16,3c4,1,6,11,11,9c5-1,14-6,20-4c6,1,13,12,17,7c4-5,6-16,16-15c10,1,12,1,11-12c-1-7,5-13-3-18c-13-9-34-3-46,5c-14,10-47,20-42,25' fill='#000' fill-opacity='0.23'></path>
-  <path d='M82,43c0,22,9,27,9,37c0,5-12,1-17,4c-4,3-2,7-9,9c-4,2-10-6-17-6c-5,0-14,7-19,4c-4-2-4-9-10-10c-6,0-19,7-19,1c0-7,8-14,8-39c0-23,17-43,37-42c21,0,37,20,37,44' fill='#ccc' fill-opacity='0.63' stroke='#000'></path>
-  <path d='M33,22c-5,0-9,4-9,10c0,6,4,10,9,10c4,0,7-2,9-6c1,4,4,6,8,6c5,0,9-4,9-10c0-6-4-10-9-10c-4,0-7,2-8,6c-2-4-5-6-9-6' fill='#fff' stroke='#000'></path>
-  <circle cx="36" cy="34" r="4"></circle>
-  <circle cx="48" cy="34" r="4"></circle>
-  <path d='M69,15c2,0,9,9,10,19l2,23c1,6,10,22,6,21c-8,0-13-13-12-33c3-21-9-29-6-30M73,82c-2,2-6,11-7,7c-2-4-1-11-2-22c-1-8-5-18-2-16c3,1,5,5,7,13c3,7,6,16,4,18M45,85c-2,2-6,4-9,3c-3,0-3-10-2-17c1-6,4-16,6-17c2-2,0,6,2,18c2,8,5,11,3,13
-M20,79c-2,0-5,0-7,1c-3,1,0-4,4-11c3-6,4-12,5-14c2-2,0,6-1,13c0,7,1,11-1,11' fill-opacity='0.12'></path>
-</svg>
-<p>While I’m clearly late to the party, I’ve already become a huge fan of <a href="http://jnicklas.github.io/capybara/">capybara</a> and <a href="http://phantomjs.org/">phantomjs</a>.  I’m now using both with my <a href="http://intertwingly.net/blog/2014/12/19/Weblog-Software-Rewrite-Underway">previously mentioned</a> <a href="https://github.com/rubys/wicker">blogging software</a> rewrite.</p>
-<p>My original intent was to aggressively prune unnecessary function with the intent of producing a more maintainable result, but with the ability to have automated acceptance tests, this is now less of a concern.</p></div></content>
-    <updated>2015-01-06T11:47:40-08:00</updated>
-  </entry>
-  <entry>
-    <id>tag:intertwingly.net,2004:3341</id>
-    <link href="/blog/2015/01/05/Apple-Apostasy"/>
-    <link rel="replies" href="3341.atom" thr:count="8" thr:updated="2015-01-06T16:33:30-08:00"/>
-    <title>Apple Apostasy</title>
-    <content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><svg style="float:right" xmlns='http://www.w3.org/2000/svg' width="90" height="100" viewBox="0 0 90 100">
-  <path d='M62,0c2,10-9,24-20,24c-3-14,9-22,20-24M5,36c5-8,13-12,21-12c7,0,12,4,19,4c6,0,10-4,19-4c6,0,14,3,19,10c-16,4-15,35,3,39c-7,17-18,27-24,27c-7,0-8-5-17-5c-9,0-11,5-17,5c-7-1-13-7-17-13c-9-10-15-40-6-51' fill='#AAA'></path>
-</svg>
-<p>Looks like <a href="http://wozniak.ca/why-i-quit-os-x">Why I quit OS X</a> struck a nerve — it is currently down (see <a href="http://web.archive.org/web/20150105063342/http://wozniak.ca/why-i-quit-os-x">web archive</a>).  Also good: <a href="http://www.marco.org/2015/01/04/apple-lost-functional-high-ground">Apple has lost the functional high ground</a>.</p>
-<p>I particularly like the comment that <em>“It just works” was never completely true</em>.  My experience is that when working with open source codebases, doing so on an Linux operating system comes much closer to “It just works” than doing so on any other.</p></div></content>
-    <updated>2015-01-05T12:09:46-08:00</updated>
-  </entry>
-  <entry>
-    <id>tag:intertwingly.net,2004:3340</id>
-    <link href="/blog/2015/01/03/Rack-broke-Sinatra"/>
-    <link rel="replies" href="3340.atom" thr:count="4" thr:updated="2017-05-25T22:32:58-07:00"/>
-    <title>Rack broke Sinatra</title>
-    <content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><svg style="float:right" xmlns="http://www.w3.org/2000/svg" width="90" height="111" viewBox="0 0 90 111">
-  <g stroke-linejoin="bevel" stroke-linecap="square" fill="none" stroke="#000">
-    <path d="M6,15l30-10l49,11v82v-82l-24,8v83v-83l-55-9v83l56,8l23-8" stroke-width="4"></path>
-    <path d="M6,98l27-9v-13l-26-4l27-8v-20l-27-5l28-8v-11v11l49,10l-24,8l-26-5v20l50,8l-24,9l-27-5v13l51,8" stroke-width="2"></path>
-  </g>
-</svg>
-<p>Not rack’s fault, but Sinatra hasn’t released in a while.  Problem has been known since <a href="https://github.com/sinatra/sinatra/pull/907">July</a>, and a fix was merged into master in <a href="https://github.com/sinatra/sinatra/commit/a43ba2c65a79bf58adc1291b0079ea889310e072">August</a>.  One <a href="https://github.com/honeybadger-io/honeybadger-ruby/commit/0e1d652992160fcf1bb3f2e53fbfafdad4d9047d">possible workaround</a> has been posted.  An alternate workaround:</p>
-<pre class="code">module Rack
-  class ShowExceptions
-    alias_method :old_pretty, :pretty
-    def pretty(*args)
-      result = old_pretty(*args)
-      def result.join; self; end
-      def result.each(&amp;block); block.call(self); end
-      result
-    end
-  end
-end</pre></div></content>
-    <updated>2015-01-03T17:31:33-08:00</updated>
-  </entry>
-  <entry>
-    <id>tag:intertwingly.net,2004:3339</id>
-    <link href="/blog/2014/12/19/Weblog-Software-Rewrite-Underway"/>
-    <link rel="replies" href="3339.atom" thr:count="3" thr:updated="2017-05-23T01:57:34-07:00"/>
-    <title>Weblog Software Rewrite Underway</title>
-    <summary type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I’ve clearly been neglecting my little spot on the web.</p>
-<p>It has gotten so bad that <a href="https://twitter.com/BrendanEich/status/544975709404282881">Brendan Eich</a> had to link to a web archive copy of a page of mine.  I must say, however, that it is very ironic and amusing that it is was <a href="http://www.intertwingly.net/blog/2005/03/15/Dont-Panic">that particular page</a>.  General outline of my current approach:</p></div></summary>
-    <content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><svg style="float:right" xmlns="http://www.w3.org/2000/svg" width="100" height="100" viewBox="0 0 100 100">
-  <defs xmlns:xlink="http://www.w3.org/1999/xlink">
-    <radialGradient id="s1" fx=".4" fy=".2" r=".7">
-      <stop stop-color="#FE8"></stop><stop stop-color="#D70" offset="1"></stop>
-    </radialGradient>
-    <radialGradient id="s2" fx=".8" fy=".5" xlink:href="#s1"></radialGradient>
-    <radialGradient id="s3" fx=".5" fy=".9" xlink:href="#s1"></radialGradient>
-    <radialGradient id="s4" fx=".1" fy=".5" xlink:href="#s1"></radialGradient>
-  </defs>
-  <g stroke="#940">
-    <path d="M73,29c-37-40-62-24-52,4l6-7c-8-16,7-26,42,9z" fill="url(#s1)"></path>
-    <path d="M47,8c33-16,48,21,9,47l-6-5c38-27,20-44,5-37z" fill="url(#s2)"></path>
-    <path d="M77,32c22,30,10,57-39,51l-1-8c3,3,67,5,36-36z" fill="url(#s3)"></path>
-    <path d="M58,84c-4,20-38-4-8-24l-6-5c-36,43,15,56,23,27z" fill="url(#s4)"></path>
-    <path d="M40,14c-40,37-37,52-9,68l1-8c-16-13-29-21,16-56z" fill="url(#s1)"></path>
-    <path d="M31,33c19,23,20,7,35,41l-9,1.7c-4-19-8-14-31-37z" fill="url(#s2)"></path>
-  </g>
-</svg>
-<p>I’ve clearly been neglecting my little spot on the web.</p>
-<p>It has gotten so bad that <a href="https://twitter.com/BrendanEich/status/544975709404282881">Brendan Eich</a> had to link to a web archive copy of a page of mine.  I must say, however, that it is very ironic and amusing that it is was <a href="http://www.intertwingly.net/blog/2005/03/15/Dont-Panic">that particular page</a>.  The problem turned out not to be a software problem, but rather a (presumably inadvertent) DOS attack on <a href="http://feedvalidator.org/about.html">feedvalidator.org</a>, causing CGI processes to fail.  Blocking the IP address in question caused the problem to clear up.</p>
-<p>General outline of my current approach:</p>
-<ul>
-<li>My interface to my weblog will no longer be Python/CGI application on a hosted server.  Instead it will be a Ruby/Sinatra application on my private home server where keeping things up to date is much easier for me.  That application will produce static HTML, CSS, StyleSheet, and a single feed, all of which will be <a href="http://linux.die.net/man/1/rsync">rsync</a>'ed to the public server.</li>
-<li>The only services exposed will be search and comments.  Comments initially be disabled, and when they return they will likely be moderated, though I may make the moderation queue publicly visible.</li>
-<li>My current focus is a software update.  The overall look and feel will (at least initially) remain the same.  </li>
-<li>The pages produced will be HTML5, though all pages may not always pass <a href="http://html5doctor.com/html5-check-it-before-you-wreck-it-with-miketm-smith/">validation</a>.  Mike is 100% correct: <em>different people can make different judgment calls</em>.  In particular, I continue to find that explicitly quoting all attributes and explicitly closing all elements both reduces authoring errors and enables a wider variety of user agents to parse the pages correctly.</li>
-<li>I’ll likely drop many features that were popular at one time, but no longer appear to be.  An example of this: <a href="http://openid.net/">OpenID</a>.</li>
-</ul></div></content>
-    <updated>2014-12-19T06:56:29-08:00</updated>
-  </entry>
-  <entry>
-    <id>tag:intertwingly.net,2004:3338</id>
-    <link href="/blog/2014/11/20/WHATWG-W3C-Collaboration"/>
-    <link rel="replies" href="3338.atom" thr:count="6" thr:updated="2017-05-25T23:59:27-07:00"/>
-    <title>WHATWG/W3C Collaboration</title>
-    <summary type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml">I’ve been having fun working on the <a href="https://url.spec.whatwg.org/">URL Living Standard</a>.  All good things must come to an end.  Now it is time to spell out a path forward.</div></summary>
-    <content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><svg style="float:right" xmlns='http://www.w3.org/2000/svg' width="132" height="76" viewBox="0 0 132 76">
-  <path d='M57,29c0-9,18-12,17-2c0,7-12,10-12,18v5h8v-3c0-7,13-9,13-21c-1-16-34-16-34,3zM62,53h8v8h-8z' fill='#371'></path>
-  <circle cy='38' stroke='#371' fill='none' r='33' stroke-width='10' cx='66'></circle>
-  <path d='M45,17l9,9l-9,10l-9-10zM67,17l9,9l-9,10l-10-10zM88,17l9,9l-9,10l-9-10zM14,28l9,9l-9,9l-9-9zM35,28l9,9l-9,9l-9-9zM56,28l9,9l-9,9l-9-9zM77,28l9,9l-9,9l-9-9zM98,28l9,9l-9,9l-9-9zM119,28l10,9l-10,9l-9-9zM45,39l9,9l-9,9l-9-9zM67,39l9,9l-9,9l-10-9zM88,39l9,9l-9,9l-9-9z' fill='#bdbdc5'></path>
-  <path d='M44,13l9,31l9-31h25v3l-10,16c23,7,2,52-16,21l6-2c11,21,24-16,2-14v-3l9-15h-11l-13,44h-1l-10-31l-9,31h-1l-15-50h7l9,31l6-21l-3-10z' fill='#005A9C'></path>
-  <path stroke='#000' d='M5,36h20l10,10l10-10l11,10l21-21l11,11l10-11l12,11h19v3h-20l-11-11l-10,11l-11-11l-21,21l-11-10l-10,10l-11-10h-19z' fill='#ffd652' stroke-width='0.5'></path>
-  <path d='M88,49c11,24,22,11,26,5l-1-5c-12,20-24,2-25,0M109,21c-8-16-26,0-16,23c-4-23,12-29,17-16l4-8l-1-6'></path>
-  <path d='M2,35h5v5h-5zM127,35h5v5h-5z'></path>
-  <path d='M57,29c0-9,18-12,17-2c0,7-12,10-12,18v5h8v-3c0-7,13-9,13-21c-1-16-34-16-34,3zM62,53h8v8h-8z' fill='#371'></path>
-</svg>
-<p>I’ve been having fun working on the <a
-href="https://url.spec.whatwg.org/">URL Living Standard</a>.  The first change
-I landed was to convert the spec from <a
-href="https://wiki.whatwg.org/wiki/Anolis">Anolis</a> to <a
-href="https://github.com/tabatkins/bikeshed#readme">Bikeshed</a>.  Here’s the
-<a href="https://rawgit.com/whatwg/url/8be4726f53/url.html">before</a> and
-after <a href="https://rawgit.com/whatwg/url/bd3f0ce38f/url.html">after</a>.
-And just for fun, here is <a
-href="https://rawgit.com/whatwg/url/232157a284/url.html">the beginning on
-2014</a> and <a
-href="https://rawgit.com/whatwg/url/bdaff0591b/url.html">beginning of 2013</a>.
-The point being that arbitrary snapshots of living standards do exist.</p>
-<p>Along the way, I’ve been named by my employer’s AC member to be a member of
-the <a href="http://www.w3.org/2008/webapps/">W3C WebApps Working Group</a>,
-and invited to become a member of the <a href="https://whatwg.org/">WHATWG</a>
-organization on GitHub.
-I’ve been named as co-editor of the spec in both organizations, and at that
-point the fun abruptly stopped.  Apparently, the larger political issues that I
-had successfully avoided in the past moved front and center.</p>
-<p>Here’s what I <a
-href="http://intertwingly.net/blog/2014/09/16/The-URL-Mess">said in
-September</a>:</p>
-<blockquote>
-  <p>While I am optimistic that at some point in the future the W3C will
-  feel comfortable referencing stable and consensus driven specifications
-  produced by the WHATWG, it is likely that some changes will be required to
-  one or both organizations for this to occur; meanwhile I encourage the W3C
-  to continue on the path of standardizing a snapshot version of the WHATWG
-  URL specification, and for HTML5 to reference the W3C version of the
-  specification.</p>
-</blockquote>
-<p>Now it is time for me to spell out how I see that happening.</p>
-<p>I’ll start out by saying that I continue to want the WebApps WG to follow
-through on its <a
-href="http://www.w3.org/2014/06/webapps-charter.html#deliverables">charter
-obligation</a> to continue to publish updates to the <a
-href="http://www.w3.org/TR/url/">URL Working Draft</a>.  And once updates
-resume, I want to work on making doing so entirely unnecessary.  While this may
-sound puzzling, there is a method to my madness.  I want to establish an
-environment where an open discussion of this matter can be held without anybody
-feeling that there are options that are closed to them or that there is a gun
-to their head.</p>
-<p>Next I’ll state an observable fact: there exists people who value the output
-of the <a href="http://www.w3.org/2014/Process-20140801/">W3C process</a>.  The
-fact that there are people who don’t doesn’t make the first set of people go
-away or become any less important.  Note that I said the output of the W3C
-process.  People who value that don’t necessarily (or even generally) want to
-observe or participate in the making of the sausage.</p>
-<p>What they value instead is <a
-href="http://lists.w3.org/Archives/Public/public-html-admin/2014Nov/0000.html">regular
-releases and making the bleeding edge publicly available</a>.  And for
-releases, what they care most about are the items that are covered during a
-W3C Transition (<a
-href="http://www.w3.org/html/wg/cr/html5/transition-request.html">example</a>).
-In particular, they are interested in evidence of wide review, evidence that
-issues have been addressed, evidence that there are implementations, and the
-IPR commitments that are captured along the way.</p>
-<p>Some have (and do) argue that these needs can be met in other ways.  Not
-everybody is convinced of this.  I’m not convinced.  In particular, the
-existence of a bugzilla database with numerous bugs closed as WORKS4ME
-without explanation doesn’t satisfy me.</p>
-<p>To date, those needs have intentionally not been met by the WHATWG.  And
-an uneasy arrangement has been created where specs have been republished at
-the W3C with additional editors listed, in many cases in name only.  Those
-copies were then shepherded through the W3C process.  Many are not happy
-with this process.  I personally can live with it, but I’d rather not.</p>
-<p>I said that this will require changes by one or both organizations.  I
-will now say that I expect this to require cooperation and changes by both.
-I’ll start by describing the changes I feel are needed by the WHATWG, of
-which there are three.</p>
-<ol>
-  <li>
-    <p>Agree to the production of planned snapshots.  And by that I mean
-    byte-for-byte copies.  As a part of this that would mean the
-    identification of "items at risk" at early stages of the process, and
-    the potential removal of these items later in the process.  These
-    snapshots will need to meet the needs of the W3C, primarily <a
-    href="http://www.w3.org/2005/07/pubrules">pubrules</a>,
-    and only linking to W3C approved references.  Even though it should have
-    to go without saying, <a
-    href="https://whatwg.org/specs/url/2014-07-30/">apparently it needs to
-    be said</a>: those specs need to be snark free.  Finally I'll go further
-    and suggest that those snapshots be hosted by the W3C, much in the way
-    that the W3C hosts WHATWG's bugzilla database and mailing list
-    archives.</p>
-  </li>
-  <li>
-    <p>Participation in the production of <a
-    href="http://www.w3.org/2005/08/01-transitions.html#transreq">Transition
-    Requests</a>.  That would involve providing evidence of wide review and
-    evidence that issues are addressed.  It also could include, but doesn't
-    necessary require, direct participation in the transition calls.
-    </p>
-  </li>
-  <li>
-    <p>Understanding and internalizing the notion that the combination of an
-    open license coupled with begin unwilling or unable to address a
-    perceived need by others is a valid reason for a fork.  Yes, I know that
-    the W3C hasn't adopted an open license themselves, and I believe that is
-    wrong too.  But that doesn't change the fact that an open license plus
-    an unmet need is sufficient justification for a fork.</p>
-  </li>
-</ol>
-<p>I’ll close my discussion on the WHATWG changes I envision with a statement
-that participation in the W3C process (to the extent described by #1 and #2
-above) is optional and will likely be done on a spec by spec basis.  Editors of
-some WHATWG specs may not chose not to participate in this process, and that’s
-OK, I simply ask that those that don’t recognize the implications of this
-choice (specifically #3 above).</p>
-<p>Responsibility for advancing specs for which the WHATWG editors
-voluntarily elect to participate in the process would fall to a sponsoring
-W3C Working Group.  Starting to sponsor, ceasing to sponsor, and forking a
-spec would require explicit W3C Working Group decisions.  As a general rule,
-Working Groups should only consider sponsoring focused, modular
-specifications.</p>
-<p>Here’s what sponsoring would (and most importantly, would <em>not</em>)
-involve:</p>
-<ol>
-  <li>
-    <p>
-      No editing.  As suggested above, snapshots produced by the WHATWG
-      would be archived, but these archives would be byte-for-byte beyond
-      the changes involved in archiving itself (example: updating stylesheet
-      links to point to captured snapshots of stylesheets).  The one
-      possible exception to this would be in the updating of normative
-      references, but this would only be done with the concurrence of the
-      WHATWG editors.
-    </p>
-  </li>
-  <li>
-    <p>Participation would be limited to the production of <a
-    href="http://www.w3.org/2005/08/01-transitions.html#transreq">Transition
-    Requests</a>.  This would include providing evidence of <a
-    href="http://www.w3.org/2014/Process-20140801/#wide-review">wide
-    review</a>, evidence that issues are <a
-    href="http://www.w3.org/2014/Process-20140801/#formal-address">formally
-    addressed</a>, <a
-    href="http://www.w3.org/2014/Process-20140801/#WGArchiveMinorityViews">recording
-    and reporting of Formal Objections</a>, collecting patent <a
-    href="http://www.w3.org/Consortium/Patent-Policy-20040205/#sec-Disclosure">disclosures</a>,
-    etc.</p>
-  </li>
-</ol>
-<p>That’s it.  Of course, the process will remain the same for documents that
-are copied and shepherded instead, but I see no reason that <a
-href="http://www.w3.org/2008/webapps/">WebApps WG</a> couldn't sponsor the
-WHATWG <a href="https://url.spec.whatwg.org/">URL standard</a> through this
-process, the <a href="http://www.w3.org/html/wg/">HTML WG</a> couldn't do the
-same for the <a href="https://dom.spec.whatwg.org/">DOM standard</a>, the <a
-href="http://www.w3.org/International/core/">I18N WG</a> couldn't do the same
-for the <a href="https://encoding.spec.whatwg.org/">Encoding standard</a>,
-etc.</p>
-<p>While everybody may come into a sponsorship collaboration with the best
-intentions, we need to realize that things may not always go as planned.
-There may be disagreements.  It has been known to happen.  When such
-occurs:</p>
-<ol>
-<li><p>Everyone involved should work very hard to resolve the dispute as
-the consequence of breakage is very bad all around.</p></li>
-<li><p>If no agreement can be reached, the W3C Working Group will likely
-stop the sponsorship of the specific spec involved in the dispute.</p></li>
-<li><p>If a Working Group stops sponsoring a spec, the Working Group could
-still fork that spec - but that would be a suboptimal solution for both W3C and
-WHATWG.  It would also re-inflame the debates between organizations.</p></li>
-<li><p>Nonetheless, since each organization has different criteria, we must
-recognize that this could happen; especially for large, broad, complex
-specs.  Accordingly it makes sense for both organizations to continue the
-trend towards smaller and more modular specifications</p></li>
-</ol>
-<p>I have no idea if others are willing to go along with this, but I hope
-that this concrete proposal helps anchor this discussion.  I invite others
-that are inclined to do so to suggest revisions or to create proposals of
-their own.  As an example, since the above describes an environment of
-collaboration and sharing of work, perhaps co-branding may be worth
-exploring?</p>
-<p>This clearly will take time.  As an editor of the URL specification, I’d
-like to propose that it be the first test of this proposal.  In the
-meanwhile, I plan to spend my time coding.</p>
-<p>For those that wish to dig further, a few links:</p>
-<small><ul>
-  <li>
-    <a href="http://www.w3.org/blog/2014/10/decision-by-consensus-or-by-informed-editor-which-is-better/">http://www.w3.org/blog/2014/10/decision-by-consensus-or-by-informed-editor-which-is-better/</a>
-  </li>
-  <li>
-    <a href="http://lists.w3.org/Archives/Public/www-archive/2014Nov/0023.html">http://lists.w3.org/Archives/Public/www-archive/2014Nov/0023.html</a>
-  </li>
-  <li>
-    <a href="http://lists.w3.org/Archives/Public/public-webapps/2014OctDec/0437.html">http://lists.w3.org/Archives/Public/public-webapps/2014OctDec/0437.html</a>
-  </li>
-  <li>
-    <a href="https://url.spec.whatwg.org/#acknowledgments">https://url.spec.whatwg.org/#acknowledgments</a>
-  </li>
-  <li>
-    <a href="http://lists.w3.org/Archives/Public/public-webapps/2014JulSep/0492.html">http://lists.w3.org/Archives/Public/public-webapps/2014JulSep/0492.html</a>
-  </li>
-  <li>
-    <a href="http://lists.w3.org/Archives/Public/public-whatwg-archive/2014Nov/0000.html">http://lists.w3.org/Archives/Public/public-whatwg-archive/2014Nov/0000.html</a>
-  </li>
-  <li>
-    <a href="http://lists.w3.org/Archives/Public/public-webapps/2014OctDec/0315.html">http://lists.w3.org/Archives/Public/public-webapps/2014OctDec/0315.html</a>
-  </li>
-  <li>
-    <a href="http://lists.w3.org/Archives/Public/public-html-admin/2014Nov/0036.html">http://lists.w3.org/Archives/Public/public-html-admin/2014Nov/0036.html</a>
-  </li>
-  <li>
-    <a href="http://intertwingly.net/blog/2014/10/02/WHATWG-URL-vs-IETF-URI">http://intertwingly.net/blog/2014/10/02/WHATWG-URL-vs-IETF-URI</a>
-  </li>
-</ul></small></div></content>
-    <updated>2014-11-20T08:55:43-08:00</updated>
-  </entry>
-  <entry>
-    <id>tag:intertwingly.net,2004:3337</id>
-    <link href="/blog/2014/10/21/pegurl-js"/>
-    <link rel="replies" href="3337.atom" thr:count="3" thr:updated="2014-11-03T11:38:30-08:00"/>
-    <title>pegurl.js</title>
-    <summary type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://intertwingly.net/projects/pegurl/">pegurl.js</a> is the result of two days worth of work.  While it is undoubtedly buggy and incomplete, it does pass 255 out of <a href="https://raw.githubusercontent.com/w3c/web-platform-tests/master/url/urltestdata.txt">256 tests</a> and that <a href="http://krijnhoetmer.nl/irc-logs/whatwg/20141021#l-399">last test is wrong</a>.  For comparison: <a href="http://intertwingly.net/stories/2014/10/16/urltest-results/">results from other user agents</a>.</p>
-<p>Current work products and future work</p></div></summary>
-    <content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><svg style="float:right" xmlns="http://www.w3.org/2000/svg" width="100" height="100" viewBox="0 0 100 100">
-  <path d="M38,38c0-12,24-15,23-2c0,9-16,13-16,23v7h11v-4c0-9,17-12,17-27c-2-22-45-22-45,3zM45,70h11v11h-11z" fill="#371"></path>
-  <circle cx="50" cy="50" r="45" fill="none" stroke="#371" stroke-width="10"></circle>
-</svg>
-<p><a href="http://intertwingly.net/projects/pegurl/">pegurl.js</a> is the result of two days worth of work.  While it is undoubtedly buggy and incomplete, it does pass 255 out of <a href="https://raw.githubusercontent.com/w3c/web-platform-tests/master/url/urltestdata.txt">256 tests</a> and that <a href="http://krijnhoetmer.nl/irc-logs/whatwg/20141021#l-399">last test is wrong</a>.  For comparison: <a href="http://intertwingly.net/stories/2014/10/16/urltest-results/">results from other user agents</a>.</p>
-<p>Current work products:</p>
-<ul>
-<li>Source: <a href="http://intertwingly.net/projects/pegurl/url.js">API</a>, <a href="http://intertwingly.net/projects/pegurl/url.pegjs">grammar</a>; the latter based on <a href="http://pegjs.majda.cz/">PEG.js</a></li>
-<li><a href="http://intertwingly.net/projects/pegurl/liveview.html">LiveViewer</a>.  Differences mean that either or both of the following are true: (a) pegurl.js doesn’t match the Url Standard or (b) the Url Standard doesn’t match your browser.</li>
-<li><a href="http://intertwingly.net/stories/2014/10/20/Url.xhtml">Grammar expressed in the form of railroad diagrams</a>.  Produced using <a href="https://twitter.com/peg_js/status/329493915881320448">Gunther Rademacher’s converter</a>.</li>
-</ul>
-<p>Future work:</p>
-<ul>
-<li>The implementation is incomplete, in particular, much of the character encoding logic and IP address parsing is just roughed id at this point.</li>
-<li>I’d like to propose a number of changes to the test results; mostly to more closely match existing browser behavior, and perhaps where possible to make the implementation logic less convoluted.  Meanwhile, I felt that it was important to have a faithful baseline implemented so that I could experiment with changes and see if there were any unintended consequences to those changes.</li>
-<li>More tests!  There’s no such thing as too many tests.</li>
-<li><a href="https://www.w3.org/Bugs/Public/show_bug.cgi?id=25946">Rewrite URL parser</a>.  I suspect that the railroad diagrams (converted to <a href="https://github.com/tabatkins/bikeshed">bikeshed</a>?) plus the parts of the grammar contained in curly braces expressed in prose would be more comprehensible and maintainable than the current state machine approach.</li>
-</ul></div></content>
-    <updated>2014-10-21T08:17:36-07:00</updated>
-  </entry>
-</feed>
----
-feed.format:    atom
-feed.title:     Sam Ruby
-feed.subtitle:  It’s just data
-feed.url:       /blog/
-feed.feed_url:  http://intertwingly.net/blog/index.atom
-feed.updated:   >>> DateTime.new( 2017, 5, 26, 3, 36, 44, '-7')
-feed.items[0].title:    Badges? We don't need no stinkin' badges!
-feed.items[0].url:      /blog/2017/04/07/Badges-We-dont-need-no-stinkin-badges
-feed.items[0].guid:     tag:intertwingly.net,2004:3356
-feed.items[0].updated:  >>> DateTime.new( 2017, 4, 7, 5, 7, 22, '-7' )