rakali 0.1.2 → 0.1.3

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: fbcaba4904aa189e1f6d70fb4b4c1f988c1d79bc
4
- data.tar.gz: 2d606db038c01b720c0842efaa821f90ca0893a5
3
+ metadata.gz: 6229934eddc63689dd10fc63358f666ba58142e6
4
+ data.tar.gz: 1da7b0b6f1ab42b40fb6b4383f087875dc2f6b4e
5
5
  SHA512:
6
- metadata.gz: a81348d6aedbc7680d92e752d2eb594265d401def51cb9d810f17f8fa2b7f52ccb0d1e2e3531123ea7e4a161831aaf1343a5f120263b7c5912b20c5fc4936277
7
- data.tar.gz: 75cff22509e12081aab80da77f6a72b77ffe4bb10e017795a6a960fb0de473584093fa925f8dce997067f096911b74a04a3fef6b3d14351c7cf71531afd91ccb
6
+ metadata.gz: f3e7dc039b3f3b5ba71dc636f94015b200d448dba911704189a8cc8764bae43f775e1ef9c7f7ea892a32eab896202f1041b6124d2745e08a37704d1141cf51cb
7
+ data.tar.gz: 20c6bf3092e95f428846792bdea1d44b3e884a7415663e9be50625a7dfa1b092acf05a57648b273fd7a1910d6c6cbf268dfc0710e185d61c1c62fcf478ef8705
@@ -1,11 +1,15 @@
1
1
  from:
2
2
  folder: examples
3
- format: docx
3
+ format: md
4
4
  to:
5
+ folder: examples
5
6
  format: html
7
+ schema: title_block.json
6
8
  merge: false
9
+ strict: true
7
10
  options:
8
11
  latex-engine: xelatex
9
12
  variables:
10
- mainfont: 'Minion Pro'
11
- sansfont: 'Myriad Pro'
13
+ documentclass: article
14
+ filters:
15
+ - behead2.hs
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- rakali (0.1.2)
4
+ rakali (0.1.3)
5
5
  colorator (~> 0.1)
6
6
  json-schema (~> 2.2)
7
7
  safe_yaml (~> 1.0)
data/README.md CHANGED
@@ -29,7 +29,7 @@ rakali convert .rakali.yml
29
29
 
30
30
  The default configuration looks like this:
31
31
 
32
- ```
32
+ ```yaml
33
33
  from:
34
34
  folder:
35
35
  format: md
@@ -52,6 +52,27 @@ Validation against **JSON Schema** also works directly with Pandoc, generate a J
52
52
 
53
53
  To integrate rakali into a continuous integration environment such as [Travis CI](https://travis-ci.org), add a configuration file (e.g. `.rakali.yml`) into the root folder of your repo, install Pandoc and the rakali gem and run `rakali convert .rakali.yml`. Look at `.travis.yml`, `.rakali.yml` and the `examples` folder in this repo for a working example.
54
54
 
55
+ ## Options and variables
56
+
57
+ Include Pandoc [options and variables](http://johnmacfarlane.net/pandoc/README.html) in the yaml input file:
58
+
59
+ ```yaml
60
+ options:
61
+ latex-engine: xelatex
62
+ variables:
63
+ documentclass: article
64
+ ```
65
+
66
+ ## Filters
67
+
68
+ Rakali can use Pandoc [filters](http://johnmacfarlane.net/pandoc/scripting.html) and uses the same conventions: they can be written in any language as long as the files are executable and they use Pandoc JSON as input and output format. Rakali includes the `behead2.hs` example from the Pandoc documentation, for filters that are not part of rakali include the path. Filters are processed in the order they are listed.
69
+
70
+ ```yaml
71
+ filters:
72
+ - behead2.hs
73
+ - your_folder/caps.py
74
+ ```
75
+
55
76
  ## Feedback
56
77
 
57
78
  This is an early release version. Please provide feedback via the [issue tracker](https://github.com/rakali/rakali.rb/issues).
@@ -64,5 +85,8 @@ This is an early release version. Please provide feedback via the [issue tracker
64
85
  4. Push to the branch (`git push origin my-new-feature`)
65
86
  5. Create new Pull Request
66
87
 
88
+ ## Release notes
89
+ Are [here](releases.md).
90
+
67
91
  ## License
68
92
  [MIT License](LICENSE).
data/Rakefile CHANGED
@@ -20,7 +20,7 @@ namespace :repo do
20
20
  # Commit and push to github
21
21
  sh "git add --all ."
22
22
  sh "git commit -m 'Committing converted files.'"
23
- sh "git push https://${GH_TOKEN}@github.com/rakali/rakali.rb master --quiet"
23
+ sh "git push https://${GH_TOKEN}@github.com/rakali/rakali.rb master"
24
24
  puts "Pushed converted files to repo"
25
25
  end
26
26
  end
@@ -0,0 +1,41 @@
1
+ <p>Authoring of scholarly articles is a recurring theme in this blog since it started in 2008. Authoring is still in desperate need for improvement, and nobody has convincingly figured out how to solve this problem.<!--more--> Authoring involves several steps, and it helps to think about them separately:</p>
2
+ <ul>
3
+ <li><strong>Writing</strong>. Manuscript writing, including formatting, collaborative authoring</li>
4
+ <li><strong>Submission</strong>. Formatting a manuscript according to a publisher's author guidelines, and handing it over to a publishing platform</li>
5
+ <li><strong>Revision</strong>. Changes made to a manuscript in the peer review process, or after publication</li>
6
+ </ul>
7
+ <p>Although authoring typically involves text, similar issues arise for other research outputs, e.g research data. And these considerations are also relevant for other forms of publishing, whether it is self-publication on a blog or website, or publishing of preprints and white papers.</p>
8
+ <div class="figure">
9
+ <img src="images/grammar.jpg" alt="Flickr photo by citnaj." /><p class="caption">Flickr photo by <a href="http://www.flickr.com/photos/citnaj/1278021067/">citnaj</a>.</p>
10
+ </div>
11
+ <p>For me the main challenge in authoring is to go from human-readable unstructured content to highly structured machine-readable content. We could make authoring simpler by either forgoing any structure and just publishing in any format we want, or we can force authors to structure their manuscripts according to a very specific set of rules. The former doesn't seem to be an option, not only do we have a set of community standards that have evolved for a very long time (research articles for example have title, authors, results, references, etc.), but it also makes it hard to find and reuse scholarly research by others.</p>
12
+ <p>The latter option is also not really viable since most researchers haven't learned to produce their research outputs in machine-readable highly standardized formats. There are some exceptions, e.g. <a href="http://www.consort-statement.org/">CONSORT</a> and other reporting standards in clinical medicine or the <a href="http://blogs.ch.cam.ac.uk/pmr/2012/01/23/brian-mcmahon-publishing-semantic-crystallography-every-science-data-publisher-should-watch-this-all-the-way-through/">semantic publishing in Crystallography</a>, but for the most part research outputs are to diverse to easily find a format that works for all of them. The current trend is certainly towards machine-readable rather than towards human-readable, but there is still a significant gap - scholarly articles are transformed from documents in Microsoft Word (or sometimes LaTeX) format into XML (for most biomedical research that means <a href="http://jats.nlm.nih.gov/publishing/">JATS</a>) using kludgy tools and lots of manual labor.</p>
13
+ <p>What solutions have been tried to overcome the limitations of our current authoring tools, and to make the process more enjoyable for authors and more productive for publishers?</p>
14
+ <ol style="list-style-type: decimal">
15
+ <li>Do the conversion manually, still a common workflow.</li>
16
+ <li>Tools for publishers such as <a href="http://blogs.plos.org/mfenner/2009/05/01/extyles_interview_with_elizabeth_blake_and_bruce_rosenblum/">eXtyles</a>, <a href="http://www.shabash.net/merops/">Merops</a> - both commercial - or the evolving Open Source <a href="http://www.lib.umich.edu/mpach/modules">mPach</a> that convert Microsoft Word documents into JATS XML and do a lot of automated checks along the way.</li>
17
+ <li>Tools for authors that directly generate JATS XML, either as a Microsoft Word plugin (the <a href="http://blogs.nature.com/mfenner/2008/11/07/interview-with-pablo-fernicola">Article Authoring Add-In</a>, not actively maintained) in the browser (e.g. <a href="http://blogs.plos.org/mfenner/2009/02/27/lemon8_xml_interview_with_mj_suhonos/">Lemon8-XML</a>, not actively maintained), or directly in a publishing platform such as Wordpress (<a href="http://annotum.org/">Annotum</a>).</li>
18
+ <li>Forget about XML and use HTML5 has the canocical file format, e.g. as <a href="http://blogs.plos.org/mfenner/2011/03/19/a-very-brief-history-of-scholarly-html/">Scholarly HTML</a> or HTML5 specifications such as <a href="https://github.com/oreillymedia/HTMLBook/blob/master/specification.asciidoc">HTMLBook</a>. Please read Molly Sharp's <a href="http://blogs.plos.org/tech/structured-documents-for-science-jats-xml-as-canonical-content-format/">blog post</a> for background information about HTML as an alternative to XML.</li>
19
+ <li>Use file formats for authoring that are a better fit for the requirements of scholarly authors, in particular <a href="http://blog.martinfenner.org/2012/12/13/a-call-for-scholarly-markdown/">Scholarly Markdown</a>.</li>
20
+ <li>Build online editors for scientific content that hide the underlying file format, and guide users towards a structured format, e.g. by not allowing input that doesn't conform to specifications.</li>
21
+ </ol>
22
+ <p><strong>Solution 1.</strong> isn't really an option, as it makes scholarly publishing unnecessarily slow and expensive. Typesetter Kaveh Bazergan has gone on record at the <a href="http://www.nature.com/spoton/2012/11/spoton-london-2012-a-global-conference/">SpotOn London Conference 2012</a> by saying that the current process is insane and that he wants to be &quot;put out of business&quot;.</p>
23
+ <p><strong>Solution 2.</strong> is probably the most commonly used workflow used by larger publishers today, but is very much centered around a Microsoft Word to XML workflow. LaTeX is a popular authoring environment in some disciplines, but still requires work to convert documents into web-friendly formats such as HTML and XML.</p>
24
+ <p><strong>Solutions 3. to 5.</strong> have never picked up any significant traction. Overall the progress in this area has been modest at best, and the mainstream of authoring today isn't too different from 20 years ago. Although I have gone on record for saying that <a href="/tags.html#markdown-ref">Scholarly Markdown</a> has a lot of potential, the problem is much bigger than finding a single file format, and markdown will never be the solution for all authoring needs.</p>
25
+ <p><strong>Solution 6.</strong> is an area where a lot of exciting development is currently happening, examples include <a href="https://www.authorea.com/">Authorea</a>, <a href="https://www.writelatex.com/">WriteLateX</a>, <a href="https://www.sharelatex.com/">ShareLaTeX</a>. Although the future of scholarly authoring will certainly include online authoring tools (making it much easier to collaborate, one of the authoring pain points), we run the risk of locking in users into one particular authoring environment.</p>
26
+ <p><em>Going Forward</em></p>
27
+ <p>How can we move forward? I would suggest the following:</p>
28
+ <ol style="list-style-type: decimal">
29
+ <li>Publishers should accept manuscripts in any reasonable file format, which means at least Microsoft Word, Open Office, LaTeX, Markdown, HTML and PDF, but possibly more. This will create a lot of extra work for publishers, but will open the doors for innovation, both in the academic and commercial sector. We will never see significant progress in scholarly authoring tools if the submission step requires manuscripts to be in a single file format (Microsoft Word) - in particular since this file format is a general purpose word processsing format and not something designed specifically for scholarly content. And we want researchers to spend their time doing research and writing up their research, not formatting documents.</li>
30
+ <li>To handle this avalanche of unstructured documents, publishers need conversion tools that can transform all these documents into a format that can feed into their editorial and publishing workflows. A limited number of these tools exist already, but this will require a significant development effort. Again, opening up submissions to a variety of file formats will not only foster innovation in authoring tools, but also in document conversion tools.</li>
31
+ <li>We should think beyond XML. Many of the workflows designed today center around conversions from one XML format to another, e.g. Microsoft Word to JATS or <a href="http://www.tei-c.org/index.xml">TEI</a> (popular in the humanities), often using XLST transforms. Not only is XML difficult for humans to read or edit, but the web and many of the technologies built around it are moving away from XML towards HTML5 and JSON. XML is fine as an important output format for publishing, but maybe not the best format to hold everything together.</li>
32
+ <li>As we haven't come up with a canoical file format for scholarly documents by now, we should give up that idea. XML is great for publisher workflows, but is not something humans can easily edit or read. PDF is still the most widely read format by humans, but is not a good intermediary format. LaTeX is too complex for authors outside of mathematics, physics and related fields, and is not built with web standards in mind. Markdown is promising, but doesn't easily support highly structured content. And HTML5 and the related ePub are widely popular, but can be hard to edit without a visual editor, and currently don't include enough standard metadata to support scholarly content out of the box.</li>
33
+ <li>The focus should not be on canonical file formats for scholarly documents, but on tools that understand the manuscripts created by researchers and can transform them into something more structured. As we have learned from document conversion tools such as <a href="http://johnmacfarlane.net/pandoc/">Pandoc</a>, we can't do this with a simple find and replace using regular expressions, but need a more structured approach. Pandoc is taking the input document (markdown, LaTeX or HTML) apart and is constructing an abstract syntax tree (<a href="http://en.wikipedia.org/wiki/Abstract_syntax_tree">AST</a>) of the document, using parsing expression grammar (<a href="http://en.wikipedia.org/wiki/Parsing_expression_grammar">PEG</a>), which includes a set of parsing rules. Parsing expression grammars are fairly new, <a href="http://bford.info/pub/lang/peg">first described by Bryan Ford</a> about 10 years ago, but in my mind are a very good fit for the formal grammar of scientific documents. It should be fairly straightforward to generate a variety of output formats from the AST (Pandoc can convert into more than 30 document formats), the hard part is the parsing of the input.</li>
34
+ </ol>
35
+ <p>All this requires a lot of work. Pandoc is a good model to start, but is written in Haskell, a functional programming language that not many people are familar with. For small changes Pandoc allows you to directly manipulate the AST (represented as JSON) using <a href="http://johnmacfarlane.net/pandoc/scripting.html">filters</a> written in Haskell or Python. And <a href="https://github.com/jgm/pandoc">custom writers</a> for other document formats can be written using <a href="http://www.lua.org/">Lua</a>, another interesting programming language that not many people know about. Lua is a fast and relatively easy to learn scripting language that can be easily embedded into other languages, and for similar reasons is also used to <a href="http://en.wikipedia.org/wiki/Wikipedia:Lua">extend the functionality of Wikipedia</a>. PEG parsers in other languages include <a href="http://treetop.rubyforge.org/">Treetop</a> (Ruby), <a href="http://pegjs.majda.cz/">PEG.js</a> (Javascript), and <a href="http://www.antlr.org/">ANTLR</a>, a popular parser generator that also includes PEG features.</p>
36
+ <p>But I think the effort to build a solid open source conversion tool for scholarly documents is worth it, in particular for smaller publishers and publishing platforms who can't afford the commercial Microsoft Word to JATS conversion tools. We shouldn't take any shortcuts - e.g. by focussing on XML and XLST transforms - and we can improve this tool over time, e.g. by starting with a few input and output formats. This tool will be valuable beyond authoring, as it can also be very helpful to convert published scholarly content into other formats such as ePub, and in text mining, which in many ways tries to solve many of the same problems. The <a href="http://johnmacfarlane.net/pandoc/scripting.html">Pandoc documentation</a> includes an example of extracting all URLs out of a document, and this can be modified to extract other content. In case you wonder whether I gave up on the idea of <a href="/tags.html#markdown-ref">Scholarly Markdown</a> - not at all. To me this is a logical next step, opening up journal submission systems to Scholarly Markdown and other evolving file formats. And Pandoc, one of the most interesting tools in this space, is a markdown conversion tool at its heart. The next steps could be the following:</p>
37
+ <ul>
38
+ <li>write a custom writer in Lua that generates JATS output from Pandoc</li>
39
+ <li>explore how difficult it would be to add Microsoft Word .docx as Pandoc input format</li>
40
+ <li>develop Pandoc filters relevant for scholarly documents (e.g. <a href="/2013/07/02/auto-generating-links-to-data-and-resources/">auto-linking accession numbers of biomedical databases</a>)</li>
41
+ </ul>
@@ -1,6 +1,8 @@
1
1
  ---
2
2
  layout: post
3
3
  title: "The Grammar of Scholarly Communication"
4
+ author: Martin Fenner
5
+ date: November 17, 2013
4
6
  tags: [markdown, authoring]
5
7
  ---
6
8
 
@@ -12,7 +14,7 @@ Authoring of scholarly articles is a recurring theme in this blog since it start
12
14
 
13
15
  Although authoring typically involves text, similar issues arise for other research outputs, e.g research data. And these considerations are also relevant for other forms of publishing, whether it is self-publication on a blog or website, or publishing of preprints and white papers.
14
16
 
15
- ![Flickr photo by [citnaj](http://www.flickr.com/photos/citnaj/1278021067/).](/images/grammar.jpg)
17
+ ![Flickr photo by [citnaj](http://www.flickr.com/photos/citnaj/1278021067/).](images/grammar.jpg)
16
18
 
17
19
  For me the main challenge in authoring is to go from human-readable unstructured content to highly structured machine-readable content. We could make authoring simpler by either forgoing any structure and just publishing in any format we want, or we can force authors to structure their manuscripts according to a very specific set of rules. The former doesn't seem to be an option, not only do we have a set of community standards that have evolved for a very long time (research articles for example have title, authors, results, references, etc.), but it also makes it hard to find and reuse scholarly research by others.
18
20
 
@@ -51,4 +53,4 @@ But I think the effort to build a solid open source conversion tool for scholarl
51
53
 
52
54
  * write a custom writer in Lua that generates JATS output from Pandoc
53
55
  * explore how difficult it would be to add Microsoft Word .docx as Pandoc input format
54
- * develop Pandoc filters relevant for scholarly documents (e.g. [auto-linking accession numbers of biomedical databases](/2013/07/02/auto-generating-links-to-data-and-resources/))
56
+ * develop Pandoc filters relevant for scholarly documents (e.g. [auto-linking accession numbers of biomedical databases](/2013/07/02/auto-generating-links-to-data-and-resources/))
@@ -0,0 +1,12 @@
1
+ <p>The Journal Article Tag Suite (<a href="http://jats.nlm.nih.gov/">JATS</a>) is a NISO standard that defines a set of XML elements and attributes for tagging journal articles. JATS is not only used for fulltext content at PubMed Central (and JATS has evolved from the NLM Archiving and Interchange Tag Suite originally developed for PubMed Central), but is also increasinly used by publishers.<!--more--></p>
2
+ <p>For many publishers the <em>version of record</em> of an article is stored in XML, and other formats (currently HTML, PDF and increasingly ePub) are generated from this XML. Unfortunately the process of converting author-submitted manuscripts into JATS-compliant XML is time-consuming and costly, and this is a problem in particular for small publishers.</p>
3
+ <p>In a recent blog post (<a href="/2013/11/17/the-grammar-of-scholarly-communication/">The Grammar of Scholarly Communication</a>) I argued that publishers should accept manuscripts in any reasonable file format, including Microsoft Word, Open Office, LaTeX, Markdown, HTML and PDF. Readers of this blog know that I am a big fan of <a href="/tags.html#markdown-ref">markdown</a> for scholarly documents, but I am of course well aware that at the end of the day these documents have to be converted into JATS.</p>
4
+ <p>As a small step towards that goal I have today released the first public version of <a href="https://github.com/mfenner/pandoc-jats">pandoc-jats</a>, a <a href="http://johnmacfarlane.net/pandoc/README.html#custom-writers">custom writer for Pandoc</a> that converts markdown documents into JATS XML with a single command, e.g.</p>
5
+ <pre><code>pandoc -f example.md --filter pandoc-citeproc --bibliography=example.bib --csl=apa.csl -t JATS.lua -o example.xml</code></pre>
6
+ <p>Please see the <a href="https://github.com/mfenner/pandoc-jats">pandoc-jats</a> Github repository for more detailed information, but using this custom writer is as simple as downloading a single <code>JATS.lua</code>file. The big challenge is of course to make this custom writer work with as many documents as possible, and that will be my job the next few weeks. Two example JATS documents are below (both markdown versions of scholarly articles and posted on this blog as HTML):</p>
7
+ <ul>
8
+ <li>Nine simple ways to make it easier to (re)use your data (<a href="/2013/06/25/nine-simple-ways-to-make-it-easier-to-reuse-your-data/">HTML</a>, <a href="/files/10.7287.peerj.preprints.7v2.xml">JATS</a>)</li>
9
+ <li>What Can Article Level Metrics Do for You? (<a href="/2013/12/11/what-can-article-level-metrics-do-for-you/">HTML</a>, <a href="/files/10.1371.journal.pbio.1001687.xml">JATS</a>)</li>
10
+ </ul>
11
+ <p>Both JATS files were validated against the JATS DTD and XSD and showed no errors with the NLM XML StyleChecker - using the excellent <a href="https://github.com/PeerJ/jats-conversion">jats-conversion</a> conversion and validation tools written by Alf Eaton. Markdown is actually a nice file format to convert to XML - in contrast to HTML authors can't for example put closing tags at the wrong places. And a Pandoc custom writer written in the Lua scripting language is an interesting alternative to XSLT transformations, the more common way to create JATS XML. The custom writer has not been tested with other Pandoc input formats besides markdown, of particular interest are of course HTML and LaTeX - Microsoft Word .docx is unfortunately only a Pandoc output format.</p>
12
+ <p>This is the first public release and there is of course a lot of room for improvement. Many elements and attributes are not yet supported - although <a href="http://orcid.org/blog/2013/03/22/orcid-how-more-specifying-orcid-ids-document-metadata">ORCID author identifiers</a> are of course included. Please help me improve this tool using the Github <a href="https://github.com/mfenner/pandoc-jats/issues">Issue Tracker</a>.</p>
@@ -1,6 +1,8 @@
1
1
  ---
2
2
  layout: post
3
3
  title: "From Markdown to JATS XML in one Step"
4
+ author: Martin Fenner
5
+ date: December 12, 2013
4
6
  tags: [markdown, jats, pandoc]
5
7
  ---
6
8
 
@@ -21,4 +23,4 @@ Please see the [pandoc-jats](https://github.com/mfenner/pandoc-jats) Github repo
21
23
 
22
24
  Both JATS files were validated against the JATS DTD and XSD and showed no errors with the NLM XML StyleChecker - using the excellent [jats-conversion](https://github.com/PeerJ/jats-conversion) conversion and validation tools written by Alf Eaton. Markdown is actually a nice file format to convert to XML - in contrast to HTML authors can't for example put closing tags at the wrong places. And a Pandoc custom writer written in the Lua scripting language is an interesting alternative to XSLT transformations, the more common way to create JATS XML. The custom writer has not been tested with other Pandoc input formats besides markdown, of particular interest are of course HTML and LaTeX - Microsoft Word .docx is unfortunately only a Pandoc output format.
23
25
 
24
- This is the first public release and there is of course a lot of room for improvement. Many elements and attributes are not yet supported - although [ORCID author identifiers](http://orcid.org/blog/2013/03/22/orcid-how-more-specifying-orcid-ids-document-metadata) are of course included. Please help me improve this tool using the Github [Issue Tracker](https://github.com/mfenner/pandoc-jats/issues).
26
+ This is the first public release and there is of course a lot of room for improvement. Many elements and attributes are not yet supported - although [ORCID author identifiers](http://orcid.org/blog/2013/03/22/orcid-how-more-specifying-orcid-ids-document-metadata) are of course included. Please help me improve this tool using the Github [Issue Tracker](https://github.com/mfenner/pandoc-jats/issues).
@@ -0,0 +1,35 @@
1
+ <p>In a <a href="/2014/07/18/roads-not-stagecoaches/">post last week</a> I talked about roads and stagecoaches, and how work on scholarly infrastructure can often be more important than building customer-facing apps. One important aspect of that infrastruture work is to not duplicate efforts.<!--more--></p>
2
+ <div class="figure">
3
+ <img src="images/5673321593_e6a7faa36d_z.jpg" alt="Image by Cocoabiscuit on Flickr" /><p class="caption">Image by Cocoabiscuit <a href="http://www.flickr.com/photos/jfgallery/5673321593/">on Flickr</a></p>
4
+ </div>
5
+ <p>A good example is information (or metadata) about scholarly publications. I am the technical lead for the open source <a href="http://articlemetrics.github.io/">article-level metrics (ALM) software</a>. This software can be used in different ways, but most people use it for tracking the metrics of scholarly articles, with articles that have DOIs issued by CrossRef. The ALM software needs three pieces of information for every article: <strong>DOI</strong>, <strong>publication date</strong>, and <strong>title</strong>. This information can be entered via a web interface, but that is of course not very practical for adding dozens or hundreds of articles at a time. The ALM software has therefore long supported the import of multiple articles via a text file and the command line.</p>
6
+ <p>This approach is working fine for the ALM software <a href="http://articlemetrics.github.io/plos/">running at PLOS since 2009</a>, but is for example a problem if the ALM software runs as a service for multiple publishers. A more flexible approach is to provide an API to upload articles, and I've <a href="http://articlemetrics.github.io/docs/api/">added an API</a> for creating, updating and deleting articles in January 2014.</p>
7
+ <p>While the API is an improvement, it still requires the integration into a number of possibly very different publisher workflows, and you have to deal with setting up the permissions, e.g. so that publisher A can't delete an article from publisher B.</p>
8
+ <p>The next ALM release (3.3) will therefore add a third approach to importing articles: using the <a href="http://api.crossref.org">CrossRef API</a> to look up article information. Article-level metrics is about tracking already published works, so we really only care about articles that have DOIs registered with CrossRef and are therefore published. ALM is now talking to a single API, and this makes it much easier to do this for a number of publishers without writing custom code. Since ALM is an open source application already used by several publishers that aspect is important. And because we are importing, we have don't have to worry about permissions. The only requirement is that CrossRef has the correct article information, and has this information as soon as possible after publication.</p>
9
+ <p>At this point I have a confession to make: I regularly use other CrossRef APIs, but wasn't aware of <strong>api.crossref.org</strong> until fairly recently. That is sort of understandable since the reference platform was deployed only September last year. The documentation to get you started is on <a href="https://github.com/CrossRef/rest-api-doc/blob/master/rest_api.md">Github</a> and the version history shows frequent API updates (now at v22). The API will return all kinds of information, e.g.</p>
10
+ <ul>
11
+ <li>how many articles has publisher x published in 2012</li>
12
+ <li>percentage of DOIs of publisher Y that include at least one ORCID identifier</li>
13
+ <li>list all books with a Creative Commons CC-BY license that were published this year</li>
14
+ </ul>
15
+ <p>Funder (via FundRef) information is also included, but is still incomplete. Another interesting result is the number of <a href="http://blogs.plos.org/mfenner/2011/03/26/direct-links-to-figures-and-tables-using-component-dois/">component DOIs</a> (DOIs for figures, tables or other parts of a document) per year:</p>
16
+ <iframe src="http://cf.datawrapper.de/Ze7et/1/" frameborder="0" allowtransparency="true" allowfullscreen="allowfullscreen" webkitallowfullscreen="webkitallowfullscreen" mozallowfullscreen="mozallowfullscreen" oallowfullscreen="oallowfullscreen" msallowfullscreen="msallowfullscreen" width="640" height="480">
17
+ </iframe>
18
+ <p>For my specific use case I wanted an API call that returns all articles published by PLOS (or any other publisher) in the last day which I can then run regularly. To get all DOIs from a specific publisher, use their CrossRef member ID - DOI prefixes don't work, as publishers can own more than one DOI prefix. To make this task a little easier I built a CrossRef member search interface into the ALM application:</p>
19
+ <div class="figure">
20
+ <img src="images/crossref_api.png" />
21
+ </div>
22
+ <p>We can filter API responses by publication date, but it is a better idea to use the update date, as it is possible that the metadata have changed, e.g. a correction of the title. We also want to increase the number of results per page (using the <code>rows</code> parameter). The final API call for all DOIs updated by PLOS since the beginning of the week would be</p>
23
+ <pre><code>http://api.crossref.org/members/340/works?filter=from-update-date:2014-07-21,until-update-date:2014-07-24&amp;rows=1000</code></pre>
24
+ <p>The next step is of course to parse the JSON of the API response, and you will notice that CrossRef is using <a href="http://gsl-nagoya-u.net/http/pub/citeproc-doc.html">Citeproc JSON</a>. This is a standard JSON format for bibliographic information used internally by several reference managers for citation styles, but increasingly also by APIs and other places where you encounter bibliographic information.</p>
25
+ <p>Citeproc JSON is helpful for one particular problem with CrossRef metadata: the exact publication date for an article is not always known, and CrossRef (and similarly DataCite) only requires the publication year. Citeproc JSON can nicely handle partial dates, e.g. year-month:</p>
26
+ <pre><code>issued: {
27
+ date-parts: [
28
+ [
29
+ 2014,
30
+ 7
31
+ ]
32
+ ]
33
+ },</code></pre>
34
+ <p>I think that a similar approach will work for many other systems that require bibliographic information about scholarly content with CrossRef DOIs. If are not already using <strong>api.crossref.org</strong>, consider integrating with it, I find the API fast, well documented, easy to use - and CrossRef is very responsive to feedback. As you can always wish for more, I would like to see the following: fix the problem were some journal articles are missing the publication date (a required field, even if only the year), and consider adding the canonical URL to the article metadata (which ALM currently has to look up itself, and which is needed to track social media coverage of an article).</p>
35
+ <p><em>Update July 24, 2014: added chart with number of component DOIs per year</em></p>
@@ -1,11 +1,13 @@
1
1
  ---
2
2
  layout: post
3
3
  title: Don't Reinvent the Wheel
4
+ author: Martin Fenner
5
+ date: July 24, 2014
4
6
  tags: [citeproc, crossref]
5
7
  ---
6
8
  In a [post last week](/2014/07/18/roads-not-stagecoaches/) I talked about roads and stagecoaches, and how work on scholarly infrastructure can often be more important than building customer-facing apps. One important aspect of that infrastruture work is to not duplicate efforts.<!--more-->
7
9
 
8
- ![Image by Cocoabiscuit [on Flickr](http://www.flickr.com/photos/jfgallery/5673321593/)](/images/5673321593_e6a7faa36d_z.jpg)
10
+ ![Image by Cocoabiscuit [on Flickr](http://www.flickr.com/photos/jfgallery/5673321593/)](images/5673321593_e6a7faa36d_z.jpg)
9
11
 
10
12
  A good example is information (or metadata) about scholarly publications. I am the technical lead for the open source [article-level metrics (ALM) software](http://articlemetrics.github.io/). This software can be used in different ways, but most people use it for tracking the metrics of scholarly articles, with articles that have DOIs issued by CrossRef. The ALM software needs three pieces of information for every article: **DOI**, **publication date**, and **title**. This information can be entered via a web interface, but that is of course not very practical for adding dozens or hundreds of articles at a time. The ALM software has therefore long supported the import of multiple articles via a text file and the command line.
11
13
 
@@ -27,7 +29,7 @@ Funder (via FundRef) information is also included, but is still incomplete. Anot
27
29
 
28
30
  For my specific use case I wanted an API call that returns all articles published by PLOS (or any other publisher) in the last day which I can then run regularly. To get all DOIs from a specific publisher, use their CrossRef member ID - DOI prefixes don't work, as publishers can own more than one DOI prefix. To make this task a little easier I built a CrossRef member search interface into the ALM application:
29
31
 
30
- ![](/images/crossref_api.png)
32
+ ![](images/crossref_api.png)
31
33
 
32
34
  We can filter API responses by publication date, but it is a better idea to use the update date, as it is possible that the metadata have changed, e.g. a correction of the title. We also want to increase the number of results per page (using the `rows` parameter). The final API call for all DOIs updated by PLOS since the beginning of the week would be
33
35
 
@@ -0,0 +1,46 @@
1
+ <p>This Sunday <a href="https://twitter.com/ianmulvany">Ian Mulvany</a> and I will do a presentation on <a href="http://wikimania2014.wikimedia.org/wiki/Submissions/Open_Scholarship_Tools_-_a_whirlwind_tour.">Open Scholarship Tools</a> at <em>Wikimania 2014</em> in London.<!--more--> From the abstract:</p>
2
+ <blockquote>
3
+ <p>This presentation will give a broad overview of tools and standards that are helping with Open Scholarship today.</p>
4
+ </blockquote>
5
+ <p>One of the four broad topics we have picked are <em>digital object identifiers (DOI)s</em>. We want to introduce them to people new to them, and we want to show some tricks and cool things to people who already now them. Along the way we will also try to debunk some myths about DOIs.</p>
6
+ <p><em>What a DOI looks like</em></p>
7
+ <p>DOIs - or better DOI names - start with a prefix in the format <code>10.x</code> where x is 4-5 digits. The suffix is determined by the organization registering the DOI, and there is no consistent pattern across organizations. The DOI name is typically expressed as a URL (see below). An example DOI would look like: <a href="http://dx.doi.org/10.5555/12345678" class="uri">http://dx.doi.org/10.5555/12345678</a>. Something in the format <strong>10/hvx</strong> or <a href="http://doi.org/hvx" class="uri">http://doi.org/hvx</a> is a <a href="http://shortdoi.org/">shortDOI</a>, and <strong>1721.1/26698</strong> or <a href="http://hdl.handle.net/1721.1/26698" class="uri">http://hdl.handle.net/1721.1/26698</a> is a handle. BTW, all DOIs names are also handles, so <a href="http://hdl.handle.net/10/hvx" class="uri">http://hdl.handle.net/10/hvx</a> for the shortDOI example above will resolve correctly.</p>
8
+ <p><em>DOIs are persistent identifiers</em></p>
9
+ <p>Links to resources can change, particularly over long periods of time. Persistent identifiers are needed so that readers can still find the content we reference in a scholarly work (or anything else where persistent linking is important) 10 or 50 years later. There are many kinds of persistent identifiers, one of the key concepts - and a major difference to URLs - is to separate the identifier for the resource from its location. Persistent identifiers require technical infrastructure to resolve identifiers (DOIs use the <a href="http://www.handle.net/">Handle System</a>) and to allow long-term archiving of resources. DOI registration agencies such as DataCite or CrossRef are required to provide that persistence. Other persistent identifier schemes besides DOIs include <a href="http://en.wikipedia.org/wiki/PURL">persistent uniform resource locators (PURLs)</a> and <a href="http://en.wikipedia.org/wiki/Archival_Resource_Key">Archival Resource Keys (ARKs)</a>.</p>
10
+ <p><em>DOIs have attached metadata</em></p>
11
+ <p>All DOIs have metadata attached to them. The metadata are supplied by the resource provider, e.g. publisher, and exposed in services run by registration agencies, for example metadata search and content negotiation (see below). There is a minimal set of required metadata for every DOI, but beyond that, different registration agencies will use different metadata schemata, and most metadata are optional. Metadata are important to build centralized discovery services, making it easier to describe a resource, e.g. journal article citing another article. Some of the more recent additions to metadata schemata include persistent identifiers for people (<a href="http://orcid.org/">ORCID</a>) and funding agencies (<a href="http://www.crossref.org/fundref/">FundRef</a>), and license information. The following API call will retrieve all publications registered with CrossRef that use a <a href="http://creativecommons.org/licenses/by/3.0/deed.en_US">Creative Commons Attribution license</a> (and where this information has been provided by the publisher):</p>
12
+ <pre><code>http://api.crossref.org/funders/10.13039/100000001/works?filter=license.url:http://creativecommons.org/licenses/by/3.0/deed.en_US</code></pre>
13
+ <p><em>DOIs support link tracking</em></p>
14
+ <p>Links to other resources are an important part of the metadata, and describing all citations between a large number scholarly documents is a task that can only really be accomplished by a central resource. To solve this very problem DOIs were invented and the CrossRef organization started around 15 years ago.</p>
15
+ <p><em>Not every DOI is the same</em></p>
16
+ <p>The DOI system <a href="http://www.doi.org/doi_handbook/1_Introduction.html">originated from an initiative by scholarly publishers</a> (first announced at the Frankfurt Book Fair in 1997), with citation linking of journal articles its first application. This citation linking system is managed by <a href="http://www.crossref.org/">CrossRef</a>, a non-profit member organization of scholarly publishers, and <a href="http://search.crossref.org/help/status">more than half</a> of the about <a href="http://www.doi.org/faq.html">100 million DOIs</a> that have been assigned to date are managed by them.</p>
17
+ <p>But many DOIs are assigned by one of the other 8 <a href="http://www.doi.org/RA_Coverage.html">registration agencies</a>. You probably know <a href="http://www.datacite.org/">DataCite</a>, but did you know that the <a href="http://publications.europa.eu/index_en.htm">Publications Office of the European Union (OP)</a> and the <a href="http://www.eidr.org/">Entertainment Identifier Registry (EIDR)</a> also assign DOIs? The distinction is important, because some of the functionality is a service of the registration agency - metadata search for example is offered by CrossRef (<a href="http://search.crossref.org" class="uri">http://search.crossref.org</a>) and DataCite (<a href="http://search.datacite.org" class="uri">http://search.datacite.org</a>), but you can't search for a DataCite DOI in the CrossRef metadata search. There is an API to find out the registration agency behind a DOI so that you know what services to expect:</p>
18
+ <pre><code>http://api.crossref.org/works/10.6084/m9.figshare.821213/agency
19
+
20
+ {
21
+ &quot;status&quot;: &quot;ok&quot;,
22
+ &quot;message-type&quot;: &quot;work-agency&quot;,
23
+ &quot;message-version&quot;: &quot;1.0.0&quot;,
24
+ &quot;message&quot;: {
25
+ &quot;DOI&quot;: &quot;10.6084/m9.figshare.821213&quot;,
26
+ &quot;agency&quot;: {
27
+ &quot;id&quot;: &quot;datacite&quot;,
28
+ &quot;label&quot;: &quot;DataCite&quot;
29
+ }
30
+ }
31
+ }</code></pre>
32
+ <p><em>DOIs are URLs</em></p>
33
+ <p><a href="http://www.doi.org/faq.html">DOI names may be expressed as URLs (URIs) through a HTTP proxy server</a> - e.g. <a href="http://dx.doi.org/10.5555/12345679" class="uri">http://dx.doi.org/10.5555/12345679</a>, and this is how DOIs are typically resolved. For this reason the <a href="http://www.crossref.org/02publishers/doi_display_guidelines.htm">CrossRef DOI Display Guidelines</a> recommend that <em>CrossRef DOIs should always be displayed as permanent URLs in the online environment</em>. Because DOIs can be expressed as URLs, they also have their features:</p>
34
+ <p><em>Special characters</em></p>
35
+ <p>Because DOIs can be expressed as URLs, DOIs <a href="http://www.crossref.org/02publishers/15doi_guidelines.html">should only include characters allowed in URLs</a>, something that wasn't always true in the past and can cause problems, e.g. when using SICIs (<a href="https://en.wikipedia.org/wiki/Serial_Item_and_Contribution_Identifier">Serial Item and Contribution Identifier</a>), an extension of the ISSN for journals:</p>
36
+ <pre><code>10.4567/0361-9230(1997)42:&lt;OaEoSR&gt;2.0.TX;2-B</code></pre>
37
+ <p><em>Content negotiation</em></p>
38
+ <p>The DOI resolver at <em>doi.org</em> (or <em>dx.doi.org</em>) normally resolves to the resource location, e.g. a landing page at a publisher website. Requests that are not for content type <code>text/html</code> are redirected to the registration agency metadata service (currently for CrossRef, DataCite and mEDRA DOIs). Using <a href="http://www.crosscite.org/cn/">content negotiation</a>, we can ask the metadata service to send us the metadata in a format we specify (e.g. Citeproc JSON, bibtex or even a formatted citation in one of thousands of citation styles) instead of getting redirected to the resource. This is a great way to collect bibliographic information, e.g. to format citations for a manuscript. In theory we could also use content negotiation to get a particular representation of a resource, e.g. <code>application/pdf</code> for a PDF of a paper or <code>text/csv</code> for a dataset in CSV format. This is not widely support and I don't know the details of the implementation in the DOI resolver, but you can try this (content negotation is easier with the command line than with a browser):</p>
39
+ <pre><code>curl -LH &quot;Accept: application/pdf&quot; http://dx.doi.org/10.7717/peerj.500 &gt;peerj.500.pdf</code></pre>
40
+ <p>This will save the PDF of the 500th PeerJ paper published last week.</p>
41
+ <p><em>Fragment identifiers</em></p>
42
+ <p>As discussed in <a href="http://blog.martinfenner.org/2014/08/02/fragment-identifiers-and-dois/">my last blog post</a>, we can use frament identifiers to subsections of a document with DOIs, e.g. <a href="http://dx.doi.org/10.1371/journal.pone.0103437#s2" class="uri">http://dx.doi.org/10.1371/journal.pone.0103437#s2</a> or <a href="http://doi.org/10.5446/12780#t=00:20,00:27" class="uri">http://doi.org/10.5446/12780#t=00:20,00:27</a>, just as we can with every other URL. This is a nice way to directly link to a specific document section, e.g. when discussing a paper on Twitter. Fragment identifiers are implemented by the client (typically web browser) and depend on the document type, but for DOIs that resolve to fulltext HTML documents they can add granularity to the DOI without much effort.</p>
43
+ <p><em>Queries</em></p>
44
+ <p>URLs obviously support queries, but that is a feature I haven't yet seen with DOIs. Queries would allow interesting features, partly overlapping with what is possible with fragment identifiers and content negotiation, e.g. <code>http://dx.doi.org/10.7717/peerj.500?format=pdf</code>. II hope to find out more until Sunday.</p>
45
+ <p><em>Outlook</em></p>
46
+ <p>My biggest wish? Make DOIs more machine-readable. They are primarily intended for human users, enabling them to find the content associated with a DOI. But they sometimes don't work as well as they could with automated tools, one example are the <a href="http://blog.martinfenner.org/2013/10/13/broken-dois/">challenges automatically resolving a DOI</a> that I described in a blog post last year. Thinking about DOIs as URLs - and using them this way - is the right direction.</p>
@@ -1,6 +1,8 @@
1
1
  ---
2
2
  layout: post
3
3
  title: What is a DOI?
4
+ author: Martin Fenner
5
+ date: August 6, 2014
4
6
  tags: [doi, wikimania]
5
7
  ---
6
8
 
@@ -0,0 +1,8 @@
1
+ #!/usr/bin/env runhaskell
2
+ -- behead2.hs
3
+ import Text.Pandoc.JSON
4
+
5
+ main :: IO ()
6
+ main = toJSONFilter behead
7
+ where behead (Header n _ xs) | n >= 2 = Para [Emph xs]
8
+ behead x = x
@@ -3,7 +3,7 @@
3
3
  module Rakali
4
4
  class Document
5
5
 
6
- attr_accessor :config, :source, :destination, :content, :schema, :errors, :options, :variables, :to_folder
6
+ attr_accessor :config, :source, :destination, :content, :schema, :errors, :options, :variables, :filters, :to_folder
7
7
 
8
8
  def initialize(document, config)
9
9
  begin
@@ -19,11 +19,12 @@ module Rakali
19
19
  if document.is_a?(Array)
20
20
  @source = document.map { |file| File.basename(file) }.join(" ")
21
21
  @destination = "#{File.basename(@from_folder)}.#{@to_format}"
22
- puts @destination
22
+ content = document.map { |file| IO.read(file) }.join("\n\n")
23
23
  else
24
24
  # otherwise use source name with new extension for destination filename
25
25
  @source = File.basename(document)
26
26
  @destination = @source.sub(/\.#{@from_format}$/, ".#{@to_format}")
27
+ content = IO.read(document)
27
28
  end
28
29
 
29
30
  # add pandoc options from config
@@ -34,11 +35,15 @@ module Rakali
34
35
  variables = @config.fetch('variables', nil) || {}
35
36
  @variables = variables.map { |k,v| "--variable #{k}='#{v}'" }.join(" ")
36
37
 
38
+ # add pandoc filters from config
39
+ @filters = filter
40
+
37
41
  # use citeproc-pandoc if citations flag is set
38
42
  bibliography = @config.fetch('citations') ? "-f citeproc-pandoc " : ""
39
43
 
40
44
  # convert source document into JSON version of native AST
41
- @content = convert(nil, @from_folder, "#{@source} #{bibliography}-t json #{@options} #{@variables}")
45
+ # read in document and parse to Pandoc via STDIN to allow filenames with spaces
46
+ @content = convert(content, @from_folder, "#{@source} #{bibliography}-t json #{@options} #{@variables} #{@filters}")
42
47
 
43
48
  # read in JSON schema, use included schemata folder if no folder is given
44
49
  @schema = scheme
@@ -47,7 +52,7 @@ module Rakali
47
52
  @errors = validate
48
53
 
49
54
  # convert to destination document from JSON version of native AST
50
- @output = convert(@content, @to_folder, "-f json #{bibliography}-o #{@destination} #{@options} #{@variables}")
55
+ @output = convert(@content, @to_folder, "-f json #{bibliography}-o #{@destination} #{@options} #{@variables} #{@filters}")
51
56
  Rakali.logger.abort_with "Fatal:", "Writing file #{@destination} failed" unless created?
52
57
 
53
58
  if @errors.empty?
@@ -94,6 +99,18 @@ module Rakali
94
99
  end
95
100
  end
96
101
 
102
+ def filter
103
+ filters = @config.fetch('filters', nil) || []
104
+ filters.map do |f|
105
+ if f.include?("/")
106
+ "--filter=#{f}"
107
+ else
108
+ filters_folder = File.expand_path("../../../filters", __FILE__)
109
+ "--filter=#{filters_folder}/#{f}"
110
+ end
111
+ end.join(" ")
112
+ end
113
+
97
114
  def validate
98
115
  errors = JSON::Validator.fully_validate(@schema, @content)
99
116
  return [] if errors.empty?
@@ -1,3 +1,3 @@
1
1
  module Rakali
2
- VERSION = "0.1.2"
2
+ VERSION = "0.1.3"
3
3
  end
@@ -0,0 +1,13 @@
1
+ ### [Rakali 0.1.3](https://github.com/rakali/rakali.rb/releases/tag/v0.1.3)
2
+
3
+ Rakali 0.1.3 was released on August 19, 2014 with the following new features:
4
+
5
+ * handle file names with spaces
6
+ * added support for [filters](http://johnmacfarlane.net/pandoc/scripting.html)
7
+ * additional JSON schema `title_block.json` to validate `title`, `author` and `date
8
+
9
+ ### [Rakali 0.1.0](https://github.com/rakali/rakali.rb/releases/tag/v0.1.0)
10
+
11
+ Rakali 0.1.0 was released on August 18, 2014 with the following new features:
12
+
13
+ * first public release
@@ -21,10 +21,12 @@
21
21
  "type": "object",
22
22
  "properties": {
23
23
  "title": {"type": "object"},
24
+ "author": {"type": "object"},
25
+ "date": {"type": "object"},
24
26
  "layout": {"type": "object"},
25
27
  "tags": { "$ref": "#/definitions/tags" }
26
28
  },
27
- "required": ["title","layout"]
29
+ "required": ["title", "date", "layout"]
28
30
  }
29
31
  }
30
32
  },
@@ -0,0 +1,35 @@
1
+ {
2
+ "$schema": "http://json-schema.org/draft-04/schema#",
3
+ "title": "Pandoc title block",
4
+ "description": "Metadata in Pandoc title block",
5
+
6
+ "definitions": {
7
+ "author": {
8
+ "type": "object",
9
+ "properties": {
10
+ "c": {"type": "array"}
11
+ }
12
+ }
13
+ },
14
+
15
+ "type": "array",
16
+ "items": [
17
+ {
18
+ "type": "object",
19
+ "properties": {
20
+ "unMeta": {
21
+ "type": "object",
22
+ "properties": {
23
+ "title": {"type": "object"},
24
+ "author": { "$ref": "#/definitions/author" },
25
+ "date": {"type": "object"}
26
+ },
27
+ "required": ["title", "author", "date"]
28
+ }
29
+ }
30
+ },
31
+ {
32
+ "type": "array"
33
+ }
34
+ ]
35
+ }
@@ -39,9 +39,10 @@ describe Rakali::Document do
39
39
  { 'from' => { 'folder' => fixture_path }, 'schema' => 'jekyll.json' })
40
40
  subject = Rakali::Document.new(document, config)
41
41
  subject.valid?.should be_falsey
42
- subject.errors.length.should == 2
43
- subject.errors.first.should match("The property '#/0/unMeta' did not contain a required property of 'title'")
44
- subject.errors.last.should match("The property '#/0/unMeta' did not contain a required property of 'layout'")
42
+ subject.errors.length.should == 3
43
+ subject.errors[0].should match("The property '#/0/unMeta' did not contain a required property of 'title'")
44
+ subject.errors[1].should match("The property '#/0/unMeta' did not contain a required property of 'date'")
45
+ subject.errors[2].should match("The property '#/0/unMeta' did not contain a required property of 'layout'")
45
46
  end
46
47
 
47
48
  it "should not validate with empty input and extended schema and raise error" do
@@ -1,8 +1,9 @@
1
1
  ---
2
2
  layout: post
3
3
  title: "Nine simple ways to make it easier to (re)use your data"
4
+ date: June 12, 2013
4
5
  ---
5
6
 
6
- # Title
7
+ # Heading
7
8
 
8
9
  This is a **test**.
@@ -1,4 +1,7 @@
1
1
  require "codeclimate-test-reporter"
2
+ CodeClimate::TestReporter.configure do |config|
3
+ config.logger.level = Logger::WARN
4
+ end
2
5
  CodeClimate::TestReporter.start
3
6
 
4
7
  require 'bundler/setup'
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: rakali
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.2
4
+ version: 0.1.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Martin Fenner
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2014-08-17 00:00:00.000000000 Z
11
+ date: 2014-08-19 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: thor
@@ -152,16 +152,20 @@ files:
152
152
  - README.md
153
153
  - Rakefile
154
154
  - bin/rakali
155
+ - examples/2013-11-17-the-grammar-of-scholarly-communication.html
155
156
  - examples/2013-11-17-the-grammar-of-scholarly-communication.md
157
+ - examples/2013-12-12-from-markdown-to-jats-xml-in-one-step.html
156
158
  - examples/2013-12-12-from-markdown-to-jats-xml-in-one-step.md
159
+ - examples/2014-07-24-dont-reinvent-the-wheel.html
157
160
  - examples/2014-07-24-dont-reinvent-the-wheel.md
161
+ - examples/2014-08-06-what-is-doi.html
158
162
  - examples/2014-08-06-what-is-doi.md
159
163
  - examples/fenner_2011.docx
160
- - examples/fenner_2011.epub
161
164
  - examples/fenner_2013.docx
162
- - examples/fenner_2013.epub
163
- - filters/caps.rb
164
- - filters/default.rb
165
+ - examples/images/5673321593_e6a7faa36d_z.jpg
166
+ - examples/images/crossref_api.png
167
+ - examples/images/grammar.jpg
168
+ - filters/behead2.hs
165
169
  - lib/rakali.rb
166
170
  - lib/rakali/cli.rb
167
171
  - lib/rakali/converter.rb
@@ -170,10 +174,12 @@ files:
170
174
  - lib/rakali/utils.rb
171
175
  - lib/rakali/version.rb
172
176
  - rakali.gemspec
177
+ - releases.md
173
178
  - schemata/citeproc.json
174
179
  - schemata/default.json
175
180
  - schemata/jats.json
176
181
  - schemata/jekyll.json
182
+ - schemata/title_block.json
177
183
  - spec/converter_spec.rb
178
184
  - spec/document_spec.rb
179
185
  - spec/fixtures/docx.yml
Binary file
Binary file
@@ -1,12 +0,0 @@
1
- # Pandoc filter to convert all regular text to uppercase.
2
- # Code, link URLs, etc. are not affected.
3
- # Adapted from Python example at https://github.com/jgm/pandocfilters/blob/master/examples/caps.py
4
-
5
- module Rakali::Filters::Caps
6
-
7
- def caps(key, value, format, meta)
8
- if key == 'Str'
9
- value.upcase
10
- end
11
- end
12
- end
@@ -1,12 +0,0 @@
1
- # Pandoc filter to convert all regular text to uppercase.
2
- # Code, link URLs, etc. are not affected.
3
- # Adapted from Python example at https://github.com/jgm/pandocfilters/blob/master/examples/caps.py
4
-
5
- module Rakali::Filters::Default
6
-
7
- def default(key, value, format, meta)
8
- if key == 'Str'
9
- value.upcase
10
- end
11
- end
12
- end