cirneco 0.9.16 → 0.9.17

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (94) hide show
  1. checksums.yaml +5 -5
  2. data/Gemfile.lock +70 -67
  3. data/cirneco.gemspec +3 -3
  4. data/lib/cirneco/data_center.rb +0 -2
  5. data/lib/cirneco/doi.rb +2 -90
  6. data/lib/cirneco/media.rb +1 -3
  7. data/lib/cirneco/metadata.rb +0 -2
  8. data/lib/cirneco/utils.rb +3 -3
  9. data/lib/cirneco/version.rb +1 -1
  10. data/lib/cirneco/work.rb +0 -2
  11. data/spec/api_spec.rb +8 -7
  12. data/spec/doi_spec.rb +4 -63
  13. data/spec/fixtures/vcr_cassettes/Cirneco_Work/media/includes_media.yml +42 -56
  14. data/spec/fixtures/vcr_cassettes/Cirneco_Work/schema/BlogPosting.yml +42 -56
  15. data/spec/utils_spec.rb +9 -10
  16. data/spec/work_spec.rb +2 -2
  17. metadata +10 -93
  18. data/data/authors.yml +0 -0
  19. data/data/references.yaml +0 -0
  20. data/data/site.yml +0 -0
  21. data/lib/cirneco/file_utils.rb +0 -371
  22. data/resources/jats-1.1/JATS-journalpublishing1-elements.xsd +0 -8608
  23. data/resources/jats-1.1/JATS-journalpublishing1-mathml3-elements.xsd +0 -8608
  24. data/resources/jats-1.1/JATS-journalpublishing1-mathml3.xsd +0 -48
  25. data/resources/jats-1.1/JATS-journalpublishing1.xsd +0 -59
  26. data/resources/jats-1.1/module-ali.xsd +0 -46
  27. data/resources/jats-1.1/standard-modules/mathml2/common/common-attribs.xsd +0 -44
  28. data/resources/jats-1.1/standard-modules/mathml2/common/math.xsd +0 -126
  29. data/resources/jats-1.1/standard-modules/mathml2/common/xlink-href.xsd +0 -20
  30. data/resources/jats-1.1/standard-modules/mathml2/content/arith.xsd +0 -90
  31. data/resources/jats-1.1/standard-modules/mathml2/content/calculus.xsd +0 -146
  32. data/resources/jats-1.1/standard-modules/mathml2/content/common-attrib.xsd +0 -30
  33. data/resources/jats-1.1/standard-modules/mathml2/content/constants.xsd +0 -83
  34. data/resources/jats-1.1/standard-modules/mathml2/content/constructs.xsd +0 -260
  35. data/resources/jats-1.1/standard-modules/mathml2/content/elementary-functions.xsd +0 -117
  36. data/resources/jats-1.1/standard-modules/mathml2/content/functions.xsd +0 -73
  37. data/resources/jats-1.1/standard-modules/mathml2/content/linear-algebra.xsd +0 -173
  38. data/resources/jats-1.1/standard-modules/mathml2/content/logic.xsd +0 -53
  39. data/resources/jats-1.1/standard-modules/mathml2/content/relations.xsd +0 -55
  40. data/resources/jats-1.1/standard-modules/mathml2/content/semantics.xsd +0 -85
  41. data/resources/jats-1.1/standard-modules/mathml2/content/sets.xsd +0 -236
  42. data/resources/jats-1.1/standard-modules/mathml2/content/statistics.xsd +0 -136
  43. data/resources/jats-1.1/standard-modules/mathml2/content/tokens.xsd +0 -120
  44. data/resources/jats-1.1/standard-modules/mathml2/content/vector-calculus.xsd +0 -88
  45. data/resources/jats-1.1/standard-modules/mathml2/content/zzz.tokens.xsd.from.zip +0 -120
  46. data/resources/jats-1.1/standard-modules/mathml2/mathml2.xsd +0 -59
  47. data/resources/jats-1.1/standard-modules/mathml2/presentation/action.xsd +0 -44
  48. data/resources/jats-1.1/standard-modules/mathml2/presentation/characters.xsd +0 -37
  49. data/resources/jats-1.1/standard-modules/mathml2/presentation/common-attribs.xsd +0 -113
  50. data/resources/jats-1.1/standard-modules/mathml2/presentation/common-types.xsd +0 -103
  51. data/resources/jats-1.1/standard-modules/mathml2/presentation/error.xsd +0 -40
  52. data/resources/jats-1.1/standard-modules/mathml2/presentation/layout.xsd +0 -195
  53. data/resources/jats-1.1/standard-modules/mathml2/presentation/scripts.xsd +0 -186
  54. data/resources/jats-1.1/standard-modules/mathml2/presentation/space.xsd +0 -52
  55. data/resources/jats-1.1/standard-modules/mathml2/presentation/style.xsd +0 -69
  56. data/resources/jats-1.1/standard-modules/mathml2/presentation/table.xsd +0 -216
  57. data/resources/jats-1.1/standard-modules/mathml2/presentation/tokens.xsd +0 -124
  58. data/resources/jats-1.1/standard-modules/xlink.xsd +0 -100
  59. data/resources/jats-1.1/standard-modules/xml.xsd +0 -287
  60. data/spec/file_utils_spec.rb +0 -203
  61. data/spec/fixtures/apa.csl +0 -621
  62. data/spec/fixtures/authors.yml +0 -19
  63. data/spec/fixtures/cool-dois/index.html +0 -404
  64. data/spec/fixtures/cool-dois-minted/index.html +0 -359
  65. data/spec/fixtures/cool-dois-minted.html.md +0 -99
  66. data/spec/fixtures/cool-dois-missing-metadata/index.html +0 -356
  67. data/spec/fixtures/cool-dois-no-accession-number.html.md +0 -97
  68. data/spec/fixtures/cool-dois-no-json-ld/index.html +0 -352
  69. data/spec/fixtures/cool-dois.html.md +0 -100
  70. data/spec/fixtures/cool-dois.yml +0 -10
  71. data/spec/fixtures/index-minted.html +0 -271
  72. data/spec/fixtures/index.html +0 -320
  73. data/spec/fixtures/index.html.erb +0 -42
  74. data/spec/fixtures/references.bib +0 -506
  75. data/spec/fixtures/references.yaml +0 -1060
  76. data/spec/fixtures/site.yml +0 -8
  77. data/spec/fixtures/vcr_cassettes/Cirneco_DataCenter/jats/should_generate_jats_for_all_urls.yml +0 -38
  78. data/spec/fixtures/vcr_cassettes/Cirneco_DataCenter/mint_and_hide_DOIs/should_hide_for_all_urls.yml +0 -38
  79. data/spec/fixtures/vcr_cassettes/Cirneco_DataCenter/mint_and_hide_DOIs/should_mint_and_hide_for_all_urls.yml +0 -38
  80. data/spec/fixtures/vcr_cassettes/Cirneco_DataCenter/mint_and_hide_DOIs/should_mint_for_all_urls.yml +0 -38
  81. data/spec/fixtures/vcr_cassettes/Cirneco_Doi/jats/writes_jats_for_list_of_urls.yml +0 -38
  82. data/spec/fixtures/vcr_cassettes/Cirneco_Doi/mint_and_hide_DOIs/hides_dois_for_list_of_urls.yml +0 -38
  83. data/spec/fixtures/vcr_cassettes/Cirneco_Doi/mint_and_hide_DOIs/mints_and_hides_dois_for_list_of_urls.yml +0 -38
  84. data/spec/fixtures/vcr_cassettes/Cirneco_Doi/mint_and_hide_DOIs/mints_dois_for_list_of_urls.yml +0 -38
  85. data/spec/fixtures/vcr_cassettes/Cirneco_Work/DOI_API/get/should_get_all_dois.yml +0 -121
  86. data/spec/fixtures/vcr_cassettes/Cirneco_Work/DOI_API/get/should_get_doi.yml +0 -121
  87. data/spec/fixtures/vcr_cassettes/Cirneco_Work/DOI_API/get/should_get_doi_not_found.yml +0 -121
  88. data/spec/fixtures/vcr_cassettes/Cirneco_Work/DOI_API/get/username_missing.yml +0 -121
  89. data/spec/fixtures/vcr_cassettes/Cirneco_Work/DOI_API/put/should_put_doi.yml +0 -121
  90. data/spec/fixtures/vcr_cassettes/Cirneco_Work/Media_API/get/should_get_media.yml +0 -121
  91. data/spec/fixtures/vcr_cassettes/Cirneco_Work/Media_API/post/should_post_media.yml +0 -121
  92. data/spec/fixtures/vcr_cassettes/Cirneco_Work/Metadata_API/delete/should_delete_metadata.yml +0 -121
  93. data/spec/fixtures/vcr_cassettes/Cirneco_Work/Metadata_API/get/should_get_metadata.yml +0 -121
  94. data/spec/fixtures/vcr_cassettes/Cirneco_Work/Metadata_API/post/should_post_metadata.yml +0 -121
@@ -1,352 +0,0 @@
1
- <!DOCTYPE html>
2
- <html>
3
- <head>
4
- <meta charset="utf-8">
5
- <!-- (1) Optimize for mobile versions: http://goo.gl/EOpFl -->
6
- <meta name="viewport" content="width=device-width, initial-scale=1.0">
7
- <!-- (1) force latest IE rendering engine: bit.ly/1c8EiC9 -->
8
- <meta http-equiv="X-UA-Compatible" content="IE=edge">
9
-
10
-
11
- <title>Cool DOI's</title>
12
- <meta name="description" content="In 1998 Tim Berners-Lee coined the term cool URIs (1998), that is URIs that don’t change. We know that URLs referenced in the scholarly literature are often not cool, leading to link rot (Klein et al., 2014) and making it hard or impossible to find..." />
13
-
14
- <meta name="HandheldFriendly" content="True" />
15
- <meta name="MobileOptimized" content="320" />
16
- <meta name="apple-mobile-web-app-capable" content="yes">
17
- <meta name="apple-mobile-web-app-status-bar-style" content="black-translucent">
18
-
19
- <!-- DublinCore Metadata -->
20
- <meta property="dc:title" content="Cool DOI's" />
21
- <meta property="dc:format" content="text/html" />
22
- <meta property="dc:language" content="en" />
23
- <meta property="dc:rights" content="CC-BY" />
24
- <meta property="dc:source" content="DataCite Blog" />
25
- <meta property="dc:subject" content="Scholarly Communication" />
26
- <meta property="dc:type" content="website" />
27
-
28
- <meta name="twitter:card" content="summary" />
29
- <meta name="twitter:site" content="datacite" />
30
- <meta name="twitter:title" content="Cool DOI's" />
31
- <meta name="twitter:image" content="https://blog.datacite.org/images/2016/12/cool-dois.png" />
32
- <meta name="twitter:description" content="In 1998 Tim Berners-Lee coined the term cool URIs (1998), that is URIs that don’t change. We know that URLs referenced in the scholarly literature are often not cool, leading to link rot (Klein et al., 2014) and making it hard or impossible to find..." />
33
-
34
- <meta property="og:site_name" content="Cool DOI's" />
35
- <meta property="og:description" content="In 1998 Tim Berners-Lee coined the term cool URIs (1998), that is URIs that don’t change. We know that URLs referenced in the scholarly literature are often not cool, leading to link rot (Klein et al., 2014) and making it hard or impossible to find..." />
36
- <meta property="og:image" content="https://blog.datacite.org/images/2016/12/cool-dois.png" />
37
- <meta property="og:type" content="blog" />
38
-
39
- <link href="//fonts.googleapis.com/css?family=Libre+Baskerville:400,400i,700" rel="stylesheet">
40
- <link href='//fonts.googleapis.com/css?family=Raleway:400,600,400italic,600italic' rel='stylesheet' type='text/css'>
41
- <link href="//maxcdn.bootstrapcdn.com/font-awesome/4.6.1/css/font-awesome.min.css" rel="stylesheet" type='text/css'>
42
- <link href="//localhost:4568/stylesheets/datacite.css" rel='stylesheet' type='text/css'>
43
-
44
- <link href="/images/favicon.ico" rel="icon" type="image/ico" />
45
- </head>
46
- <body>
47
-
48
- <!-- header start -->
49
-
50
- <div class="header" id="navtop">
51
- <div class="navbar navbar-white navbar-static-top" role="navigation">
52
- <div class="container-fluid">
53
- <div class="navbar-header"
54
- <button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-collapse">
55
- <span class="sr-only">Toggle navigation</span>
56
- <span class="icon-bar"></span>
57
- <span class="icon-bar"></span>
58
- <span class="icon-bar"></span>
59
- </button>
60
- <a class="navbar-brand" href="/">DataCite Blog</a>
61
- </div>
62
- <div class="navbar-collapse collapse">
63
- <ul class="nav navbar-nav navbar-right">
64
- <li class="dropdown">
65
- <a href="#" class="dropdown-toggle" data-toggle="dropdown" id="support">Support <span class="caret"></a>
66
- <ul class="dropdown-menu" role="menu">
67
- <li><a href="mailto:support@datacite.org">Email</a></li>
68
- <li><a href="https://github.com/datacite/blog">Source Code</a></li>
69
- </ul>
70
- </li>
71
- <li class="dropdown">
72
- <a href="#" class="dropdown-toggle" data-toggle="dropdown" id="sites"><i class='fa fa-th'></i> <span class="caret"></span></a>
73
- <ul class="dropdown-menu" role="menu">
74
- <li>
75
- <a href='https://api.datacite.org'>
76
- <i class='fa fa-cogs fa-fw'></i>
77
- API
78
- </a>
79
- </li>
80
- <li>
81
- <a href='https://blog.datacite.org'>
82
- <i class='fa fa-rss fa-fw'></i>
83
- Blog
84
- </a>
85
- </li>
86
- <li>
87
- <a href='http://citation.crosscite.org'>
88
- <i class='fa fa-file-text-o fa-fw'></i>
89
- Citation Formatter
90
- </a>
91
- </li>
92
- <li>
93
- <a href='https://data.datacite.org'>
94
- <i class='fa fa-repeat fa-fw'></i>
95
- Content Resolver
96
- </a>
97
- </li>
98
- <li>
99
- <a href='https://www.datacite.org'>
100
- <i class='fa fa-globe fa-fw'></i>
101
- Homepage
102
- </a>
103
- </li>
104
- <li>
105
- <a href='https://mds.datacite.org'>
106
- <i class='fa fa-database fa-fw'></i>
107
- MDS
108
- </a>
109
- </li>
110
- <li>
111
- <a href='https://oai.datacite.org'>
112
- <i class='fa fa-table fa-fw'></i>
113
- OAI-PMH
114
- </a>
115
- </li>
116
- <li>
117
- <a href='https://profiles.datacite.org'>
118
- <i class='fa fa-user fa-fw'></i>
119
- Profiles
120
- </a>
121
- </li>
122
- <li>
123
- <a href='https://schema.datacite.org'>
124
- <i class='fa fa-file-code-o fa-fw'></i>
125
- Schema
126
- </a>
127
- </li>
128
- <li>
129
- <a href='https://search.datacite.org'>
130
- <i class='fa fa-search fa-fw'></i>
131
- Search
132
- </a>
133
- </li>
134
- <li>
135
- <a href='https://stats.datacite.org'>
136
- <i class='fa fa-bar-chart fa-fw'></i>
137
- Statistics
138
- </a>
139
- </li>
140
- <li>
141
- <a href='http://status.datacite.org'>
142
- <i class='fa fa-calendar-check-o fa-fw'></i>
143
- Status
144
- </a>
145
- </li>
146
- </ul>
147
- </li>
148
- </ul>
149
- </div>
150
- </div>
151
- </div>
152
- </div>
153
-
154
- <!-- header end -->
155
-
156
- <div class="wrapper">
157
- <div class="section section-white">
158
- <div class="container-fluid">
159
- <div class="row row-section">
160
- <div class="col-md-8 col-md-offset-2 post-content">
161
- <a name="topofpage"></a>
162
- <div class="post-meta">
163
- <h1>Cool DOI's</h1>
164
- December 15, 2016 by Martin Fenner
165
- • <span class="post-reading-time"></span> read
166
- <p class="doi"><a href="https://doi.org/10.5438/55E5-T5C0">https://doi.org/10.5438/55E5-T5C0</a></p>
167
- </div>
168
-
169
- <p>In 1998 Tim Berners-Lee coined the term cool URIs <span class="citation">(<a href="#ref-https://www.w3.org/Provider/Style/URI">1998</a>)</span>, that is URIs that don’t change. We know that URLs referenced in the scholarly literature are often not cool, leading to link rot <span class="citation">(Klein et al., <a href="#ref-https://doi.org/10.1371/journal.pone.0115253">2014</a>)</span> and making it hard or impossible to find the referenced resource.</p>
170
- <p>Cool URIs are, of course, a fundamental principle behind DOIs, with the two important concepts <a href="https://www.doi.org/doi_handbook/3_Resolution.html"><em>resolution</em></a> (it is very hard to maintain a URL directly pointing at a resource) and <a href="https://www.doi.org/doi_handbook/6_Policies.html"><em>policies</em></a> (that all DOI registration agencies and organizations minting DOIs agree to maintain the redirection). The third essential element for DOIs, their <a href="https://www.doi.org/doi_handbook/4_Data_Model.html"><em>data model</em></a>, is not directly about persistent linking, but about the discoverability of the linked resources via standard metadata in a central index.</p>
171
- <p>All DOIs, expressed as HTTP URI, are therefore cool URIs. So what is a cool DOI? And, furthermore, how to create and use them? To understand what a cool DOI is, we have to explain the three parts that make up a DOI:</p>
172
- <div class="figure">
173
- <img src="/images/2016/12/doi-parts.png" />
174
-
175
- </div>
176
- <h3 id="proxy">Proxy</h3>
177
- <p>The proxy is not part of the DOI specification, but almost all scholarly DOIs that users encounter today will be expressed as HTTP URLs. DataCite recommends that all DOIs are displayed as permanent URLs, consistent with the recommendations of other DOI registration agencies, e.g. the <a href="http://www.crossref.org/02publishers/doi_display_guidelines.html">Crossref DOI display guidelines</a>. When the DOI system was originally designed, it was thought that the DOI protocol would become widely used, but that clearly has not happened and displaying DOIs as <strong>doi:10.5281/ZENODO.31780</strong> is therefore not recommended.</p>
178
- <p>The DOI proxy enables the functionality of expressing DOIs as HTTP URIs. Users should also be aware of two these two recommendations:</p>
179
- <ul>
180
- <li>Use <a href="https://www.doi.org/doi_proxy/proxy_policies.html">doi.org</a> instead of dx.doi.org as DNS name</li>
181
- <li>Use the HTTPS protocol instead of HTTP protocol</li>
182
- </ul>
183
- <p>Ed Pentz from Crossref makes the case for HTTPS in a <a href="http://blog.crossref.org/2016/09/new-crossref-doi-display-guidelines.html">September blog post</a>. The web, and therefore also the scholarly web, is moving to HTTPS as the default. It is important that the DOI proxy redirects to HTTPS URLs, and it will take some time until all DataCite data centers use HTTPS for the landing pages their DOIs redirects to.</p>
184
- <p>What many users don’t know is that doi.org is not the only proxy server for DOIs. DOIs use the handle system and any handle server will resolve a DOI, just as doi.org will resolve any handle. This means that <a href="https://hdl.handle.net/10.5281/ZENODO.31780" class="uri">https://hdl.handle.net/10.5281/ZENODO.31780</a> will resolve to the landing page for that DOI and that <a href="http://doi.org/10273/BGRB5054RX05201" class="uri">http://doi.org/10273/BGRB5054RX05201</a> is a handle (for a <a href="http://www.igsn.org/">IGSN</a>) and not a DOI.</p>
185
- <h3 id="prefix">Prefix</h3>
186
- <p>The DOI prefix is used as a namespace so that DOIs are globally unique without requiring global coordination for every new identifier. Prefixes in the handle system and therefore for DOIs are numbers without any semantic meaning. One lesson learned with persistent identifiers is that adding meaning to the identifier (e.g. by using a prefix with the name of the data repository) is always dangerous, because – despite best intentions – all names can change over time.</p>
187
- <p>Since the DOI prefix is a namespace to keep DOIs globally unique, there is usually no need for multiple prefixes for one organization managing DOI assignment. The tricky part is that these responsibilities can change, e.g. when an organization manages multiple repositories and one of them is migrated to another organization. It therefore makes sense to assign one prefix per list of resources that always stays together, e.g. one repository. It is possible that one prefix is managed by multiple organizations (as long as they use the same DOI registration agency), but that makes DOI management more complex.</p>
188
- <h3 id="suffix">Suffix</h3>
189
- <p>The suffix for a DOI can be (almost) any string. Which is both a feature and a curse. It is a feature because it gives maximal flexibility, for example when migrating existing identifiers to the DOI system. And it is a curse because it not always works well in the web context, as the list of characters allowed in a URL is limited. A good example of this are SICIs (<a href="https://en.wikipedia.org/wiki/Serial_Item_and_Contribution_Identifier">Serial Item and Contribution Identifier</a>), they were defined in 1996 before the DOI system was implemented, and could then be migrated to DOIs. Unfortunately they can contain many characters that are problematic in a URL or make it difficult to validate the DOI, as in <a href="https://doi.org/10.1002/(sici)1099-1409(199908/10)3:6/7%3C672::aid-jpp192%3E3.0.co;2-8" class="uri">https://doi.org/10.1002/(sici)1099-1409(199908/10)3:6/7&lt;672::aid-jpp192&gt;3.0.co;2-8</a>. A Crossref <a href="http://blog.crossref.org/2015/08/doi-regular-expressions.html">blog post</a> by Andrew Gilmartin gives a good overview about the characters found in DOIs and suggests the following regular expression to check for valid DOIs:</p>
190
- <pre><code>/^10.\d{4,9}/[-._;()/:A-Z0-9]+$/i</code></pre>
191
- <p>SICIs demonstrate two other pitfalls:</p>
192
- <ul>
193
- <li>they contain semantic information (ISSN, volume, number, etc.) that may change over time, and</li>
194
- <li>they are long, difficult to transcribe, with characters not allowed in URLs, and not very human-readable.</li>
195
- </ul>
196
- <p>Semantic information might also lead users to expect certain functionalities. A common pattern that we see at DataCite is to include information about the version or parent in the suffix, e.g. <a href="https://doi.org/10.6084/M9.FIGSHARE.3501629.V1" class="uri">https://doi.org/10.6084/M9.FIGSHARE.3501629.V1</a> or <a href="https://doi.org/10.5061/DRYAD.0SN63/7" class="uri">https://doi.org/10.5061/DRYAD.0SN63/7</a>. While the decision on what to put into the suffix is up to each data center, we should make sure users don’t think that these are functionalities of the DOI system (e.g. that adding <strong>.V2</strong> to any DOI name will resolve to version 2 of that resource).</p>
197
- <p>Another issue to keep in mind when assigning suffixes is that DOIs – in contrast to HTTP URIs – are case-insensitive, <a href="https://doi.org/10.5281/ZENODO.31780" class="uri">https://doi.org/10.5281/ZENODO.31780</a> and <a href="https://doi.org/10.5281/zenodo.31780" class="uri">https://doi.org/10.5281/zenodo.31780</a> are the same DOI. All DOIs are <a href="https://www.doi.org/doi_handbook/2_Numbering.html#2.4">converted to upper case</a> upon registration and DOI resolution, but DOIs are not consistently displayed in such a way.</p>
198
- <h3 id="generating-cool-dois">Generating cool DOIs</h3>
199
- <p>With all that, what should the ideal DOI look like? Its suffix should be:</p>
200
- <ul>
201
- <li>opaque without semantic information</li>
202
- <li>work well in a web environment, avoiding characters problematic in URLs</li>
203
- <li>short and human-readable</li>
204
- <li>Resistant to transcription errors</li>
205
- <li>easy to generate</li>
206
- </ul>
207
- <p>On Tuesday DataCite released a tool that helps generating such a suffix, an open source command line tool called <a href="https://github.com/datacite/cirneco">cirneco</a> (a lot of our open source software uses Italian dog breed names). Cirneco is a Ruby gem that can be installed via</p>
208
- <pre><code>gem install cirneco</code></pre>
209
- <p>Cirneco uses base32 encoding, as <a href="http://www.crockford.com/wrmg/base32.html">described</a> by Douglas Crockford. The encoding starts with a randomly generated number to guarantee uniqueness of the identifier, and then encodes the number into a string that uses all numbers and uppercase letters. It avoids the letters I, O and L as they can be confused with the letter 1 and 0, using 32 characters (and 5 checksum characters) in total. The last character is a checksum. The resulting string from cirneco always has a length of 8 characters, in groups of 4 separated by a hyphen to help with readability. The advantage of base32 encoding over using only numbers (as for example ORCID is doing) is that the resulting string becomes much more compact, the available 7 characters (plus one for the checksum) can encode 34,359,738,367 strings, compared to 10 million when only using numbers. This number is large enough that the resulting suffix will not only be unique for a given prefix, but also unique for all DOIs (there is a very small chance to get the same random number twice, but this will be rejected when trying to register the DOI).</p>
210
- <p>Another common way to generate random strings would have been universally unique identifiers (<a href="https://en.wikipedia.org/wiki/Universally_unique_identifier">UUID</a>), but they are long and not very human-readable, e.g. <a href="https://doi.org/10.4233/UUID:6D192FE2-DE18-4556-873A-D3CD56AB96A6" class="uri">https://doi.org/10.4233/UUID:6D192FE2-DE18-4556-873A-D3CD56AB96A6</a>.</p>
211
- <p>An example DOI generated by cirneco would be</p>
212
- <pre><code>cirneco doi generate --prefix 10.5555
213
- 10.5555/KVTD-VPWM</code></pre>
214
- <p>The generated DOI is short enough that it should work well in places where space is limited, providing an alternative to the <a href="http://shortdoi.org/">ShortDOI</a> service which shortens existing DOIs, but does this by adding another layer on top of the DOI proxy.</p>
215
- <p>Another cirneco command checks that this is a valid bas32 string using the checksum</p>
216
- <pre><code>cirneco doi check 10.5555/KVTD-VPWM
217
- Checksum for 10.5555/KVTD-VPWM is valid</code></pre>
218
- <p>This can be used to quickly verify a DOI, e.g. in a web form or API. The Ruby base32 encoding library used by cirneco is open source (<a href="https://github.com/datacite/base32" class="uri">https://github.com/datacite/base32</a>. I added the checksum to the existing library), and implementations of the Crockford base32 encoding pattern are available in many other languages, including <a href="https://github.com/jbittel/base32-crockford">Python</a>, <a href="https://github.com/dflydev/dflydev-base32-crockford">PHP</a>, <a href="https://www.npmjs.com/package/base32-crockford">Javascript</a>, <a href="http://stackoverflow.com/questions/22385467/crockford-base32-encoding-for-large-number-java-implementation">Java</a>, <a href="https://github.com/richardlehane/crock32">Go</a> and <a href="https://crockfordbase32.codeplex.com/">.NET</a>.</p>
219
- <p>To answer the question raised at the beginning: a cool DOI is a DOI expressed as HTTPS URI using the doi.org proxy and using a base32-encoded suffix, for example <strong>https://doi.org/10.5555/KVTD-VPWM</strong>. This DOI works well in a web environment, is human readable, easy to parse and detect (e.g. in text mining), and can be generated using an algorithm that is well understood and supported.</p>
220
- <div class="figure">
221
- <img src="/images/2016/12/cool-dois.svg" />
222
-
223
- </div>
224
- <h3 id="references" class="unnumbered">References</h3>
225
- <div id="refs" class="references">
226
- <div id="ref-https://www.w3.org/Provider/Style/URI">
227
- <p>Berners-Lee, T. (1998). Hypertext Style: Cool URIs don’t change. Retrieved from <a href="https://www.w3.org/Provider/Style/URI" class="uri">https://www.w3.org/Provider/Style/URI</a></p>
228
- </div>
229
- <div id="ref-https://doi.org/10.1371/journal.pone.0115253">
230
- <p>Klein, M., Sompel, H. V. de, Sanderson, R., Shankar, H., Balakireva, L., Zhou, K., &amp; Tobin, R. (2014). Scholarly Context Not Found: One in Five Articles Suffers from Reference Rot. <em>PLOS ONE</em>, <em>9</em>(12), e115253. <a href="http://doi.org/10.1371/journal.pone.0115253" class="uri">http://doi.org/10.1371/journal.pone.0115253</a></p>
231
- </div>
232
- </div>
233
-
234
- <hr width="80%">
235
- </div>
236
- </div>
237
- <div class="row">
238
- <div class="col-md-5 col-md-offset-2 post-content">
239
- <div class="bottom-teaser cf">
240
- <div class="isLeft">
241
- <section class="author">
242
- <div class="author-image" style="background-image: url(https://www.gravatar.com/avatar/434592a097e91261792ebd6b492042bc?s=250&d=mm&r=x)">Blog Logo</div>
243
- <h4>Martin Fenner</h4>
244
- <p class="bio">DataCite Technical Director</p>
245
- <p class="orcid"><a href="http://orcid.org/0000-0003-1419-2405">http://orcid.org/0000-0003-1419-2405</a></p>
246
- <div class="clearfix"></div>
247
- <h4>Cool DOI's</h4>
248
- <p class="published"><a href="https://doi.org/10.5438/55E5-T5C0">https://doi.org/10.5438/55E5-T5C0</a>
249
- <p class="published"><i class="fa fa-calendar"></i> <time datetime="2016-12-15 00:00">December 15, 2016</time></p>
250
- <p class="published"><i class="fa fa-history"></i> <a href="https://github.com/datacite/blog/commits/master/source/posts/cool-dois.html.md">History</a></p>
251
- <p class="published">© 2016 Martin Fenner. Distributed under the terms of the <a href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution license</a>.</p>
252
- <p class="published">
253
- <i class="fa fa-tags"></i>
254
- <a href="/index.html?tag=doi">doi</a>, <a href="/index.html?tag=featured">featured</a>
255
- </p>
256
- </section>
257
- </div>
258
- </div>
259
-
260
- </div>
261
- <div class="col-md-2 col-md-offset-1">
262
- <div class="bottom-teaser cf">
263
- <div class="isLeft">
264
- <h5 class="index-headline featured"><span>Share on</span></h5>
265
- <a class="icon-twitter" href="http://twitter.com/share?text=On the @datacite blog: Cool DOI's&amp;url=http://localhost:4567/cool-dois/"
266
- onclick="window.open(this.href, 'twitter-share', 'width=550,height=255');return false;">
267
- <i class="fa fa-twitter fa-2x"></i><span class="hidden">twitter</span>
268
- </a>
269
- <a class="icon-facebook" href="https://www.facebook.com/sharer.php?t=On the @datacite blog: Cool DOI's&amp;u=http://localhost:4567/cool-dois/"
270
- onclick="window.open(this.href, 'facebook-share', 'width=550,height=255');return false;">
271
- <i class="fa fa-facebook fa-2x"></i><span class="hidden">facebook</span>
272
- </a>
273
- </div>
274
- </div>
275
-
276
- </div>
277
- </div>
278
- </div>
279
- </div>
280
- </div>
281
-
282
- <!-- footer start -->
283
-
284
- <footer class='row footer'>
285
- <div class="container-fluid">
286
- <div class='col-md-3 col-sm-4'>
287
- <h4>About DataCite</h4>
288
- <ul>
289
- <li><a href="https://www.datacite.org/mission.html">What we do</a></a></li>
290
- <li><a href="https://www.datacite.org/board.html">Board</a></a></li>
291
- <li><a href="https://www.datacite.org/steering.html">Steering groups</a></a></li>
292
- <li><a href="https://www.datacite.org/staff.html">Staff</a></a></li>
293
- <li><a href="https://www.datacite.org/jobopportunities.html">Job opportunities</a></a></li>
294
- </ul>
295
- </div>
296
- <div class='col-md-3 col-sm-4'>
297
- <h4>Services</h4>
298
- <ul>
299
- <li><a href="https://www.datacite.org/dois.html">Assign DOIs</a></a></li>
300
- <li><a href="https://www.datacite.org/search.html">Metadata search</a></a></li>
301
- <li><a href="https://www.datacite.org/eventdata.html">Event data</a></a></li>
302
- <li><a href="https://www.datacite.org/profiles.html">Profiles</a></a></li>
303
- <li><a href="https://www.datacite.org/re3data.html">re3data</a></a></li>
304
- <li><a href="https://www.datacite.org/citation.html">Citation formatter</a></a></li>
305
- <li><a href="https://www.datacite.org/stats.html">Statistics</a></a></li>
306
- <li><a href="https://www.datacite.org/service.html">Service status</a></a></li>
307
- <li><a href="https://www.datacite.org/content.html">Content negotiation</a></a></li>
308
- <li><a href="https://www.datacite.org/oaipmh.html">OAI-PMH</a></a></li>
309
- <li><a href="https://www.datacite.org/test.html">Test environment</a></a></li>
310
- </ul>
311
- </div>
312
- <div class='col-md-3 col-sm-4'>
313
- <h4>Resources</h4>
314
- <ul>
315
- <li><a href="https://schema.datacite.org">Metadata schema</a></a></li>
316
- <li><a href="https://www.datacite.org/technical.html">Technical documentation</a></a></li>
317
- <li><a href="https://www.datacite.org/outreach.html">Outreach material</a></a></li>
318
- <li><a href="https://www.datacite.org/events.html">Events</a></a></li>
319
- </ul>
320
- <h4>Community</h4>
321
- <ul>
322
- <li><a href="https://www.datacite.org/members.html">Members</a></a></li>
323
- <li><a href="https://www.datacite.org/partners.html">Partners</a></a></li>
324
- <li><a href="https://www.datacite.org/steering.html">Steering groups</a></a></li>
325
- <li><a href="https://www.datacite.org/events.html">Events</a></a></li>
326
- </ul>
327
- </div>
328
- <div class='col-md-3'>
329
- <h4 class="share">Contact us</h4>
330
- <a href='mailto:support@datacite.org' class="share">
331
- <i class='fa fa-at'></i>
332
- </a>
333
- <a href='https://blog.datacite.org' class="share">
334
- <i class='fa fa-rss'></i>
335
- </a>
336
- <a href='https://twitter.com/datacite' class="share">
337
- <i class='fa fa-twitter'></i>
338
- </a>
339
- <a href='https://www.linkedin.com/company/datacite' class="share">
340
- <i class='fa fa-linkedin'></i>
341
- </a>
342
- <ul class="share">
343
- <li><a href="https://www.datacite.org/terms.html">Terms and conditions</a></a></li>
344
- <li><a href="https://www.datacite.org/privacy.html">Privacy policy</a></a></li>
345
- <li><a href="https://www.datacite.org/acknowledgments.html">Acknowledgements</a></a></li>
346
- </ul>
347
- </div>
348
- </div>
349
- </div>
350
-
351
- </body>
352
- </html>
@@ -1,100 +0,0 @@
1
- ---
2
- layout: post
3
- title: Cool DOI's
4
- author: mfenner
5
- tags:
6
- - doi
7
- - featured
8
- accession_number: MS-123
9
- image: https://blog.datacite.org/images/2016/12/cool-dois.png
10
- published: false
11
- date: '2016-12-15'
12
- doi: 10.5438/55e5-t5c0
13
- ---
14
- In 1998 Tim Berners-Lee coined the term cool URIs [-@https://www.w3.org/Provider/Style/URI], that is URIs that don’t change. We know that URLs referenced in the scholarly literature are often not cool, leading to link rot [@https://doi.org/10.1371/journal.pone.0115253] and making it hard or impossible to find the referenced resource.READMORE
15
-
16
- Cool URIs are, of course, a fundamental principle behind DOIs, with the two important concepts [*resolution*](https://www.doi.org/doi_handbook/3_Resolution.html) (it is very hard to maintain a URL directly pointing at a resource) and [*policies*](https://www.doi.org/doi_handbook/6_Policies.html) (that all DOI registration agencies and organizations minting DOIs agree to maintain the redirection). The third essential element for DOIs, their [*data model*](https://www.doi.org/doi_handbook/4_Data_Model.html), is not directly about persistent linking, but about the discoverability of the linked resources via standard metadata in a central index.
17
-
18
- All DOIs, expressed as HTTP URI, are therefore cool URIs. So what is a cool DOI? And, furthermore, how to create and use them? To understand what a cool DOI is, we have to explain the three parts that make up a DOI:
19
-
20
- ![](/images/2016/12/doi-parts.png)
21
-
22
- ### Proxy
23
-
24
- The proxy is not part of the DOI specification, but almost all scholarly DOIs that users encounter today will be expressed as HTTP URLs. DataCite recommends that all DOIs are displayed as permanent URLs, consistent with the recommendations of other DOI registration agencies, e.g. the [Crossref DOI display guidelines](http://www.crossref.org/02publishers/doi_display_guidelines.html). When the DOI system was originally designed, it was thought that the DOI protocol would become widely used, but that clearly has not happened and displaying DOIs as **doi:10.5281/ZENODO.31780** is therefore not recommended.
25
-
26
- The DOI proxy enables the functionality of expressing DOIs as HTTP URIs. Users should also be aware of two these two recommendations:
27
-
28
- * Use [doi.org](https://www.doi.org/doi_proxy/proxy_policies.html) instead of dx.doi.org as DNS name
29
- * Use the HTTPS protocol instead of HTTP protocol
30
-
31
- Ed Pentz from Crossref makes the case for HTTPS in a [September blog post](http://blog.crossref.org/2016/09/new-crossref-doi-display-guidelines.html). The web, and therefore also the scholarly web, is moving to HTTPS as the default. It is important that the DOI proxy redirects to HTTPS URLs, and it will take some time until all DataCite data centers use HTTPS for the landing pages their DOIs redirects to.
32
-
33
- What many users don’t know is that doi.org is not the only proxy server for DOIs. DOIs use the handle system and any handle server will resolve a DOI, just as doi.org will resolve any handle. This means that [https://hdl.handle.net/10.5281/ZENODO.31780](https://hdl.handle.net/10.5281/ZENODO.31780) will resolve to the landing page for that DOI and that [http://doi.org/10273/BGRB5054RX05201](http://doi.org/10273/BGRB5054RX05201) is a handle (for a [IGSN](http://www.igsn.org/)) and not a DOI.
34
-
35
- ### Prefix
36
-
37
- The DOI prefix is used as a namespace so that DOIs are globally unique without requiring global coordination for every new identifier. Prefixes in the handle system and therefore for DOIs are numbers without any semantic meaning. One lesson learned with persistent identifiers is that adding meaning to the identifier (e.g. by using a prefix with the name of the data repository) is always dangerous, because – despite best intentions – all names can change over time.
38
-
39
- Since the DOI prefix is a namespace to keep DOIs globally unique, there is usually no need for multiple prefixes for one organization managing DOI assignment. The tricky part is that these responsibilities can change, e.g. when an organization manages multiple repositories and one of them is migrated to another organization. It therefore makes sense to assign one prefix per list of resources that always stays together, e.g. one repository. It is possible that one prefix is managed by multiple organizations (as long as they use the same DOI registration agency), but that makes DOI management more complex.
40
-
41
- ### Suffix
42
-
43
- The suffix for a DOI can be (almost) any string. Which is both a feature and a curse. It is a feature because it gives maximal flexibility, for example when migrating existing identifiers to the DOI system. And it is a curse because it not always works well in the web context, as the list of characters allowed in a URL is limited. A good example of this are SICIs ([Serial Item and Contribution Identifier](https://en.wikipedia.org/wiki/Serial_Item_and_Contribution_Identifier)), they were defined in 1996 before the DOI system was implemented, and could then be migrated to DOIs. Unfortunately they can contain many characters that are problematic in a URL or make it difficult to validate the DOI, as in [https://doi.org/10.1002/(sici)1099-1409(199908/10)3:6/7<672::aid-jpp192>3.0.co;2-8](https://doi.org/10.1002/(sici)1099-1409(199908/10)3:6/7<672::aid-jpp192>3.0.co;2-8). A Crossref [blog post](http://blog.crossref.org/2015/08/doi-regular-expressions.html) by Andrew Gilmartin gives a good overview about the characters found in DOIs and suggests the following regular expression to check for valid DOIs:
44
-
45
- ```
46
- /^10.\d{4,9}/[-._;()/:A-Z0-9]+$/i
47
- ```
48
-
49
- SICIs demonstrate two other pitfalls:
50
-
51
- * they contain semantic information (ISSN, volume, number, etc.) that may change over time, and
52
- * they are long, difficult to transcribe, with characters not allowed in URLs, and not very human-readable.
53
-
54
- Semantic information might also lead users to expect certain functionalities. A common pattern that we see at DataCite is to include information about the version or parent in the suffix, e.g. [https://doi.org/10.6084/M9.FIGSHARE.3501629.V1](https://doi.org/10.6084/M9.FIGSHARE.3501629.V1) or [https://doi.org/10.5061/DRYAD.0SN63/7](https://doi.org/10.5061/DRYAD.0SN63/7). While the decision on what to put into the suffix is up to each data center, we should make sure users don't think that these are functionalities of the DOI system (e.g. that adding **.V2** to any DOI name will resolve to version 2 of that resource).
55
-
56
- Another issue to keep in mind when assigning suffixes is that DOIs – in contrast to HTTP URIs – are case-insensitive, [https://doi.org/10.5281/ZENODO.31780](https://doi.org/10.5281/ZENODO.31780) and [https://doi.org/10.5281/zenodo.31780](https://doi.org/10.5281/zenodo.31780) are the same DOI. All DOIs are [converted to upper case](https://www.doi.org/doi_handbook/2_Numbering.html#2.4) upon registration and DOI resolution, but DOIs are not consistently displayed in such a way.
57
-
58
- ### Generating cool DOIs
59
-
60
- With all that, what should the ideal DOI look like? Its suffix should be:
61
-
62
- * opaque without semantic information
63
- * work well in a web environment, avoiding characters problematic in URLs
64
- * short and human-readable
65
- * Resistant to transcription errors
66
- * easy to generate
67
-
68
- On Tuesday DataCite released a tool that helps generating such a suffix, an open source command line tool called [cirneco](https://github.com/datacite/cirneco) (a lot of our open source software uses Italian dog breed names). Cirneco is a Ruby gem that can be installed via
69
-
70
- ```
71
- gem install cirneco
72
- ```
73
-
74
- Cirneco uses base32 encoding, as [described](http://www.crockford.com/wrmg/base32.html) by Douglas Crockford. The encoding starts with a randomly generated number to guarantee uniqueness of the identifier, and then encodes the number into a string that uses all numbers and uppercase letters. It avoids the letters I, O and L as they can be confused with the letter 1 and 0, using 32 characters (and 5 checksum characters) in total. The last character is a checksum. The resulting string from cirneco always has a length of 8 characters, in groups of 4 separated by a hyphen to help with readability. The advantage of base32 encoding over using only numbers (as for example ORCID is doing) is that the resulting string becomes much more compact, the available 7 characters (plus one for the checksum) can encode 34,359,738,367 strings, compared to 10 million when only using numbers. This number is large enough that the resulting suffix will not only be unique for a given prefix, but also unique for all DOIs (there is a very small chance to get the same random number twice, but this will be rejected when trying to register the DOI).
75
-
76
- Another common way to generate random strings would have been universally unique identifiers ([UUID](https://en.wikipedia.org/wiki/Universally_unique_identifier)), but they are long and not very human-readable, e.g. [https://doi.org/10.4233/UUID:6D192FE2-DE18-4556-873A-D3CD56AB96A6](https://doi.org/10.4233/UUID:6D192FE2-DE18-4556-873A-D3CD56AB96A6).
77
-
78
- An example DOI generated by cirneco would be
79
-
80
- ```
81
- cirneco doi generate --prefix 10.5555
82
- 10.5555/KVTD-VPWM
83
- ```
84
-
85
- The generated DOI is short enough that it should work well in places where space is limited, providing an alternative to the [ShortDOI](http://shortdoi.org/) service which shortens existing DOIs, but does this by adding another layer on top of the DOI proxy.
86
-
87
- Another cirneco command checks that this is a valid bas32 string using the checksum
88
-
89
- ```
90
- cirneco doi check 10.5555/KVTD-VPWM
91
- Checksum for 10.5555/KVTD-VPWM is valid
92
- ```
93
-
94
- This can be used to quickly verify a DOI, e.g. in a web form or API. The Ruby base32 encoding library used by cirneco is open source ([https://github.com/datacite/base32](https://github.com/datacite/base32). I added the checksum to the existing library), and implementations of the Crockford base32 encoding pattern are available in many other languages, including [Python](https://github.com/jbittel/base32-crockford), [PHP](https://github.com/dflydev/dflydev-base32-crockford), [Javascript](https://www.npmjs.com/package/base32-crockford), [Java](http://stackoverflow.com/questions/22385467/crockford-base32-encoding-for-large-number-java-implementation), [Go](https://github.com/richardlehane/crock32) and [.NET](https://crockfordbase32.codeplex.com/).
95
-
96
- To answer the question raised at the beginning: a cool DOI is a DOI expressed as HTTPS URI using the doi.org proxy and using a base32-encoded suffix, for example **https://doi.org/10.5555/KVTD-VPWM**. This DOI works well in a web environment, is human readable, easy to parse and detect (e.g. in text mining), and can be generated using an algorithm that is well understood and supported.
97
-
98
- ![](/images/2016/12/cool-dois.png)
99
-
100
- ### References
@@ -1,10 +0,0 @@
1
- ---
2
- layout: post
3
- title: Cool DOI's
4
- author: mfenner
5
- date: 2016-12-15
6
- tags:
7
- - doi
8
- - featured
9
- image: https://blog.datacite.org/images/2016/12/cool-dois.png
10
- ---