commonmeta-ruby 3.3.2 → 3.3.4
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/Gemfile.lock +3 -1
- data/bin/commonmeta +1 -1
- data/commonmeta.gemspec +1 -0
- data/lib/commonmeta/cli.rb +10 -0
- data/lib/commonmeta/readers/json_feed_reader.rb +1 -1
- data/lib/commonmeta/utils.rb +55 -1
- data/lib/commonmeta/version.rb +1 -1
- data/lib/commonmeta.rb +1 -0
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/change_metadata_as_datacite_xml/with_data_citation.yml +14 -14
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/doi_registration_agency/crossref.yml +5 -5
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/doi_registration_agency/datacite.yml +5 -5
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/doi_registration_agency/jalc.yml +5 -5
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/doi_registration_agency/kisti.yml +5 -5
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/doi_registration_agency/medra.yml +5 -5
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/doi_registration_agency/not_found.yml +5 -5
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/doi_registration_agency/op.yml +5 -5
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/find_from_format_by_ID/crossref.yml +5 -5
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/find_from_format_by_ID/crossref_doi_not_url.yml +5 -5
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/find_from_format_by_ID/datacite.yml +5 -5
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/find_from_format_by_ID/datacite_doi_http.yml +5 -5
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/find_from_format_by_ID/unknown_DOI_registration_agency.yml +5 -5
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_cff_metadata/cff-converter-python.yml +9 -7
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_cff_metadata/ruby-cff.yml +16 -14
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_cff_metadata/ruby-cff_repository_url.yml +14 -12
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_codemeta_metadata/maremma.yml +10 -8
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_codemeta_metadata/metadata_reports.yml +9 -7
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/DOI_with_ORCID_ID.yml +74 -74
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/DOI_with_SICI_DOI.yml +73 -73
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/DOI_with_data_citation.yml +70 -70
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/JaLC.yml +159 -159
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/KISTI.yml +128 -128
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/OP.yml +72 -72
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/affiliation_is_space.yml +73 -73
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/another_book.yml +109 -109
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/another_book_chapter.yml +71 -71
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/article_id_as_page_number.yml +74 -74
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/author_literal.yml +82 -82
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/book.yml +71 -71
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/book_chapter.yml +72 -72
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/book_chapter_with_RDF_for_container.yml +70 -70
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/book_oup.yml +69 -69
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/component.yml +91 -91
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/dataset.yml +101 -102
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/dataset_usda.yml +133 -133
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/date_in_future.yml +78 -78
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/dissertation.yml +100 -100
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/empty_given_name.yml +72 -72
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/invalid_date.yml +73 -73
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/journal_article.yml +72 -72
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/journal_article_original_language_title.yml +70 -70
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/journal_article_with.yml +76 -514
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/journal_article_with_RDF_for_container.yml +70 -70
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/journal_article_with_funding.yml +73 -73
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/journal_issue.yml +69 -69
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/mEDRA.yml +69 -69
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/markup.yml +78 -78
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/missing_creator.yml +73 -73
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/multiple_issn.yml +72 -72
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/multiple_titles.yml +71 -70
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/multiple_titles_with_missing.yml +716 -716
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/not_found_error.yml +63 -63
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/peer_review.yml +74 -74
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/posted_content.yml +71 -71
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/posted_content_copernicus.yml +73 -73
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/report_osti.yml +117 -117
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/vor_with_url.yml +75 -75
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/yet_another_book.yml +69 -69
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_metadata/yet_another_book_chapter.yml +70 -70
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_crossref_raw/journal_article.yml +10 -10
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_datacite_metadata/dissertation.yml +9 -9
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_datacite_metadata/funding_references.yml +11 -11
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_datacite_metadata/subject_scheme.yml +20 -20
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_doi_prefix_for_blog/by_blog_id.yml +6 -419
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_doi_prefix_for_blog/by_blog_post_uuid.yml +7 -260
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_doi_prefix_for_blog/by_blog_post_uuid_specific_prefix.yml +3 -136
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_json_feed/by_blog_id.yml +225 -1432
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_json_feed/not_indexed_posts.yml +1380 -2112
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_json_feed/unregistered_posts.yml +6 -172
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_json_feed_item_metadata/blog_post_with_non-url_id.yml +7 -7
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_json_feed_item_metadata/blogger_post.yml +12 -12
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_json_feed_item_metadata/ghost_post_with_author_name_suffix.yml +8 -8
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_json_feed_item_metadata/ghost_post_with_doi.yml +7 -7
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_json_feed_item_metadata/ghost_post_with_organizational_author.yml +3 -3
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_json_feed_item_metadata/ghost_post_without_doi.yml +8 -8
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_json_feed_item_metadata/jekyll_post.yml +8 -8
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_json_feed_item_metadata/substack_post_with_broken_reference.yml +90 -176
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_json_feed_item_metadata/syldavia_gazette_post_with_references.yml +25 -25
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_json_feed_item_metadata/upstream_post_with_references.yml +61 -61
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_json_feed_item_metadata/wordpress_post.yml +8 -8
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_json_feed_item_metadata/wordpress_post_with_references.yml +20 -20
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_one_author/has_familyName.yml +9 -9
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_one_author/has_name_in_display-order_with_ORCID.yml +9 -9
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_one_author/name_with_affiliation_crossref.yml +14 -14
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_one_author/only_familyName_and_givenName.yml +43 -36
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_schema_org_metadata/BlogPosting.yml +158 -158
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_schema_org_metadata/BlogPosting_with_new_DOI.yml +162 -162
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_schema_org_metadata/get_schema_org_metadata_front_matter/BlogPosting.yml +178 -180
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_schema_org_metadata/harvard_dataverse.yml +226 -230
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_schema_org_metadata/pangaea.yml +43 -36
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_schema_org_metadata/upstream_blog.yml +94 -94
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_schema_org_metadata/zenodo.yml +14 -14
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/handle_input/DOI_RA_not_Crossref_or_DataCite.yml +5 -5
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/handle_input/unknown_DOI_prefix.yml +5 -5
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/json_schema_errors/is_valid.yml +13 -13
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_bibtex/BlogPosting.yml +4 -4
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_bibtex/Dataset.yml +6 -6
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_bibtex/authors_with_affiliations.yml +14 -14
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_bibtex/climate_data.yml +6 -6
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_bibtex/from_schema_org.yml +159 -159
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_bibtex/keywords_subject_scheme.yml +6 -6
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_bibtex/maremma.yml +12 -10
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_bibtex/text.yml +4 -4
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_bibtex/with_data_citation.yml +14 -14
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_bibtex/with_pages.yml +12 -12
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_cff/Collection_of_Jupyter_notebooks.yml +9 -9
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_cff/SoftwareSourceCode_Zenodo.yml +17 -17
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_cff/SoftwareSourceCode_also_Zenodo.yml +12 -12
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_cff/ruby-cff.yml +16 -14
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_citation/Dataset.yml +6 -6
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_citation/Journal_article.yml +14 -14
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_citation/Journal_article_vancouver_style.yml +19 -19
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_citation/Missing_author.yml +12 -12
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_citation/interactive_resource_without_dates.yml +4 -4
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_citation/software_w/version.yml +6 -6
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_codemeta/SoftwareSourceCode_DataCite.yml +4 -4
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_codemeta/SoftwareSourceCode_DataCite_check_codemeta_v2.yml +4 -4
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_crossref/another_schema_org_from_front-matter.yml +27 -27
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_crossref/journal_article.yml +4 -4
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_crossref/journal_article_from_datacite.yml +4 -4
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_crossref/json_feed_item_from_rogue_scholar_with_doi.yml +8 -54
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_crossref/json_feed_item_from_rogue_scholar_with_organizational_author.yml +3 -3
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_crossref/json_feed_item_from_upstream_blog.yml +10 -53
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_crossref/json_feed_item_with_references.yml +62 -62
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_crossref/posted_content.yml +15 -15
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_crossref/schema_org_from_another_science_blog.yml +6 -6
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_crossref/schema_org_from_front_matter.yml +29 -29
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_crossref/schema_org_from_upstream_blog.yml +4 -4
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_csl/Another_dataset.yml +28 -28
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_csl/BlogPosting.yml +4 -4
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_csl/BlogPosting_schema_org.yml +158 -158
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_csl/Dataset.yml +6 -6
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_csl/container_title.yml +11 -11
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_csl/interactive_resource_without_dates.yml +4 -4
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_csl/journal_article.yml +14 -14
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_csl/keywords_subject_scheme.yml +6 -6
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_csl/maremma.yml +9 -7
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_csl/missing_creator.yml +12 -12
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_csl/multiple_abstracts.yml +6 -6
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_csl/organization_author.yml +19 -19
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_csl/software.yml +4 -4
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_csl/software_w/version.yml +6 -6
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_csl/with_only_first_page.yml +13 -13
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_csl/with_pages.yml +12 -12
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_csv/climate_data.yml +6 -6
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_csv/maremma.yml +10 -8
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_csv/text.yml +4 -4
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_csv/with_data_citation.yml +14 -14
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_csv/with_pages.yml +12 -12
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_datacite/dissertation.yml +17 -17
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_datacite/from_schema_org.yml +158 -158
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_datacite/journal_article.yml +18 -18
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_datacite/maremma.yml +10 -8
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_datacite/with_ORCID_ID.yml +12 -12
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_datacite/with_data_citation.yml +14 -14
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_jats_xml/Dataset_in_schema_4_0.yml +6 -6
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_jats_xml/Text_pass-thru.yml +4 -4
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_jats_xml/book_chapter.yml +15 -13
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_jats_xml/from_schema_org.yml +158 -158
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_jats_xml/interactive_resource_without_dates.yml +4 -4
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_jats_xml/maremma.yml +12 -10
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_jats_xml/with_ORCID_ID.yml +12 -12
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_jats_xml/with_data_citation.yml +14 -14
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_jats_xml/with_editor.yml +13 -13
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_ris/BlogPosting.yml +4 -4
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_ris/BlogPosting_schema_org.yml +159 -159
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_ris/Dataset.yml +6 -6
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_ris/alternate_name.yml +4 -4
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_ris/journal_article.yml +8 -8
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_ris/keywords_with_subject_scheme.yml +6 -6
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_ris/maremma.yml +9 -7
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_ris/with_pages.yml +7 -7
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_schema_org/Another_Schema_org_JSON.yml +6 -6
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_schema_org/Funding.yml +9 -9
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_schema_org/Funding_OpenAIRE.yml +9 -9
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_schema_org/Schema_org_JSON.yml +17 -17
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_schema_org/Schema_org_JSON_Cyark.yml +33 -33
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_schema_org/alternate_identifiers.yml +9 -9
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_schema_org/data_catalog.yml +9 -9
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_schema_org/geo_location_box.yml +12 -12
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_schema_org/interactive_resource_without_dates.yml +9 -9
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_schema_org/journal_article.yml +14 -14
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_schema_org/maremma_schema_org_JSON.yml +10 -8
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_schema_org/series_information.yml +9 -9
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_schema_org/subject_scheme.yml +11 -11
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_schema_org/subject_scheme_multiple_keywords.yml +11 -11
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_turtle/BlogPosting.yml +4 -4
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_turtle/Dataset.yml +6 -6
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_turtle/journal_article.yml +14 -14
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_turtle/with_pages.yml +12 -12
- data/spec/readers/cff_reader_spec.rb +6 -6
- data/spec/readers/crossref_reader_spec.rb +3 -3
- data/spec/readers/crossref_xml_reader_spec.rb +7 -7
- data/spec/readers/json_feed_reader_spec.rb +13 -13
- data/spec/readers/schema_org_reader_spec.rb +2 -3
- data/spec/spec_helper.rb +1 -0
- data/spec/utils_spec.rb +1 -1
- data/spec/writers/cff_writer_spec.rb +3 -3
- data/spec/writers/ris_writer_spec.rb +2 -2
- data/spec/writers/schema_org_writer_spec.rb +1 -1
- metadata +21 -423
- data/spec/fixtures/vcr_cassettes/Briard_CLI/convert_file/crossref/default.yml +0 -110
- data/spec/fixtures/vcr_cassettes/Briard_CLI/convert_file/crossref/to_bibtex.yml +0 -110
- data/spec/fixtures/vcr_cassettes/Briard_CLI/convert_file/crossref/to_crossref_xml.yml +0 -110
- data/spec/fixtures/vcr_cassettes/Briard_CLI/convert_file/crossref/to_datacite.yml +0 -110
- data/spec/fixtures/vcr_cassettes/Briard_CLI/convert_file/crossref/to_schema_org.yml +0 -110
- data/spec/fixtures/vcr_cassettes/Briard_CLI/convert_file/crossref_xml/default.yml +0 -55
- data/spec/fixtures/vcr_cassettes/Briard_CLI/convert_file/crossref_xml/to_bibtex.yml +0 -55
- data/spec/fixtures/vcr_cassettes/Briard_CLI/convert_file/crossref_xml/to_crossref_xml.yml +0 -55
- data/spec/fixtures/vcr_cassettes/Briard_CLI/convert_file/crossref_xml/to_datacite.yml +0 -55
- data/spec/fixtures/vcr_cassettes/Briard_CLI/convert_file/crossref_xml/to_schema_org.yml +0 -55
- data/spec/fixtures/vcr_cassettes/Briard_CLI/convert_from_id/crossref/default.yml +0 -299
- data/spec/fixtures/vcr_cassettes/Briard_CLI/convert_from_id/crossref/to_bibtex.yml +0 -299
- data/spec/fixtures/vcr_cassettes/Briard_CLI/convert_from_id/crossref/to_citation.yml +0 -299
- data/spec/fixtures/vcr_cassettes/Briard_CLI/convert_from_id/crossref/to_crossref_xml.yml +0 -299
- data/spec/fixtures/vcr_cassettes/Briard_CLI/convert_from_id/crossref/to_datacite.yml +0 -299
- data/spec/fixtures/vcr_cassettes/Briard_CLI/convert_from_id/crossref/to_jats.yml +0 -299
- data/spec/fixtures/vcr_cassettes/Briard_CLI/convert_from_id/crossref/to_schema_org.yml +0 -299
- data/spec/fixtures/vcr_cassettes/Briard_CLI/convert_from_id/datacite/default.yml +0 -172
- data/spec/fixtures/vcr_cassettes/Briard_CLI/convert_from_id/datacite/to_bibtex.yml +0 -172
- data/spec/fixtures/vcr_cassettes/Briard_CLI/convert_from_id/datacite/to_citation.yml +0 -172
- data/spec/fixtures/vcr_cassettes/Briard_CLI/convert_from_id/datacite/to_datacite.yml +0 -172
- data/spec/fixtures/vcr_cassettes/Briard_CLI/convert_from_id/datacite/to_jats.yml +0 -172
- data/spec/fixtures/vcr_cassettes/Briard_CLI/convert_from_id/datacite/to_schema_org.yml +0 -172
- data/spec/fixtures/vcr_cassettes/Briard_CLI/convert_from_id/schema_org/default.yml +0 -1098
- data/spec/fixtures/vcr_cassettes/Briard_CLI/convert_from_id/schema_org/to_datacite.yml +0 -1098
- data/spec/fixtures/vcr_cassettes/Briard_CLI/convert_from_id/schema_org/to_schema_org.yml +0 -1100
- data/spec/fixtures/vcr_cassettes/Briard_CLI/find_from_format_by_id/crossref.yml +0 -55
- data/spec/fixtures/vcr_cassettes/Briard_CLI/find_from_format_by_id/datacite.yml +0 -55
- data/spec/fixtures/vcr_cassettes/Briard_CLI/find_from_format_by_id/jalc.yml +0 -55
- data/spec/fixtures/vcr_cassettes/Briard_CLI/find_from_format_by_id/kisti.yml +0 -55
- data/spec/fixtures/vcr_cassettes/Briard_CLI/find_from_format_by_id/medra.yml +0 -55
- data/spec/fixtures/vcr_cassettes/Briard_CLI/find_from_format_by_id/op.yml +0 -55
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/authors_as_string/author.yml +0 -164
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/authors_as_string/no_author.yml +0 -164
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/authors_as_string/single_author.yml +0 -164
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/authors_as_string/with_organization.yml +0 -164
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/change_metadata_as_datacite_xml/with_data_citation.yml +0 -247
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/doi_registration_agency/crossref.yml +0 -55
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/doi_registration_agency/datacite.yml +0 -55
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/doi_registration_agency/jalc.yml +0 -55
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/doi_registration_agency/kisti.yml +0 -55
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/doi_registration_agency/medra.yml +0 -55
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/doi_registration_agency/not_found.yml +0 -55
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/doi_registration_agency/op.yml +0 -55
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/find_from_format_by_ID/crossref.yml +0 -55
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/find_from_format_by_ID/crossref_doi_not_url.yml +0 -55
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/find_from_format_by_ID/datacite.yml +0 -55
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/find_from_format_by_ID/datacite_doi_http.yml +0 -55
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/find_from_format_by_ID/unknown_DOI_registration_agency.yml +0 -55
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/fos/hsh_to_fos_for_match.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/fos/hsh_to_fos_for_with_schemeUri_in_hash.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/fos/hsh_to_fos_match.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/fos/hsh_to_fos_no_match.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/fos/name_to_fos_for_match.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/fos/name_to_fos_match.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/fos/name_to_fos_no_match.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/from_schema_org/with_id.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/from_schema_org_creators/with_affiliation.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/from_schema_org_creators/without_affiliation.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_cff_metadata/cff-converter-python.yml +0 -200
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_cff_metadata/ruby-cff.yml +0 -154
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_cff_metadata/ruby-cff_repository_url.yml +0 -154
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_codemeta_metadata/maremma.yml +0 -86
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_codemeta_metadata/metadata_reports.yml +0 -93
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/DOI_with_ORCID_ID.yml +0 -337
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/DOI_with_SICI_DOI.yml +0 -347
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/DOI_with_data_citation.yml +0 -359
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/JaLC.yml +0 -384
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/KISTI.yml +0 -330
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/OP.yml +0 -969
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/affiliation_is_space.yml +0 -358
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/another_book.yml +0 -312
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/another_book_chapter.yml +0 -465
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/article_id_as_page_number.yml +0 -276
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/author_literal.yml +0 -492
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/book.yml +0 -523
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/book_chapter.yml +0 -377
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/book_chapter_with_RDF_for_container.yml +0 -336
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/book_oup.yml +0 -289
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/component.yml +0 -289
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/dataset.yml +0 -299
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/dataset_usda.yml +0 -341
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/date_in_future.yml +0 -570
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/dissertation.yml +0 -301
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/empty_given_name.yml +0 -303
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/invalid_date.yml +0 -307
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/journal_article.yml +0 -461
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/journal_article_original_language_title.yml +0 -276
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/journal_article_with.yml +0 -470
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/journal_article_with_RDF_for_container.yml +0 -519
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/journal_article_with_funding.yml +0 -456
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/journal_issue.yml +0 -270
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/mEDRA.yml +0 -310
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/markup.yml +0 -329
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/missing_creator.yml +0 -307
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/multiple_issn.yml +0 -393
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/multiple_titles.yml +0 -265
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/multiple_titles_with_missing.yml +0 -860
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/not_found_error.yml +0 -209
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/peer_review.yml +0 -287
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/posted_content.yml +0 -326
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/posted_content_copernicus.yml +0 -297
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/report_osti.yml +0 -315
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/vor_with_url.yml +0 -451
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/yet_another_book.yml +0 -816
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_metadata/yet_another_book_chapter.yml +0 -324
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_crossref_raw/journal_article.yml +0 -110
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_datacite_metadata/dissertation.yml +0 -152
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_datacite_metadata/funding_references.yml +0 -175
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_datacite_metadata/subject_scheme.yml +0 -328
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_date/publication_date.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_date_from_date_parts/date.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_date_from_date_parts/year-month.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_date_from_date_parts/year.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_date_from_parts/date.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_date_from_parts/year-month.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_date_from_parts/year.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_date_parts/date.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_date_parts/year-month.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_date_parts/year.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_datetime_from_time/future.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_datetime_from_time/invalid.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_datetime_from_time/nil.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_datetime_from_time/past.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_datetime_from_time/present.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_one_author/has_familyName.yml +0 -133
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_one_author/has_name_in_display-order_with_ORCID.yml +0 -153
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_one_author/is_organization.yml +0 -164
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_one_author/name_with_affiliation_crossref.yml +0 -247
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_one_author/only_familyName_and_givenName.yml +0 -468
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_schema_org_metadata/BlogPosting.yml +0 -530
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_schema_org_metadata/BlogPosting_with_new_DOI.yml +0 -530
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_schema_org_metadata/get_schema_org_metadata_front_matter/BlogPosting.yml +0 -534
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_schema_org_metadata/harvard_dataverse.yml +0 -1838
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_schema_org_metadata/pangaea.yml +0 -468
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_schema_org_metadata/upstream_blog.yml +0 -885
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_schema_org_metadata/zenodo.yml +0 -583
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_series_information/only_title.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_series_information/title_and_pages.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_series_information/title_volume_and_pages.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/get_series_information/title_volume_issue_and_pages.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/github/github_as_cff_url.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/github/github_as_codemeta_url.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/github/github_from_url.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/github/github_from_url_cff_file.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/github/github_from_url_file.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/handle_input/DOI_RA_not_Crossref_or_DataCite.yml +0 -55
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/handle_input/unknown_DOI_prefix.yml +0 -55
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/is_personal_name_/has_comma.yml +0 -164
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/is_personal_name_/has_family_name.yml +0 -164
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/is_personal_name_/has_id.yml +0 -164
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/is_personal_name_/has_known_given_name.yml +0 -164
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/is_personal_name_/has_no_info.yml +0 -164
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/is_personal_name_/has_orcid_id.yml +0 -164
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/is_personal_name_/has_type_organization.yml +0 -164
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/json_schema_errors/is_valid.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/normalize_cc_url/not_found.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/normalize_cc_url/with_trailing_slash.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/normalize_cc_url/with_trailing_slash_and_to_https.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/normalize_id/doi.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/normalize_id/doi_as_url.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/normalize_id/filename.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/normalize_id/ftp.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/normalize_id/invalid_url.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/normalize_id/sandbox_via_options.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/normalize_id/sandbox_via_url.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/normalize_id/string.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/normalize_id/url.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/normalize_id/url_with_utf-8.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/normalize_ids/doi.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/normalize_ids/url.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/normalize_issn/from_array.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/normalize_issn/from_empty_array.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/normalize_issn/from_hash.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/normalize_issn/from_string.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/normalize_url/uri.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/normalize_url/with_trailing_slash.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/normalize_url/with_trailing_slash_and_to_https.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/parse_attributes/array.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/parse_attributes/array_of_strings.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/parse_attributes/first.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/parse_attributes/hash.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/parse_attributes/hash_with_array_value.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/parse_attributes/nil.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/parse_attributes/string.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/random_doi/decode_anothe_doi.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/random_doi/decode_doi.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/random_doi/encode_doi.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/sanitize/onlies_keep_specific_tags.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/sanitize/removes_a_tags.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/spdx/hsh_to_spdx_id.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/spdx/hsh_to_spdx_not_found.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/spdx/hsh_to_spdx_url.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/spdx/name_to_spdx_exists.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/spdx/name_to_spdx_id.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/to_schema_org/with_id.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/to_schema_org_identifiers/with_identifiers.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/validate_orcid/validate_orcid.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/validate_orcid/validate_orcid_https.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/validate_orcid/validate_orcid_id.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/validate_orcid/validate_orcid_sandbox.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/validate_orcid/validate_orcid_sandbox_https.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/validate_orcid/validate_orcid_with_spaces.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/validate_orcid/validate_orcid_wrong_id.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/validate_orcid/validate_orcid_www.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/validate_orcid_scheme/validate_orcid_scheme.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/validate_orcid_scheme/validate_orcid_scheme_https.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/validate_orcid_scheme/validate_orcid_scheme_trailing_slash.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/validate_orcid_scheme/validate_orcid_scheme_www.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/validate_url/DOI.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/validate_url/ISSN.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/validate_url/URL.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/validate_url/string.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_bibtex/BlogPosting.yml +0 -81
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_bibtex/Dataset.yml +0 -120
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_bibtex/authors_with_affiliations.yml +0 -186
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_bibtex/climate_data.yml +0 -74
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_bibtex/from_schema_org.yml +0 -530
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_bibtex/keywords_subject_scheme.yml +0 -149
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_bibtex/maremma.yml +0 -86
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_bibtex/text.yml +0 -100
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_bibtex/with_data_citation.yml +0 -247
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_bibtex/with_pages.yml +0 -228
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_cff/Collection_of_Jupyter_notebooks.yml +0 -143
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_cff/SoftwareSourceCode_Zenodo.yml +0 -150
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_cff/SoftwareSourceCode_also_Zenodo.yml +0 -93
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_cff/ruby-cff.yml +0 -154
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_citation/Dataset.yml +0 -120
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_citation/Journal_article.yml +0 -247
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_citation/Journal_article_vancouver_style.yml +0 -299
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_citation/Missing_author.yml +0 -199
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_citation/interactive_resource_without_dates.yml +0 -75
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_citation/software_w/version.yml +0 -86
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_codemeta/SoftwareSourceCode_DataCite.yml +0 -76
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_codemeta/SoftwareSourceCode_DataCite_check_codemeta_v2.yml +0 -76
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_crossref/another_schema_org_from_front-matter.yml +0 -541
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_crossref/journal_article.yml +0 -55
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_crossref/journal_article_from_datacite.yml +0 -85
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_crossref/posted_content.yml +0 -283
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_crossref/schema_org_from_another_science_blog.yml +0 -123
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_crossref/schema_org_from_front_matter.yml +0 -477
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_crossref/schema_org_from_upstream_blog.yml +0 -1025
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_csl/Another_dataset.yml +0 -110
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_csl/BlogPosting.yml +0 -81
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_csl/BlogPosting_schema_org.yml +0 -530
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_csl/Dataset.yml +0 -120
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_csl/container_title.yml +0 -153
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_csl/interactive_resource_without_dates.yml +0 -75
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_csl/journal_article.yml +0 -247
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_csl/keywords_subject_scheme.yml +0 -149
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_csl/maremma.yml +0 -86
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_csl/missing_creator.yml +0 -199
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_csl/multiple_abstracts.yml +0 -101
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_csl/organization_author.yml +0 -314
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_csl/software.yml +0 -90
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_csl/software_w/version.yml +0 -86
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_csl/with_only_first_page.yml +0 -333
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_csl/with_pages.yml +0 -228
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_csv/climate_data.yml +0 -74
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_csv/maremma.yml +0 -86
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_csv/text.yml +0 -100
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_csv/with_data_citation.yml +0 -247
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_csv/with_pages.yml +0 -228
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_datacite/from_schema_org.yml +0 -530
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_datacite/maremma.yml +0 -86
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_datacite/with_ORCID_ID.yml +0 -228
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_datacite/with_data_citation.yml +0 -247
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_datacite_json/from_schema_org.yml +0 -530
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_datacite_json/maremma.yml +0 -86
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_datacite_json/with_ORCID_ID.yml +0 -228
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_datacite_json/with_data_citation.yml +0 -247
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_jats_xml/Dataset_in_schema_4_0.yml +0 -120
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_jats_xml/Text_pass-thru.yml +0 -106
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_jats_xml/book_chapter.yml +0 -163
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_jats_xml/from_schema_org.yml +0 -530
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_jats_xml/interactive_resource_without_dates.yml +0 -75
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_jats_xml/maremma.yml +0 -86
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_jats_xml/with_ORCID_ID.yml +0 -228
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_jats_xml/with_data_citation.yml +0 -247
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_jats_xml/with_editor.yml +0 -355
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_rdf_xml/BlogPosting.yml +0 -81
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_rdf_xml/BlogPosting_schema_org.yml +0 -530
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_rdf_xml/journal_article.yml +0 -247
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_rdf_xml/maremma.yml +0 -86
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_rdf_xml/with_pages.yml +0 -228
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_ris/BlogPosting.yml +0 -81
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_ris/BlogPosting_schema_org.yml +0 -530
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_ris/Dataset.yml +0 -120
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_ris/alternate_name.yml +0 -138
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_ris/journal_article.yml +0 -115
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_ris/keywords_with_subject_scheme.yml +0 -149
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_ris/maremma.yml +0 -86
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_ris/with_pages.yml +0 -112
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_schema_org/Another_Schema_org_JSON.yml +0 -120
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_schema_org/Funding.yml +0 -192
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_schema_org/Funding_OpenAIRE.yml +0 -150
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_schema_org/Schema_org_JSON.yml +0 -98
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_schema_org/Schema_org_JSON_Cyark.yml +0 -160
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_schema_org/Schema_org_JSON_IsSupplementTo.yml +0 -153
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_schema_org/alternate_identifiers.yml +0 -131
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_schema_org/data_catalog.yml +0 -136
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_schema_org/geo_location_box.yml +0 -181
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_schema_org/interactive_resource_without_dates.yml +0 -127
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_schema_org/journal_article.yml +0 -247
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_schema_org/maremma_schema_org_JSON.yml +0 -86
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_schema_org/series_information.yml +0 -174
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_schema_org/subject_scheme.yml +0 -199
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_schema_org/subject_scheme_multiple_keywords.yml +0 -201
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_turtle/BlogPosting.yml +0 -81
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_turtle/BlogPosting_schema_org.yml +0 -530
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_turtle/Dataset.yml +0 -120
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_turtle/journal_article.yml +0 -247
- data/spec/fixtures/vcr_cassettes/Briard_Metadata/write_metadata_as_turtle/with_pages.yml +0 -228
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/fos/hsh_to_fos_for_match.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/fos/hsh_to_fos_for_with_schemeUri_in_hash.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/fos/hsh_to_fos_match.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/fos/hsh_to_fos_no_match.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/fos/name_to_fos_for_match.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/fos/name_to_fos_match.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/fos/name_to_fos_no_match.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/from_schema_org/with_id.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_date/publication_date.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_date_from_date_parts/date.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_date_from_date_parts/year-month.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_date_from_date_parts/year.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_date_from_parts/date.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_date_from_parts/year-month.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_date_from_parts/year.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_date_parts/date.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_date_parts/year-month.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_date_parts/year.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_datetime_from_time/future.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_datetime_from_time/invalid.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_datetime_from_time/nil.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_datetime_from_time/past.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_datetime_from_time/present.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_json_feed/all_posts.yml +0 -3602
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_json_feed/behind_the_science.yml +0 -1176
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_json_feed/citation_style_language.yml +0 -360
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_json_feed/citation_style_language_blog.yml +0 -360
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_json_feed/front-matter_blog.yml +0 -1034
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_json_feed/upstream.yml +0 -2438
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_json_feed/upstream_blog.yml +0 -2438
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_json_feed_item/by_uuid.yml +0 -136
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_link/license.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_link/url.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_series_information/only_title.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_series_information/title_and_pages.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_series_information/title_volume_and_pages.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/get_series_information/title_volume_issue_and_pages.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/github/github_as_cff_url.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/github/github_as_codemeta_url.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/github/github_from_url.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/github/github_from_url_cff_file.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/github/github_from_url_file.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/json_feed_unregistered_url/all_posts.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/normalize_cc_url/not_found.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/normalize_cc_url/with_trailing_slash.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/normalize_cc_url/with_trailing_slash_and_to_https.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/normalize_id/doi.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/normalize_id/doi_as_url.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/normalize_id/filename.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/normalize_id/ftp.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/normalize_id/invalid_url.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/normalize_id/sandbox_via_options.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/normalize_id/sandbox_via_url.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/normalize_id/string.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/normalize_id/url.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/normalize_id/url_with_utf-8.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/normalize_issn/from_array.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/normalize_issn/from_empty_array.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/normalize_issn/from_hash.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/normalize_issn/from_string.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/normalize_url/uri.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/normalize_url/with_trailing_slash.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/normalize_url/with_trailing_slash_and_to_https.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/parse_attributes/array.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/parse_attributes/array_of_strings.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/parse_attributes/first.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/parse_attributes/hash.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/parse_attributes/hash_with_array_value.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/parse_attributes/nil.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/parse_attributes/string.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/random_doi/decode_anothe_doi.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/random_doi/decode_another_doi.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/random_doi/decode_doi.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/random_doi/encode_doi.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/random_id/decode_another_id.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/random_id/decode_id.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/random_id/encode_id.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/sanitize/onlies_keep_specific_tags.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/sanitize/removes_a_tags.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/spdx/hsh_to_spdx_id.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/spdx/hsh_to_spdx_not_found.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/spdx/hsh_to_spdx_url.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/spdx/name_to_spdx_exists.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/spdx/name_to_spdx_id.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/to_schema_org/with_id.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/to_schema_org_identifiers/with_identifiers.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/validate_orcid/validate_orcid.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/validate_orcid/validate_orcid_https.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/validate_orcid/validate_orcid_id.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/validate_orcid/validate_orcid_sandbox.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/validate_orcid/validate_orcid_sandbox_https.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/validate_orcid/validate_orcid_with_spaces.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/validate_orcid/validate_orcid_wrong_id.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/validate_orcid/validate_orcid_www.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/validate_orcid_scheme/validate_orcid_scheme.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/validate_orcid_scheme/validate_orcid_scheme_https.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/validate_orcid_scheme/validate_orcid_scheme_trailing_slash.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/validate_orcid_scheme/validate_orcid_scheme_www.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/validate_url/DOI.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/validate_url/ISSN.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/validate_url/URL.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/validate_url/string.yml +0 -221
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_rdf_xml/BlogPosting.yml +0 -81
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_rdf_xml/BlogPosting_schema_org.yml +0 -530
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_rdf_xml/journal_article.yml +0 -247
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_rdf_xml/maremma.yml +0 -86
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_rdf_xml/with_pages.yml +0 -228
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_schema_org/Schema_org_JSON_IsSupplementTo.yml +0 -153
- data/spec/fixtures/vcr_cassettes/Commonmeta_Metadata/write_metadata_as_turtle/BlogPosting_schema_org.yml +0 -530
|
@@ -23,13 +23,13 @@ http_interactions:
|
|
|
23
23
|
Cache-Control:
|
|
24
24
|
- public, max-age=0, must-revalidate
|
|
25
25
|
Content-Length:
|
|
26
|
-
- '
|
|
26
|
+
- '49530'
|
|
27
27
|
Content-Type:
|
|
28
28
|
- application/json; charset=utf-8
|
|
29
29
|
Date:
|
|
30
|
-
- Sun, 18 Jun 2023
|
|
30
|
+
- Sun, 18 Jun 2023 15:24:19 GMT
|
|
31
31
|
Etag:
|
|
32
|
-
- '"
|
|
32
|
+
- '"xv42bhvvc21253"'
|
|
33
33
|
Server:
|
|
34
34
|
- Vercel
|
|
35
35
|
Strict-Transport-Security:
|
|
@@ -39,7 +39,7 @@ http_interactions:
|
|
|
39
39
|
X-Vercel-Cache:
|
|
40
40
|
- MISS
|
|
41
41
|
X-Vercel-Id:
|
|
42
|
-
- fra1::iad1::
|
|
42
|
+
- fra1::iad1::6gh26-1687101859043-0548b5ea306e
|
|
43
43
|
Connection:
|
|
44
44
|
- close
|
|
45
45
|
body:
|
|
@@ -50,420 +50,7 @@ http_interactions:
|
|
|
50
50
|
feed</a>.<br>ISSN 2051-8188. Written content on this site is licensed under
|
|
51
51
|
a <a href=\"https://creativecommons.org/licenses/by/4.0/\">Creative Commons
|
|
52
52
|
Attribution 4.0 International license</a>.","language":"en","favicon":null,"feed_url":"https://iphylo.blogspot.com/feeds/posts/default","feed_format":"application/atom+xml","home_page_url":"https://iphylo.blogspot.com/","indexed_at":"2023-02-06","modified_at":"2023-05-31T17:26:00+00:00","license":"https://creativecommons.org/licenses/by/4.0/legalcode","generator":"Blogger
|
|
53
|
-
7.00","category":"Natural Sciences","backlog":true,"prefix":"10.59350","items":[{"id":"https://doi.org/10.59350/
|
|
54
|
-
markdown, and taxonomic trees","summary":"Returning to the subject of personal
|
|
55
|
-
knowledge graphs Kyle Scheer has an interesting repository of Markdown files
|
|
56
|
-
that describe academic disciplines at https://github.com/kyletscheer/academic-disciplines
|
|
57
|
-
(see his blog post for more background). If you add these files to Obsidian
|
|
58
|
-
you get a nice visualisation of a taxonomy of academic disciplines. The applications
|
|
59
|
-
of this to biological taxonomy seem obvious, especially as a tool like Obsidian
|
|
60
|
-
enables all sorts of interesting links to be added...","date_published":"2022-04-07T21:07:00Z","date_modified":"2022-04-07T21:15:34Z","date_indexed":"1909-06-16T09:41:45+00:00","authors":[{"url":null,"name":"Roderic
|
|
61
|
-
Page"}],"image":null,"content_html":"<p>Returning to the subject of <a href=\"https://iphylo.blogspot.com/2020/08/personal-knowledge-graphs-obsidian-roam.html\">personal
|
|
62
|
-
knowledge graphs</a> Kyle Scheer has an interesting repository of Markdown
|
|
63
|
-
files that describe academic disciplines at <a href=\"https://github.com/kyletscheer/academic-disciplines\">https://github.com/kyletscheer/academic-disciplines</a>
|
|
64
|
-
(see <a href=\"https://kyletscheer.medium.com/on-creating-a-tree-of-knowledge-f099c1028bf6\">his
|
|
65
|
-
blog post</a> for more background).</p>\n\n<p>If you add these files to <a
|
|
66
|
-
href=\"https://obsidian.md/\">Obsidian</a> you get a nice visualisation of
|
|
67
|
-
a taxonomy of academic disciplines. The applications of this to biological
|
|
68
|
-
taxonomy seem obvious, especially as a tool like Obsidian enables all sorts
|
|
69
|
-
of interesting links to be added (e.g., we could add links to the taxonomic
|
|
70
|
-
research behind each node in the taxonomic tree, the people doing that research,
|
|
71
|
-
etc. - although that would mean we''d no longer have a simple tree).</p>\n\n<p>The
|
|
72
|
-
more I look at these sort of simple Markdown-based tools the more I wonder
|
|
73
|
-
whether we could make more use of them to create simple but persistent databases.
|
|
74
|
-
Text files seem the most stable, long-lived digital format around, maybe this
|
|
75
|
-
would be a way to minimise the inevitable obsolescence of database and server
|
|
76
|
-
software. Time for some experiments I feel... can we take a taxonomic group,
|
|
77
|
-
such as mammals, and create a richly connected database purely in Markdown?</p>\n\n<div
|
|
78
|
-
class=\"separator\" style=\"clear: both; text-align: center;\"><iframe allowfullscreen=''allowfullscreen''
|
|
79
|
-
webkitallowfullscreen=''webkitallowfullscreen'' mozallowfullscreen=''mozallowfullscreen''
|
|
80
|
-
width=''400'' height=''322'' src=''https://www.blogger.com/video.g?token=AD6v5dxZtweOTJTdg6aqvICq_tKF0la1QZuDAEpwPPCVQKtG5vjB-DzuQv-ApL8JnpyZ1FffYtWo6ymizNQ''
|
|
81
|
-
class=''b-hbp-video b-uploaded'' frameborder=''0''></iframe></div>","tags":["markdown","obsidian"],"language":"en","references":[]},{"id":"https://doi.org/10.59350/m48f7-c2128","uuid":"8aea47e4-f227-45f4-b37b-0454a8a7a3ff","url":"https://iphylo.blogspot.com/2023/04/chatgpt-semantic-search-and-knowledge.html","title":"ChatGPT,
|
|
82
|
-
semantic search, and knowledge graphs","summary":"One thing about ChatGPT
|
|
83
|
-
is it has opened my eyes to some concepts I was dimly aware of but am only
|
|
84
|
-
now beginning to fully appreciate. ChatGPT enables you ask it questions, but
|
|
85
|
-
the answers depend on what ChatGPT “knows”. As several people have noted,
|
|
86
|
-
what would be even better is to be able to run ChatGPT on your own content.
|
|
87
|
-
Indeed, ChatGPT itself now supports this using plugins. Paul Graham GPT However,
|
|
88
|
-
it’s still useful to see how to add ChatGPT functionality to your own content
|
|
89
|
-
from...","date_published":"2023-04-03T15:30:00Z","date_modified":"2023-04-03T15:32:04Z","date_indexed":"1909-06-16T09:02:34+00:00","authors":[{"url":null,"name":"Roderic
|
|
90
|
-
Page"}],"image":null,"content_html":"<p>One thing about ChatGPT is it has
|
|
91
|
-
opened my eyes to some concepts I was dimly aware of but am only now beginning
|
|
92
|
-
to fully appreciate. ChatGPT enables you ask it questions, but the answers
|
|
93
|
-
depend on what ChatGPT “knows”. As several people have noted, what would be
|
|
94
|
-
even better is to be able to run ChatGPT on your own content. Indeed, ChatGPT
|
|
95
|
-
itself now supports this using <a href=\"https://openai.com/blog/chatgpt-plugins\">plugins</a>.</p>\n<h4
|
|
96
|
-
id=\"paul-graham-gpt\">Paul Graham GPT</h4>\n<p>However, it’s still useful
|
|
97
|
-
to see how to add ChatGPT functionality to your own content from scratch.
|
|
98
|
-
A nice example of this is <a href=\"https://paul-graham-gpt.vercel.app/\">Paul
|
|
99
|
-
Graham GPT</a> by <a href=\"https://twitter.com/mckaywrigley\">Mckay Wrigley</a>.
|
|
100
|
-
Mckay Wrigley took essays by Paul Graham (a well known venture capitalist)
|
|
101
|
-
and built a question and answer tool very like ChatGPT.</p>\n<iframe width=\"560\"
|
|
102
|
-
height=\"315\" src=\"https://www.youtube.com/embed/ii1jcLg-eIQ\" title=\"YouTube
|
|
103
|
-
video player\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write;
|
|
104
|
-
encrypted-media; gyroscope; picture-in-picture; web-share\" allowfullscreen></iframe>\n<p>Because
|
|
105
|
-
you can send a block of text to ChatGPT (as part of the prompt) you can get
|
|
106
|
-
ChatGPT to summarise or transform that information, or answer questions based
|
|
107
|
-
on that information. But there is a limit to how much information you can
|
|
108
|
-
pack into a prompt. You can’t put all of Paul Graham’s essays into a prompt
|
|
109
|
-
for example. So a solution is to do some preprocessing. For example, given
|
|
110
|
-
a question such as “How do I start a startup?” we could first find the essays
|
|
111
|
-
that are most relevant to this question, then use them to create a prompt
|
|
112
|
-
for ChatGPT. A quick and dirty way to do this is simply do a text search over
|
|
113
|
-
the essays and take the top hits. But we aren’t searching for words, we are
|
|
114
|
-
searching for answers to a question. The essay with the best answer might
|
|
115
|
-
not include the phrase “How do I start a startup?”.</p>\n<h4 id=\"semantic-search\">Semantic
|
|
116
|
-
search</h4>\n<p>Enter <a href=\"https://en.wikipedia.org/wiki/Semantic_search\">Semantic
|
|
117
|
-
search</a>. The key concept behind semantic search is that we are looking
|
|
118
|
-
for documents with similar meaning, not just similarity of text. One approach
|
|
119
|
-
to this is to represent documents by “embeddings”, that is, a vector of numbers
|
|
120
|
-
that encapsulate features of the document. Documents with similar vectors
|
|
121
|
-
are potentially related. In semantic search we take the query (e.g., “How
|
|
122
|
-
do I start a startup?”), compute its embedding, then search among the documents
|
|
123
|
-
for those with similar embeddings.</p>\n<p>To create Paul Graham GPT Mckay
|
|
124
|
-
Wrigley did the following. First he sent each essay to the OpenAI API underlying
|
|
125
|
-
ChatGPT, and in return he got the embedding for that essay (a vector of 1536
|
|
126
|
-
numbers). Each embedding was stored in a database (Mckay uses Postgres with
|
|
127
|
-
<a href=\"https://github.com/pgvector/pgvector\">pgvector</a>). When a user
|
|
128
|
-
enters a query such as “How do I start a startup?” that query is also sent
|
|
129
|
-
to the OpenAI API to retrieve its embedding vector. Then we query the database
|
|
130
|
-
of embeddings for Paul Graham’s essays and take the top five hits. These hits
|
|
131
|
-
are, one hopes, the most likely to contain relevant answers. The original
|
|
132
|
-
question and the most similar essays are then bundled up and sent to ChatGPT
|
|
133
|
-
which then synthesises an answer. See his <a href=\"https://github.com/mckaywrigley/paul-graham-gpt\">GitHub
|
|
134
|
-
repo</a> for more details. Note that we are still using ChatGPT, but on a
|
|
135
|
-
set of documents it doesn’t already have.</p>\n<h4 id=\"knowledge-graphs\">Knowledge
|
|
136
|
-
graphs</h4>\n<p>I’m a fan of knowledge graphs, but they are not terribly easy
|
|
137
|
-
to use. For example, I built a knowledge graph of Australian animals <a href=\"https://ozymandias-demo.herokuapp.com\">Ozymandias</a>
|
|
138
|
-
that contains a wealth of information on taxa, publications, and people, wrapped
|
|
139
|
-
up in a web site. If you want to learn more you need to figure out how to
|
|
140
|
-
write queries in SPARQL, which is not fun. Maybe we could use ChatGPT to write
|
|
141
|
-
the SPARQL queries for us, but it would be much more fun to be simply ask
|
|
142
|
-
natural language queries (e.g., “who are the experts on Australian ants?”).
|
|
143
|
-
I made some naïve notes on these ideas <a href=\"https://iphylo.blogspot.com/2015/09/possible-project-natural-language.html\">Possible
|
|
144
|
-
project: natural language queries, or answering “how many species are there?”</a>
|
|
145
|
-
and <a href=\"https://iphylo.blogspot.com/2019/05/ozymandias-meets-wikipedia-with-notes.html\">Ozymandias
|
|
146
|
-
meets Wikipedia, with notes on natural language generation</a>.</p>\n<p>Of
|
|
147
|
-
course, this is a well known problem. Tools such as <a href=\"http://rdf2vec.org\">RDF2vec</a>
|
|
148
|
-
can take RDF from a knowledge graph and create embeddings which could in tern
|
|
149
|
-
be used to support semantic search. But it seems to me that we could simply
|
|
150
|
-
this process a bit by making use of ChatGPT.</p>\n<p>Firstly we would generate
|
|
151
|
-
natural language statements from the knowledge graph (e.g., “species x belongs
|
|
152
|
-
to genus y and was described in z”, “this paper on ants was authored by x”,
|
|
153
|
-
etc.) that cover the basic questions we expect people to ask. We then get
|
|
154
|
-
embeddings for these (e.g., using OpenAI). We then have an interface where
|
|
155
|
-
people can ask a question (“is species x a valid species?”, “who has published
|
|
156
|
-
on ants”, etc.), we get the embedding for that question, retrieve natural
|
|
157
|
-
language statements that the closest in embedding “space”, package everything
|
|
158
|
-
up and ask ChatGPT to summarise the answer.</p>\n<p>The trick, of course,
|
|
159
|
-
is to figure out how t generate natural language statements from the knowledge
|
|
160
|
-
graph (which amounts to deciding what paths to traverse in the knowledge graph,
|
|
161
|
-
and how to write those paths is something approximating English). We also
|
|
162
|
-
want to know something about the sorts of questions people are likely to ask
|
|
163
|
-
so that we have a reasonable chance of having the answers (for example, are
|
|
164
|
-
people going to ask about individual species, or questions about summary statistics
|
|
165
|
-
such as numbers of species in a genus, etc.).</p>\n<p>What makes this attractive
|
|
166
|
-
is that it seems a straightforward way to go from a largely academic exercise
|
|
167
|
-
(build a knowledge graph) to something potentially useful (a question and
|
|
168
|
-
answer machine). Imagine if something like the defunct BBC wildlife site (see
|
|
169
|
-
<a href=\"https://iphylo.blogspot.com/2017/12/blue-planet-ii-bbc-and-semantic-web.html\">Blue
|
|
170
|
-
Planet II, the BBC, and the Semantic Web: a tale of lessons forgotten and
|
|
171
|
-
opportunities lost</a>) revived <a href=\"https://aspiring-look.glitch.me\">here</a>
|
|
172
|
-
had a question and answer interface where we could ask questions rather than
|
|
173
|
-
passively browse.</p>\n<h4 id=\"summary\">Summary</h4>\n<p>I have so much
|
|
174
|
-
more to learn, and need to think about ways to incorporate semantic search
|
|
175
|
-
and ChatGPT-like tools into knowledge graphs.</p>\n<blockquote>\n<p>Written
|
|
176
|
-
with <a href=\"https://stackedit.io/\">StackEdit</a>.</p>\n</blockquote>","tags":[],"language":"en","references":[]},{"id":"https://doi.org/10.59350/zc4qc-77616","uuid":"30c78d9d-2e50-49db-9f4f-b3baa060387b","url":"https://iphylo.blogspot.com/2022/09/does-anyone-cite-taxonomic-treatments.html","title":"Does
|
|
177
|
-
anyone cite taxonomic treatments?","summary":"Taxonomic treatments have come
|
|
178
|
-
up in various discussions I''m involved in, and I''m curious as to whether
|
|
179
|
-
they are actually being used, in particular, whether they are actually being
|
|
180
|
-
cited. Consider the following quote: The taxa are described in taxonomic treatments,
|
|
181
|
-
well defined sections of scientific publications (Catapano 2019). They include
|
|
182
|
-
a nomenclatural section and one or more sections including descriptions, material
|
|
183
|
-
citations referring to studied specimens, or notes ecology and...","date_published":"2022-09-01T16:49:00Z","date_modified":"2022-09-01T16:49:51Z","date_indexed":"1909-06-16T09:31:50+00:00","authors":[{"url":null,"name":"Roderic
|
|
184
|
-
Page"}],"image":null,"content_html":"<div class=\"separator\" style=\"clear:
|
|
185
|
-
both;\"><a href=\"https://zenodo.org/record/5731100/thumb100\" style=\"display:
|
|
186
|
-
block; padding: 1em 0; text-align: center; clear: right; float: right;\"><img
|
|
187
|
-
alt=\"\" border=\"0\" height=\"128\" data-original-height=\"106\" data-original-width=\"100\"
|
|
188
|
-
src=\"https://zenodo.org/record/5731100/thumb250\"/></a></div>\nTaxonomic
|
|
189
|
-
treatments have come up in various discussions I''m involved in, and I''m
|
|
190
|
-
curious as to whether they are actually being used, in particular, whether
|
|
191
|
-
they are actually being cited. Consider the following quote:\n\n<blockquote>\nThe
|
|
192
|
-
taxa are described in taxonomic treatments, well defined sections of scientific
|
|
193
|
-
publications (Catapano 2019). They include a nomenclatural section and one
|
|
194
|
-
or more sections including descriptions, material citations referring to studied
|
|
195
|
-
specimens, or notes ecology and behavior. In case the treatment does not describe
|
|
196
|
-
a new discovered taxon, previous treatments are cited in the form of treatment
|
|
197
|
-
citations. This citation can refer to a previous treatment and add additional
|
|
198
|
-
data, or it can be a statement synonymizing the taxon with another taxon.
|
|
199
|
-
This allows building a citation network, and ultimately is a constituent part
|
|
200
|
-
of the catalogue of life. - Taxonomic Treatments as Open FAIR Digital Objects
|
|
201
|
-
<a href=\"https://doi.org/10.3897/rio.8.e93709\">https://doi.org/10.3897/rio.8.e93709</a>\n</blockquote>\n\n<p>\n
|
|
202
|
-
\"Traditional\" academic citation is from article to article. For example,
|
|
203
|
-
consider these two papers:\n\n<blockquote>\nLi Y, Li S, Lin Y (2021) Taxonomic
|
|
204
|
-
study on fourteen symphytognathid species from Asia (Araneae, Symphytognathidae).
|
|
205
|
-
ZooKeys 1072: 1-47. https://doi.org/10.3897/zookeys.1072.67935\n</blockquote>\n\n<blockquote>\nMiller
|
|
206
|
-
J, Griswold C, Yin C (2009) The symphytognathoid spiders of the Gaoligongshan,
|
|
207
|
-
Yunnan, China (Araneae: Araneoidea): Systematics and diversity of micro-orbweavers.
|
|
208
|
-
ZooKeys 11: 9-195. https://doi.org/10.3897/zookeys.11.160\n</blockquote>\n</p>\n\n<p>Li
|
|
209
|
-
et al. 2021 cites Miller et al. 2009 (although Pensoft seems to have broken
|
|
210
|
-
the citation such that it does appear correctly either on their web page or
|
|
211
|
-
in CrossRef).</p>\n\n<p>So, we have this link: [article]10.3897/zookeys.1072.67935
|
|
212
|
-
--cites--> [article]10.3897/zookeys.11.160. One article cites another.</p>\n\n<p>In
|
|
213
|
-
their 2021 paper Li et al. discuss <i>Patu jidanweishi</i> Miller, Griswold
|
|
214
|
-
& Yin, 2009:\n\n<div class=\"separator\" style=\"clear: both;\"><a href=\"https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgPavMHXqQNX1ls_zXo9kIcMLHPxc7ZpV9NCSof5wLumrg3ovPoi6nYKzZsINuqFtYoEvW1QrerePD-MEf2DJaYUXlT37d81x3L6ILls7u229rg0_Nc0uUmgW-ICzr6MI_QCZfgQbYGTxuofu-fuPVoygbCnm3vQVYOhLDLtp1EtQ9jRZHDvw/s1040/Screenshot%202022-09-01%20at%2017.12.27.png\"
|
|
215
|
-
style=\"display: block; padding: 1em 0; text-align: center; \"><img alt=\"\"
|
|
216
|
-
border=\"0\" width=\"400\" data-original-height=\"314\" data-original-width=\"1040\"
|
|
217
|
-
src=\"https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgPavMHXqQNX1ls_zXo9kIcMLHPxc7ZpV9NCSof5wLumrg3ovPoi6nYKzZsINuqFtYoEvW1QrerePD-MEf2DJaYUXlT37d81x3L6ILls7u229rg0_Nc0uUmgW-ICzr6MI_QCZfgQbYGTxuofu-fuPVoygbCnm3vQVYOhLDLtp1EtQ9jRZHDvw/s400/Screenshot%202022-09-01%20at%2017.12.27.png\"/></a></div>\n\n<p>There
|
|
218
|
-
is a treatment for the original description of <i>Patu jidanweishi</i> at
|
|
219
|
-
<a href=\"https://doi.org/10.5281/zenodo.3792232\">https://doi.org/10.5281/zenodo.3792232</a>,
|
|
220
|
-
which was created by Plazi with a time stamp \"2020-05-06T04:59:53.278684+00:00\".
|
|
221
|
-
The original publication date was 2009, the treatments are being added retrospectively.</p>\n\n<p>In
|
|
222
|
-
an ideal world my expectation would be that Li et al. 2021 would have cited
|
|
223
|
-
the treatment, instead of just providing the text string \"Patu jidanweishi
|
|
224
|
-
Miller, Griswold & Yin, 2009: 64, figs 65A–E, 66A, B, 67A–D, 68A–F, 69A–F,
|
|
225
|
-
70A–F and 71A–F (♂♀).\" Isn''t the expectation under the treatment model that
|
|
226
|
-
we would have seen this relationship:</p>\n\n<p>[article]10.3897/zookeys.1072.67935
|
|
227
|
-
--cites--> [treatment]https://doi.org/10.5281/zenodo.3792232</p>\n\n<p>Furthermore,
|
|
228
|
-
if it is the case that \"[i]n case the treatment does not describe a new discovered
|
|
229
|
-
taxon, previous treatments are cited in the form of treatment citations\"
|
|
230
|
-
then we should also see a citation between treatments, in other words Li et
|
|
231
|
-
al.''s 2021 treatment of <i>Patu jidanweishi</i> (which doesn''t seem to have
|
|
232
|
-
a DOI but is available on Plazi'' web site as <a href=\"https://tb.plazi.org/GgServer/html/1CD9FEC313A35240938EC58ABB858E74\">https://tb.plazi.org/GgServer/html/1CD9FEC313A35240938EC58ABB858E74</a>)
|
|
233
|
-
should also cite the original treatment? It doesn''t - but it does cite the
|
|
234
|
-
Miller et al. paper.</p>\n\n<p>So in this example we don''t see articles citing
|
|
235
|
-
treatments, nor do we see treatments citing treatments. Playing Devil''s advocate,
|
|
236
|
-
why then do we have treatments? Does''t the lack of citations suggest that
|
|
237
|
-
- despite some taxonomists saying this is the unit that matters - they actually
|
|
238
|
-
don''t. If we pay attention to what people do rather than what they say they
|
|
239
|
-
do, they cite articles.</p>\n\n<p>Now, there are all sorts of reasons why
|
|
240
|
-
we don''t see [article] -> [treatment] citations, or [treatment] -> [treatment]
|
|
241
|
-
citations. Treatments are being added after the fact by Plazi, not by the
|
|
242
|
-
authors of the original work. And in many cases the treatments that could
|
|
243
|
-
be cited haven''t appeared until after that potentially citing work was published.
|
|
244
|
-
In the example above the Miller et al. paper dates from 2009, but the treatment
|
|
245
|
-
extracted only went online in 2020. And while there is a long standing culture
|
|
246
|
-
of citing publications (ideally using DOIs) there isn''t an equivalent culture
|
|
247
|
-
of citing treatments (beyond the simple text strings).</p>\n\n<p>Obviously
|
|
248
|
-
this is but one example. I''d need to do some exploration of the citation
|
|
249
|
-
graph to get a better sense of citations patterns, perhaps using <a href=\"https://www.crossref.org/documentation/event-data/\">CrossRef''s
|
|
250
|
-
event data</a>. But my sense is that taxonomists don''t cite treatments.</p>\n\n<p>I''m
|
|
251
|
-
guessing Plazi would respond by saying treatments are cited, for example (indirectly)
|
|
252
|
-
in GBIF downloads. This is true, although arguably people aren''t citing the
|
|
253
|
-
treatment, they''re citing specimen data in those treatments, and that specimen
|
|
254
|
-
data could be extracted at the level of articles rather than treatments. In
|
|
255
|
-
other words, it''s not the treatments themselves that people are citing.</p>\n\n<p>To
|
|
256
|
-
be clear, I think there is value in being able to identify those \"well defined
|
|
257
|
-
sections\" of a publication that deal with a given taxon (i.e., treatments),
|
|
258
|
-
but it''s not clear to me that these are actually the citable units people
|
|
259
|
-
might hope them to be. Likewise, journals such as <i>ZooKeys</i> have DOIs
|
|
260
|
-
for individual figures. Does anyone actually cite those?</p>","tags":[],"language":"en","references":[]},{"id":"https://doi.org/10.59350/pmhat-5ky65","uuid":"5891c709-d139-440f-bacb-06244424587a","url":"https://iphylo.blogspot.com/2021/10/problems-with-plazi-parsing-how.html","title":"Problems
|
|
261
|
-
with Plazi parsing: how reliable are automated methods for extracting specimens
|
|
262
|
-
from the literature?","summary":"The Plazi project has become one of the major
|
|
263
|
-
contributors to GBIF with some 36,000 datasets yielding some 500,000 occurrences
|
|
264
|
-
(see Plazi''s GBIF page for details). These occurrences are extracted from
|
|
265
|
-
taxonomic publication using automated methods. New data is published almost
|
|
266
|
-
daily (see latest treatments). The map below shows the geographic distribution
|
|
267
|
-
of material citations provided to GBIF by Plazi, which gives you a sense of
|
|
268
|
-
the size of the dataset. By any metric Plazi represents a...","date_published":"2021-10-25T11:10:00Z","date_modified":"2021-10-28T16:08:18Z","date_indexed":"1970-01-01T00:00:00+00:00","authors":[{"url":null,"name":"Roderic
|
|
269
|
-
Page"}],"image":null,"content_html":"<div class=\"separator\" style=\"clear:
|
|
270
|
-
both;\"><a href=\"https://1.bp.blogspot.com/-oiqSkA53FI4/YXaRAUpRgFI/AAAAAAAAgzY/PbPEOh_VlhIE0aaqqhVQPLAOE5pRULwJACLcBGAsYHQ/s240/Rf7UoXTw_400x400.jpg\"
|
|
271
|
-
style=\"display: block; padding: 1em 0; text-align: center; clear: right;
|
|
272
|
-
float: right;\"><img alt=\"\" border=\"0\" width=\"128\" data-original-height=\"240\"
|
|
273
|
-
data-original-width=\"240\" src=\"https://1.bp.blogspot.com/-oiqSkA53FI4/YXaRAUpRgFI/AAAAAAAAgzY/PbPEOh_VlhIE0aaqqhVQPLAOE5pRULwJACLcBGAsYHQ/s200/Rf7UoXTw_400x400.jpg\"/></a></div><p>The
|
|
274
|
-
<a href=\"http://plazi.org\">Plazi</a> project has become one of the major
|
|
275
|
-
contributors to GBIF with some 36,000 datasets yielding some 500,000 occurrences
|
|
276
|
-
(see <a href=\"https://www.gbif.org/publisher/7ce8aef0-9e92-11dc-8738-b8a03c50a862\">Plazi''s
|
|
277
|
-
GBIF page</a> for details). These occurrences are extracted from taxonomic
|
|
278
|
-
publication using automated methods. New data is published almost daily (see
|
|
279
|
-
<a href=\"https://tb.plazi.org/GgServer/static/newToday.html\">latest treatments</a>).
|
|
280
|
-
The map below shows the geographic distribution of material citations provided
|
|
281
|
-
to GBIF by Plazi, which gives you a sense of the size of the dataset.</p>\n\n<div
|
|
282
|
-
class=\"separator\" style=\"clear: both;\"><a href=\"https://1.bp.blogspot.com/-DCJ4HR8eej8/YXaRQnz22bI/AAAAAAAAgz4/AgRcree6jVgjtQL2ch7IXgtb_Xtx7fkngCPcBGAYYCw/s1030/Screenshot%2B2021-10-24%2Bat%2B20.43.23.png\"
|
|
283
|
-
style=\"display: block; padding: 1em 0; text-align: center; \"><img alt=\"\"
|
|
284
|
-
border=\"0\" width=\"400\" data-original-height=\"514\" data-original-width=\"1030\"
|
|
285
|
-
src=\"https://1.bp.blogspot.com/-DCJ4HR8eej8/YXaRQnz22bI/AAAAAAAAgz4/AgRcree6jVgjtQL2ch7IXgtb_Xtx7fkngCPcBGAYYCw/s400/Screenshot%2B2021-10-24%2Bat%2B20.43.23.png\"/></a></div>\n\n<p>By
|
|
286
|
-
any metric Plazi represents a considerable achievement. But often when I browse
|
|
287
|
-
individual records on Plazi I find records that seem clearly incorrect. Text
|
|
288
|
-
mining the literature is a challenging problem, but at the moment Plazi seems
|
|
289
|
-
something of a \"black box\". PDFs go in, the content is mined, and data comes
|
|
290
|
-
up to be displayed on the Plazi web site and uploaded to GBIF. Nowhere does
|
|
291
|
-
there seem to be an evaluation of how accurate this text mining actually is.
|
|
292
|
-
Anecdotally it seems to work well in some cases, but in others it produces
|
|
293
|
-
what can only be described as bogus records.</p>\n\n<h2>Finding errors</h2>\n\n<p>A
|
|
294
|
-
treatment in Plazi is a block of text (and sometimes illustrations) that refers
|
|
295
|
-
to a single taxon. Often that text will include a description of the taxon,
|
|
296
|
-
and list one or more specimens that have been examined. These lists of specimens
|
|
297
|
-
(\"material citations\") are one of the key bits of information that Plaza
|
|
298
|
-
extracts from a treatment as these citations get fed into GBIF as occurrences.</p>\n\n<p>To
|
|
299
|
-
help explore treatments I''ve constructed a simple web site that takes the
|
|
300
|
-
Plazi identifier for a treatment and displays that treatment with the material
|
|
301
|
-
citations highlighted. For example, for the Plazi treatment <a href=\"https://tb.plazi.org/GgServer/html/03B5A943FFBB6F02FE27EC94FABEEAE7\">03B5A943FFBB6F02FE27EC94FABEEAE7</a>
|
|
302
|
-
you can view the marked up version at <a href=\"https://plazi-tester.herokuapp.com/?uri=622F7788-F0A4-449D-814A-5B49CD20B228\">https://plazi-tester.herokuapp.com/?uri=622F7788-F0A4-449D-814A-5B49CD20B228</a>.
|
|
303
|
-
Below is an example of a material citation with its component parts tagged:</p>\n\n<div
|
|
304
|
-
class=\"separator\" style=\"clear: both;\"><a href=\"https://1.bp.blogspot.com/-NIGuo9BggdA/YXaRQQrv0QI/AAAAAAAAgz4/SZDcA1jZSN47JMRTWDwSMRpHUShrCeOdgCPcBGAYYCw/s693/Screenshot%2B2021-10-24%2Bat%2B20.59.56.png\"
|
|
305
|
-
style=\"display: block; padding: 1em 0; text-align: center; \"><img alt=\"\"
|
|
306
|
-
border=\"0\" width=\"400\" data-original-height=\"94\" data-original-width=\"693\"
|
|
307
|
-
src=\"https://1.bp.blogspot.com/-NIGuo9BggdA/YXaRQQrv0QI/AAAAAAAAgz4/SZDcA1jZSN47JMRTWDwSMRpHUShrCeOdgCPcBGAYYCw/s400/Screenshot%2B2021-10-24%2Bat%2B20.59.56.png\"/></a></div>\n\n<p>This
|
|
308
|
-
is an example where Plazi has successfully parsed the specimen. But I keep
|
|
309
|
-
coming across cases where specimens have not been parsed correctly, resulting
|
|
310
|
-
in issues such as single specimens being split into multiple records (e.g., <a
|
|
311
|
-
href=\"https://plazi-tester.herokuapp.com/?uri=5244B05EFFC8E20F7BC32056C178F496\">https://plazi-tester.herokuapp.com/?uri=5244B05EFFC8E20F7BC32056C178F496</a>),
|
|
312
|
-
geographical coordinates being misinterpreted (e.g., <a href=\"https://plazi-tester.herokuapp.com/?uri=0D228E6AFFC2FFEFFF4DE8118C4EE6B9\">https://plazi-tester.herokuapp.com/?uri=0D228E6AFFC2FFEFFF4DE8118C4EE6B9</a>),
|
|
313
|
-
or collector''s initials being confused with codes for natural history collections
|
|
314
|
-
(e.g., <a href=\"https://plazi-tester.herokuapp.com/?uri=252C87918B362C05FF20F8C5BFCB3D4E\">https://plazi-tester.herokuapp.com/?uri=252C87918B362C05FF20F8C5BFCB3D4E</a>).</p>\n\n<p>Parsing
|
|
315
|
-
specimens is a hard problem so it''s not unexpected to find errors. But they
|
|
316
|
-
do seem common enough to be easily found, which raises the question of just
|
|
317
|
-
what percentage of these material citations are correct? How much of the
|
|
318
|
-
data Plazi feeds to GBIF is correct? How would we know?</p>\n\n<h2>Systemic
|
|
319
|
-
problems</h2>\n\n<p>Some of the errors I''ve found concern the interpretation
|
|
320
|
-
of the parsed data. For example, it is striking that despite including marine
|
|
321
|
-
taxa <b>no</b> Plazi record has a value for depth below sea level (see <a
|
|
322
|
-
href=\"https://www.gbif.org/occurrence/map?depth=0,9999&publishing_org=7ce8aef0-9e92-11dc-8738-b8a03c50a862&advanced=1\">GBIF
|
|
323
|
-
search on depth range 0-9999 for Plazi</a>). But <a href=\"https://www.gbif.org/occurrence/map?elevation=0,9999&publishing_org=7ce8aef0-9e92-11dc-8738-b8a03c50a862&advanced=1\">many
|
|
324
|
-
records do have an elevation</a>, including records from marine environments.
|
|
325
|
-
Any record that has a depth value is interpreted by Plazi as being elevation,
|
|
326
|
-
so we have aerial crustacea and fish.</p>\n\n<h3>Map of Plazi records with
|
|
327
|
-
depth 0-9999m</h3>\n<div class=\"separator\" style=\"clear: both;\"><a href=\"https://1.bp.blogspot.com/-GD4pPtPCxVc/YXaRQ9bdn1I/AAAAAAAAgz8/A9YsypSvHfwWKAjDxSdeFVUkou88LGItACPcBGAYYCw/s673/Screenshot%2B2021-10-25%2Bat%2B12.03.51.png\"
|
|
328
|
-
style=\"display: block; padding: 1em 0; text-align: center; \"><img alt=\"\"
|
|
329
|
-
border=\"0\" width=\"400\" data-original-height=\"258\" data-original-width=\"673\"
|
|
330
|
-
src=\"https://1.bp.blogspot.com/-GD4pPtPCxVc/YXaRQ9bdn1I/AAAAAAAAgz8/A9YsypSvHfwWKAjDxSdeFVUkou88LGItACPcBGAYYCw/s320/Screenshot%2B2021-10-25%2Bat%2B12.03.51.png\"/></a></div>\n\n<h3>Map
|
|
331
|
-
of Plazi records with elevation 0-9999m </h3>\n<div class=\"separator\" style=\"clear:
|
|
332
|
-
both;\"><a href=\"https://1.bp.blogspot.com/-BReSHtXTOkA/YXaRRKFW7dI/AAAAAAAAg0A/-FcBkMwyswIp0siWGVX3MNMANs7UkZFtwCPcBGAYYCw/s675/Screenshot%2B2021-10-25%2Bat%2B12.04.06.png\"
|
|
333
|
-
style=\"display: block; padding: 1em 0; text-align: center; \"><img alt=\"\"
|
|
334
|
-
border=\"0\" width=\"400\" data-original-height=\"256\" data-original-width=\"675\"
|
|
335
|
-
src=\"https://1.bp.blogspot.com/-BReSHtXTOkA/YXaRRKFW7dI/AAAAAAAAg0A/-FcBkMwyswIp0siWGVX3MNMANs7UkZFtwCPcBGAYYCw/s320/Screenshot%2B2021-10-25%2Bat%2B12.04.06.png\"/></a></div>\n\n<p>Anecdotally
|
|
336
|
-
I''ve also noticed that Plazi seems to do well on zoological data, especially
|
|
337
|
-
journals like <i>Zootaxa</i>, but it often struggles with botanical specimens.
|
|
338
|
-
Botanists tend to cite specimens rather differently to zoologists (botanists
|
|
339
|
-
emphasise collector numbers rather than specimen codes). Hence data quality
|
|
340
|
-
in Plazi is likely to taxonomic biased.</p>\n\n<p>Plazi is <a href=\"https://github.com/plazi/community/issues\">using
|
|
341
|
-
GitHub to track issues with treatments</a> so feedback on erroneous records
|
|
342
|
-
is possible, but this seems inadequate to the task. There are tens of thousands
|
|
343
|
-
of data sets, with more being released daily, and hundreds of thousands of
|
|
344
|
-
occurrences, and relying on GitHub issues devolves the responsibility for
|
|
345
|
-
error checking onto the data users. I don''t have a measure of how many records
|
|
346
|
-
in Plazi have problems, but because I suspect it is a significant fraction
|
|
347
|
-
because for any given day''s output I can typically find errors.</p>\n\n<h2>What
|
|
348
|
-
to do?</h2>\n\n<p>Faced with a process that generates noisy data there are
|
|
349
|
-
several of things we could do:</p>\n\n<ol>\n<li>Have tools to detect and flag
|
|
350
|
-
errors made in generating the data.</li>\n<li>Have the data generator give
|
|
351
|
-
estimates the confidence of its results.</li>\n<li>Improve the data generator.</li>\n</ol>\n\n<p>I
|
|
352
|
-
think a comparison with the problem of parsing bibliographic references might
|
|
353
|
-
be instructive here. There is a long history of people developing tools to
|
|
354
|
-
parse references (<a href=\"https://iphylo.blogspot.com/2021/05/finding-citations-of-specimens.html\">I''ve
|
|
355
|
-
even had a go</a>). State-of-the art tools such as <a href=\"https://anystyle.io\">AnyStyle</a>
|
|
356
|
-
feature machine learning, and are tested against <a href=\"https://github.com/inukshuk/anystyle/blob/master/res/parser/core.xml\">human
|
|
357
|
-
curated datasets</a> of tagged bibliographic records. This means we can evaluate
|
|
358
|
-
the performance of a method (how well does it retrieve the same results as
|
|
359
|
-
human experts?) and also improve the method by expanding the corpus of training
|
|
360
|
-
data. Some of these tools can provide a measures of how confident they are
|
|
361
|
-
when classifying a string as, say, a person''s name, which means we could
|
|
362
|
-
flag potential issues for anyone wanting to use that record.</p>\n\n<p>We
|
|
363
|
-
don''t have equivalent tools for parsing specimens in the literature, and
|
|
364
|
-
hence have no easy way to quantify how good existing methods are, nor do we
|
|
365
|
-
have a public corpus of material citations that we can use as training data.
|
|
366
|
-
I <a href=\"https://iphylo.blogspot.com/2021/05/finding-citations-of-specimens.html\">blogged
|
|
367
|
-
about this</a> a few months ago and was considering using Plazi as a source
|
|
368
|
-
of marked up specimen data to use for training. However based on what I''ve
|
|
369
|
-
looked at so far Plazi''s data would need to be carefully scrutinised before
|
|
370
|
-
it could be used as training data.</p>\n\n<p>Going forward, I think it would
|
|
371
|
-
be desirable to have a set of records that can be used to benchmark specimen
|
|
372
|
-
parsers, and ideally have the parsers themselves available as web services
|
|
373
|
-
so that anyone can evaluate them. Even better would be a way to contribute
|
|
374
|
-
to the training data so that these tools improve over time.</p>\n\n<p>Plazi''s
|
|
375
|
-
data extraction tools are mostly desktop-based, that is, you need to download
|
|
376
|
-
software to use their methods. However, there are experimental web services
|
|
377
|
-
available as well. I''ve created a simple wrapper around the material citation
|
|
378
|
-
parser, you can try it at <a href=\"https://plazi-tester.herokuapp.com/parser.php\">https://plazi-tester.herokuapp.com/parser.php</a>.
|
|
379
|
-
It takes a single material citation and returns a version with elements such
|
|
380
|
-
as specimen code and collector name tagged in different colours.</p>\n\n<h2>Summary</h2>\n\n<p>Text
|
|
381
|
-
mining the taxonomic literature is clearly a gold mine of data, but at the
|
|
382
|
-
same time it is potentially fraught as we try and extract structured data
|
|
383
|
-
from semi-structured text. Plazi has demonstrated that it is possible to extract
|
|
384
|
-
a lot of data from the literature, but at the same time the quality of that
|
|
385
|
-
data seems highly variable. Even minor issues in parsing text can have big
|
|
386
|
-
implications for data quality (e.g., marine organisms apparently living above
|
|
387
|
-
sea level). Historically in biodiversity informatics we have favoured data
|
|
388
|
-
quantity over data quality. Quantity has an obvious metric, and has milestones
|
|
389
|
-
we can celebrate (e.g., <a href=\"GBIF at 1 billion - what''s next?\">one
|
|
390
|
-
billion specimens</a>). There aren''t really any equivalent metrics for data
|
|
391
|
-
quality.</p>\n\n<p>Adding new types of data can sometimes initially result
|
|
392
|
-
in a new set of quality issues (e.g., <a href=\"https://iphylo.blogspot.com/2019/12/gbif-metagenomics-and-metacrap.html\">GBIF
|
|
393
|
-
metagenomics and metacrap</a>) that take time to resolve. In the case of Plazi,
|
|
394
|
-
I think it would be worthwhile to quantify just how many records have errors,
|
|
395
|
-
and develop benchmarks that we can use to test methods for extracting specimen
|
|
396
|
-
data from text. If we don''t do this then there will remain uncertainty as
|
|
397
|
-
to how much trust we can place in data mined from the taxonomic literature.</p>\n\n<h2>Update</h2>\n\nPlazi
|
|
398
|
-
has responded, see <a href=\"http://plazi.org/posts/2021/10/liberation-first-step-toward-quality/\">Liberating
|
|
399
|
-
material citations as a first step to more better data</a>. My reading of
|
|
400
|
-
their repsonse is that it essentially just reiterates Plazi''s approach and
|
|
401
|
-
doesn''t tackle the underlying issue: their method for extracting material
|
|
402
|
-
citations is error prone, and many of those errors end up in GBIF.","tags":["data
|
|
403
|
-
quality","parsing","Plazi","specimen","text mining"],"language":"en","references":[]},{"id":"https://doi.org/10.59350/j77nc-e8x98","uuid":"c6b101f4-bfbc-4d01-921d-805c43c85757","url":"https://iphylo.blogspot.com/2022/08/linking-taxonomic-names-to-literature.html","title":"Linking
|
|
404
|
-
taxonomic names to the literature","summary":"Just some thoughts as I work
|
|
405
|
-
through some datasets linking taxonomic names to the literature. In the diagram
|
|
406
|
-
above I''ve tried to capture the different situatios I encounter. Much of
|
|
407
|
-
the work I''ve done on this has focussed on case 1 in the diagram: I want
|
|
408
|
-
to link a taxonomic name to an identifier for the work in which that name
|
|
409
|
-
was published. In practise this means linking names to DOIs. This has the
|
|
410
|
-
advantage of linking to a citable indentifier, raising questions such as whether
|
|
411
|
-
citations...","date_published":"2022-08-22T17:19:00Z","date_modified":"2022-08-22T17:19:08Z","date_indexed":"1909-06-16T08:21:41+00:00","authors":[{"url":null,"name":"Roderic
|
|
412
|
-
Page"}],"image":null,"content_html":"Just some thoughts as I work through
|
|
413
|
-
some datasets linking taxonomic names to the literature.\n\n<div class=\"separator\"
|
|
414
|
-
style=\"clear: both;\"><a href=\"https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi5ZkbnumEWKZpf0isei_hUJucFlOOK08-SKnJknD8B4qhAX36u-vMQRnZdhRJK5rPb7HNYcnqB7qE4agbeStqyzMkWHrzUj14gPkz2ohmbOVg8P_nHo0hM94PD1wH3SPJsaLAumN8vih3ch9pjH2RaVWZBLwwhGhNu0FS1m5z6j5xt2NeZ4w/s2140/linking%20to%20names144.jpg\"
|
|
415
|
-
style=\"display: block; padding: 1em 0; text-align: center; \"><img alt=\"\"
|
|
416
|
-
border=\"0\" height=\"600\" data-original-height=\"2140\" data-original-width=\"1604\"
|
|
417
|
-
src=\"https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi5ZkbnumEWKZpf0isei_hUJucFlOOK08-SKnJknD8B4qhAX36u-vMQRnZdhRJK5rPb7HNYcnqB7qE4agbeStqyzMkWHrzUj14gPkz2ohmbOVg8P_nHo0hM94PD1wH3SPJsaLAumN8vih3ch9pjH2RaVWZBLwwhGhNu0FS1m5z6j5xt2NeZ4w/s600/linking%20to%20names144.jpg\"/></a></div>\n\n<p>In
|
|
418
|
-
the diagram above I''ve tried to capture the different situatios I encounter.
|
|
419
|
-
Much of the work I''ve done on this has focussed on case 1 in the diagram:
|
|
420
|
-
I want to link a taxonomic name to an identifier for the work in which that
|
|
421
|
-
name was published. In practise this means linking names to DOIs. This has
|
|
422
|
-
the advantage of linking to a citable indentifier, raising questions such
|
|
423
|
-
as whether citations of taxonmic papers by taxonomic databases could become
|
|
424
|
-
part of a <a href=\"https://iphylo.blogspot.com/2022/08/papers-citing-data-that-cite-papers.html\">taxonomist''s
|
|
425
|
-
Google Scholar profile</a>.</p>\n\n<p>In many taxonomic databases full work-level
|
|
426
|
-
citations are not the norm, instead taxonomists cite one or more pages within
|
|
427
|
-
a work that are relevant to a taxonomic name. These \"microcitations\" (what
|
|
428
|
-
the U.S. legal profession refer to as \"point citations\" or \"pincites\", see
|
|
429
|
-
<a href=\"https://rasmussen.libanswers.com/faq/283203\">What are pincites,
|
|
430
|
-
pinpoints, or jump legal references?</a>) require some work to map to the
|
|
431
|
-
work itself (which is typically the thing that has a citatble identifier such
|
|
432
|
-
as a DOI).</p>\n\n<p>Microcitations (case 2 in the diagram above) can be quite
|
|
433
|
-
complex. Some might simply mention a single page, but others might list a
|
|
434
|
-
series of (not necessarily contiguous) pages, as well as figures, plates etc.
|
|
435
|
-
Converting these to citable identifiers can be tricky, especially as in most
|
|
436
|
-
cases we don''t have page-level identifiers. The Biodiversity Heritage Library
|
|
437
|
-
(BHL) does have URLs for each scanned page, and we have a standard for referring
|
|
438
|
-
to pages in a PDF (<code>page=<pageNum></code>, see <a href=\"https://datatracker.ietf.org/doc/html/rfc8118\">RFC
|
|
439
|
-
8118</a>). But how do we refer to a set of pages? Do we pick the first page?
|
|
440
|
-
Do we try and represent a set of pages, and if so, how?</p>\n\n<p>Another
|
|
441
|
-
issue with page-level identifiers is that not everything on a given page may
|
|
442
|
-
be relevant to the taxonomic name. In case 2 above I''ve shaded in the parts
|
|
443
|
-
of the pages and figure that refer to the taxonomic name. An example where
|
|
444
|
-
this can be problematic is the recent test case I created for BHL where a
|
|
445
|
-
page image was included for the taxonomic name <a href=\"https://www.gbif.org/species/195763322\"><i>Aphrophora
|
|
446
|
-
impressa</i></a>. The image includes the species description and a illustration,
|
|
447
|
-
as well as text that relates to other species.</p>\n\n<div class=\"separator\"
|
|
448
|
-
style=\"clear: both;\"><a href=\"https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg1SAv1XiVBPHXeMLPfdsnTh4Sj4m0AQyqjXM0faXyvbNhBXFDl7SawjFIkMo3qz4RQhDEyZXkKAh5nF4gtfHbVXA6cK3NJ46CuIWECC6HmJKtTjoZ0M3QQXvLY_X-2-RecjBqEy68M0cEr0ph3l6KY51kA9BvGt9d4id314P71PBitpWMATg/s3467/https---www.biodiversitylibrary.org-pageimage-29138463.jpeg\"
|
|
449
|
-
style=\"display: block; padding: 1em 0; text-align: center; \"><img alt=\"\"
|
|
450
|
-
border=\"0\" height=\"400\" data-original-height=\"3467\" data-original-width=\"2106\"
|
|
451
|
-
src=\"https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg1SAv1XiVBPHXeMLPfdsnTh4Sj4m0AQyqjXM0faXyvbNhBXFDl7SawjFIkMo3qz4RQhDEyZXkKAh5nF4gtfHbVXA6cK3NJ46CuIWECC6HmJKtTjoZ0M3QQXvLY_X-2-RecjBqEy68M0cEr0ph3l6KY51kA9BvGt9d4id314P71PBitpWMATg/s400/https---www.biodiversitylibrary.org-pageimage-29138463.jpeg\"/></a></div>\n\n<p>Given
|
|
452
|
-
that not everything on a page need be relevant, we could extract just the
|
|
453
|
-
relevant blocks of text and illustrations (e.g., paragraphs of text, panels
|
|
454
|
-
within a figure, etc.) and treat that set of elements as the thing to cite.
|
|
455
|
-
This is, of course, what <a href=\"http://plazi.org\">Plazi</a> are doing.
|
|
456
|
-
The set of extracted blocks is glued together as a \"treatment\", assigned
|
|
457
|
-
an identifier (often a DOI), and treated as a citable unit. It would be interesting
|
|
458
|
-
to see to what extent these treatments are actually cited, for example, do
|
|
459
|
-
subsequent revisions that cite work that include treatments cite those treatments,
|
|
460
|
-
or just the work itself? Put another way, are we creating <a href=\"https://iphylo.blogspot.com/2012/09/decoding-nature-encode-ipad-app-omg-it.html\">\"threads\"</a>
|
|
461
|
-
between taxonomic revisions?</p>\n\n<p>One reason for these notes is that
|
|
462
|
-
I''m exploring uploading taxonomic name - literature links to <a href=\"https://www.checklistbank.org\">ChecklistBank</a>
|
|
463
|
-
and case 1 above is easy, as is case 3 (if we have treatment-level identifiers).
|
|
464
|
-
But case 2 is problematic because we are linking to a set of things that may
|
|
465
|
-
not have an identifier, which means a decision has to be made about which
|
|
466
|
-
page to link to, and how to refer to that page.</p>","tags":[],"language":"en","references":[]},{"id":"https://doi.org/10.59350/ymc6x-rx659","uuid":"0807f515-f31d-4e2c-9e6f-78c3a9668b9d","url":"https://iphylo.blogspot.com/2022/09/dna-barcoding-as-intergenerational.html","title":"DNA
|
|
53
|
+
7.00","category":"Natural Sciences","backlog":true,"prefix":"10.59350","items":[{"id":"https://doi.org/10.59350/ymc6x-rx659","uuid":"0807f515-f31d-4e2c-9e6f-78c3a9668b9d","url":"https://iphylo.blogspot.com/2022/09/dna-barcoding-as-intergenerational.html","title":"DNA
|
|
467
54
|
barcoding as intergenerational transfer of taxonomic knowledge","summary":"I
|
|
468
55
|
tweeted about this but want to bookmark it for later as well. The paper “A
|
|
469
56
|
molecular-based identification resource for the arthropods of Finland” doi:10.1111/1755-0998.13510
|
|
@@ -993,5 +580,5 @@ http_interactions:
|
|
|
993
580
|
simpler overlay on top of SPARQL so that we can retrieve the data we want
|
|
994
581
|
without having to learn the intricacies of SPARQL, nor how Wikidata models
|
|
995
582
|
publications and people.</p>","tags":["GraphQL","SPARQL","WikiCite","Wikidata"],"language":"en","references":[]}]}'
|
|
996
|
-
recorded_at: Sun, 18 Jun 2023
|
|
583
|
+
recorded_at: Sun, 18 Jun 2023 15:24:19 GMT
|
|
997
584
|
recorded_with: VCR 6.1.0
|