PyPI - academic-refchecker - Versions diffs - 1.2.44__tar.gz → 1.2.46__tar.gz - Mend

academic-refchecker 1.2.44tar.gz → 1.2.46tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (57) hide show

{academic_refchecker-1.2.44/src/academic_refchecker.egg-info → academic_refchecker-1.2.46}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: academic-refchecker
-Version: 1.2.44
+Version: 1.2.46
 Summary: A comprehensive tool for validating reference accuracy in academic papers
 Author-email: Mark Russinovich <markrussinovich@hotmail.com>
 License-Expression: MIT
@@ -78,7 +78,9 @@ A comprehensive tool for validating reference accuracy in academic papers, usefu
        Verified URL: https://www.semanticscholar.org/paper/5f4ac1ac7ca4b17d3db1b52d9aafd9e8b26c0d7
        ArXiv URL: https://arxiv.org/abs/1610.10099
        DOI URL: https://doi.org/10.48550/arxiv.1610.10099
-      ⚠️  Warning: Year mismatch: cited as 2017 but actually 2016
+      ⚠️  Warning: Year mismatch:
+               cited:  '2017'
+               actual: '2016'
 [2/45] Effective approaches to attention-based neural machine translation
        Minh-Thang Luong, Hieu Pham, Christopher D. Manning
@@ -87,7 +89,9 @@ A comprehensive tool for validating reference accuracy in academic papers, usefu
        Verified URL: https://www.semanticscholar.org/paper/93499a7c7f699b6630a86fad964536f9423bb6d0
        ArXiv URL: https://arxiv.org/abs/1508.04025
        DOI URL: https://doi.org/10.18653/v1/d15-1166
-      ❌ Error: First author mismatch: 'Minh-Thang Luong' vs 'Thang Luong'
+      ❌ Error: First author mismatch:
+               cited:  'Minh-Thang Luong'
+               actual: 'Thang Luong'
 [3/45] Deep Residual Learning for Image Recognition
        Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
@@ -98,7 +102,9 @@ A comprehensive tool for validating reference accuracy in academic papers, usefu
        Verified URL: https://www.semanticscholar.org/paper/2c03df8b48bf3fa39054345bafabfeff15bfd11d
        ArXiv URL: https://arxiv.org/abs/1512.03385
        DOI URL: https://doi.org/10.1109/CVPR.2016.90
-      ❌ Error: DOI mismatch: cited as '10.1109/CVPR.2016.91' but actually '10.1109/CVPR.2016.90'
+      ❌ Error: DOI mismatch:
+               cited:  '10.1109/CVPR.2016.91'
+               actual: '10.1109/CVPR.2016.90'
 ============================================================
 📋 SUMMARY
@@ -382,7 +388,9 @@ This enhanced URL display helps users access multiple authoritative sources for
            Verified URL: https://www.semanticscholar.org/paper/a1b2c3d4e5f6789012345678901234567890abcd
            ArXiv URL: https://arxiv.org/abs/2312.02119
            DOI URL: https://doi.org/10.48550/arxiv.2312.02119
-          ❌ Error: First author mismatch: 'T. Xie' vs 'Zhao Xu'
+          ❌ Error: First author mismatch:
+                   cited:  'T. Xie'
+                   actual: 'Zhao Xu'
     ```
   - `title`: Title discrepancies
     ```
@@ -392,7 +400,9 @@ This enhanced URL display helps users access multiple authoritative sources for
            Verified URL: https://www.semanticscholar.org/paper/df2b0e26d0599ce3e70df8a9da02e51594e0e992
            ArXiv URL: https://arxiv.org/abs/1810.04805
            DOI URL: https://doi.org/10.18653/v1/n19-1423
-          ❌ Error: Title mismatch: cited as 'BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding' but actually 'BERT: Pre-training of Deep Bidirectional Transformers for Language Comprehension'
+          ❌ Error: Title mismatch:
+                   cited:  'BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding'
+                   actual: 'BERT: Pre-training of Deep Bidirectional Transformers for Language Comprehension'
     ```
   - `arxiv_id`: Incorrect URLs or arXiv IDs
     ```
@@ -415,7 +425,9 @@ This enhanced URL display helps users access multiple authoritative sources for
            Verified URL: https://www.semanticscholar.org/paper/204e3073870fae3d05bcbc2f6a8e263d9b72e776
            ArXiv URL: https://arxiv.org/abs/1706.03762
            DOI URL: https://doi.org/10.48550/arXiv.1706.03762
-          ❌ Error: DOI mismatch: cited as '10.5555/3295222.3295349' but actually '10.48550/arXiv.1706.03762'
+          ❌ Error: DOI mismatch:
+                   cited:  '10.5555/3295222.3295349'
+                   actual: '10.48550/arXiv.1706.03762'
     ```
 - **⚠️ Warnings**: Minor issues that may need attention
@@ -428,7 +440,9 @@ This enhanced URL display helps users access multiple authoritative sources for
            Verified URL: https://www.semanticscholar.org/paper/f1a2b3c4d5e6f7890123456789012345678901ab
            ArXiv URL: https://arxiv.org/abs/2310.03684
            DOI URL: https://doi.org/10.48550/arxiv.2310.03684
-          ⚠️  Warning: Year mismatch: cited as 2024 but actually 2023
+          ⚠️  Warning: Year mismatch:
+                   cited:  '2024'
+                   actual: '2023'
     ```
   - `venue`: Venue format variations
     ```
@@ -439,7 +453,9 @@ This enhanced URL display helps users access multiple authoritative sources for
            Verified URL: https://www.semanticscholar.org/paper/c1d2e3f4a5b6c7d8e9f0123456789012345678ab
            ArXiv URL: https://arxiv.org/abs/2403.02151
            DOI URL: https://doi.org/10.48550/arxiv.2403.02151
-          ⚠️  Warning: Venue mismatch: cited as 'arXiv, 2024' but actually 'Neural Information Processing Systems'
+          ⚠️  Warning: Venue mismatch:
+                   cited:  'arXiv, 2024'
+                   actual: 'Neural Information Processing Systems'
     ```
 - **❓ Unverified**: References that couldn't be verified with any of the checker APIs

{academic_refchecker-1.2.44 → academic_refchecker-1.2.46}/README.md RENAMED Viewed

@@ -17,7 +17,9 @@ A comprehensive tool for validating reference accuracy in academic papers, usefu
        Verified URL: https://www.semanticscholar.org/paper/5f4ac1ac7ca4b17d3db1b52d9aafd9e8b26c0d7
        ArXiv URL: https://arxiv.org/abs/1610.10099
        DOI URL: https://doi.org/10.48550/arxiv.1610.10099
-      ⚠️  Warning: Year mismatch: cited as 2017 but actually 2016
+      ⚠️  Warning: Year mismatch:
+               cited:  '2017'
+               actual: '2016'
 [2/45] Effective approaches to attention-based neural machine translation
        Minh-Thang Luong, Hieu Pham, Christopher D. Manning
@@ -26,7 +28,9 @@ A comprehensive tool for validating reference accuracy in academic papers, usefu
        Verified URL: https://www.semanticscholar.org/paper/93499a7c7f699b6630a86fad964536f9423bb6d0
        ArXiv URL: https://arxiv.org/abs/1508.04025
        DOI URL: https://doi.org/10.18653/v1/d15-1166
-      ❌ Error: First author mismatch: 'Minh-Thang Luong' vs 'Thang Luong'
+      ❌ Error: First author mismatch:
+               cited:  'Minh-Thang Luong'
+               actual: 'Thang Luong'
 [3/45] Deep Residual Learning for Image Recognition
        Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
@@ -37,7 +41,9 @@ A comprehensive tool for validating reference accuracy in academic papers, usefu
        Verified URL: https://www.semanticscholar.org/paper/2c03df8b48bf3fa39054345bafabfeff15bfd11d
        ArXiv URL: https://arxiv.org/abs/1512.03385
        DOI URL: https://doi.org/10.1109/CVPR.2016.90
-      ❌ Error: DOI mismatch: cited as '10.1109/CVPR.2016.91' but actually '10.1109/CVPR.2016.90'
+      ❌ Error: DOI mismatch:
+               cited:  '10.1109/CVPR.2016.91'
+               actual: '10.1109/CVPR.2016.90'
 ============================================================
 📋 SUMMARY
@@ -321,7 +327,9 @@ This enhanced URL display helps users access multiple authoritative sources for
            Verified URL: https://www.semanticscholar.org/paper/a1b2c3d4e5f6789012345678901234567890abcd
            ArXiv URL: https://arxiv.org/abs/2312.02119
            DOI URL: https://doi.org/10.48550/arxiv.2312.02119
-          ❌ Error: First author mismatch: 'T. Xie' vs 'Zhao Xu'
+          ❌ Error: First author mismatch:
+                   cited:  'T. Xie'
+                   actual: 'Zhao Xu'
     ```
   - `title`: Title discrepancies
     ```
@@ -331,7 +339,9 @@ This enhanced URL display helps users access multiple authoritative sources for
            Verified URL: https://www.semanticscholar.org/paper/df2b0e26d0599ce3e70df8a9da02e51594e0e992
            ArXiv URL: https://arxiv.org/abs/1810.04805
            DOI URL: https://doi.org/10.18653/v1/n19-1423
-          ❌ Error: Title mismatch: cited as 'BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding' but actually 'BERT: Pre-training of Deep Bidirectional Transformers for Language Comprehension'
+          ❌ Error: Title mismatch:
+                   cited:  'BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding'
+                   actual: 'BERT: Pre-training of Deep Bidirectional Transformers for Language Comprehension'
     ```
   - `arxiv_id`: Incorrect URLs or arXiv IDs
     ```
@@ -354,7 +364,9 @@ This enhanced URL display helps users access multiple authoritative sources for
            Verified URL: https://www.semanticscholar.org/paper/204e3073870fae3d05bcbc2f6a8e263d9b72e776
            ArXiv URL: https://arxiv.org/abs/1706.03762
            DOI URL: https://doi.org/10.48550/arXiv.1706.03762
-          ❌ Error: DOI mismatch: cited as '10.5555/3295222.3295349' but actually '10.48550/arXiv.1706.03762'
+          ❌ Error: DOI mismatch:
+                   cited:  '10.5555/3295222.3295349'
+                   actual: '10.48550/arXiv.1706.03762'
     ```
 - **⚠️ Warnings**: Minor issues that may need attention
@@ -367,7 +379,9 @@ This enhanced URL display helps users access multiple authoritative sources for
            Verified URL: https://www.semanticscholar.org/paper/f1a2b3c4d5e6f7890123456789012345678901ab
            ArXiv URL: https://arxiv.org/abs/2310.03684
            DOI URL: https://doi.org/10.48550/arxiv.2310.03684
-          ⚠️  Warning: Year mismatch: cited as 2024 but actually 2023
+          ⚠️  Warning: Year mismatch:
+                   cited:  '2024'
+                   actual: '2023'
     ```
   - `venue`: Venue format variations
     ```
@@ -378,7 +392,9 @@ This enhanced URL display helps users access multiple authoritative sources for
            Verified URL: https://www.semanticscholar.org/paper/c1d2e3f4a5b6c7d8e9f0123456789012345678ab
            ArXiv URL: https://arxiv.org/abs/2403.02151
            DOI URL: https://doi.org/10.48550/arxiv.2403.02151
-          ⚠️  Warning: Venue mismatch: cited as 'arXiv, 2024' but actually 'Neural Information Processing Systems'
+          ⚠️  Warning: Venue mismatch:
+                   cited:  'arXiv, 2024'
+                   actual: 'Neural Information Processing Systems'
     ```
 - **❓ Unverified**: References that couldn't be verified with any of the checker APIs

{academic_refchecker-1.2.44 → academic_refchecker-1.2.46}/src/__version__.py RENAMED Viewed

@@ -1,3 +1,3 @@
 """Version information for RefChecker."""
-__version__ = "1.2.44"
+__version__ = "1.2.46"

{academic_refchecker-1.2.44 → academic_refchecker-1.2.46/src/academic_refchecker.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: academic-refchecker
-Version: 1.2.44
+Version: 1.2.46
 Summary: A comprehensive tool for validating reference accuracy in academic papers
 Author-email: Mark Russinovich <markrussinovich@hotmail.com>
 License-Expression: MIT
@@ -78,7 +78,9 @@ A comprehensive tool for validating reference accuracy in academic papers, usefu
        Verified URL: https://www.semanticscholar.org/paper/5f4ac1ac7ca4b17d3db1b52d9aafd9e8b26c0d7
        ArXiv URL: https://arxiv.org/abs/1610.10099
        DOI URL: https://doi.org/10.48550/arxiv.1610.10099
-      ⚠️  Warning: Year mismatch: cited as 2017 but actually 2016
+      ⚠️  Warning: Year mismatch:
+               cited:  '2017'
+               actual: '2016'
 [2/45] Effective approaches to attention-based neural machine translation
        Minh-Thang Luong, Hieu Pham, Christopher D. Manning
@@ -87,7 +89,9 @@ A comprehensive tool for validating reference accuracy in academic papers, usefu
        Verified URL: https://www.semanticscholar.org/paper/93499a7c7f699b6630a86fad964536f9423bb6d0
        ArXiv URL: https://arxiv.org/abs/1508.04025
        DOI URL: https://doi.org/10.18653/v1/d15-1166
-      ❌ Error: First author mismatch: 'Minh-Thang Luong' vs 'Thang Luong'
+      ❌ Error: First author mismatch:
+               cited:  'Minh-Thang Luong'
+               actual: 'Thang Luong'
 [3/45] Deep Residual Learning for Image Recognition
        Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
@@ -98,7 +102,9 @@ A comprehensive tool for validating reference accuracy in academic papers, usefu
        Verified URL: https://www.semanticscholar.org/paper/2c03df8b48bf3fa39054345bafabfeff15bfd11d
        ArXiv URL: https://arxiv.org/abs/1512.03385
        DOI URL: https://doi.org/10.1109/CVPR.2016.90
-      ❌ Error: DOI mismatch: cited as '10.1109/CVPR.2016.91' but actually '10.1109/CVPR.2016.90'
+      ❌ Error: DOI mismatch:
+               cited:  '10.1109/CVPR.2016.91'
+               actual: '10.1109/CVPR.2016.90'
 ============================================================
 📋 SUMMARY
@@ -382,7 +388,9 @@ This enhanced URL display helps users access multiple authoritative sources for
            Verified URL: https://www.semanticscholar.org/paper/a1b2c3d4e5f6789012345678901234567890abcd
            ArXiv URL: https://arxiv.org/abs/2312.02119
            DOI URL: https://doi.org/10.48550/arxiv.2312.02119
-          ❌ Error: First author mismatch: 'T. Xie' vs 'Zhao Xu'
+          ❌ Error: First author mismatch:
+                   cited:  'T. Xie'
+                   actual: 'Zhao Xu'
     ```
   - `title`: Title discrepancies
     ```
@@ -392,7 +400,9 @@ This enhanced URL display helps users access multiple authoritative sources for
            Verified URL: https://www.semanticscholar.org/paper/df2b0e26d0599ce3e70df8a9da02e51594e0e992
            ArXiv URL: https://arxiv.org/abs/1810.04805
            DOI URL: https://doi.org/10.18653/v1/n19-1423
-          ❌ Error: Title mismatch: cited as 'BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding' but actually 'BERT: Pre-training of Deep Bidirectional Transformers for Language Comprehension'
+          ❌ Error: Title mismatch:
+                   cited:  'BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding'
+                   actual: 'BERT: Pre-training of Deep Bidirectional Transformers for Language Comprehension'
     ```
   - `arxiv_id`: Incorrect URLs or arXiv IDs
     ```
@@ -415,7 +425,9 @@ This enhanced URL display helps users access multiple authoritative sources for
            Verified URL: https://www.semanticscholar.org/paper/204e3073870fae3d05bcbc2f6a8e263d9b72e776
            ArXiv URL: https://arxiv.org/abs/1706.03762
            DOI URL: https://doi.org/10.48550/arXiv.1706.03762
-          ❌ Error: DOI mismatch: cited as '10.5555/3295222.3295349' but actually '10.48550/arXiv.1706.03762'
+          ❌ Error: DOI mismatch:
+                   cited:  '10.5555/3295222.3295349'
+                   actual: '10.48550/arXiv.1706.03762'
     ```
 - **⚠️ Warnings**: Minor issues that may need attention
@@ -428,7 +440,9 @@ This enhanced URL display helps users access multiple authoritative sources for
            Verified URL: https://www.semanticscholar.org/paper/f1a2b3c4d5e6f7890123456789012345678901ab
            ArXiv URL: https://arxiv.org/abs/2310.03684
            DOI URL: https://doi.org/10.48550/arxiv.2310.03684
-          ⚠️  Warning: Year mismatch: cited as 2024 but actually 2023
+          ⚠️  Warning: Year mismatch:
+                   cited:  '2024'
+                   actual: '2023'
     ```
   - `venue`: Venue format variations
     ```
@@ -439,7 +453,9 @@ This enhanced URL display helps users access multiple authoritative sources for
            Verified URL: https://www.semanticscholar.org/paper/c1d2e3f4a5b6c7d8e9f0123456789012345678ab
            ArXiv URL: https://arxiv.org/abs/2403.02151
            DOI URL: https://doi.org/10.48550/arxiv.2403.02151
-          ⚠️  Warning: Venue mismatch: cited as 'arXiv, 2024' but actually 'Neural Information Processing Systems'
+          ⚠️  Warning: Venue mismatch:
+                   cited:  'arXiv, 2024'
+                   actual: 'Neural Information Processing Systems'
     ```
 - **❓ Unverified**: References that couldn't be verified with any of the checker APIs

{academic_refchecker-1.2.44 → academic_refchecker-1.2.46}/src/checkers/crossref.py RENAMED Viewed

@@ -31,6 +31,7 @@ import re
 from typing import Dict, List, Tuple, Optional, Any, Union
 from urllib.parse import quote_plus
 from utils.text_utils import normalize_text, clean_title_basic, find_best_match, is_name_match, compare_authors, clean_title_for_search
+from utils.error_utils import format_year_mismatch, format_doi_mismatch
 from config.settings import get_config
 # Set up logging
@@ -478,21 +479,19 @@ class CrossRefReferenceChecker:
         if year and work_year and year != work_year:
             errors.append({
                 'warning_type': 'year',
-                'warning_details': f"Year mismatch: cited as {year} but actually {work_year}",
+                'warning_details': format_year_mismatch(year, work_year),
                 'ref_year_correct': work_year
             })
         # Verify DOI
         work_doi = work_data.get('DOI')
         if doi and work_doi:
-            # Normalize DOIs for comparison (remove URL prefix and trailing periods)
-            cited_doi_clean = doi.replace('https://doi.org/', '').replace('http://doi.org/', '').strip().rstrip('.')
-            work_doi_clean = work_doi.replace('https://doi.org/', '').replace('http://doi.org/', '').strip().rstrip('.')
-            if cited_doi_clean.lower() != work_doi_clean.lower():
+            # Compare DOIs using the proper comparison function
+            from utils.doi_utils import compare_dois
+            if not compare_dois(doi, work_doi):
                 errors.append({
                     'error_type': 'doi',
-                    'error_details': f"DOI mismatch: cited as {doi} but actually {work_doi}",
+                    'error_details': format_doi_mismatch(doi, work_doi),
                     'ref_doi_correct': work_doi
                 })

{academic_refchecker-1.2.44 → academic_refchecker-1.2.46}/src/checkers/github_checker.py RENAMED Viewed

@@ -169,9 +169,14 @@ class GitHubChecker:
             if cited_title:
                 title_match = self._check_title_match(cited_title, actual_name, actual_description)
                 if not title_match:
+                    from utils.error_utils import format_title_mismatch
+                    details = format_title_mismatch(cited_title, actual_name)
+                    if actual_description:
+                        snippet = actual_description[:100] + ('...' if len(actual_description) > 100 else '')
+                        details += f" ({snippet})"
                     errors.append({
                         "warning_type": "title",
-                        "warning_details": f"Title mismatch: cited as '{cited_title}' but repository is '{actual_name}' ({actual_description[:100]}{'...' if len(actual_description) > 100 else ''})"
+                        "warning_details": details
                     })
             # Verify authors
@@ -180,9 +185,13 @@ class GitHubChecker:
                 author_str = ', '.join(cited_authors) if isinstance(cited_authors, list) else str(cited_authors)
                 author_match = self._check_author_match(author_str, actual_owner, actual_owner_name)
                 if not author_match:
+                    from utils.error_utils import format_three_line_mismatch
+                    left = author_str
+                    right = f"{actual_owner} ({actual_owner_name})" if actual_owner_name else actual_owner
+                    details = format_three_line_mismatch("Author mismatch", left, right)
                     errors.append({
                         "warning_type": "author",
-                        "warning_details": f"Author mismatch: cited as '{author_str}' but repository owner is '{actual_owner}' ({actual_owner_name})"
+                        "warning_details": details
                     })
             # Verify year
@@ -191,9 +200,10 @@ class GitHubChecker:
                 try:
                     cited_year_int = int(cited_year)
                     if cited_year_int < creation_year:
+                        from utils.error_utils import format_year_mismatch
                         errors.append({
                             "warning_type": "year",
-                            "warning_details": f"Year mismatch: cited as {cited_year} but repository created in {creation_year}",
+                            "warning_details": format_year_mismatch(cited_year, creation_year),
                             "ref_year_correct": str(creation_year)
                         })
                 except (ValueError, TypeError):

{academic_refchecker-1.2.44 → academic_refchecker-1.2.46}/src/checkers/openalex.py RENAMED Viewed

@@ -33,6 +33,7 @@ import re
 from typing import Dict, List, Tuple, Optional, Any, Union
 from urllib.parse import quote_plus
 from utils.text_utils import normalize_text, clean_title_basic, find_best_match, is_name_match, compare_authors, clean_title_for_search
+from utils.error_utils import format_year_mismatch, format_doi_mismatch
 from config.settings import get_config
 # Set up logging
@@ -448,7 +449,7 @@ class OpenAlexReferenceChecker:
         if year and work_year and year != work_year:
             errors.append({
                 'warning_type': 'year',
-                'warning_details': f"Year mismatch: cited as {year} but actually {work_year}",
+                'warning_details': format_year_mismatch(year, work_year),
                 'ref_year_correct': work_year
             })
@@ -458,14 +459,12 @@ class OpenAlexReferenceChecker:
             work_doi = work_data['ids']['doi']
         if doi and work_doi:
-            # Normalize DOIs for comparison (remove URL prefix and trailing periods)
-            cited_doi_clean = doi.replace('https://doi.org/', '').replace('http://doi.org/', '').strip().rstrip('.')
-            work_doi_clean = work_doi.replace('https://doi.org/', '').replace('http://doi.org/', '').strip().rstrip('.')
-            if cited_doi_clean.lower() != work_doi_clean.lower():
+            # Compare DOIs using the proper comparison function
+            from utils.doi_utils import compare_dois
+            if not compare_dois(doi, work_doi):
                 errors.append({
                     'error_type': 'doi',
-                    'error_details': f"DOI mismatch: cited as {doi} but actually {work_doi}",
+                    'error_details': format_doi_mismatch(doi, work_doi),
                     'ref_doi_correct': work_doi
                 })

{academic_refchecker-1.2.44 → academic_refchecker-1.2.46}/src/checkers/openreview_checker.py RENAMED Viewed

@@ -425,9 +425,11 @@ class OpenReviewReferenceChecker:
         if cited_title and paper_title:
             similarity = calculate_title_similarity(cited_title, paper_title)
             if similarity < 0.7:  # Using a reasonable threshold
+                from utils.error_utils import format_title_mismatch
+                details = format_title_mismatch(cited_title, paper_title) + f" (similarity: {similarity:.2f})"
                 errors.append({
                     "warning_type": "title",
-                    "warning_details": f"Title mismatch: cited as '{cited_title}' but OpenReview shows '{paper_title}' (similarity: {similarity:.2f})"
+                    "warning_details": details
                 })
         # Check authors
@@ -460,9 +462,10 @@ class OpenReviewReferenceChecker:
                 is_different, year_message = is_year_substantially_different(cited_year_int, paper_year_int)
                 if is_different and year_message:
+                    from utils.error_utils import format_year_mismatch
                     errors.append({
                         "warning_type": "year",
-                        "warning_details": year_message
+                        "warning_details": format_year_mismatch(cited_year_int, paper_year_int)
                     })
             except (ValueError, TypeError):
                 pass  # Skip year validation if conversion fails
@@ -473,10 +476,10 @@ class OpenReviewReferenceChecker:
         if cited_venue and paper_venue:
             if are_venues_substantially_different(cited_venue, paper_venue):
-                from utils.error_utils import clean_venue_for_comparison
+                from utils.error_utils import format_venue_mismatch
                 errors.append({
                     "warning_type": "venue",
-                    "warning_details": f"Venue mismatch: cited as '{clean_venue_for_comparison(cited_venue)}' but OpenReview shows '{clean_venue_for_comparison(paper_venue)}'"
+                    "warning_details": format_venue_mismatch(cited_venue, paper_venue)
                 })
         # Create verified data structure

{academic_refchecker-1.2.44 → academic_refchecker-1.2.46}/src/checkers/semantic_scholar.py RENAMED Viewed

@@ -29,6 +29,7 @@ import logging
 import re
 from typing import Dict, List, Tuple, Optional, Any, Union
 from utils.text_utils import normalize_text, clean_title_basic, find_best_match, is_name_match, are_venues_substantially_different, calculate_title_similarity, compare_authors, clean_title_for_search
+from utils.error_utils import format_title_mismatch
 from config.settings import get_config
 # Set up logging
@@ -471,7 +472,7 @@ class NonArxivReferenceChecker:
         if found_title and title_similarity < SIMILARITY_THRESHOLD:
             errors.append({
                 'error_type': 'title',
-                'error_details': f"Title mismatch: cited as '{title}' but actually '{found_title}'",
+                'error_details': format_title_mismatch(title, found_title),
                 'ref_title_correct': paper_data.get('title', '')
             })
@@ -525,9 +526,10 @@ class NonArxivReferenceChecker:
             is_different, warning_message = is_year_substantially_different(year, paper_year, context)
             if is_different and warning_message:
+                from utils.error_utils import format_year_mismatch
                 errors.append({
                     'warning_type': 'year',
-                    'warning_details': warning_message,
+                    'warning_details': format_year_mismatch(year, paper_year),
                     'ref_year_correct': paper_year
                 })
@@ -541,49 +543,50 @@ class NonArxivReferenceChecker:
         elif paper_venue and not isinstance(paper_venue, str):
             paper_venue = str(paper_venue)
+        # Check venue mismatches
         if cited_venue and paper_venue:
             # Use the utility function to check if venues are substantially different
             if are_venues_substantially_different(cited_venue, paper_venue):
                 from utils.error_utils import create_venue_warning
                 errors.append(create_venue_warning(cited_venue, paper_venue))
         elif not cited_venue and paper_venue:
-            # Check if this is an arXiv paper first
-            external_ids = paper_data.get('externalIds', {})
-            arxiv_id = external_ids.get('ArXiv') if external_ids else None
-            if arxiv_id:
-                # For arXiv papers, suggest including the arXiv URL instead of venue
-                arxiv_url = f"https://arxiv.org/abs/{arxiv_id}"
-                # Check if the reference already includes this ArXiv URL or equivalent DOI
-                reference_url = reference.get('url', '')
-                # Check for direct arXiv URL match
-                has_arxiv_url = arxiv_url in reference_url
-                # Also check for arXiv DOI URL (e.g., https://doi.org/10.48550/arxiv.2505.11595)
-                arxiv_doi_url = f"https://doi.org/10.48550/arxiv.{arxiv_id}"
-                has_arxiv_doi = arxiv_doi_url.lower() in reference_url.lower()
-                if not (has_arxiv_url or has_arxiv_doi):
+            # Original reference has the venue in raw text but not parsed correctly
+            raw_text = reference.get('raw_text', '')
+            if raw_text and '#' in raw_text:
+                # Check if venue might be in the raw text format (author#title#venue#year#url)
+                parts = raw_text.split('#')
+                if len(parts) >= 3 and parts[2].strip():
+                    # Venue is present in raw text but missing from parsed reference
                     errors.append({
                         'warning_type': 'venue',
-                        'warning_details': f"Reference should include arXiv URL: {arxiv_url}",
-                        'ref_url_correct': arxiv_url
+                        'warning_details': f"Venue missing: should include '{paper_venue}'",
+                        'ref_venue_correct': paper_venue
                     })
-            else:
-                # Original reference has the venue in raw text but not parsed correctly
-                raw_text = reference.get('raw_text', '')
-                if raw_text and '#' in raw_text:
-                    # Check if venue might be in the raw text format (author#title#venue#year#url)
-                    parts = raw_text.split('#')
-                    if len(parts) >= 3 and parts[2].strip():
-                        # Venue is present in raw text but missing from parsed reference
-                        errors.append({
-                            'warning_type': 'venue',
-                            'warning_details': f"Venue missing: should include '{paper_venue}'",
-                            'ref_venue_correct': paper_venue
-                        })
+        # Always check for missing arXiv URLs when paper has arXiv ID
+        external_ids = paper_data.get('externalIds', {})
+        arxiv_id = external_ids.get('ArXiv') if external_ids else None
+        if arxiv_id:
+            # For arXiv papers, check if reference includes the arXiv URL
+            arxiv_url = f"https://arxiv.org/abs/{arxiv_id}"
+            # Check if the reference already includes this ArXiv URL or equivalent DOI
+            reference_url = reference.get('url', '')
+            # Check for direct arXiv URL match
+            has_arxiv_url = arxiv_url in reference_url
+            # Also check for arXiv DOI URL (e.g., https://doi.org/10.48550/arxiv.2505.11595)
+            arxiv_doi_url = f"https://doi.org/10.48550/arxiv.{arxiv_id}"
+            has_arxiv_doi = arxiv_doi_url.lower() in reference_url.lower()
+            if not (has_arxiv_url or has_arxiv_doi):
+                errors.append({
+                    'warning_type': 'url',
+                    'warning_details': f"Reference could include arXiv URL: {arxiv_url}",
+                    'ref_url_correct': arxiv_url
+                })
         # Verify DOI
         paper_doi = None
@@ -591,14 +594,13 @@ class NonArxivReferenceChecker:
         if external_ids and 'DOI' in external_ids:
             paper_doi = external_ids['DOI']
-            # Compare DOIs, but strip hash fragments and trailing periods for comparison
-            cited_doi_clean = doi.split('#')[0].rstrip('.') if doi else ''
-            paper_doi_clean = paper_doi.split('#')[0].rstrip('.') if paper_doi else ''
-            if cited_doi_clean and paper_doi_clean and cited_doi_clean.lower() != paper_doi_clean.lower():
+            # Compare DOIs using the proper comparison function
+            from utils.doi_utils import compare_dois
+            if doi and paper_doi and not compare_dois(doi, paper_doi):
+                from utils.error_utils import format_doi_mismatch
                 errors.append({
                     'error_type': 'doi',
-                    'error_details': f"DOI mismatch: cited as {doi} but actually {paper_doi}",
+                    'error_details': format_doi_mismatch(doi, paper_doi),
                     'ref_doi_correct': paper_doi
                 })

{academic_refchecker-1.2.44 → academic_refchecker-1.2.46}/src/checkers/webpage_checker.py RENAMED Viewed

@@ -71,7 +71,8 @@ class WebPageChecker:
         doc_indicators = [
             'docs', 'documentation', 'readthedocs.io', 'help', 'guide', 'tutorial',
             'reference', 'manual', 'wiki', 'blog', 'api', 'developer', 'platform',
-            'index', 'research', 'news', 'insights', 'whitepaper', 'brief', 'develop'
+            'index', 'research', 'news', 'insights', 'whitepaper', 'brief', 'develop',
+            'posts'  # For blog posts and forum posts like LessWrong
         ]
         return any(indicator in url.lower() for indicator in doc_indicators) or self._is_likely_webpage(url)
@@ -84,7 +85,8 @@ class WebPageChecker:
         doc_domains = [
             'pytorch.org', 'tensorflow.org', 'readthedocs.io', 'onnxruntime.ai',
             'deepspeed.ai', 'huggingface.co', 'openai.com', 'microsoft.com',
-            'google.com', 'nvidia.com', 'intel.com', 'langchain.com'
+            'google.com', 'nvidia.com', 'intel.com', 'langchain.com',
+            'lesswrong.com'  # LessWrong rationality and AI safety blog platform
         ]
         return any(domain in parsed.netloc for domain in doc_domains)
@@ -182,9 +184,10 @@ class WebPageChecker:
             # Check title match
             if cited_title and page_title:
                 if not self._check_title_match(cited_title, page_title, page_description):
+                    from utils.error_utils import format_title_mismatch
                     errors.append({
                         "warning_type": "title",
-                        "warning_details": f"Title mismatch: cited as '{cited_title}' but page title is '{page_title}'"
+                        "warning_details": format_title_mismatch(cited_title, page_title)
                     })
             # Check if this is a documentation page for the cited topic
@@ -201,9 +204,13 @@ class WebPageChecker:
             if cited_authors:
                 author_str = ', '.join(cited_authors) if isinstance(cited_authors, list) else str(cited_authors)
                 if not self._check_author_match(author_str, site_info, web_url):
+                    from utils.error_utils import format_three_line_mismatch
+                    left = author_str
+                    right = site_info.get('organization', 'unknown')
+                    details = format_three_line_mismatch("Author/organization mismatch", left, right)
                     errors.append({
                         "warning_type": "author",
-                        "warning_details": f"Author/organization mismatch: cited as '{author_str}' but page is from '{site_info.get('organization', 'unknown')}'"
+                        "warning_details": details
                     })
             logger.debug(f"Web page verification completed for: {web_url}")
@@ -390,6 +397,14 @@ class WebPageChecker:
         organization = site_info.get('organization', '').lower()
         domain = site_info.get('domain', '').lower()
+        # Accept generic web resource terms - these are valid for any web URL
+        generic_web_terms = [
+            'web resource', 'web site', 'website', 'online resource',
+            'online', 'web', 'internet resource', 'web page', 'webpage'
+        ]
+        if cited_lower in generic_web_terms:
+            return True
         # Direct matches
         if cited_lower in organization or organization in cited_lower:
             return True

academic-refchecker 1.2.44__tar.gz → 1.2.46__tar.gz

academic-refchecker 1.2.44tar.gz → 1.2.46tar.gz