openalex-local 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,152 @@
1
+ Metadata-Version: 2.4
2
+ Name: openalex-local
3
+ Version: 0.1.0
4
+ Summary: Local OpenAlex database with 284M+ works, abstracts, and semantic search
5
+ Author-email: Yusuke Watanabe <ywatanabe@alumni.u-tokyo.ac.jp>
6
+ License: AGPL-3.0
7
+ Project-URL: Homepage, https://github.com/ywatanabe1989/openalex-local
8
+ Project-URL: Repository, https://github.com/ywatanabe1989/openalex-local
9
+ Keywords: openalex,academic,research,abstracts,semantic-search
10
+ Classifier: Development Status :: 3 - Alpha
11
+ Classifier: Intended Audience :: Science/Research
12
+ Classifier: License :: OSI Approved :: GNU Affero General Public License v3
13
+ Classifier: Programming Language :: Python :: 3
14
+ Classifier: Programming Language :: Python :: 3.10
15
+ Classifier: Programming Language :: Python :: 3.11
16
+ Classifier: Programming Language :: Python :: 3.12
17
+ Classifier: Topic :: Scientific/Engineering
18
+ Requires-Python: >=3.10
19
+ Description-Content-Type: text/markdown
20
+ Requires-Dist: click>=8.0
21
+ Requires-Dist: awscli>=1.0
22
+ Provides-Extra: dev
23
+ Requires-Dist: pytest>=7.0; extra == "dev"
24
+ Requires-Dist: pytest-cov>=4.0; extra == "dev"
25
+
26
+ # OpenAlex Local
27
+
28
+ Local OpenAlex database with 284M+ scholarly works, abstracts, and semantic search.
29
+
30
+ [![Python](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
31
+ [![License](https://img.shields.io/badge/license-AGPL--3.0-blue.svg)](LICENSE)
32
+
33
+ <details>
34
+ <summary><strong>Why OpenAlex Local?</strong></summary>
35
+
36
+ **Built for the LLM era** - features that matter for AI research assistants:
37
+
38
+ | Feature | Benefit |
39
+ |---------|---------|
40
+ | ๐Ÿ“š **284M Works** | More coverage than CrossRef |
41
+ | ๐Ÿ“ **Abstracts** | ~45-60% availability for semantic search |
42
+ | ๐Ÿท๏ธ **Concepts & Topics** | Built-in classification |
43
+ | ๐Ÿ‘ค **Author Disambiguation** | Linked to institutions |
44
+ | ๐Ÿ”“ **Open Access Info** | OA status and URLs |
45
+
46
+ Perfect for: RAG systems, research assistants, literature review automation.
47
+
48
+ </details>
49
+
50
+ <details>
51
+ <summary><strong>Installation</strong></summary>
52
+
53
+ ```bash
54
+ pip install openalex-local
55
+ ```
56
+
57
+ From source:
58
+ ```bash
59
+ git clone https://github.com/ywatanabe1989/openalex-local
60
+ cd openalex-local && make install
61
+ ```
62
+
63
+ Database setup (~300 GB, ~1-2 days to build):
64
+ ```bash
65
+ # Check system status
66
+ make status
67
+
68
+ # 1. Download OpenAlex Works snapshot (~300GB)
69
+ make download-screen # runs in background
70
+
71
+ # 2. Build SQLite database
72
+ make build-db
73
+
74
+ # 3. Build FTS5 index
75
+ make build-fts
76
+ ```
77
+
78
+ </details>
79
+
80
+ <details>
81
+ <summary><strong>Python API</strong></summary>
82
+
83
+ ```python
84
+ from openalex_local import search, get, count
85
+
86
+ # Full-text search (title + abstract)
87
+ results = search("machine learning neural networks")
88
+ for work in results:
89
+ print(f"{work.title} ({work.year})")
90
+ print(f" Abstract: {work.abstract[:200]}...")
91
+ print(f" Concepts: {[c['name'] for c in work.concepts]}")
92
+
93
+ # Get by OpenAlex ID or DOI
94
+ work = get("W2741809807")
95
+ work = get("10.1038/nature12373")
96
+
97
+ # Count matches
98
+ n = count("CRISPR")
99
+ ```
100
+
101
+ </details>
102
+
103
+ <details>
104
+ <summary><strong>CLI</strong></summary>
105
+
106
+ ```bash
107
+ openalex-local search "CRISPR genome editing" -n 5
108
+ openalex-local get W2741809807
109
+ openalex-local get 10.1038/nature12373
110
+ openalex-local count "machine learning"
111
+ ```
112
+
113
+ </details>
114
+
115
+ <details>
116
+ <summary><strong>Related Projects</strong></summary>
117
+
118
+ **[crossref-local](https://github.com/ywatanabe1989/crossref-local)** - Sister project with CrossRef data:
119
+
120
+ | Feature | crossref-local | openalex-local |
121
+ |---------|----------------|----------------|
122
+ | Works | 167M | 284M |
123
+ | Abstracts | ~21% | ~45-60% |
124
+ | Update frequency | Real-time | Monthly |
125
+ | DOI authority | โœ“ (source) | Uses CrossRef |
126
+ | Citations | Raw references | Linked works |
127
+ | Concepts/Topics | โŒ | โœ“ |
128
+ | Author IDs | โŒ | โœ“ |
129
+ | Best for | DOI lookup, raw refs | Semantic search |
130
+
131
+ **When to use CrossRef**: Real-time DOI updates, raw reference parsing, authoritative metadata.
132
+ **When to use OpenAlex**: Semantic search, citation analysis, topic discovery.
133
+
134
+ </details>
135
+
136
+ <details>
137
+ <summary><strong>Data Source</strong></summary>
138
+
139
+ Data from [OpenAlex](https://openalex.org/), an open catalog of scholarly works.
140
+ Updated monthly from their [snapshot](https://docs.openalex.org/download-all-data/openalex-snapshot).
141
+
142
+ </details>
143
+
144
+ ---
145
+
146
+ <p align="center">
147
+ <a href="https://scitex.ai"><img src="docs/scitex-icon-navy-inverted.png" alt="SciTeX" width="40"/></a>
148
+ <br>
149
+ AGPL-3.0 ยท ywatanabe@scitex.ai
150
+ </p>
151
+
152
+ <!-- EOF -->
@@ -0,0 +1,127 @@
1
+ # OpenAlex Local
2
+
3
+ Local OpenAlex database with 284M+ scholarly works, abstracts, and semantic search.
4
+
5
+ [![Python](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
6
+ [![License](https://img.shields.io/badge/license-AGPL--3.0-blue.svg)](LICENSE)
7
+
8
+ <details>
9
+ <summary><strong>Why OpenAlex Local?</strong></summary>
10
+
11
+ **Built for the LLM era** - features that matter for AI research assistants:
12
+
13
+ | Feature | Benefit |
14
+ |---------|---------|
15
+ | ๐Ÿ“š **284M Works** | More coverage than CrossRef |
16
+ | ๐Ÿ“ **Abstracts** | ~45-60% availability for semantic search |
17
+ | ๐Ÿท๏ธ **Concepts & Topics** | Built-in classification |
18
+ | ๐Ÿ‘ค **Author Disambiguation** | Linked to institutions |
19
+ | ๐Ÿ”“ **Open Access Info** | OA status and URLs |
20
+
21
+ Perfect for: RAG systems, research assistants, literature review automation.
22
+
23
+ </details>
24
+
25
+ <details>
26
+ <summary><strong>Installation</strong></summary>
27
+
28
+ ```bash
29
+ pip install openalex-local
30
+ ```
31
+
32
+ From source:
33
+ ```bash
34
+ git clone https://github.com/ywatanabe1989/openalex-local
35
+ cd openalex-local && make install
36
+ ```
37
+
38
+ Database setup (~300 GB, ~1-2 days to build):
39
+ ```bash
40
+ # Check system status
41
+ make status
42
+
43
+ # 1. Download OpenAlex Works snapshot (~300GB)
44
+ make download-screen # runs in background
45
+
46
+ # 2. Build SQLite database
47
+ make build-db
48
+
49
+ # 3. Build FTS5 index
50
+ make build-fts
51
+ ```
52
+
53
+ </details>
54
+
55
+ <details>
56
+ <summary><strong>Python API</strong></summary>
57
+
58
+ ```python
59
+ from openalex_local import search, get, count
60
+
61
+ # Full-text search (title + abstract)
62
+ results = search("machine learning neural networks")
63
+ for work in results:
64
+ print(f"{work.title} ({work.year})")
65
+ print(f" Abstract: {work.abstract[:200]}...")
66
+ print(f" Concepts: {[c['name'] for c in work.concepts]}")
67
+
68
+ # Get by OpenAlex ID or DOI
69
+ work = get("W2741809807")
70
+ work = get("10.1038/nature12373")
71
+
72
+ # Count matches
73
+ n = count("CRISPR")
74
+ ```
75
+
76
+ </details>
77
+
78
+ <details>
79
+ <summary><strong>CLI</strong></summary>
80
+
81
+ ```bash
82
+ openalex-local search "CRISPR genome editing" -n 5
83
+ openalex-local get W2741809807
84
+ openalex-local get 10.1038/nature12373
85
+ openalex-local count "machine learning"
86
+ ```
87
+
88
+ </details>
89
+
90
+ <details>
91
+ <summary><strong>Related Projects</strong></summary>
92
+
93
+ **[crossref-local](https://github.com/ywatanabe1989/crossref-local)** - Sister project with CrossRef data:
94
+
95
+ | Feature | crossref-local | openalex-local |
96
+ |---------|----------------|----------------|
97
+ | Works | 167M | 284M |
98
+ | Abstracts | ~21% | ~45-60% |
99
+ | Update frequency | Real-time | Monthly |
100
+ | DOI authority | โœ“ (source) | Uses CrossRef |
101
+ | Citations | Raw references | Linked works |
102
+ | Concepts/Topics | โŒ | โœ“ |
103
+ | Author IDs | โŒ | โœ“ |
104
+ | Best for | DOI lookup, raw refs | Semantic search |
105
+
106
+ **When to use CrossRef**: Real-time DOI updates, raw reference parsing, authoritative metadata.
107
+ **When to use OpenAlex**: Semantic search, citation analysis, topic discovery.
108
+
109
+ </details>
110
+
111
+ <details>
112
+ <summary><strong>Data Source</strong></summary>
113
+
114
+ Data from [OpenAlex](https://openalex.org/), an open catalog of scholarly works.
115
+ Updated monthly from their [snapshot](https://docs.openalex.org/download-all-data/openalex-snapshot).
116
+
117
+ </details>
118
+
119
+ ---
120
+
121
+ <p align="center">
122
+ <a href="https://scitex.ai"><img src="docs/scitex-icon-navy-inverted.png" alt="SciTeX" width="40"/></a>
123
+ <br>
124
+ AGPL-3.0 ยท ywatanabe@scitex.ai
125
+ </p>
126
+
127
+ <!-- EOF -->
@@ -0,0 +1,49 @@
1
+ [build-system]
2
+ requires = ["setuptools>=61.0", "wheel"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "openalex-local"
7
+ version = "0.1.0"
8
+ description = "Local OpenAlex database with 284M+ works, abstracts, and semantic search"
9
+ readme = "README.md"
10
+ license = {text = "AGPL-3.0"}
11
+ authors = [
12
+ {name = "Yusuke Watanabe", email = "ywatanabe@alumni.u-tokyo.ac.jp"}
13
+ ]
14
+ requires-python = ">=3.10"
15
+ classifiers = [
16
+ "Development Status :: 3 - Alpha",
17
+ "Intended Audience :: Science/Research",
18
+ "License :: OSI Approved :: GNU Affero General Public License v3",
19
+ "Programming Language :: Python :: 3",
20
+ "Programming Language :: Python :: 3.10",
21
+ "Programming Language :: Python :: 3.11",
22
+ "Programming Language :: Python :: 3.12",
23
+ "Topic :: Scientific/Engineering",
24
+ ]
25
+ keywords = ["openalex", "academic", "research", "abstracts", "semantic-search"]
26
+ dependencies = [
27
+ "click>=8.0",
28
+ "awscli>=1.0",
29
+ ]
30
+
31
+ [project.optional-dependencies]
32
+ dev = [
33
+ "pytest>=7.0",
34
+ "pytest-cov>=4.0",
35
+ ]
36
+
37
+ [project.scripts]
38
+ openalex-local = "openalex_local.cli:main"
39
+
40
+ [project.urls]
41
+ Homepage = "https://github.com/ywatanabe1989/openalex-local"
42
+ Repository = "https://github.com/ywatanabe1989/openalex-local"
43
+
44
+ [tool.setuptools.packages.find]
45
+ where = ["src"]
46
+
47
+ [tool.pytest.ini_options]
48
+ testpaths = ["tests"]
49
+ python_files = ["test_*.py"]
@@ -0,0 +1,4 @@
1
+ [egg_info]
2
+ tag_build =
3
+ tag_date = 0
4
+
@@ -0,0 +1,14 @@
1
+ """
2
+ OpenAlex Local - Local OpenAlex database with 284M+ works and semantic search.
3
+
4
+ Example:
5
+ >>> from openalex_local import search, get
6
+ >>> results = search("machine learning neural networks")
7
+ >>> work = get("W2741809807") # OpenAlex ID
8
+ >>> work = get("10.1038/nature12373") # or DOI
9
+ """
10
+
11
+ __version__ = "0.1.0"
12
+
13
+ # API will be exposed here after implementation
14
+ # from .api import search, get, count, info
@@ -0,0 +1,73 @@
1
+ """Configuration for openalex_local."""
2
+
3
+ import os
4
+ from pathlib import Path
5
+ from typing import Optional
6
+
7
+ # Default database locations (checked in order)
8
+ DEFAULT_DB_PATHS = [
9
+ Path("/home/ywatanabe/proj/openalex-local/data/openalex.db"),
10
+ Path("/home/ywatanabe/proj/openalex_local/data/openalex.db"),
11
+ Path("/mnt/nas_ug/openalex_local/data/openalex.db"),
12
+ Path.home() / ".openalex_local" / "openalex.db",
13
+ Path.cwd() / "data" / "openalex.db",
14
+ ]
15
+
16
+
17
+ def get_db_path() -> Path:
18
+ """
19
+ Get database path from environment or auto-detect.
20
+
21
+ Priority:
22
+ 1. OPENALEX_LOCAL_DB environment variable
23
+ 2. First existing path from DEFAULT_DB_PATHS
24
+
25
+ Returns:
26
+ Path to the database file
27
+
28
+ Raises:
29
+ FileNotFoundError: If no database found
30
+ """
31
+ # Check environment variable first
32
+ env_path = os.environ.get("OPENALEX_LOCAL_DB")
33
+ if env_path:
34
+ path = Path(env_path)
35
+ if path.exists():
36
+ return path
37
+ raise FileNotFoundError(f"OPENALEX_LOCAL_DB path not found: {env_path}")
38
+
39
+ # Auto-detect from default locations
40
+ for path in DEFAULT_DB_PATHS:
41
+ if path.exists():
42
+ return path
43
+
44
+ raise FileNotFoundError(
45
+ "OpenAlex database not found. Set OPENALEX_LOCAL_DB environment variable "
46
+ f"or place database at one of: {[str(p) for p in DEFAULT_DB_PATHS]}"
47
+ )
48
+
49
+
50
+ class Config:
51
+ """Configuration container."""
52
+
53
+ _db_path: Optional[Path] = None
54
+
55
+ @classmethod
56
+ def get_db_path(cls) -> Path:
57
+ """Get or auto-detect database path."""
58
+ if cls._db_path is None:
59
+ cls._db_path = get_db_path()
60
+ return cls._db_path
61
+
62
+ @classmethod
63
+ def set_db_path(cls, path: str | Path) -> None:
64
+ """Set database path explicitly."""
65
+ path = Path(path)
66
+ if not path.exists():
67
+ raise FileNotFoundError(f"Database not found: {path}")
68
+ cls._db_path = path
69
+
70
+ @classmethod
71
+ def reset(cls) -> None:
72
+ """Reset configuration (for testing)."""
73
+ cls._db_path = None
@@ -0,0 +1,187 @@
1
+ """Data models for openalex_local."""
2
+
3
+ from dataclasses import dataclass, field
4
+ from typing import List, Optional, Dict, Any
5
+
6
+
7
+ @dataclass
8
+ class Work:
9
+ """
10
+ Represents a scholarly work from OpenAlex.
11
+
12
+ Attributes:
13
+ openalex_id: OpenAlex ID (e.g., W2741809807)
14
+ doi: Digital Object Identifier
15
+ title: Work title
16
+ abstract: Abstract text (reconstructed from inverted index)
17
+ authors: List of author names
18
+ year: Publication year
19
+ source: Journal/venue name
20
+ issn: Journal ISSN
21
+ volume: Volume number
22
+ issue: Issue number
23
+ pages: Page range
24
+ publisher: Publisher name
25
+ type: Work type (journal-article, book-chapter, etc.)
26
+ concepts: List of OpenAlex concepts
27
+ topics: List of OpenAlex topics
28
+ cited_by_count: Number of citations
29
+ referenced_works: List of referenced OpenAlex IDs
30
+ is_oa: Is open access
31
+ oa_url: Open access URL
32
+ """
33
+
34
+ openalex_id: str
35
+ doi: Optional[str] = None
36
+ title: Optional[str] = None
37
+ abstract: Optional[str] = None
38
+ authors: List[str] = field(default_factory=list)
39
+ year: Optional[int] = None
40
+ source: Optional[str] = None
41
+ issn: Optional[str] = None
42
+ volume: Optional[str] = None
43
+ issue: Optional[str] = None
44
+ pages: Optional[str] = None
45
+ publisher: Optional[str] = None
46
+ type: Optional[str] = None
47
+ concepts: List[Dict[str, Any]] = field(default_factory=list)
48
+ topics: List[Dict[str, Any]] = field(default_factory=list)
49
+ cited_by_count: Optional[int] = None
50
+ referenced_works: List[str] = field(default_factory=list)
51
+ is_oa: bool = False
52
+ oa_url: Optional[str] = None
53
+
54
+ @classmethod
55
+ def from_openalex(cls, data: dict) -> "Work":
56
+ """
57
+ Create Work from OpenAlex API/snapshot JSON.
58
+
59
+ Args:
60
+ data: OpenAlex work dictionary
61
+
62
+ Returns:
63
+ Work instance
64
+ """
65
+ # Extract OpenAlex ID
66
+ openalex_id = data.get("id", "").replace("https://openalex.org/", "")
67
+
68
+ # Extract DOI
69
+ doi = data.get("doi", "").replace("https://doi.org/", "") if data.get("doi") else None
70
+
71
+ # Extract authors
72
+ authors = []
73
+ for authorship in data.get("authorships", []):
74
+ author = authorship.get("author", {})
75
+ name = author.get("display_name")
76
+ if name:
77
+ authors.append(name)
78
+
79
+ # Reconstruct abstract from inverted index
80
+ abstract = None
81
+ inv_index = data.get("abstract_inverted_index")
82
+ if inv_index:
83
+ words = sorted(
84
+ [(pos, word) for word, positions in inv_index.items() for pos in positions]
85
+ )
86
+ abstract = " ".join(word for _, word in words)
87
+
88
+ # Extract source info
89
+ primary_location = data.get("primary_location") or {}
90
+ source_info = primary_location.get("source") or {}
91
+ source = source_info.get("display_name")
92
+ issns = source_info.get("issn") or []
93
+ issn = issns[0] if issns else None
94
+
95
+ # Extract biblio
96
+ biblio = data.get("biblio") or {}
97
+
98
+ # Extract concepts (top 5)
99
+ concepts = [
100
+ {"name": c.get("display_name"), "score": c.get("score")}
101
+ for c in (data.get("concepts") or [])[:5]
102
+ ]
103
+
104
+ # Extract topics (top 3)
105
+ topics = [
106
+ {"name": t.get("display_name"), "subfield": t.get("subfield", {}).get("display_name")}
107
+ for t in (data.get("topics") or [])[:3]
108
+ ]
109
+
110
+ # Extract OA info
111
+ oa_info = data.get("open_access") or {}
112
+
113
+ return cls(
114
+ openalex_id=openalex_id,
115
+ doi=doi,
116
+ title=data.get("title") or data.get("display_name"),
117
+ abstract=abstract,
118
+ authors=authors,
119
+ year=data.get("publication_year"),
120
+ source=source,
121
+ issn=issn,
122
+ volume=biblio.get("volume"),
123
+ issue=biblio.get("issue"),
124
+ pages=biblio.get("first_page"),
125
+ publisher=source_info.get("host_organization_name"),
126
+ type=data.get("type"),
127
+ concepts=concepts,
128
+ topics=topics,
129
+ cited_by_count=data.get("cited_by_count"),
130
+ referenced_works=[
131
+ r.replace("https://openalex.org/", "")
132
+ for r in (data.get("referenced_works") or [])
133
+ ],
134
+ is_oa=oa_info.get("is_oa", False),
135
+ oa_url=oa_info.get("oa_url"),
136
+ )
137
+
138
+ def to_dict(self) -> dict:
139
+ """Convert to dictionary."""
140
+ return {
141
+ "openalex_id": self.openalex_id,
142
+ "doi": self.doi,
143
+ "title": self.title,
144
+ "abstract": self.abstract,
145
+ "authors": self.authors,
146
+ "year": self.year,
147
+ "source": self.source,
148
+ "issn": self.issn,
149
+ "volume": self.volume,
150
+ "issue": self.issue,
151
+ "pages": self.pages,
152
+ "publisher": self.publisher,
153
+ "type": self.type,
154
+ "concepts": self.concepts,
155
+ "topics": self.topics,
156
+ "cited_by_count": self.cited_by_count,
157
+ "referenced_works": self.referenced_works,
158
+ "is_oa": self.is_oa,
159
+ "oa_url": self.oa_url,
160
+ }
161
+
162
+
163
+ @dataclass
164
+ class SearchResult:
165
+ """
166
+ Container for search results with metadata.
167
+
168
+ Attributes:
169
+ works: List of Work objects
170
+ total: Total number of matches
171
+ query: Original search query
172
+ elapsed_ms: Search time in milliseconds
173
+ """
174
+
175
+ works: List[Work]
176
+ total: int
177
+ query: str
178
+ elapsed_ms: float
179
+
180
+ def __len__(self) -> int:
181
+ return len(self.works)
182
+
183
+ def __iter__(self):
184
+ return iter(self.works)
185
+
186
+ def __getitem__(self, idx):
187
+ return self.works[idx]
@@ -0,0 +1,152 @@
1
+ Metadata-Version: 2.4
2
+ Name: openalex-local
3
+ Version: 0.1.0
4
+ Summary: Local OpenAlex database with 284M+ works, abstracts, and semantic search
5
+ Author-email: Yusuke Watanabe <ywatanabe@alumni.u-tokyo.ac.jp>
6
+ License: AGPL-3.0
7
+ Project-URL: Homepage, https://github.com/ywatanabe1989/openalex-local
8
+ Project-URL: Repository, https://github.com/ywatanabe1989/openalex-local
9
+ Keywords: openalex,academic,research,abstracts,semantic-search
10
+ Classifier: Development Status :: 3 - Alpha
11
+ Classifier: Intended Audience :: Science/Research
12
+ Classifier: License :: OSI Approved :: GNU Affero General Public License v3
13
+ Classifier: Programming Language :: Python :: 3
14
+ Classifier: Programming Language :: Python :: 3.10
15
+ Classifier: Programming Language :: Python :: 3.11
16
+ Classifier: Programming Language :: Python :: 3.12
17
+ Classifier: Topic :: Scientific/Engineering
18
+ Requires-Python: >=3.10
19
+ Description-Content-Type: text/markdown
20
+ Requires-Dist: click>=8.0
21
+ Requires-Dist: awscli>=1.0
22
+ Provides-Extra: dev
23
+ Requires-Dist: pytest>=7.0; extra == "dev"
24
+ Requires-Dist: pytest-cov>=4.0; extra == "dev"
25
+
26
+ # OpenAlex Local
27
+
28
+ Local OpenAlex database with 284M+ scholarly works, abstracts, and semantic search.
29
+
30
+ [![Python](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
31
+ [![License](https://img.shields.io/badge/license-AGPL--3.0-blue.svg)](LICENSE)
32
+
33
+ <details>
34
+ <summary><strong>Why OpenAlex Local?</strong></summary>
35
+
36
+ **Built for the LLM era** - features that matter for AI research assistants:
37
+
38
+ | Feature | Benefit |
39
+ |---------|---------|
40
+ | ๐Ÿ“š **284M Works** | More coverage than CrossRef |
41
+ | ๐Ÿ“ **Abstracts** | ~45-60% availability for semantic search |
42
+ | ๐Ÿท๏ธ **Concepts & Topics** | Built-in classification |
43
+ | ๐Ÿ‘ค **Author Disambiguation** | Linked to institutions |
44
+ | ๐Ÿ”“ **Open Access Info** | OA status and URLs |
45
+
46
+ Perfect for: RAG systems, research assistants, literature review automation.
47
+
48
+ </details>
49
+
50
+ <details>
51
+ <summary><strong>Installation</strong></summary>
52
+
53
+ ```bash
54
+ pip install openalex-local
55
+ ```
56
+
57
+ From source:
58
+ ```bash
59
+ git clone https://github.com/ywatanabe1989/openalex-local
60
+ cd openalex-local && make install
61
+ ```
62
+
63
+ Database setup (~300 GB, ~1-2 days to build):
64
+ ```bash
65
+ # Check system status
66
+ make status
67
+
68
+ # 1. Download OpenAlex Works snapshot (~300GB)
69
+ make download-screen # runs in background
70
+
71
+ # 2. Build SQLite database
72
+ make build-db
73
+
74
+ # 3. Build FTS5 index
75
+ make build-fts
76
+ ```
77
+
78
+ </details>
79
+
80
+ <details>
81
+ <summary><strong>Python API</strong></summary>
82
+
83
+ ```python
84
+ from openalex_local import search, get, count
85
+
86
+ # Full-text search (title + abstract)
87
+ results = search("machine learning neural networks")
88
+ for work in results:
89
+ print(f"{work.title} ({work.year})")
90
+ print(f" Abstract: {work.abstract[:200]}...")
91
+ print(f" Concepts: {[c['name'] for c in work.concepts]}")
92
+
93
+ # Get by OpenAlex ID or DOI
94
+ work = get("W2741809807")
95
+ work = get("10.1038/nature12373")
96
+
97
+ # Count matches
98
+ n = count("CRISPR")
99
+ ```
100
+
101
+ </details>
102
+
103
+ <details>
104
+ <summary><strong>CLI</strong></summary>
105
+
106
+ ```bash
107
+ openalex-local search "CRISPR genome editing" -n 5
108
+ openalex-local get W2741809807
109
+ openalex-local get 10.1038/nature12373
110
+ openalex-local count "machine learning"
111
+ ```
112
+
113
+ </details>
114
+
115
+ <details>
116
+ <summary><strong>Related Projects</strong></summary>
117
+
118
+ **[crossref-local](https://github.com/ywatanabe1989/crossref-local)** - Sister project with CrossRef data:
119
+
120
+ | Feature | crossref-local | openalex-local |
121
+ |---------|----------------|----------------|
122
+ | Works | 167M | 284M |
123
+ | Abstracts | ~21% | ~45-60% |
124
+ | Update frequency | Real-time | Monthly |
125
+ | DOI authority | โœ“ (source) | Uses CrossRef |
126
+ | Citations | Raw references | Linked works |
127
+ | Concepts/Topics | โŒ | โœ“ |
128
+ | Author IDs | โŒ | โœ“ |
129
+ | Best for | DOI lookup, raw refs | Semantic search |
130
+
131
+ **When to use CrossRef**: Real-time DOI updates, raw reference parsing, authoritative metadata.
132
+ **When to use OpenAlex**: Semantic search, citation analysis, topic discovery.
133
+
134
+ </details>
135
+
136
+ <details>
137
+ <summary><strong>Data Source</strong></summary>
138
+
139
+ Data from [OpenAlex](https://openalex.org/), an open catalog of scholarly works.
140
+ Updated monthly from their [snapshot](https://docs.openalex.org/download-all-data/openalex-snapshot).
141
+
142
+ </details>
143
+
144
+ ---
145
+
146
+ <p align="center">
147
+ <a href="https://scitex.ai"><img src="docs/scitex-icon-navy-inverted.png" alt="SciTeX" width="40"/></a>
148
+ <br>
149
+ AGPL-3.0 ยท ywatanabe@scitex.ai
150
+ </p>
151
+
152
+ <!-- EOF -->
@@ -0,0 +1,11 @@
1
+ README.md
2
+ pyproject.toml
3
+ src/openalex_local/__init__.py
4
+ src/openalex_local/config.py
5
+ src/openalex_local/models.py
6
+ src/openalex_local.egg-info/PKG-INFO
7
+ src/openalex_local.egg-info/SOURCES.txt
8
+ src/openalex_local.egg-info/dependency_links.txt
9
+ src/openalex_local.egg-info/entry_points.txt
10
+ src/openalex_local.egg-info/requires.txt
11
+ src/openalex_local.egg-info/top_level.txt
@@ -0,0 +1,2 @@
1
+ [console_scripts]
2
+ openalex-local = openalex_local.cli:main
@@ -0,0 +1,6 @@
1
+ click>=8.0
2
+ awscli>=1.0
3
+
4
+ [dev]
5
+ pytest>=7.0
6
+ pytest-cov>=4.0
@@ -0,0 +1 @@
1
+ openalex_local