PyPI - instapaper-scraper - Versions diffs - 1.1.1__tar.gz → 1.2.0rc1__tar.gz - Mend

instapaper-scraper 1.1.1tar.gz → 1.2.0rc1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (25) hide show

{instapaper_scraper-1.1.1/src/instapaper_scraper.egg-info → instapaper_scraper-1.2.0rc1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: instapaper-scraper
-Version: 1.1.1
+Version: 1.2.0rc1
 Summary: A tool to scrape articles from Instapaper.
 Project-URL: Homepage, https://github.com/chriskyfung/InstapaperScraper
 Project-URL: Source, https://github.com/chriskyfung/InstapaperScraper
@@ -21,7 +21,7 @@ Requires-Python: >=3.9
 Description-Content-Type: text/markdown
 License-File: LICENSE
 Requires-Dist: beautifulsoup4~=4.14.2
-Requires-Dist: certifi~=2025.11.12
+Requires-Dist: certifi<2026.2.0,>=2025.11.12
 Requires-Dist: charset-normalizer~=3.4.3
 Requires-Dist: cryptography~=46.0.3
 Requires-Dist: guara~=0.0.14
@@ -141,9 +141,9 @@ The script authenticates using one of the following methods, in order of priorit
 > **Note on Security:** Your session file (`.instapaper_session`) and the encryption key (`.session_key`) are stored with secure permissions (read/write for the owner only) to protect your credentials.
-### 📁 Folder Configuration
+### 📁 Folder and Field Configuration
-You can define and quickly access your Instapaper folders using a `config.toml` file. The scraper will look for this file in the following locations (in order of precedence):
+You can define and quickly access your Instapaper folders and set default output fields using a `config.toml` file. The scraper will look for this file in the following locations (in order of precedence):
 1. The path specified by the `--config-path` argument.
 2. `config.toml` in the current working directory.
@@ -155,6 +155,12 @@ Here is an example of `config.toml`:
 # Default output filename for non-folder mode
 output_filename = "home-articles.csv"
+# Optional fields to include in the output.
+# These can be overridden by command-line flags.
+[fields]
+read_url = false
+article_preview = false
 [[folders]]
 key = "ml"
 id = "1234567"
@@ -169,10 +175,14 @@ output_filename = "python-articles.db"
 ```
 - **output_filename (top-level)**: The default output filename to use when not in folder mode.
-- **key**: A short alias for the folder.
-- **id**: The folder ID from the Instapaper URL.
-- **slug**: The human-readable part of the folder URL.
-- **output_filename (folder-specific)**: A preset output filename for scraped articles from this specific folder.
+- **[fields]**: A section to control which optional data fields are included in the output.
+    -   `read_url`: Set to `true` to include the Instapaper read URL for each article.
+    -   `article_preview`: Set to `true` to include the article's text preview.
+- **[[folders]]**: Each `[[folders]]` block defines a specific folder.
+    -   **key**: A short alias for the folder.
+    -   **id**: The folder ID from the Instapaper URL.
+    -   **slug**: The human-readable part of the folder URL.
+    -   **output_filename (folder-specific)**: A preset output filename for scraped articles from this specific folder.
 When a `config.toml` file is present and no `--folder` argument is provided, the scraper will prompt you to select a folder. You can also specify a folder directly using the `--folder` argument with its key, ID, or slug. Use `--folder=none` to explicitly disable folder mode and scrape all articles.
@@ -186,7 +196,8 @@ When a `config.toml` file is present and no `--folder` argument is provided, the
 | `--output <filename>` | Specify a custom output filename. The file extension will be automatically corrected to match the selected format. |
 | `--username <user>` | Your Instapaper account username. |
 | `--password <pass>` | Your Instapaper account password. |
-| `--add-instapaper-url` | Adds a `instapaper_url` column to the output, containing a full, clickable URL for each article. |
+| `--[no-]read-url` | Includes the Instapaper read URL. (Old flag `--add-instapaper-url` is deprecated but supported). Can be set in `config.toml`. Overrides config. |
+| `--[no-]article-preview` | Includes the article preview text. (Old flag `--add-article-preview` is deprecated but supported). Can be set in `config.toml`. Overrides config. |
 ### 📄 Output Formats
@@ -204,10 +215,10 @@ When using `--output <filename>`, the file extension is automatically corrected
 The output data includes a unique `id` for each article. You can use this ID to construct a URL to the article's reader view: `https://www.instapaper.com/read/<article_id>`.
-For convenience, you can use the `--add-instapaper-url` flag to have the script include a full, clickable URL in the output.
+For convenience, you can use the `--read-url` flag to have the script include a full, clickable URL in the output.
 ```sh
-instapaper-scraper --add-instapaper-url
+instapaper-scraper --read-url
 ```
 This adds a `instapaper_url` field to each article in the JSON output and a `instapaper_url` column in the CSV and SQLite outputs. The original `id` field is preserved.
@@ -223,15 +234,15 @@ The tool is designed with a modular architecture for reliability and maintainabi
 ## 📊 Example Output
-### 📄 CSV (`output/bookmarks.csv`) (with --add-instapaper-url)
+### 📄 CSV (`output/bookmarks.csv`) (with --add-instapaper-url and --add-article-preview)
 ```csv
-"id","instapaper_url","title","url"
-"999901234","https://www.instapaper.com/read/999901234","Article 1","https://www.example.com/page-1/"
-"999002345","https://www.instapaper.com/read/999002345","Article 2","https://www.example.com/page-2/"
+"id","instapaper_url","title","url","article_preview"
+"999901234","https://www.instapaper.com/read/999901234","Article 1","https://www.example.com/page-1/","This is a preview of article 1."
+"999002345","https://www.instapaper.com/read/999002345","Article 2","https://www.example.com/page-2/","This is a preview of article 2."
 ```
-### 📄 JSON (`output/bookmarks.json`) (with --add-instapaper-url)
+### 📄 JSON (`output/bookmarks.json`) (with --add-instapaper-url and --add-article-preview)
 ```json
 [
@@ -239,13 +250,15 @@ The tool is designed with a modular architecture for reliability and maintainabi
         "id": "999901234",
         "title": "Article 1",
         "url": "https://www.example.com/page-1/",
-        "instapaper_url": "https://www.instapaper.com/read/999901234"
+        "instapaper_url": "https://www.instapaper.com/read/999901234",
+        "article_preview": "This is a preview of article 1."
     },
     {
         "id": "999002345",
         "title": "Article 2",
         "url": "https://www.example.com/page-2/",
-        "instapaper_url": "https://www.instapaper.com/read/999002345"
+        "instapaper_url": "https://www.instapaper.com/read/999002345",
+        "article_preview": "This is a preview of article 2."
     }
 ]
 ```
@@ -274,7 +287,18 @@ Please read the **[Contribution Guidelines](CONTRIBUTING.md)** before you start.
 ## 🧑‍💻 Development & Testing
-This project uses `pytest` for testing, `ruff` for code formatting and linting, and `mypy` for static type checking.
+This project uses `pytest` for testing, `ruff` for code formatting and linting, and `mypy` for static type checking. A `Makefile` is provided to simplify common development tasks.
+### 🚀 Using the Makefile
+The most common commands are:
+-   `make install`: Installs development dependencies.
+-   `make format`: Formats the entire codebase.
+-   `make check`: Runs the linter, type checker, and test suite.
+-   `make test`: Runs the test suite.
+-   `make build`: Builds the distributable packages.
+Run `make help` to see all available commands.
 ### 🔧 Setup
@@ -300,13 +324,13 @@ python -m src.instapaper_scraper.cli
 ### ✅ Testing
-To run the tests, execute the following command from the project root:
+To run the tests, execute the following command from the project root (or use `make test`):
 ```sh
 pytest
 ```
-To check test coverage:
+To check test coverage (or use `make test-cov`):
 ```sh
 pytest --cov=src/instapaper_scraper --cov-report=term-missing
@@ -314,6 +338,8 @@ pytest --cov=src/instapaper_scraper --cov-report=term-missing
 ### ✨ Code Quality
+You can use the `Makefile` for convenience (e.g., `make format`, `make lint`).
 To format the code with `ruff`:
 ```sh
@@ -326,12 +352,6 @@ To check for linting errors with `ruff`:
 ruff check .
 ```
-To automatically fix linting errors:
-```sh
-ruff check . --fix
-```
 To run static type checking with `mypy`:
 ```sh
@@ -341,7 +361,7 @@ mypy src
 To run license checks:
 ```sh
-licensecheck --show-only-failing
+licensecheck --zero
 ```

{instapaper_scraper-1.1.1 → instapaper_scraper-1.2.0rc1}/README.md RENAMED Viewed

@@ -93,9 +93,9 @@ The script authenticates using one of the following methods, in order of priorit
 > **Note on Security:** Your session file (`.instapaper_session`) and the encryption key (`.session_key`) are stored with secure permissions (read/write for the owner only) to protect your credentials.
-### 📁 Folder Configuration
+### 📁 Folder and Field Configuration
-You can define and quickly access your Instapaper folders using a `config.toml` file. The scraper will look for this file in the following locations (in order of precedence):
+You can define and quickly access your Instapaper folders and set default output fields using a `config.toml` file. The scraper will look for this file in the following locations (in order of precedence):
 1. The path specified by the `--config-path` argument.
 2. `config.toml` in the current working directory.
@@ -107,6 +107,12 @@ Here is an example of `config.toml`:
 # Default output filename for non-folder mode
 output_filename = "home-articles.csv"
+# Optional fields to include in the output.
+# These can be overridden by command-line flags.
+[fields]
+read_url = false
+article_preview = false
 [[folders]]
 key = "ml"
 id = "1234567"
@@ -121,10 +127,14 @@ output_filename = "python-articles.db"
 ```
 - **output_filename (top-level)**: The default output filename to use when not in folder mode.
-- **key**: A short alias for the folder.
-- **id**: The folder ID from the Instapaper URL.
-- **slug**: The human-readable part of the folder URL.
-- **output_filename (folder-specific)**: A preset output filename for scraped articles from this specific folder.
+- **[fields]**: A section to control which optional data fields are included in the output.
+    -   `read_url`: Set to `true` to include the Instapaper read URL for each article.
+    -   `article_preview`: Set to `true` to include the article's text preview.
+- **[[folders]]**: Each `[[folders]]` block defines a specific folder.
+    -   **key**: A short alias for the folder.
+    -   **id**: The folder ID from the Instapaper URL.
+    -   **slug**: The human-readable part of the folder URL.
+    -   **output_filename (folder-specific)**: A preset output filename for scraped articles from this specific folder.
 When a `config.toml` file is present and no `--folder` argument is provided, the scraper will prompt you to select a folder. You can also specify a folder directly using the `--folder` argument with its key, ID, or slug. Use `--folder=none` to explicitly disable folder mode and scrape all articles.
@@ -138,7 +148,8 @@ When a `config.toml` file is present and no `--folder` argument is provided, the
 | `--output <filename>` | Specify a custom output filename. The file extension will be automatically corrected to match the selected format. |
 | `--username <user>` | Your Instapaper account username. |
 | `--password <pass>` | Your Instapaper account password. |
-| `--add-instapaper-url` | Adds a `instapaper_url` column to the output, containing a full, clickable URL for each article. |
+| `--[no-]read-url` | Includes the Instapaper read URL. (Old flag `--add-instapaper-url` is deprecated but supported). Can be set in `config.toml`. Overrides config. |
+| `--[no-]article-preview` | Includes the article preview text. (Old flag `--add-article-preview` is deprecated but supported). Can be set in `config.toml`. Overrides config. |
 ### 📄 Output Formats
@@ -156,10 +167,10 @@ When using `--output <filename>`, the file extension is automatically corrected
 The output data includes a unique `id` for each article. You can use this ID to construct a URL to the article's reader view: `https://www.instapaper.com/read/<article_id>`.
-For convenience, you can use the `--add-instapaper-url` flag to have the script include a full, clickable URL in the output.
+For convenience, you can use the `--read-url` flag to have the script include a full, clickable URL in the output.
 ```sh
-instapaper-scraper --add-instapaper-url
+instapaper-scraper --read-url
 ```
 This adds a `instapaper_url` field to each article in the JSON output and a `instapaper_url` column in the CSV and SQLite outputs. The original `id` field is preserved.
@@ -175,15 +186,15 @@ The tool is designed with a modular architecture for reliability and maintainabi
 ## 📊 Example Output
-### 📄 CSV (`output/bookmarks.csv`) (with --add-instapaper-url)
+### 📄 CSV (`output/bookmarks.csv`) (with --add-instapaper-url and --add-article-preview)
 ```csv
-"id","instapaper_url","title","url"
-"999901234","https://www.instapaper.com/read/999901234","Article 1","https://www.example.com/page-1/"
-"999002345","https://www.instapaper.com/read/999002345","Article 2","https://www.example.com/page-2/"
+"id","instapaper_url","title","url","article_preview"
+"999901234","https://www.instapaper.com/read/999901234","Article 1","https://www.example.com/page-1/","This is a preview of article 1."
+"999002345","https://www.instapaper.com/read/999002345","Article 2","https://www.example.com/page-2/","This is a preview of article 2."
 ```
-### 📄 JSON (`output/bookmarks.json`) (with --add-instapaper-url)
+### 📄 JSON (`output/bookmarks.json`) (with --add-instapaper-url and --add-article-preview)
 ```json
 [
@@ -191,13 +202,15 @@ The tool is designed with a modular architecture for reliability and maintainabi
         "id": "999901234",
         "title": "Article 1",
         "url": "https://www.example.com/page-1/",
-        "instapaper_url": "https://www.instapaper.com/read/999901234"
+        "instapaper_url": "https://www.instapaper.com/read/999901234",
+        "article_preview": "This is a preview of article 1."
     },
     {
         "id": "999002345",
         "title": "Article 2",
         "url": "https://www.example.com/page-2/",
-        "instapaper_url": "https://www.instapaper.com/read/999002345"
+        "instapaper_url": "https://www.instapaper.com/read/999002345",
+        "article_preview": "This is a preview of article 2."
     }
 ]
 ```
@@ -226,7 +239,18 @@ Please read the **[Contribution Guidelines](CONTRIBUTING.md)** before you start.
 ## 🧑‍💻 Development & Testing
-This project uses `pytest` for testing, `ruff` for code formatting and linting, and `mypy` for static type checking.
+This project uses `pytest` for testing, `ruff` for code formatting and linting, and `mypy` for static type checking. A `Makefile` is provided to simplify common development tasks.
+### 🚀 Using the Makefile
+The most common commands are:
+-   `make install`: Installs development dependencies.
+-   `make format`: Formats the entire codebase.
+-   `make check`: Runs the linter, type checker, and test suite.
+-   `make test`: Runs the test suite.
+-   `make build`: Builds the distributable packages.
+Run `make help` to see all available commands.
 ### 🔧 Setup
@@ -252,13 +276,13 @@ python -m src.instapaper_scraper.cli
 ### ✅ Testing
-To run the tests, execute the following command from the project root:
+To run the tests, execute the following command from the project root (or use `make test`):
 ```sh
 pytest
 ```
-To check test coverage:
+To check test coverage (or use `make test-cov`):
 ```sh
 pytest --cov=src/instapaper_scraper --cov-report=term-missing
@@ -266,6 +290,8 @@ pytest --cov=src/instapaper_scraper --cov-report=term-missing
 ### ✨ Code Quality
+You can use the `Makefile` for convenience (e.g., `make format`, `make lint`).
 To format the code with `ruff`:
 ```sh
@@ -278,12 +304,6 @@ To check for linting errors with `ruff`:
 ruff check .
 ```
-To automatically fix linting errors:
-```sh
-ruff check . --fix
-```
 To run static type checking with `mypy`:
 ```sh
@@ -293,7 +313,7 @@ mypy src
 To run license checks:
 ```sh
-licensecheck --show-only-failing
+licensecheck --zero
 ```

{instapaper_scraper-1.1.1 → instapaper_scraper-1.2.0rc1}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "instapaper-scraper"
-version = "1.1.1"
+version = "1.2.0rc1"
 description = "A tool to scrape articles from Instapaper."
 readme = "README.md"
 requires-python = ">=3.9"
@@ -25,7 +25,7 @@ classifiers = [
 license-files = ["LICEN[CS]E*"]
 dependencies = [
     "beautifulsoup4~=4.14.2",
-    "certifi~=2025.11.12",
+    "certifi>=2025.11.12,<2026.2.0",
     "charset-normalizer~=3.4.3",
     "cryptography~=46.0.3",
     "guara~=0.0.14",

{instapaper_scraper-1.1.1 → instapaper_scraper-1.2.0rc1}/src/instapaper_scraper/api.py RENAMED Viewed

@@ -8,7 +8,13 @@ from bs4 import BeautifulSoup
 from bs4.element import Tag
 from .exceptions import ScraperStructureChanged
-from .constants import INSTAPAPER_BASE_URL, KEY_ID, KEY_TITLE, KEY_URL
+from .constants import (
+    INSTAPAPER_BASE_URL,
+    KEY_ID,
+    KEY_TITLE,
+    KEY_URL,
+    KEY_ARTICLE_PREVIEW,
+)
 class InstapaperClient:
@@ -34,6 +40,7 @@ class InstapaperClient:
     PAGINATE_OLDER_CLASS = "paginate_older"
     ARTICLE_TITLE_CLASS = "article_title"
     TITLE_META_CLASS = "title_meta"
+    ARTICLE_PREVIEW_CLASS = "article_preview"
     # URL paths
     URL_PATH_USER = "/u/"
@@ -102,12 +109,14 @@ class InstapaperClient:
         self,
         page: int = DEFAULT_PAGE_START,
         folder_info: Optional[Dict[str, str]] = None,
+        add_article_preview: bool = False,
     ) -> Tuple[List[Dict[str, str]], bool]:
         """
         Fetches a single page of articles and determines if there are more pages.
         Args:
             page: The page number to fetch.
             folder_info: A dictionary containing 'id' and 'slug' of the folder to fetch articles from.
+            add_article_preview: Whether to include the article preview.
         Returns:
             A tuple containing:
             - A list of article data (dictionaries with id, title, url).
@@ -147,7 +156,9 @@ class InstapaperClient:
                             article_id_val.replace(self.ARTICLE_ID_PREFIX, "")
                         )
-                data = self._parse_article_data(soup, article_ids, page)
+                data = self._parse_article_data(
+                    soup, article_ids, page, add_article_preview
+                )
                 has_more = soup.find(class_=self.PAGINATE_OLDER_CLASS) is not None
                 return data, has_more
@@ -185,13 +196,17 @@ class InstapaperClient:
         raise Exception(self.MSG_SCRAPING_FAILED_UNKNOWN)
     def get_all_articles(
-        self, limit: Optional[int] = None, folder_info: Optional[Dict[str, str]] = None
+        self,
+        limit: Optional[int] = None,
+        folder_info: Optional[Dict[str, str]] = None,
+        add_article_preview: bool = False,
     ) -> List[Dict[str, str]]:
         """
         Iterates through pages and fetches articles up to a specified limit.
         Args:
             limit: The maximum number of pages to scrape. If None, scrapes all pages.
             folder_info: A dictionary containing 'id' and 'slug' of the folder to fetch articles from.
+            add_article_preview: Whether to include the article preview.
         """
         all_articles = []
         page = self.DEFAULT_PAGE_START
@@ -202,7 +217,11 @@ class InstapaperClient:
                 break
             logging.info(self.MSG_SCRAPING_PAGE.format(page=page))
-            data, has_more = self.get_articles(page=page, folder_info=folder_info)
+            data, has_more = self.get_articles(
+                page=page,
+                folder_info=folder_info,
+                add_article_preview=add_article_preview,
+            )
             if data:
                 all_articles.extend(data)
             page += 1
@@ -217,7 +236,11 @@ class InstapaperClient:
         return f"{INSTAPAPER_BASE_URL}{self.URL_PATH_USER}{page}"
     def _parse_article_data(
-        self, soup: BeautifulSoup, article_ids: List[str], page: int
+        self,
+        soup: BeautifulSoup,
+        article_ids: List[str],
+        page: int,
+        add_article_preview: bool = False,
     ) -> List[Dict[str, Any]]:
         """Parses the raw HTML to extract structured data for each article."""
         data = []
@@ -249,7 +272,19 @@ class InstapaperClient:
                     raise AttributeError(self.MSG_LINK_ELEMENT_NOT_FOUND)
                 link = link_element["href"]
-                data.append({KEY_ID: article_id, KEY_TITLE: title, KEY_URL: link})
+                article_data = {KEY_ID: article_id, KEY_TITLE: title, KEY_URL: link}
+                if add_article_preview:
+                    preview_element = article_element.find(
+                        class_=self.ARTICLE_PREVIEW_CLASS
+                    )
+                    article_data[KEY_ARTICLE_PREVIEW] = (
+                        preview_element.get_text().strip()
+                        if isinstance(preview_element, Tag)
+                        else ""
+                    )
+                data.append(article_data)
             except AttributeError as e:
                 logging.warning(
                     self.MSG_PARSE_ARTICLE_WARNING.format(

{instapaper_scraper-1.1.1 → instapaper_scraper-1.2.0rc1}/src/instapaper_scraper/cli.py RENAMED Viewed

@@ -102,9 +102,18 @@ def main() -> None:
     parser.add_argument("--username", help="Instapaper username.")
     parser.add_argument("--password", help="Instapaper password.")
     parser.add_argument(
-        "--add-instapaper-url",
-        action="store_true",
-        help="Add an 'instapaper_url' column to the output with the full Instapaper read URL.",
+        "--read-url",  # New, preferred flag
+        "--add-instapaper-url",  # Old, for backward compatibility
+        dest="add_instapaper_url",
+        action=argparse.BooleanOptionalAction,
+        help="Include the Instapaper read URL. Overrides config.",
+    )
+    parser.add_argument(
+        "--article-preview",  # New, preferred flag
+        "--add-article-preview",  # Old, for backward compatibility
+        dest="add_article_preview",
+        action=argparse.BooleanOptionalAction,
+        help="Include the article preview text. Overrides config.",
     )
     parser.add_argument(
         "--limit",
@@ -120,8 +129,21 @@ def main() -> None:
     config = load_config(args.config_path)
     folders = config.get("folders", []) if config else []
+    fields_config = config.get("fields", {}) if config else {}
     selected_folder = None
+    # Resolve boolean flags, giving CLI priority over config
+    final_add_instapaper_url = (
+        args.add_instapaper_url
+        if args.add_instapaper_url is not None
+        else fields_config.get("read_url", False)
+    )
+    final_add_article_preview = (
+        args.add_article_preview
+        if args.add_article_preview is not None
+        else fields_config.get("article_preview", False)
+    )
     if args.folder:
         if args.folder.lower() == "none":
             selected_folder = None
@@ -196,7 +218,9 @@ def main() -> None:
     try:
         folder_info = selected_folder if selected_folder else None
         all_articles = client.get_all_articles(
-            limit=args.limit, folder_info=folder_info
+            limit=args.limit,
+            folder_info=folder_info,
+            add_article_preview=final_add_article_preview,
         )
     except ScraperStructureChanged as e:
         logging.error(f"Stopping scraper due to an unrecoverable error: {e}")
@@ -214,7 +238,8 @@ def main() -> None:
             all_articles,
             args.format,
             output_filename,
-            add_instapaper_url=args.add_instapaper_url,
+            add_instapaper_url=final_add_instapaper_url,
+            add_article_preview=final_add_article_preview,
         )
         logging.info("Articles scraped and saved successfully.")
     except Exception as e:

{instapaper_scraper-1.1.1 → instapaper_scraper-1.2.0rc1}/src/instapaper_scraper/constants.py RENAMED Viewed

@@ -15,3 +15,4 @@ CONFIG_DIR = Path.home() / ".config" / APP_NAME
 KEY_ID = "id"
 KEY_TITLE = "title"
 KEY_URL = "url"
+KEY_ARTICLE_PREVIEW = "article_preview"

instapaper-scraper 1.1.1__tar.gz → 1.2.0rc1__tar.gz

instapaper-scraper 1.1.1tar.gz → 1.2.0rc1tar.gz