PyPI - instagram-archiver - Versions diffs - 0.3.0__tar.gz → 0.3.2__tar.gz - Mend

instagram-archiver 0.3.0tar.gz → 0.3.2tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of instagram-archiver might be problematic. Click here for more details.

Files changed (15) hide show

{instagram_archiver-0.3.0 → instagram_archiver-0.3.2}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.3
 Name: instagram-archiver
-Version: 0.3.0
+Version: 0.3.2
 Summary: Save Instagram content you have access to.
 License: MIT
 Keywords: command line,instagram
@@ -33,7 +33,7 @@ Description-Content-Type: text/markdown
 [![PyPI - Version](https://img.shields.io/pypi/v/instagram-archiver)](https://pypi.org/project/instagram-archiver/)
 [![GitHub tag (with filter)](https://img.shields.io/github/v/tag/Tatsh/instagram-archiver)](https://github.com/Tatsh/instagram-archiver/tags)
 [![License](https://img.shields.io/github/license/Tatsh/instagram-archiver)](https://github.com/Tatsh/instagram-archiver/blob/master/LICENSE.txt)
-[![GitHub commits since latest release (by SemVer including pre-releases)](https://img.shields.io/github/commits-since/Tatsh/instagram-archiver/v0.3.0/master)](https://github.com/Tatsh/instagram-archiver/compare/v0.3.0...master)
+[![GitHub commits since latest release (by SemVer including pre-releases)](https://img.shields.io/github/commits-since/Tatsh/instagram-archiver/v0.3.2/master)](https://github.com/Tatsh/instagram-archiver/compare/v0.3.2...master)
 [![QA](https://github.com/Tatsh/instagram-archiver/actions/workflows/qa.yml/badge.svg)](https://github.com/Tatsh/instagram-archiver/actions/workflows/qa.yml)
 [![Tests](https://github.com/Tatsh/instagram-archiver/actions/workflows/tests.yml/badge.svg)](https://github.com/Tatsh/instagram-archiver/actions/workflows/tests.yml)
 [![Coverage Status](https://coveralls.io/repos/github/Tatsh/instagram-archiver/badge.svg?branch=master)](https://coveralls.io/github/Tatsh/instagram-archiver?branch=master)

{instagram_archiver-0.3.0 → instagram_archiver-0.3.2}/README.md RENAMED Viewed

@@ -4,7 +4,7 @@
 [![PyPI - Version](https://img.shields.io/pypi/v/instagram-archiver)](https://pypi.org/project/instagram-archiver/)
 [![GitHub tag (with filter)](https://img.shields.io/github/v/tag/Tatsh/instagram-archiver)](https://github.com/Tatsh/instagram-archiver/tags)
 [![License](https://img.shields.io/github/license/Tatsh/instagram-archiver)](https://github.com/Tatsh/instagram-archiver/blob/master/LICENSE.txt)
-[![GitHub commits since latest release (by SemVer including pre-releases)](https://img.shields.io/github/commits-since/Tatsh/instagram-archiver/v0.3.0/master)](https://github.com/Tatsh/instagram-archiver/compare/v0.3.0...master)
+[![GitHub commits since latest release (by SemVer including pre-releases)](https://img.shields.io/github/commits-since/Tatsh/instagram-archiver/v0.3.2/master)](https://github.com/Tatsh/instagram-archiver/compare/v0.3.2...master)
 [![QA](https://github.com/Tatsh/instagram-archiver/actions/workflows/qa.yml/badge.svg)](https://github.com/Tatsh/instagram-archiver/actions/workflows/qa.yml)
 [![Tests](https://github.com/Tatsh/instagram-archiver/actions/workflows/tests.yml/badge.svg)](https://github.com/Tatsh/instagram-archiver/actions/workflows/tests.yml)
 [![Coverage Status](https://coveralls.io/repos/github/Tatsh/instagram-archiver/badge.svg?branch=master)](https://coveralls.io/github/Tatsh/instagram-archiver?branch=master)

{instagram_archiver-0.3.0 → instagram_archiver-0.3.2}/instagram_archiver/__init__.py RENAMED Viewed

@@ -6,4 +6,4 @@ from .profile_scraper import ProfileScraper
 from .saved_scraper import SavedScraper
 __all__ = ('InstagramClient', 'ProfileScraper', 'SavedScraper')
-__version__ = 'v0.3.0'
+__version__ = 'v0.3.2'

{instagram_archiver-0.3.0 → instagram_archiver-0.3.2}/instagram_archiver/client.py RENAMED Viewed

@@ -8,12 +8,11 @@ from typing import TYPE_CHECKING, Any, Self, TypeVar, cast
 import json
 import logging
-from bs4 import BeautifulSoup as Soup
 from requests import HTTPError
 from yt_dlp_utils import setup_session
 import requests
-from .constants import API_HEADERS, PAGE_FETCH_HEADERS, SHARED_HEADERS
+from .constants import API_HEADERS, SHARED_HEADERS
 from .typing import (
     CarouselMedia,
     Comments,
@@ -31,7 +30,7 @@ if TYPE_CHECKING:
     from .typing import BrowserName
-__all__ = ('CSRFTokenNotFound', 'InstagramClient')
+__all__ = ('CSRFTokenNotFound', 'InstagramClient', 'UnexpectedRedirect')
 T = TypeVar('T')
 log = logging.getLogger(__name__)
@@ -41,6 +40,10 @@ class CSRFTokenNotFound(RuntimeError):
     """CSRF token not found in cookies."""
+class UnexpectedRedirect(RuntimeError):
+    """Unexpected redirect in a request."""
 class InstagramClient:
     """Generic client for Instagram."""
     def __init__(self, browser: BrowserName = 'chrome', browser_profile: str = 'Default') -> None:
@@ -59,7 +62,6 @@ class InstagramClient:
                                      browser_profile,
                                      SHARED_HEADERS,
                                      domains={'instagram.com'},
-                                     setup_retry=True,
                                      status_forcelist=(413, 429, 500, 502, 503, 504))
         self.failed_urls: set[str] = set()
         """Set of failed URLs."""
@@ -193,27 +195,29 @@ class InstagramClient:
             json.dump(top_comment_data, f, sort_keys=True, indent=2)
     def save_media(self, edge: Edge) -> None:
-        """Save media for an edge node."""
-        log.info('Saving media at URL: https://www.instagram.com/p/%s', edge['node']['code'])
-        media_info_url = f'https://www.instagram.com/p/{edge["node"]["code"]}/'
+        """
+        Save media for an edge node.
+        Raises
+        ------
+        UnexpectedRedirect
+            If a redirect occurs unexpectedly.
+        """
+        media_info_url = f'https://www.instagram.com/api/v1/media/{edge["node"]["pk"]}/info/'
+        log.info('Saving media at URL: %s', media_info_url)
         if self.is_saved(media_info_url):
             return
-        r = self.session.get(media_info_url, headers=PAGE_FETCH_HEADERS)
+        r = self.session.get(media_info_url, headers=API_HEADERS, allow_redirects=False)
         if r.status_code != HTTPStatus.OK:
+            if r.status_code in {HTTPStatus.MOVED_PERMANENTLY, HTTPStatus.FOUND}:
+                raise UnexpectedRedirect
             log.warning('GET request failed with status code %s.', r.status_code)
+            log.debug('Content: %s', r.text)
             return
         if 'image_versions2' not in r.text or 'taken_at' not in r.text:
             log.warning('Invalid response. image_versions2 dict not found.')
             return
-        soup = Soup(r.text, 'html5lib')
-        media_info_embedded = next(
-            json.loads(s) for s in (''.join(
-                getattr(c, 'text', '') for c in getattr(script, 'contents', ''))
-                                    for script in soup.select('script[type="application/json"]'))
-            if 'image_versions2' in s and 'taken_at' in s)
-        media_info: MediaInfo = (
-            media_info_embedded['require'][0][3][0]['__bbox']['require'][0][3][1]['__bbox']
-            ['result']['data']['xdt_api__v1__media__shortcode__web_info'])
+        media_info: MediaInfo = r.json()
         timestamp = media_info['items'][0]['taken_at']
         id_json_file = f'{edge["node"]["id"]}.json'
         media_info_json_file = f'{edge["node"]["id"]}-media-info-0000.json'
@@ -246,7 +250,7 @@ class InstagramClient:
                     else:
                         log.exception('Unknown shortcode.')
                 if edge['node'].get('video_dash_manifest'):
-                    self.add_video_url(f'https://www.instagram.com/p/{shortcode}')
+                    self.add_video_url(f'https://www.instagram.com/p/{shortcode}/')
                 else:
                     try:
                         self.save_comments(edge)
@@ -259,7 +263,7 @@ class InstagramClient:
                     'Unknown type: `%s`. Item %s will not be processed.',
                     edge['node']['__typename'], edge['node']['id'])
                 shortcode = edge['node']['code']
-                self.failed_urls.add(f'https://www.instagram.com/p/{shortcode}')
+                self.failed_urls.add(f'https://www.instagram.com/p/{shortcode}/')
     def get_json(self, url: str, *, cast_to: type[T], params: Mapping[str, str] | None = None) -> T:
         """Get JSON data from a URL."""

{instagram_archiver-0.3.0 → instagram_archiver-0.3.2}/instagram_archiver/main.py RENAMED Viewed

@@ -6,6 +6,7 @@ from typing import TYPE_CHECKING
 import click
+from .client import UnexpectedRedirect
 from .constants import BROWSER_CHOICES
 from .profile_scraper import ProfileScraper
 from .saved_scraper import SavedScraper
@@ -55,6 +56,9 @@ def main(output_dir: str,
                                         if '%(username)s' in output_dir else Path(output_dir)),
                             username=username) as client:
             client.process()
+    except UnexpectedRedirect as e:
+        click.echo('Unexpected redirect. Assuming request limit has been reached.', err=True)
+        raise click.Abort from e
     except Exception as e:
         if isinstance(e, KeyboardInterrupt) or debug:
             raise
@@ -91,6 +95,9 @@ def save_saved_main(output_dir: str,
     setup_logging(debug=debug)
     try:
         SavedScraper(browser, profile, output_dir, comments=include_comments).process(unsave=unsave)
+    except UnexpectedRedirect as e:
+        click.echo('Unexpected redirect. Assuming request limit has been reached.', err=True)
+        raise click.Abort from e
     except Exception as e:
         if isinstance(e, KeyboardInterrupt) or debug:
             raise

{instagram_archiver-0.3.0 → instagram_archiver-0.3.2}/instagram_archiver/profile_scraper.py RENAMED Viewed

@@ -116,22 +116,26 @@ class ProfileScraper(SaveCommentsCheckDisabledMixin, InstagramClient):
             r = self.get_json('https://i.instagram.com/api/v1/users/web_profile_info/',
                               params={'username': self._username},
                               cast_to=WebProfileInfo)
-            with Path('web_profile_info.json').open('w', encoding='utf-8') as f:
-                json.dump(r, f, indent=2, sort_keys=True)
-            user_info = r['data']['user']
-            if not self.is_saved(user_info['profile_pic_url_hd']):
-                with Path('profile_pic.jpg').open('wb') as f:
-                    f.writelines(
-                        self.session.get(user_info['profile_pic_url_hd'],
-                                         stream=True).iter_content(chunk_size=512))
-                self.save_to_log(user_info['profile_pic_url_hd'])
-            try:
-                for item in self.highlights_tray(user_info['id'])['tray']:
-                    self.add_video_url('https://www.instagram.com/stories/highlights/'
-                                       f'{item["id"].split(":")[-1]}/')
-            except HTTPError:
-                log.exception('Failed to get highlights data.')
-            self.save_edges(user_info['edge_owner_to_timeline_media']['edges'])
+            if 'data' in r:
+                with Path('web_profile_info.json').open('w', encoding='utf-8') as f:
+                    json.dump(r, f, indent=2, sort_keys=True)
+                user_info = r['data']['user']
+                if not self.is_saved(user_info['profile_pic_url_hd']):
+                    with Path('profile_pic.jpg').open('wb') as f:
+                        f.writelines(
+                            self.session.get(user_info['profile_pic_url_hd'],
+                                             stream=True).iter_content(chunk_size=512))
+                    self.save_to_log(user_info['profile_pic_url_hd'])
+                try:
+                    for item in self.highlights_tray(user_info['id'])['tray']:
+                        self.add_video_url('https://www.instagram.com/stories/highlights/'
+                                           f'{item["id"].split(":")[-1]}/')
+                except HTTPError:
+                    log.exception('Failed to get highlights data.')
+                self.save_edges(user_info['edge_owner_to_timeline_media']['edges'])
+            else:
+                log.warning(
+                    'Failed to get user info. Profile information and image will not be saved.')
             d = self.graphql_query(
                 {
                     'data': {
@@ -180,15 +184,15 @@ class ProfileScraper(SaveCommentsCheckDisabledMixin, InstagramClient):
                 with get_configured_yt_dlp() as ydl:
                     while self.video_urls and (url := self.video_urls.pop()):
                         if self.is_saved(url):
-                            log.info('`%s` is already saved.', url)
+                            log.info('%s is already saved.', url)
                             continue
                         if ydl.extract_info(url):
-                            log.info('Extracting `%s`.', url)
+                            log.info('Downloading video: %s', url)
                             self.save_to_log(url)
                         else:
                             self.failed_urls.add(url)
             if self.failed_urls:
-                log.warning('Some video URIs failed. Check failed.txt.')
+                log.warning('Some URIs failed. Check failed.txt.')
                 with Path('failed.txt').open('w', encoding='utf-8') as f:
                     for url in self.failed_urls:
                         f.write(f'{url}\n')

{instagram_archiver-0.3.0 → instagram_archiver-0.3.2}/instagram_archiver/saved_scraper.py RENAMED Viewed

@@ -20,7 +20,7 @@ __all__ = ('SavedScraper',)
 log = logging.getLogger(__name__)
-class SavedScraper(InstagramClient, SaveCommentsCheckDisabledMixin):
+class SavedScraper(SaveCommentsCheckDisabledMixin, InstagramClient):
     """Scrape saved posts."""
     def __init__(
         self,
@@ -69,6 +69,7 @@ class SavedScraper(InstagramClient, SaveCommentsCheckDisabledMixin):
                     'id': item['media']['id'],
                     'code': item['media']['code'],
                     'owner': item['media']['owner'],
+                    'pk': item['media']['pk'],
                     'video_dash_manifest': item['media'].get('video_dash_manifest')
                 }
             } for item in feed['items'])

{instagram_archiver-0.3.0 → instagram_archiver-0.3.2}/instagram_archiver/typing.py RENAMED Viewed

@@ -131,6 +131,8 @@ class XDTMediaDict(TypedDict):
     """Media ID."""
     owner: Owner
     """Owner information."""
+    pk: str
+    """Primary key. Also carousel ID."""
     video_dash_manifest: NotRequired[str | None]
     """Video dash manifest URL, if available."""
@@ -161,7 +163,7 @@ class WebProfileInfoData(TypedDict):
 class WebProfileInfo(TypedDict):
     """Profile information container."""
-    data: WebProfileInfoData
+    data: NotRequired[WebProfileInfoData]
     """Profile data."""

{instagram_archiver-0.3.0 → instagram_archiver-0.3.2}/man/instagram-archiver.1 RENAMED Viewed

@@ -27,9 +27,9 @@ level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
 .\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
 .in \\n[rst2man-indent\\n[rst2man-indent-level]]u
 ..
-.TH "INSTAGRAM-ARCHIVER" "1" "May 12, 2025" "0.3.0" "instagram-archiver"
+.TH "INSTAGRAM-ARCHIVER" "1" "May 12, 2025" "0.3.2" "instagram-archiver"
 .SH NAME
-instagram-archiver \- instagram-archiver v0.3.0
+instagram-archiver \- instagram-archiver v0.3.2
 .SH COMMANDS
 .SS instagram\-archiver
 .sp

{instagram_archiver-0.3.0 → instagram_archiver-0.3.2}/pyproject.toml RENAMED Viewed

@@ -18,7 +18,7 @@ keywords = ["command line", "instagram"]
 license = "MIT"
 name = "instagram-archiver"
 readme = "README.md"
-version = "0.3.0"
+version = "0.3.2"
 [[project.authors]]
 email = "audvare@gmail.com"

{instagram_archiver-0.3.0 → instagram_archiver-0.3.2}/LICENSE.txt RENAMED Viewed

File without changes

{instagram_archiver-0.3.0 → instagram_archiver-0.3.2}/instagram_archiver/__main__.py RENAMED Viewed

File without changes

{instagram_archiver-0.3.0 → instagram_archiver-0.3.2}/instagram_archiver/constants.py RENAMED Viewed

File without changes

{instagram_archiver-0.3.0 → instagram_archiver-0.3.2}/instagram_archiver/py.typed RENAMED Viewed

File without changes

{instagram_archiver-0.3.0 → instagram_archiver-0.3.2}/instagram_archiver/utils.py RENAMED Viewed

File without changes

instagram-archiver 0.3.0__tar.gz → 0.3.2__tar.gz

Potentially problematic release.

instagram-archiver 0.3.0tar.gz → 0.3.2tar.gz