PyPI - mapillary-downloader - Versions diffs - 0.2.0__tar.gz → 0.3.1__tar.gz - Mend

mapillary-downloader 0.2.0tar.gz → 0.3.1tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (17) hide show

{mapillary_downloader-0.2.0 → mapillary_downloader-0.3.1}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: mapillary_downloader
-Version: 0.2.0
+Version: 0.3.1
 Summary: Download your Mapillary data before it's gone
 Author-email: Gareth Davidson <gaz@bitplane.net>
 Requires-Python: >=3.10
@@ -34,52 +34,50 @@ Download your Mapillary data before it's gone.
 ## Installation
-```bash
-pip install mapillary-downloader
-```
-Or from source:
+Installation is optional, you can prefix the command with `uvx` or `pipx` to
+download and run it. Or if you're oldskool you can do:
 ```bash
-make install
+pip install mapillary-downloader
 ```
 ## Usage
-First, get your Mapillary API access token from https://www.mapillary.com/dashboard/developers
+First, get your Mapillary API access token from
+[the developer dashboard](https://www.mapillary.com/dashboard/developers)
 ```bash
-mapillary-downloader --token YOUR_TOKEN --username YOUR_USERNAME --output ./downloads
+# Set token via environment variable
+export MAPILLARY_TOKEN=YOUR_TOKEN
+mapillary-downloader --username SOME_USERNAME --output ./downloads
+# Or pass token directly, and have it in your shell history 💩👀
+mapillary-downloader --token YOUR_TOKEN --username SOME_USERNAME --output ./downloads
 ```
 | option        | because                               | default            |
 | ------------- | ------------------------------------- | ------------------ |
-| `--token`     | Your Mapillary API access token       | None (required)    |
-| `--username`  | Your Mapillary username               | None (required)    |
+| `--username`  | Mapillary username                    | None (required)    |
+| `--token`     | Mapillary API token (or env var)      | `$MAPILLARY_TOKEN` |
 | `--output`    | Output directory                      | `./mapillary_data` |
 | `--quality`   | 256, 1024, 2048 or original           | `original`         |
 | `--bbox`      | `west,south,east,north`               | `None`             |
 | `--webp`      | Convert to WebP (saves ~70% space)    | `False`            |
+| `--workers`   | Number of parallel download workers   | Half of CPU count  |
+| `--no-tar`    | Don't tar sequence directories        | `False`            |
 The downloader will:
-* 💾 Fetch all your uploaded images from Mapillary
-* 📷 Download full-resolution images organized by sequence
+* 📷 Download a user's images organized by sequence
 * 📜 Inject EXIF metadata (GPS coordinates, camera info, timestamps,
   compass direction)
 * 🛟 Save progress so you can safely resume if interrupted
-* 🗜️ Optionally convert to WebP format for massive space savings
+* 🗜️ Optionally convert to WebP to save space
+* 📦 Tar sequence directories for faster uploads
 ## WebP Conversion
-Use the `--webp` flag to convert images to WebP format after download:
-```bash
-mapillary-downloader --token YOUR_TOKEN --username YOUR_USERNAME --webp
-```
-This reduces storage by approximately 70% while preserving all EXIF metadata
-including GPS coordinates. Requires the `cwebp` binary to be installed:
+You'll need `cwebp` to use the `--webp` flag. So install it:
 ```bash
 # Debian/Ubuntu
@@ -89,20 +87,46 @@ sudo apt install webp
 brew install webp
 ```
+## Sequence Tarball Creation
+By default, sequence directories are automatically tarred after download because
+if they weren't, you'd spend more time setting up upload metadata than actually
+uploading files to IA.
+To keep individual files instead of creating tars, use the `--no-tar` flag:
+```bash
+mapillary-downloader --username WHOEVER --no-tar
+```
+## Internet Archive upload
+I've written a bash tool to rip media then tag, queue, and upload to The
+Internet Archive. The metadata is in the same format. If you copy completed
+download dirs into the `4.ship` dir, they'll find their way into an
+appropriately named item.
+See inlay for details:
+* [📀 rip](https://bitplane.net/dev/sh/rip)
 ## Development
 ```bash
 make dev      # Setup dev environment
 make test     # Run tests
-make coverage # Run tests with coverage
+make dist     # Build the distribution
+make help     # See other make options
 ```
 ## Links
 * [🏠 home](https://bitplane.net/dev/python/mapillary_downloader)
-* [📖 pydoc](https://bitplane.net/dev/python/mapillary_downloader/pydoc)
+  * [📖 pydoc](https://bitplane.net/dev/python/mapillary_downloader/pydoc)
 * [🐍 pypi](https://pypi.org/project/mapillary-downloader)
 * [🐱 github](https://github.com/bitplane/mapillary_downloader)
+* [📀 rip](https://bitplane.net/dev/sh/rip
 ## License

mapillary_downloader-0.3.1/README.md ADDED Viewed

@@ -0,0 +1,108 @@
+# 🗺️ Mapillary Downloader
+Download your Mapillary data before it's gone.
+## Installation
+Installation is optional, you can prefix the command with `uvx` or `pipx` to
+download and run it. Or if you're oldskool you can do:
+```bash
+pip install mapillary-downloader
+```
+## Usage
+First, get your Mapillary API access token from
+[the developer dashboard](https://www.mapillary.com/dashboard/developers)
+```bash
+# Set token via environment variable
+export MAPILLARY_TOKEN=YOUR_TOKEN
+mapillary-downloader --username SOME_USERNAME --output ./downloads
+# Or pass token directly, and have it in your shell history 💩👀
+mapillary-downloader --token YOUR_TOKEN --username SOME_USERNAME --output ./downloads
+```
+| option        | because                               | default            |
+| ------------- | ------------------------------------- | ------------------ |
+| `--username`  | Mapillary username                    | None (required)    |
+| `--token`     | Mapillary API token (or env var)      | `$MAPILLARY_TOKEN` |
+| `--output`    | Output directory                      | `./mapillary_data` |
+| `--quality`   | 256, 1024, 2048 or original           | `original`         |
+| `--bbox`      | `west,south,east,north`               | `None`             |
+| `--webp`      | Convert to WebP (saves ~70% space)    | `False`            |
+| `--workers`   | Number of parallel download workers   | Half of CPU count  |
+| `--no-tar`    | Don't tar sequence directories        | `False`            |
+The downloader will:
+* 📷 Download a user's images organized by sequence
+* 📜 Inject EXIF metadata (GPS coordinates, camera info, timestamps,
+  compass direction)
+* 🛟 Save progress so you can safely resume if interrupted
+* 🗜️ Optionally convert to WebP to save space
+* 📦 Tar sequence directories for faster uploads
+## WebP Conversion
+You'll need `cwebp` to use the `--webp` flag. So install it:
+```bash
+# Debian/Ubuntu
+sudo apt install webp
+# macOS
+brew install webp
+```
+## Sequence Tarball Creation
+By default, sequence directories are automatically tarred after download because
+if they weren't, you'd spend more time setting up upload metadata than actually
+uploading files to IA.
+To keep individual files instead of creating tars, use the `--no-tar` flag:
+```bash
+mapillary-downloader --username WHOEVER --no-tar
+```
+## Internet Archive upload
+I've written a bash tool to rip media then tag, queue, and upload to The
+Internet Archive. The metadata is in the same format. If you copy completed
+download dirs into the `4.ship` dir, they'll find their way into an
+appropriately named item.
+See inlay for details:
+* [📀 rip](https://bitplane.net/dev/sh/rip)
+## Development
+```bash
+make dev      # Setup dev environment
+make test     # Run tests
+make dist     # Build the distribution
+make help     # See other make options
+```
+## Links
+* [🏠 home](https://bitplane.net/dev/python/mapillary_downloader)
+  * [📖 pydoc](https://bitplane.net/dev/python/mapillary_downloader/pydoc)
+* [🐍 pypi](https://pypi.org/project/mapillary-downloader)
+* [🐱 github](https://github.com/bitplane/mapillary_downloader)
+* [📀 rip](https://bitplane.net/dev/sh/rip
+## License
+WTFPL with one additional clause
+1. Don't blame me
+Do wtf you want, but don't blame me if it makes jokes about the size of your
+disk drive.

{mapillary_downloader-0.2.0 → mapillary_downloader-0.3.1}/pyproject.toml RENAMED Viewed

@@ -1,7 +1,7 @@
 [project]
 name = "mapillary_downloader"
 description = "Download your Mapillary data before it's gone"
-version = "0.2.0"
+version = "0.3.1"
 authors = [
     { name = "Gareth Davidson", email = "gaz@bitplane.net" }
 ]

{mapillary_downloader-0.2.0 → mapillary_downloader-0.3.1}/src/mapillary_downloader/__main__.py RENAMED Viewed

@@ -1,6 +1,7 @@
 """CLI entry point."""
 import argparse
+import os
 import sys
 from mapillary_downloader.client import MapillaryClient
 from mapillary_downloader.downloader import MapillaryDownloader
@@ -14,8 +15,12 @@ def main():
     logger = setup_logging()
     parser = argparse.ArgumentParser(description="Download your Mapillary data before it's gone")
-    parser.add_argument("--token", required=True, help="Mapillary API access token")
-    parser.add_argument("--username", required=True, help="Your Mapillary username")
+    parser.add_argument(
+        "--token",
+        default=os.environ.get("MAPILLARY_TOKEN"),
+        help="Mapillary API access token (or set MAPILLARY_TOKEN env var)",
+    )
+    parser.add_argument("--username", required=True, help="Mapillary username")
     parser.add_argument("--output", default="./mapillary_data", help="Output directory (default: ./mapillary_data)")
     parser.add_argument(
         "--quality",
@@ -29,9 +34,25 @@ def main():
         action="store_true",
         help="Convert images to WebP format (saves ~70%% disk space, requires cwebp binary)",
     )
+    parser.add_argument(
+        "--workers",
+        type=int,
+        default=None,
+        help="Number of parallel workers (default: half of CPU cores)",
+    )
+    parser.add_argument(
+        "--no-tar",
+        action="store_true",
+        help="Don't tar sequence directories (keep individual files)",
+    )
     args = parser.parse_args()
+    # Check for token
+    if not args.token:
+        logger.error("Error: Mapillary API token required. Use --token or set MAPILLARY_TOKEN environment variable")
+        sys.exit(1)
     bbox = None
     if args.bbox:
         try:
@@ -51,8 +72,10 @@ def main():
     try:
         client = MapillaryClient(args.token)
-        downloader = MapillaryDownloader(client, args.output)
-        downloader.download_user_data(args.username, args.quality, bbox, convert_webp=args.webp)
+        downloader = MapillaryDownloader(
+            client, args.output, args.username, args.quality, workers=args.workers, tar_sequences=not args.no_tar
+        )
+        downloader.download_user_data(bbox=bbox, convert_webp=args.webp)
     except KeyboardInterrupt:
         logger.info("\nInterrupted by user")
         sys.exit(1)

mapillary_downloader-0.3.1/src/mapillary_downloader/downloader.py ADDED Viewed

@@ -0,0 +1,218 @@
+"""Main downloader logic."""
+import json
+import logging
+import os
+import time
+from pathlib import Path
+from concurrent.futures import ProcessPoolExecutor, as_completed
+from mapillary_downloader.utils import format_size, format_time
+from mapillary_downloader.ia_meta import generate_ia_metadata
+from mapillary_downloader.worker import download_and_convert_image
+from mapillary_downloader.tar_sequences import tar_sequence_directories
+logger = logging.getLogger("mapillary_downloader")
+class MapillaryDownloader:
+    """Handles downloading Mapillary data for a user."""
+    def __init__(self, client, output_dir, username=None, quality=None, workers=None, tar_sequences=True):
+        """Initialize the downloader.
+        Args:
+            client: MapillaryClient instance
+            output_dir: Base directory to save downloads
+            username: Mapillary username (for collection directory)
+            quality: Image quality (for collection directory)
+            workers: Number of parallel workers (default: half of cpu_count)
+            tar_sequences: Whether to tar sequence directories after download (default: True)
+        """
+        self.client = client
+        self.base_output_dir = Path(output_dir)
+        self.username = username
+        self.quality = quality
+        self.workers = workers if workers is not None else max(1, os.cpu_count() // 2)
+        self.tar_sequences = tar_sequences
+        # If username and quality provided, create collection directory
+        if username and quality:
+            collection_name = f"mapillary-{username}-{quality}"
+            self.output_dir = self.base_output_dir / collection_name
+        else:
+            self.output_dir = self.base_output_dir
+        self.output_dir.mkdir(parents=True, exist_ok=True)
+        self.metadata_file = self.output_dir / "metadata.jsonl"
+        self.progress_file = self.output_dir / "progress.json"
+        self.downloaded = self._load_progress()
+    def _load_progress(self):
+        """Load previously downloaded image IDs."""
+        if self.progress_file.exists():
+            with open(self.progress_file) as f:
+                return set(json.load(f).get("downloaded", []))
+        return set()
+    def _save_progress(self):
+        """Save progress to disk atomically."""
+        temp_file = self.progress_file.with_suffix(".json.tmp")
+        with open(temp_file, "w") as f:
+            json.dump({"downloaded": list(self.downloaded)}, f)
+            f.flush()
+            os.fsync(f.fileno())
+        temp_file.replace(self.progress_file)
+    def download_user_data(self, bbox=None, convert_webp=False):
+        """Download all images for a user.
+        Args:
+            bbox: Optional bounding box [west, south, east, north]
+            convert_webp: Convert images to WebP format after download
+        """
+        if not self.username or not self.quality:
+            raise ValueError("Username and quality must be provided during initialization")
+        quality_field = f"thumb_{self.quality}_url"
+        logger.info(f"Downloading images for user: {self.username}")
+        logger.info(f"Output directory: {self.output_dir}")
+        logger.info(f"Quality: {self.quality}")
+        logger.info(f"Using {self.workers} parallel workers")
+        processed = 0
+        downloaded_count = 0
+        skipped = 0
+        total_bytes = 0
+        failed_count = 0
+        start_time = time.time()
+        # Track which image IDs we've seen in metadata to avoid re-fetching
+        seen_ids = set()
+        # Collect images to download from existing metadata
+        images_to_download = []
+        if self.metadata_file.exists():
+            logger.info("Processing existing metadata file...")
+            with open(self.metadata_file) as f:
+                for line in f:
+                    if line.strip():
+                        image = json.loads(line)
+                        image_id = image["id"]
+                        seen_ids.add(image_id)
+                        processed += 1
+                        if image_id in self.downloaded:
+                            skipped += 1
+                            continue
+                        # Queue for download
+                        if image.get(quality_field):
+                            images_to_download.append(image)
+        # Download images from existing metadata in parallel
+        if images_to_download:
+            logger.info(f"Downloading {len(images_to_download)} images from existing metadata...")
+            downloaded_count, total_bytes, failed_count = self._download_images_parallel(
+                images_to_download, convert_webp
+            )
+        # Always check API for new images (will skip duplicates via seen_ids)
+        logger.info("Checking for new images from API...")
+        new_images = []
+        with open(self.metadata_file, "a") as meta_f:
+            for image in self.client.get_user_images(self.username, bbox=bbox):
+                image_id = image["id"]
+                # Skip if we already have this in our metadata file
+                if image_id in seen_ids:
+                    continue
+                seen_ids.add(image_id)
+                processed += 1
+                # Save new metadata
+                meta_f.write(json.dumps(image) + "\n")
+                meta_f.flush()
+                # Skip if already downloaded
+                if image_id in self.downloaded:
+                    skipped += 1
+                    continue
+                # Queue for download
+                if image.get(quality_field):
+                    new_images.append(image)
+        # Download new images in parallel
+        if new_images:
+            logger.info(f"Downloading {len(new_images)} new images...")
+            new_downloaded, new_bytes, new_failed = self._download_images_parallel(new_images, convert_webp)
+            downloaded_count += new_downloaded
+            total_bytes += new_bytes
+            failed_count += new_failed
+        self._save_progress()
+        elapsed = time.time() - start_time
+        logger.info(
+            f"Complete! Processed {processed} images, downloaded {downloaded_count} ({format_size(total_bytes)}), "
+            f"skipped {skipped}, failed {failed_count}"
+        )
+        logger.info(f"Total time: {format_time(elapsed)}")
+        # Tar sequence directories for efficient IA uploads
+        if self.tar_sequences:
+            tar_sequence_directories(self.output_dir)
+        # Generate IA metadata
+        generate_ia_metadata(self.output_dir)
+    def _download_images_parallel(self, images, convert_webp):
+        """Download images in parallel using worker pool.
+        Args:
+            images: List of image metadata dicts
+            convert_webp: Whether to convert to WebP
+        Returns:
+            Tuple of (downloaded_count, total_bytes, failed_count)
+        """
+        downloaded_count = 0
+        total_bytes = 0
+        failed_count = 0
+        with ProcessPoolExecutor(max_workers=self.workers) as executor:
+            # Submit all tasks
+            future_to_image = {}
+            for image in images:
+                future = executor.submit(
+                    download_and_convert_image,
+                    image,
+                    str(self.output_dir),
+                    self.quality,
+                    convert_webp,
+                    self.client.access_token,
+                )
+                future_to_image[future] = image["id"]
+            # Process results as they complete
+            for future in as_completed(future_to_image):
+                image_id, bytes_dl, success, error_msg = future.result()
+                if success:
+                    self.downloaded.add(image_id)
+                    downloaded_count += 1
+                    total_bytes += bytes_dl
+                    if downloaded_count % 10 == 0:
+                        logger.info(f"Downloaded: {downloaded_count}/{len(images)} ({format_size(total_bytes)})")
+                        self._save_progress()
+                else:
+                    failed_count += 1
+                    logger.warning(f"Failed to download {image_id}: {error_msg}")
+        return downloaded_count, total_bytes, failed_count

mapillary_downloader-0.3.1/src/mapillary_downloader/ia_meta.py ADDED Viewed

@@ -0,0 +1,182 @@
+"""Internet Archive metadata generation for Mapillary collections."""
+import json
+import logging
+import re
+from datetime import datetime
+from pathlib import Path
+from importlib.metadata import version
+logger = logging.getLogger("mapillary_downloader")
+def parse_collection_name(directory):
+    """Parse username and quality from directory name.
+    Args:
+        directory: Path to collection directory (e.g., mapillary-username-original)
+    Returns:
+        Tuple of (username, quality) or (None, None) if parsing fails
+    """
+    match = re.match(r"mapillary-(.+)-(256|1024|2048|original)$", Path(directory).name)
+    if match:
+        return match.group(1), match.group(2)
+    return None, None
+def get_date_range(metadata_file):
+    """Get first and last captured_at dates from metadata.jsonl.
+    Args:
+        metadata_file: Path to metadata.jsonl file
+    Returns:
+        Tuple of (first_date, last_date) as ISO format strings, or (None, None)
+    """
+    if not Path(metadata_file).exists():
+        return None, None
+    timestamps = []
+    with open(metadata_file) as f:
+        for line in f:
+            if line.strip():
+                data = json.loads(line)
+                if "captured_at" in data:
+                    timestamps.append(data["captured_at"])
+    if not timestamps:
+        return None, None
+    # Convert from milliseconds to seconds, then to datetime
+    first_ts = min(timestamps) / 1000
+    last_ts = max(timestamps) / 1000
+    first_date = datetime.fromtimestamp(first_ts).strftime("%Y-%m-%d")
+    last_date = datetime.fromtimestamp(last_ts).strftime("%Y-%m-%d")
+    return first_date, last_date
+def count_images(metadata_file):
+    """Count number of images in metadata.jsonl.
+    Args:
+        metadata_file: Path to metadata.jsonl file
+    Returns:
+        Number of images
+    """
+    if not Path(metadata_file).exists():
+        return 0
+    count = 0
+    with open(metadata_file) as f:
+        for line in f:
+            if line.strip():
+                count += 1
+    return count
+def write_meta_tag(meta_dir, tag, values):
+    """Write metadata tag files in rip format.
+    Args:
+        meta_dir: Path to .meta directory
+        tag: Tag name
+        values: Single value or list of values
+    """
+    tag_dir = meta_dir / tag
+    tag_dir.mkdir(parents=True, exist_ok=True)
+    if not isinstance(values, list):
+        values = [values]
+    for idx, value in enumerate(values):
+        (tag_dir / str(idx)).write_text(str(value))
+def generate_ia_metadata(collection_dir):
+    """Generate Internet Archive metadata for a Mapillary collection.
+    Args:
+        collection_dir: Path to collection directory (e.g., ./mapillary_data/mapillary-username-original)
+    Returns:
+        True if successful, False otherwise
+    """
+    collection_dir = Path(collection_dir)
+    username, quality = parse_collection_name(collection_dir)
+    if not username or not quality:
+        logger.error(f"Could not parse username/quality from directory: {collection_dir.name}")
+        return False
+    metadata_file = collection_dir / "metadata.jsonl"
+    if not metadata_file.exists():
+        logger.error(f"metadata.jsonl not found in {collection_dir}")
+        return False
+    logger.info(f"Generating IA metadata for {collection_dir.name}...")
+    # Get date range and image count
+    first_date, last_date = get_date_range(metadata_file)
+    image_count = count_images(metadata_file)
+    if not first_date or not last_date:
+        logger.warning("Could not determine date range from metadata")
+        first_date = last_date = "unknown"
+    # Create .meta directory
+    meta_dir = collection_dir / ".meta"
+    meta_dir.mkdir(exist_ok=True)
+    # Generate metadata tags
+    write_meta_tag(
+        meta_dir,
+        "title",
+        f"Mapillary images by {username} ({quality} quality)",
+    )
+    description = (
+        f"Street-level imagery from Mapillary user '{username}'. "
+        f"Contains {image_count:,} images captured between {first_date} and {last_date}. "
+        f"Images are organized by sequence ID and include EXIF metadata with GPS coordinates, "
+        f"camera information, and compass direction.\n\n"
+        f"Downloaded using mapillary_downloader (https://bitplane.net/dev/python/mapillary_downloader/). "
+        f"Uploaded using rip (https://bitplane.net/dev/sh/rip)."
+    )
+    write_meta_tag(meta_dir, "description", description)
+    # Subject tags
+    write_meta_tag(
+        meta_dir,
+        "subject",
+        ["mapillary", "street-view", "computer-vision", "geospatial", "photography"],
+    )
+    write_meta_tag(meta_dir, "creator", username)
+    write_meta_tag(meta_dir, "date", first_date)
+    write_meta_tag(meta_dir, "coverage", f"{first_date} - {last_date}")
+    write_meta_tag(meta_dir, "licenseurl", "https://creativecommons.org/licenses/by-sa/4.0/")
+    write_meta_tag(meta_dir, "mediatype", "data")
+    write_meta_tag(meta_dir, "collection", "opensource_media")
+    # Source and scanner metadata
+    write_meta_tag(meta_dir, "source", f"https://www.mapillary.com/app/user/{username}")
+    downloader_version = version("mapillary_downloader")
+    write_meta_tag(
+        meta_dir,
+        "scanner",
+        [
+            f"mapillary_downloader {downloader_version} https://bitplane.net/dev/python/mapillary_downloader/",
+            "rip https://bitplane.net/dev/sh/rip",
+        ],
+    )
+    # Add searchable tag for batch collection management
+    write_meta_tag(meta_dir, "mapillary_downloader", downloader_version)
+    logger.info(f"IA metadata generated in {meta_dir}")
+    return True

mapillary_downloader-0.3.1/src/mapillary_downloader/tar_sequences.py ADDED Viewed

@@ -0,0 +1,112 @@
+"""Tar sequence directories for efficient Internet Archive uploads."""
+import logging
+import subprocess
+from pathlib import Path
+logger = logging.getLogger("mapillary_downloader")
+def tar_sequence_directories(collection_dir):
+    """Tar all sequence directories in a collection for faster IA uploads.
+    Args:
+        collection_dir: Path to collection directory (e.g., mapillary-user-quality/)
+    Returns:
+        Tuple of (tarred_count, total_files_tarred)
+    """
+    collection_dir = Path(collection_dir)
+    if not collection_dir.exists():
+        logger.error(f"Collection directory not found: {collection_dir}")
+        return 0, 0
+    # Find all sequence directories (skip special dirs)
+    skip_dirs = {".meta", "__pycache__"}
+    sequence_dirs = []
+    for item in collection_dir.iterdir():
+        if item.is_dir() and item.name not in skip_dirs:
+            sequence_dirs.append(item)
+    if not sequence_dirs:
+        logger.info("No sequence directories to tar")
+        return 0, 0
+    logger.info(f"Tarring {len(sequence_dirs)} sequence directories...")
+    tarred_count = 0
+    total_files = 0
+    for seq_dir in sequence_dirs:
+        seq_name = seq_dir.name
+        tar_path = collection_dir / f"{seq_name}.tar"
+        # Handle naming collision - find next available name
+        counter = 1
+        while tar_path.exists():
+            counter += 1
+            tar_path = collection_dir / f"{seq_name}.{counter}.tar"
+        # Count files in sequence
+        files = list(seq_dir.glob("*"))
+        file_count = len([f for f in files if f.is_file()])
+        if file_count == 0:
+            logger.warning(f"Skipping empty directory: {seq_name}")
+            continue
+        try:
+            # Create uncompressed tar (WebP already compressed)
+            # Use -C to change directory so paths in tar are relative
+            # Use -- to prevent sequence IDs starting with - from being interpreted as options
+            result = subprocess.run(
+                ["tar", "-cf", str(tar_path), "-C", str(collection_dir), "--", seq_name],
+                capture_output=True,
+                text=True,
+                timeout=300,  # 5 minute timeout per tar
+            )
+            if result.returncode != 0:
+                logger.error(f"Failed to tar {seq_name}: {result.stderr}")
+                continue
+            # Verify tar was created and has size
+            if tar_path.exists() and tar_path.stat().st_size > 0:
+                # Remove original directory
+                for file in seq_dir.rglob("*"):
+                    if file.is_file():
+                        file.unlink()
+                # Remove empty subdirs and main dir
+                for subdir in list(seq_dir.rglob("*")):
+                    if subdir.is_dir():
+                        try:
+                            subdir.rmdir()
+                        except OSError:
+                            pass  # Not empty yet
+                seq_dir.rmdir()
+                tarred_count += 1
+                total_files += file_count
+                if tarred_count % 10 == 0:
+                    logger.info(f"Tarred {tarred_count}/{len(sequence_dirs)} sequences...")
+            else:
+                logger.error(f"Tar file empty or not created: {tar_path}")
+                if tar_path.exists():
+                    tar_path.unlink()
+        except subprocess.TimeoutExpired:
+            logger.error(f"Timeout tarring {seq_name}")
+            if tar_path.exists():
+                tar_path.unlink()
+        except Exception as e:
+            logger.error(f"Error tarring {seq_name}: {e}")
+            if tar_path.exists():
+                tar_path.unlink()
+    logger.info(f"Tarred {tarred_count} sequences ({total_files:,} files total)")
+    return tarred_count, total_files

{mapillary_downloader-0.2.0 → mapillary_downloader-0.3.1}/src/mapillary_downloader/webp_converter.py RENAMED Viewed

@@ -17,17 +17,25 @@ def check_cwebp_available():
     return shutil.which("cwebp") is not None
-def convert_to_webp(jpg_path):
+def convert_to_webp(jpg_path, output_path=None, delete_original=True):
     """Convert a JPG image to WebP format, preserving EXIF metadata.
     Args:
         jpg_path: Path to the JPG file
+        output_path: Optional path for the WebP output. If None, uses jpg_path with .webp extension
+        delete_original: Whether to delete the original JPG after conversion (default: True)
     Returns:
         Path object to the new WebP file, or None if conversion failed
     """
     jpg_path = Path(jpg_path)
-    webp_path = jpg_path.with_suffix(".webp")
+    if output_path is None:
+        webp_path = jpg_path.with_suffix(".webp")
+    else:
+        webp_path = Path(output_path)
+        # Ensure output directory exists
+        webp_path.parent.mkdir(parents=True, exist_ok=True)
     try:
         # Convert with cwebp, preserving all metadata
@@ -42,8 +50,9 @@ def convert_to_webp(jpg_path):
             logger.error(f"cwebp conversion failed for {jpg_path}: {result.stderr}")
             return None
-        # Delete original JPG after successful conversion
-        jpg_path.unlink()
+        # Delete original JPG after successful conversion if requested
+        if delete_original:
+            jpg_path.unlink()
         return webp_path
     except subprocess.TimeoutExpired:

mapillary_downloader-0.3.1/src/mapillary_downloader/worker.py ADDED Viewed

@@ -0,0 +1,95 @@
+"""Worker process for parallel image download and conversion."""
+import tempfile
+from pathlib import Path
+import requests
+from requests.exceptions import RequestException
+from mapillary_downloader.exif_writer import write_exif_to_image
+from mapillary_downloader.webp_converter import convert_to_webp
+def download_and_convert_image(image_data, output_dir, quality, convert_webp, access_token):
+    """Download and optionally convert a single image.
+    This function is designed to run in a worker process.
+    Args:
+        image_data: Image metadata dict from API
+        output_dir: Base output directory path
+        quality: Quality level (256, 1024, 2048, original)
+        convert_webp: Whether to convert to WebP
+        access_token: Mapillary API access token
+    Returns:
+        Tuple of (image_id, bytes_downloaded, success, error_msg)
+    """
+    image_id = image_data["id"]
+    quality_field = f"thumb_{quality}_url"
+    temp_dir = None
+    try:
+        # Get image URL
+        image_url = image_data.get(quality_field)
+        if not image_url:
+            return (image_id, 0, False, f"No {quality} URL")
+        # Determine final output directory
+        output_dir = Path(output_dir)
+        sequence_id = image_data.get("sequence")
+        if sequence_id:
+            img_dir = output_dir / sequence_id
+            img_dir.mkdir(parents=True, exist_ok=True)
+        else:
+            img_dir = output_dir
+        # If converting to WebP, use /tmp for intermediate JPEG
+        # Otherwise write JPEG directly to final location
+        if convert_webp:
+            temp_dir = tempfile.mkdtemp(prefix="mapillary_downloader_")
+            jpg_path = Path(temp_dir) / f"{image_id}.jpg"
+            final_path = img_dir / f"{image_id}.webp"
+        else:
+            jpg_path = img_dir / f"{image_id}.jpg"
+            final_path = jpg_path
+        # Download image
+        # No retries for CDN images - they're cheap, just skip failures and move on
+        session = requests.Session()
+        session.headers.update({"Authorization": f"OAuth {access_token}"})
+        bytes_downloaded = 0
+        try:
+            # 60 second timeout for entire download (connection + read)
+            response = session.get(image_url, stream=True, timeout=60)
+            response.raise_for_status()
+            with open(jpg_path, "wb") as f:
+                for chunk in response.iter_content(chunk_size=8192):
+                    f.write(chunk)
+                    bytes_downloaded += len(chunk)
+        except RequestException as e:
+            return (image_id, 0, False, f"Download failed: {e}")
+        # Write EXIF metadata
+        write_exif_to_image(jpg_path, image_data)
+        # Convert to WebP if requested
+        if convert_webp:
+            webp_path = convert_to_webp(jpg_path, output_path=final_path, delete_original=False)
+            if not webp_path:
+                return (image_id, bytes_downloaded, False, "WebP conversion failed")
+        return (image_id, bytes_downloaded, True, None)
+    except Exception as e:
+        return (image_id, 0, False, str(e))
+    finally:
+        # Clean up temp directory if it was created
+        if temp_dir and Path(temp_dir).exists():
+            try:
+                for file in Path(temp_dir).glob("*"):
+                    file.unlink()
+                Path(temp_dir).rmdir()
+            except Exception:
+                pass  # Best effort cleanup

mapillary_downloader-0.2.0/README.md DELETED Viewed

@@ -1,84 +0,0 @@
-# 🗺️ Mapillary Downloader
-Download your Mapillary data before it's gone.
-## Installation
-```bash
-pip install mapillary-downloader
-```
-Or from source:
-```bash
-make install
-```
-## Usage
-First, get your Mapillary API access token from https://www.mapillary.com/dashboard/developers
-```bash
-mapillary-downloader --token YOUR_TOKEN --username YOUR_USERNAME --output ./downloads
-```
-| option        | because                               | default            |
-| ------------- | ------------------------------------- | ------------------ |
-| `--token`     | Your Mapillary API access token       | None (required)    |
-| `--username`  | Your Mapillary username               | None (required)    |
-| `--output`    | Output directory                      | `./mapillary_data` |
-| `--quality`   | 256, 1024, 2048 or original           | `original`         |
-| `--bbox`      | `west,south,east,north`               | `None`             |
-| `--webp`      | Convert to WebP (saves ~70% space)    | `False`            |
-The downloader will:
-* 💾 Fetch all your uploaded images from Mapillary
-* 📷 Download full-resolution images organized by sequence
-* 📜 Inject EXIF metadata (GPS coordinates, camera info, timestamps,
-  compass direction)
-* 🛟 Save progress so you can safely resume if interrupted
-* 🗜️ Optionally convert to WebP format for massive space savings
-## WebP Conversion
-Use the `--webp` flag to convert images to WebP format after download:
-```bash
-mapillary-downloader --token YOUR_TOKEN --username YOUR_USERNAME --webp
-```
-This reduces storage by approximately 70% while preserving all EXIF metadata
-including GPS coordinates. Requires the `cwebp` binary to be installed:
-```bash
-# Debian/Ubuntu
-sudo apt install webp
-# macOS
-brew install webp
-```
-## Development
-```bash
-make dev      # Setup dev environment
-make test     # Run tests
-make coverage # Run tests with coverage
-```
-## Links
-* [🏠 home](https://bitplane.net/dev/python/mapillary_downloader)
-* [📖 pydoc](https://bitplane.net/dev/python/mapillary_downloader/pydoc)
-* [🐍 pypi](https://pypi.org/project/mapillary-downloader)
-* [🐱 github](https://github.com/bitplane/mapillary_downloader)
-## License
-WTFPL with one additional clause
-1. Don't blame me
-Do wtf you want, but don't blame me if it makes jokes about the size of your
-disk drive.

mapillary_downloader-0.2.0/src/mapillary_downloader/downloader.py DELETED Viewed

@@ -1,206 +0,0 @@
-"""Main downloader logic."""
-import json
-import logging
-import os
-import time
-from pathlib import Path
-from collections import deque
-from mapillary_downloader.exif_writer import write_exif_to_image
-from mapillary_downloader.utils import format_size, format_time
-from mapillary_downloader.webp_converter import convert_to_webp
-logger = logging.getLogger("mapillary_downloader")
-class MapillaryDownloader:
-    """Handles downloading Mapillary data for a user."""
-    def __init__(self, client, output_dir):
-        """Initialize the downloader.
-        Args:
-            client: MapillaryClient instance
-            output_dir: Directory to save downloads
-        """
-        self.client = client
-        self.output_dir = Path(output_dir)
-        self.output_dir.mkdir(parents=True, exist_ok=True)
-        self.metadata_file = self.output_dir / "metadata.jsonl"
-        self.progress_file = self.output_dir / "progress.json"
-        self.downloaded = self._load_progress()
-    def _load_progress(self):
-        """Load previously downloaded image IDs."""
-        if self.progress_file.exists():
-            with open(self.progress_file) as f:
-                return set(json.load(f).get("downloaded", []))
-        return set()
-    def _save_progress(self):
-        """Save progress to disk atomically."""
-        temp_file = self.progress_file.with_suffix(".json.tmp")
-        with open(temp_file, "w") as f:
-            json.dump({"downloaded": list(self.downloaded)}, f)
-            f.flush()
-            os.fsync(f.fileno())
-        temp_file.replace(self.progress_file)
-    def download_user_data(self, username, quality="original", bbox=None, convert_webp=False):
-        """Download all images for a user.
-        Args:
-            username: Mapillary username
-            quality: Image quality to download (256, 1024, 2048, original)
-            bbox: Optional bounding box [west, south, east, north]
-            convert_webp: Convert images to WebP format after download
-        """
-        quality_field = f"thumb_{quality}_url"
-        logger.info(f"Downloading images for user: {username}")
-        logger.info(f"Output directory: {self.output_dir}")
-        logger.info(f"Quality: {quality}")
-        processed = 0
-        downloaded_count = 0
-        skipped = 0
-        total_bytes = 0
-        # Track download times for adaptive ETA (last 50 downloads)
-        download_times = deque(maxlen=50)
-        start_time = time.time()
-        # Track which image IDs we've seen in metadata to avoid re-fetching
-        seen_ids = set()
-        # First, process any existing metadata without re-fetching from API
-        if self.metadata_file.exists():
-            logger.info("Processing existing metadata file...")
-            with open(self.metadata_file) as f:
-                for line in f:
-                    if line.strip():
-                        image = json.loads(line)
-                        image_id = image["id"]
-                        seen_ids.add(image_id)
-                        processed += 1
-                        if image_id in self.downloaded:
-                            skipped += 1
-                            continue
-                        # Download this un-downloaded image
-                        image_url = image.get(quality_field)
-                        if not image_url:
-                            logger.warning(f"No {quality} URL for image {image_id}")
-                            continue
-                        sequence_id = image.get("sequence")
-                        if sequence_id:
-                            img_dir = self.output_dir / sequence_id
-                            img_dir.mkdir(exist_ok=True)
-                        else:
-                            img_dir = self.output_dir
-                        output_path = img_dir / f"{image_id}.jpg"
-                        download_start = time.time()
-                        bytes_downloaded = self.client.download_image(image_url, output_path)
-                        if bytes_downloaded:
-                            download_time = time.time() - download_start
-                            download_times.append(download_time)
-                            write_exif_to_image(output_path, image)
-                            # Convert to WebP if requested
-                            if convert_webp:
-                                webp_path = convert_to_webp(output_path)
-                                if webp_path:
-                                    output_path = webp_path
-                            self.downloaded.add(image_id)
-                            downloaded_count += 1
-                            total_bytes += bytes_downloaded
-                            progress_str = (
-                                f"Processed: {processed}, Downloaded: {downloaded_count} ({format_size(total_bytes)})"
-                            )
-                            logger.info(progress_str)
-                            if downloaded_count % 10 == 0:
-                                self._save_progress()
-        # Always check API for new images (will skip duplicates via seen_ids)
-        logger.info("Checking for new images from API...")
-        with open(self.metadata_file, "a") as meta_f:
-            for image in self.client.get_user_images(username, bbox=bbox):
-                image_id = image["id"]
-                # Skip if we already have this in our metadata file
-                if image_id in seen_ids:
-                    continue
-                seen_ids.add(image_id)
-                processed += 1
-                # Save new metadata
-                meta_f.write(json.dumps(image) + "\n")
-                meta_f.flush()
-                # Skip if already downloaded
-                if image_id in self.downloaded:
-                    skipped += 1
-                    continue
-                # Download image
-                image_url = image.get(quality_field)
-                if not image_url:
-                    logger.warning(f"No {quality} URL for image {image_id}")
-                    continue
-                # Use sequence ID for organization
-                sequence_id = image.get("sequence")
-                if sequence_id:
-                    img_dir = self.output_dir / sequence_id
-                    img_dir.mkdir(exist_ok=True)
-                else:
-                    img_dir = self.output_dir
-                output_path = img_dir / f"{image_id}.jpg"
-                download_start = time.time()
-                bytes_downloaded = self.client.download_image(image_url, output_path)
-                if bytes_downloaded:
-                    download_time = time.time() - download_start
-                    download_times.append(download_time)
-                    # Write EXIF metadata to the downloaded image
-                    write_exif_to_image(output_path, image)
-                    # Convert to WebP if requested
-                    if convert_webp:
-                        webp_path = convert_to_webp(output_path)
-                        if webp_path:
-                            output_path = webp_path
-                    self.downloaded.add(image_id)
-                    downloaded_count += 1
-                    total_bytes += bytes_downloaded
-                    # Calculate progress
-                    progress_str = (
-                        f"Processed: {processed}, Downloaded: {downloaded_count} ({format_size(total_bytes)})"
-                    )
-                    logger.info(progress_str)
-                    # Save progress every 10 images
-                    if downloaded_count % 10 == 0:
-                        self._save_progress()
-        self._save_progress()
-        elapsed = time.time() - start_time
-        logger.info(
-            f"Complete! Processed {processed} images, downloaded {downloaded_count} ({format_size(total_bytes)}), skipped {skipped}"
-        )
-        logger.info(f"Total time: {format_time(elapsed)}")

{mapillary_downloader-0.2.0 → mapillary_downloader-0.3.1}/LICENSE.md RENAMED Viewed

File without changes

{mapillary_downloader-0.2.0 → mapillary_downloader-0.3.1}/src/mapillary_downloader/__init__.py RENAMED Viewed

File without changes

{mapillary_downloader-0.2.0 → mapillary_downloader-0.3.1}/src/mapillary_downloader/client.py RENAMED Viewed

File without changes

{mapillary_downloader-0.2.0 → mapillary_downloader-0.3.1}/src/mapillary_downloader/exif_writer.py RENAMED Viewed

File without changes

{mapillary_downloader-0.2.0 → mapillary_downloader-0.3.1}/src/mapillary_downloader/logging_config.py RENAMED Viewed

File without changes

{mapillary_downloader-0.2.0 → mapillary_downloader-0.3.1}/src/mapillary_downloader/utils.py RENAMED Viewed

File without changes

mapillary-downloader 0.2.0__tar.gz → 0.3.1__tar.gz

mapillary-downloader 0.2.0tar.gz → 0.3.1tar.gz