PyPI - megadetector - Versions diffs - 5.0.19__tar.gz → 5.0.20__tar.gz - Mend

megadetector 5.0.19tar.gz → 5.0.20tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Potentially problematic release.

This version of megadetector might be problematic. Click here for more details.

Files changed (211) hide show

{megadetector-5.0.19/megadetector.egg-info → megadetector-5.0.20}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: megadetector
-Version: 5.0.19
+Version: 5.0.20
 Summary: MegaDetector is an AI model that helps conservation folks spend less time doing boring things with camera trap images.
 Author-email: Your friendly neighborhood MegaDetector team <cameratraps@lila.science>
 Maintainer-email: Your friendly neighborhood MegaDetector team <cameratraps@lila.science>
@@ -39,7 +39,7 @@ Requires-Dist: Pillow>=9.5
 Requires-Dist: tqdm>=4.64.0
 Requires-Dist: jsonpickle>=3.0.2
 Requires-Dist: humanfriendly>=10.0
-Requires-Dist: numpy>=1.26.0
+Requires-Dist: numpy<1.24,>=1.22
 Requires-Dist: matplotlib>=3.8.0
 Requires-Dist: opencv-python>=4.8.0
 Requires-Dist: requests>=2.31.0

{megadetector-5.0.19 → megadetector-5.0.20}/README.md RENAMED Viewed

@@ -68,6 +68,8 @@ Here are a few of the organizations that have used MegaDetector... we're only li
 * [New Zealand Department of Conservation](https://www.doc.govt.nz)
 * [Habitat NZ](https://habitatnz.co.nz/)
 * [Research Institute of Organic Agriculture](https://www.fibl.org/en/) (FiBL)
+* [A/Vian Ecological Consulting](https://avianeco.com/)
+* [Wildlife Insights](https://www.wildlifeinsights.org/)
 * [Applied Conservation Macro Ecology Lab](http://www.acmelab.ca/), University of Victoria
 * [Banff National Park Resource Conservation](https://www.pc.gc.ca/en/pn-np/ab/banff/nature/conservation), Parks Canada
@@ -96,6 +98,7 @@ Here are a few of the organizations that have used MegaDetector... we're only li
 * [Northern Great Plains Program](https://nationalzoo.si.edu/news/restoring-americas-prairie), Smithsonian
 * [Polar Ecology Group](https://polarecologygroup.wordpress.com), University of Gdansk
 * [Quantitative Ecology Lab](https://depts.washington.edu/sefsqel/), University of Washington
+* [San Diego Field Station](https://www.usgs.gov/centers/werc/science/san-diego-field-station), U.S. Geological Survey
 * [Santa Monica Mountains Recreation Area](https://www.nps.gov/samo/index.htm), National Park Service
 * [Seattle Urban Carnivore Project](https://www.zoo.org/seattlecarnivores), Woodland Park Zoo
 * [Serra dos Órgãos National Park](https://www.icmbio.gov.br/parnaserradosorgaos/), ICMBio

{megadetector-5.0.19 → megadetector-5.0.20}/megadetector/data_management/importers/bellevue_to_json.py RENAMED Viewed

@@ -270,4 +270,3 @@ html_output_file,image_db = visualize_db.visualize_db(output_filename,
                                                         os.path.join(output_base,'preview'),
                                                         base_dir,viz_options)
 os.startfile(html_output_file)

megadetector-5.0.20/megadetector/data_management/importers/osu-small-animals-to-json.py ADDED Viewed

@@ -0,0 +1,364 @@
+"""
+Prepare the OSU Small Animals dataset for LILA release:
+1. Convert metadata to COCO
+2. Extract location, datestamp, and sequence information
+3. Remove redundant or excluded images
+"""
+#%% Imports and constants
+import os
+input_folder = r'G:\temp\osu-small-animals'
+assert os.path.isdir(input_folder)
+output_folder = r'G:\temp\osu-small-animals-lila'
+os.makedirs(output_folder,exist_ok=True)
+output_file = os.path.join(output_folder,'osu-small-animals.json')
+preview_folder = r'G:\temp\osu-small-animals-preview'
+os.makedirs(preview_folder,exist_ok=True)
+common_to_latin_file = r'c:\git\agentmorrisprivate\camera-traps\osu-small-animals-common-to-latin.txt'
+assert os.path.isfile(common_to_latin_file)
+#%% Support functions
+def custom_relative_path_to_location(relative_path):
+    bn = os.path.basename(relative_path).upper()
+    # This only impacted six images
+    if bn.startswith('RCNX'):
+        site = 'OSTN'
+        return site
+    # FCS1__2019-07-08__10-37-46(1).JPG
+    # BIWA4S2020-06-25_16-19-56.JPG
+    # GRN3c__2019-05-05__01-39-23(1).JPG
+    tokens = bn.split('_')
+    site = tokens[0]
+    if '2020' in site:
+        site = site.split('2020')[0]
+    assert len(site) <= 8
+    assert site.isalnum()
+    return site
+#%% Read EXIF data from all images
+from megadetector.data_management.read_exif import \
+    ReadExifOptions, read_exif_from_folder
+import json
+exif_cache_file = os.path.join(input_folder,'exif_info.json')
+if os.path.isfile(exif_cache_file):
+    print('Reading EXIF data from cache')
+    with open(exif_cache_file,'r') as f:
+        exif_info = json.load(f)
+else:
+    read_exif_options = ReadExifOptions()
+    read_exif_options.n_workers = 8
+    exif_info = read_exif_from_folder(input_folder=input_folder,
+                                      output_file=exif_cache_file,
+                                      options=read_exif_options,
+                                      filenames=None,
+                                      recursive=True)
+#%% Verify that no GPS data is present
+from megadetector.data_management.read_exif import has_gps_info
+missing_exif_tags = []
+# im = exif_info[0]
+for im in exif_info:
+    if im['exif_tags'] is None:
+        missing_exif_tags.append(im['file_name'])
+        continue
+    else:
+        assert not has_gps_info(im)
+#%% Read common --> latin mapping
+with open(common_to_latin_file,'r') as f:
+    lines = f.readlines()
+common_to_latin = {}
+# s = lines[0]
+for s in lines:
+    s = s.strip()
+    tokens = s.split('\t')
+    assert len(tokens) == 2
+    common = tokens[0].lower().replace(' ','_')
+    latin = tokens[1].replace('_',' ').lower()
+    assert common not in common_to_latin.keys()
+    assert latin not in common_to_latin.values()
+    common_to_latin[common] = latin
+#%% Convert non-excluded, non-split images to COCO format
+from datetime import datetime
+from tqdm import tqdm
+# One-off typo fix
+name_replacements = \
+{
+    'common_five-linked_skink':'common_five-lined_skink'
+}
+category_name_to_category = {}
+# Force the empty category to be ID 0
+empty_category = {}
+empty_category['id'] = 0
+empty_category['name'] = 'empty'
+category_name_to_category['empty'] = empty_category
+next_category_id = 1
+images = []
+annotations = []
+error_images = []
+excluded_images = []
+# exif_im = exif_info[0]
+for exif_im in tqdm(exif_info):
+    fn_relative = exif_im['file_name']
+    assert '\\' not in fn_relative
+    if 'Split_images' in fn_relative or 'Exclusions' in fn_relative:
+        excluded_images.append(fn_relative)
+        continue
+    if 'error' in exif_im:
+        assert exif_im['error'] is not None
+        error_images.append(fn_relative)
+        continue
+    location_name = custom_relative_path_to_location(fn_relative)
+    exif_tags = exif_im['exif_tags']
+    # Convert '2021:05:27 14:42:00' to '2021-05-27 14:42:00'
+    datestamp = exif_tags['DateTime']
+    datestamp_tokens = datestamp.split(' ')
+    assert len(datestamp_tokens) == 2
+    date_string = datestamp_tokens[0]
+    time_string = datestamp_tokens[1]
+    assert len(date_string) == 10 and len(date_string.split(':')) == 3
+    date_string = date_string.replace(':','-')
+    assert len(time_string) == 8 and len(time_string.split(':')) == 3
+    datestamp_string = date_string + ' ' + time_string
+    datestamp_object = datetime.strptime(datestamp_string, '%Y-%m-%d %H:%M:%S')
+    assert str(datestamp_object) == datestamp_string
+    # E.g.:
+    #
+    # Images/Sorted_by_species/Testudines/Snapping Turtle/CBG10__2021-05-27__14-42-00(1).JPG'
+    common_name = os.path.basename(os.path.dirname(fn_relative)).lower().replace(' ','_')
+    if common_name in name_replacements:
+        common_name = name_replacements[common_name]
+    if common_name == 'blanks':
+        common_name = 'empty'
+    else:
+        assert common_name in common_to_latin
+    if common_name in category_name_to_category:
+        category = category_name_to_category[common_name]
+    else:
+        category = {}
+        category['name'] = common_name
+        category['latin_name'] = common_to_latin[common_name]
+        category['id'] = next_category_id
+        next_category_id += 1
+        category_name_to_category[common_name] = category
+    im = {}
+    im['id'] = fn_relative
+    im['file_name'] = fn_relative
+    im['datetime'] = datestamp_object
+    im['location'] = location_name
+    annotation = {}
+    annotation['id'] = 'ann_' + fn_relative
+    annotation['image_id'] = im['id']
+    annotation['category_id'] = category['id']
+    annotation['sequence_level_annotation'] = False
+    images.append(im)
+    annotations.append(annotation)
+# ...for each image
+cct_dict = {}
+cct_dict['images'] = images
+cct_dict['annotations'] = annotations
+cct_dict['categories'] = list(category_name_to_category.values())
+cct_dict['info'] = {}
+cct_dict['info']['version'] = '2024.10.03'
+cct_dict['info']['description'] = 'OSU small animals dataset'
+print('\nExcluded {} of {} images ({} errors)'.format(
+    len(excluded_images),
+    len(exif_info),
+    len(error_images)))
+assert len(images) == len(exif_info) - (len(error_images) + len(excluded_images))
+#%% Create sequences from timestamps
+from megadetector.data_management import cct_json_utils
+print('Assembling images into sequences')
+cct_json_utils.create_sequences(cct_dict)
+# Convert datetimes to strings so we can serialize to json
+for im in cct_dict['images']:
+    im['datetime'] = str(im['datetime'])
+#%% Write COCO data
+with open(output_file,'w') as f:
+    json.dump(cct_dict,f,indent=1)
+#%% Copy images (prep)
+from megadetector.utils.path_utils import parallel_copy_files
+input_file_to_output_file = {}
+# im = cct_dict['images'][0]
+for im in tqdm(cct_dict['images']):
+    fn_relative = im['file_name']
+    fn_source_abs = os.path.join(input_folder,fn_relative)
+    assert os.path.isfile(fn_source_abs)
+    fn_dest_abs = os.path.join(output_folder,fn_relative)
+    assert fn_source_abs not in input_file_to_output_file
+    input_file_to_output_file[fn_source_abs] = fn_dest_abs
+#%% Copy images (execution)
+parallel_copy_files(input_file_to_output_file, max_workers=10,
+                    use_threads=True, overwrite=False, verbose=False)
+#%% Validate .json file
+from megadetector.data_management.databases import integrity_check_json_db
+options = integrity_check_json_db.IntegrityCheckOptions()
+options.baseDir = input_folder
+options.bCheckImageSizes = False
+options.bCheckImageExistence = True
+options.bFindUnusedImages = True
+options.bRequireLocation = True
+sorted_categories, data, _ = integrity_check_json_db.integrity_check_json_db(output_file, options)
+#%% Preview labels
+from megadetector.visualization import visualize_db
+viz_options = visualize_db.DbVizOptions()
+viz_options.num_to_visualize = 5000
+viz_options.parallelize_rendering = True
+viz_options.htmlOptions['maxFiguresPerHtmlFile'] = 2500
+viz_options.parallelize_rendering_with_threads = True
+html_output_file, image_db = visualize_db.visualize_db(db_path=output_file,
+                                                       output_dir=preview_folder,
+                                                       image_base_dir=input_folder,
+                                                       options=viz_options)
+os.startfile(html_output_file)
+#%% Print unique locations
+all_locations = set()
+for im in cct_dict['images']:
+    all_locations.add(im['location'])
+all_locations = sorted(list(all_locations))
+#%% Notes
+"""
+ 31899 eastern_gartersnake
+ 14567 song_sparrow
+ 14169 meadow_vole
+ 11448 empty
+ 10548 white-footed_mouse
+  5934 northern_house_wren
+  5075 invertebrate
+  5045 common_five-lined_skink
+  4242 masked_shrew
+  3263 eastern_cottontail
+  2325 long-tailed_weasel
+  1510 woodland_jumping_mouse
+  1272 plains_gartersnake
+  1189 eastern_massasauga
+   985 virginia_opossum
+   802 common_yellowthroat
+   746 n._short-tailed_shrew
+   529 dekay's_brownsnake
+   425 american_mink
+   340 american_toad
+   293 eastern_racer_snake
+   264 smooth_greensnake
+   198 eastern_chipmunk
+   193 northern_leopard_frog
+   160 meadow_jumping_mouse
+   155 butler's_gartersnake
+   133 eastern_ribbonsnake
+   121 northern_watersnake
+   111 star-nosed_mole
+   104 striped_skunk
+    72 eastern_milksnake
+    68 gray_ratsnake
+    67 eastern_hog-nosed_snake
+    62 raccoon
+    47 green_frog
+    44 woodchuck
+    44 kirtland's_snake
+    23 indigo_bunting
+    23 painted_turtle
+    13 sora
+    12 american_bullfrog
+    10 gray_catbird
+     9 red-bellied_snake
+     8 brown_rat
+     6 snapping_turtle
+     1 eastern_bluebird
+"""

{megadetector-5.0.19 → megadetector-5.0.20}/megadetector/data_management/lila/generate_lila_per_image_labels.py RENAMED Viewed

@@ -511,6 +511,6 @@ open_file(html_filename)
 #%% Zip output file
-zipped_output_file = zip_file(output_file,verbose=True)
+zipped_output_file = zip_file(output_file,verbose=True,overwrite=True)
 print('Zipped {} to {}'.format(output_file,zipped_output_file))

{megadetector-5.0.19 → megadetector-5.0.20}/megadetector/data_management/lila/get_lila_annotation_counts.py RENAMED Viewed

@@ -86,6 +86,8 @@ for ds_name in metadata_table.keys():
 #%% Get category names and counts for each dataset
+# Takes ~5 minutes
 from collections import defaultdict
 dataset_to_categories = {}

{megadetector-5.0.19 → megadetector-5.0.20}/megadetector/data_management/lila/lila_common.py RENAMED Viewed

@@ -50,12 +50,14 @@ for url in lila_base_urls.values():
 #%% Common functions
-def read_wildlife_insights_taxonomy_mapping(metadata_dir):
+def read_wildlife_insights_taxonomy_mapping(metadata_dir, force_download=False):
     """
     Reads the WI taxonomy mapping file, downloading the .json data (and writing to .csv) if necessary.
     Args:
         metadata_dir (str): folder to use for temporary LILA metadata files
+        force_download (bool, optional): download the taxonomy mapping file
+            even if the local file exists.
     Returns:
         pd.dataframe: A DataFrame with taxonomy information
@@ -67,7 +69,8 @@ def read_wildlife_insights_taxonomy_mapping(metadata_dir):
         df = pd.read_csv(wi_taxonomy_csv_path)
     else:
         wi_taxonomy_json_path = os.path.join(metadata_dir,wildlife_insights_taxonomy_local_json_filename)
-        download_url(wildlife_insights_taxonomy_url, wi_taxonomy_json_path)
+        download_url(wildlife_insights_taxonomy_url, wi_taxonomy_json_path,
+                     force_download=force_download)
         with open(wi_taxonomy_json_path,'r') as f:
             d = json.load(f)
@@ -93,12 +96,14 @@ def read_wildlife_insights_taxonomy_mapping(metadata_dir):
     return df
-def read_lila_taxonomy_mapping(metadata_dir):
+def read_lila_taxonomy_mapping(metadata_dir, force_download=False):
     """
     Reads the LILA taxonomy mapping file, downloading the .csv file if necessary.
     Args:
         metadata_dir (str): folder to use for temporary LILA metadata files
+        force_download (bool, optional): download the taxonomy mapping file
+            even if the local file exists.
     Returns:
         pd.DataFrame: a DataFrame with one row per identification
@@ -106,19 +111,22 @@ def read_lila_taxonomy_mapping(metadata_dir):
     p = urlparse(lila_taxonomy_mapping_url)
     taxonomy_filename = os.path.join(metadata_dir,os.path.basename(p.path))
-    download_url(lila_taxonomy_mapping_url, taxonomy_filename)
+    download_url(lila_taxonomy_mapping_url, taxonomy_filename,
+                 force_download=force_download)
     df = pd.read_csv(lila_taxonomy_mapping_url)
     return df
-def read_lila_metadata(metadata_dir):
+def read_lila_metadata(metadata_dir, force_download=False):
     """
     Reads LILA metadata (URLs to each dataset), downloading the .csv file if necessary.
     Args:
         metadata_dir (str): folder to use for temporary LILA metadata files
+        force_download (bool, optional): download the metadata file even if
+            the local file exists.
     Returns:
         dict: a dict mapping dataset names (e.g. "Caltech Camera Traps") to dicts
@@ -130,7 +138,6 @@ def read_lila_metadata(metadata_dir):
         - country
         - region
         - image_base_url_relative
-        - metadata_url_relative
         - bbox_url_relative
         - image_base_url_gcp
         - metadata_url_gcp
@@ -150,7 +157,7 @@ def read_lila_metadata(metadata_dir):
     # Put the master metadata file in the same folder where we're putting images
     p = urlparse(lila_metadata_url)
     metadata_filename = os.path.join(metadata_dir,os.path.basename(p.path))
-    download_url(lila_metadata_url, metadata_filename)
+    download_url(lila_metadata_url, metadata_filename, force_download=force_download)
     df = pd.read_csv(metadata_filename)
@@ -174,13 +181,15 @@ def read_lila_metadata(metadata_dir):
     return metadata_table
-def read_lila_all_images_file(metadata_dir):
+def read_lila_all_images_file(metadata_dir, force_download=False):
     """
     Downloads if necessary - then unzips if necessary - the .csv file with label mappings for
     all LILA files, and opens the resulting .csv file as a Pandas DataFrame.
     Args:
         metadata_dir (str): folder to use for temporary LILA metadata files
+        force_download (bool, optional): download the metadata file even if
+            the local file exists.
     Returns:
         pd.DataFrame: a DataFrame containing one row per identification in a LILA camera trap image
@@ -188,7 +197,8 @@ def read_lila_all_images_file(metadata_dir):
     p = urlparse(lila_all_images_url)
     lila_all_images_zip_filename = os.path.join(metadata_dir,os.path.basename(p.path))
-    download_url(lila_all_images_url, lila_all_images_zip_filename)
+    download_url(lila_all_images_url, lila_all_images_zip_filename,
+                 force_download=force_download)
     with zipfile.ZipFile(lila_all_images_zip_filename,'r') as z:
         files = z.namelist()
@@ -209,7 +219,8 @@ def read_metadata_file_for_dataset(ds_name,
                                    metadata_dir,
                                    metadata_table=None,
                                    json_url=None,
-                                   preferred_cloud='gcp'):
+                                   preferred_cloud='gcp',
+                                   force_download=False):
     """
     Downloads if necessary - then unzips if necessary - the .json file for a specific dataset.
@@ -222,6 +233,8 @@ def read_metadata_file_for_dataset(ds_name,
         json_url (str, optional): the URL of the metadata file, if None will be retrieved
             via read_lila_metadata()
         preferred_cloud (str, optional): 'gcp' (default), 'azure', or 'aws'
+        force_download (bool, optional): download the metadata file even if
+            the local file exists.
     Returns:
         str: the .json filename on the local disk
@@ -239,7 +252,7 @@ def read_metadata_file_for_dataset(ds_name,
     p = urlparse(json_url)
     json_filename = os.path.join(metadata_dir,os.path.basename(p.path))
-    download_url(json_url, json_filename)
+    download_url(json_url, json_filename, force_download=force_download)
     # Unzip if necessary
     if json_filename.endswith('.zip'):
@@ -266,7 +279,10 @@ if False:
     #%% Verify that all base URLs exist
     # LILA camera trap primary metadata file
-    urls = (lila_metadata_url,lila_taxonomy_mapping_url,lila_all_images_url,wildlife_insights_taxonomy_url)
+    urls = (lila_metadata_url,
+            lila_taxonomy_mapping_url,
+            lila_all_images_url,
+            wildlife_insights_taxonomy_url)
     from megadetector.utils import url_utils

{megadetector-5.0.19 → megadetector-5.0.20}/megadetector/data_management/lila/test_lila_metadata_urls.py RENAMED Viewed

@@ -35,15 +35,17 @@ md_results_keys = ['mdv4_results_raw','mdv5a_results_raw','mdv5b_results_raw','m
 preferred_cloud = 'gcp' # 'azure', 'aws'
+force_download = True
 #%% Load category and taxonomy files
-taxonomy_df = read_lila_taxonomy_mapping(metadata_dir)
+taxonomy_df = read_lila_taxonomy_mapping(metadata_dir, force_download=force_download)
 #%% Download and parse the metadata file
-metadata_table = read_lila_metadata(metadata_dir)
+metadata_table = read_lila_metadata(metadata_dir, force_download=force_download)
 print('Loaded metadata URLs for {} datasets'.format(len(metadata_table)))
@@ -52,17 +54,24 @@ print('Loaded metadata URLs for {} datasets'.format(len(metadata_table)))
 for ds_name in metadata_table.keys():
-    metadata_table[ds_name]['json_filename'] = read_metadata_file_for_dataset(ds_name=ds_name,
-                                                                         metadata_dir=metadata_dir,
-                                                                         metadata_table=metadata_table)
+    # Download the main metadata file for this dataset
+    metadata_table[ds_name]['json_filename'] = \
+        read_metadata_file_for_dataset(ds_name=ds_name,
+                                       metadata_dir=metadata_dir,
+                                       metadata_table=metadata_table,
+                                       force_download=force_download)
+    # Download MD results for this dataset
     for k in md_results_keys:
         md_results_url = metadata_table[ds_name][k]
         if md_results_url is None:
             metadata_table[ds_name][k + '_filename'] = None
         else:
-            metadata_table[ds_name][k + '_filename'] = read_metadata_file_for_dataset(ds_name=ds_name,
-                                                                        metadata_dir=md_results_dir,
-                                                                        json_url=md_results_url)
+            metadata_table[ds_name][k + '_filename'] = \
+                read_metadata_file_for_dataset(ds_name=ds_name,
+                                               metadata_dir=md_results_dir,
+                                               json_url=md_results_url,
+                                               force_download=force_download)
 #%% Build up a list of URLs to test

megadetector 5.0.19__tar.gz → 5.0.20__tar.gz

Potentially problematic release.

megadetector 5.0.19tar.gz → 5.0.20tar.gz