SpeciesNet

Criadas 27/03/2025, 22:26

Última atualização 15/04/2025, 22:15

AI models trained by Google to classify species in images from motion-triggered widlife cameras.

Effective wildlife monitoring relies heavily on motion-triggered wildlife cameras, or “camera traps”, which generate vast quantities of image data. Manual processing of these images is a significant bottleneck. AI can accelerate that processing, helping conservation practitioners spend more time on conservation, and less time reviewing images.

This repository hosts code for running an ensemble of two AI models: (1) an object detector that finds objects of interest in wildlife camera images, and (2) an image classifier that classifies those objects to the species level. This ensemble is used for species recognition in the Wildlife Insights platform.

The object detector used in this ensemble is MegaDetector, which finds animals, humans, and vehicles in camera trap images, but does not classify animals to species level.

The species classifier (SpeciesNet) was trained at Google using a large dataset of camera trap images and an EfficientNet V2 M architecture. It is designed to classify images into one of more than 2000 labels, covering diverse animal species, higher-level taxa (like "mammalia" or "felidae"), and non-animal classes ("blank", "vehicle"). SpeciesNet has been trained on a geographically diverse dataset of over 65M images, including curated images from the Wildlife Insights user community, as well as images from publicly-available repositories.

The SpeciesNet ensemble combines these two models using a set of heuristics and, optionally, geographic information to assign each image to a single category. See the "ensemble decision-making" section for more information about how the ensemble combines information for each image to make a single prediction.

The full details of the models and the ensemble process are discussed in this research paper:

Gadot T, Istrate Ș, Kim H, Morris D, Beery S, Birch T, Ahumada J. To crop or not to crop: Comparing whole-image and cropped classification on a large dataset of camera trap images. IET Computer Vision. 2024 Dec;18(8):1193-208.