This is a brief overview of the most common image-processing tasks that can be performed using the PyTorch Torchvision package. These tasks include image classification, localisation, object detection, instance segmentation, and semantic segmentation. Several models, such as ResNet-50, Faster R-CNN, and FCN, are used to demonstrate these tasks. The weights used in these models are the default weights available in Torchvision. The code examples in this overview are based on the examples provided in the PyTorch documentation and use Torchvision version 0.16.
Classification
Image classification assigns a single class to a whole image using a model trained with predefined classes. The model predicts the probability of all classes it has been trained on and classifies based on the highest probability (and/or on a minimum threshold).
The image of a cat shown is classified with a 51.9% probability of being a “tiger cat”. The other classes in the top five are tabby: 4.9%, Egyptian cat: 1.1%, Persian cat: 0.2%and Leonberg: 0.1%.
Image Classification Example
Image Classification + Localisation
Localisation extends image classification by locating the object in the image and drawing a bounding box around the object.
Image Localisation with Bounding Box
Object Detection
Find multiple objects in an image and classify each object. In the example below, bounding boxes are drawn around each object.
Object Detection Example
Instance Segmentation
Instance segmentation extends beyond object detection with bounding boxes and outlines the boundary of the object pixel by pixel. This is useful in image editing or enhancement, where you want to cut or apply a filter to a particular object in the image.
Instance Segmentation Example
Semantic Segmentation
Semantic segmentation assigns a class to every pixel in the image. An example of this is for categorising land use in aerial images where the image would be assigned classes like building, water, forest, etc.
Semantic Segmentation Example
The example code segments are based on trained classes, so only the dog and cat are identified. Another example is generated using Meta AI’s Segment Anything, which better explains the concept of segmenting every pixel in the image.
Segment Anything Example
Summary
In this post, we have explored various image-processing tasks and provided code examples using the Torchvision library. The classification and localisation examples were for a single class, while the object detection and segmentation examples were for multiple classes in a single image.
We are already using AI models whether we are conscious of it or not. Models are increasingly being embedded in all sorts of applications including those making high impact decisions. Trusting these models will be a key to their adoption.
This is the third in a series of posts to build a simple data logger system for temperature and relative humidity. Part 3 will use the Python Pandas package to visualize the data logged in the Amazon Timestream database.
This is the second in a series of posts to build a simple data logger system for temperature and relative humidity. Part 2 will send data to an Amazon Timestream database.