Moad Computer, the actionable insights company
  • Home
  • Contact
  • Shop
  • Blog
  • Home
  • Contact
  • Shop
  • Blog
Search

Actionable Insights blog




Twitter's controversial saliency filter analyzed

8/30/2021

0 Comments

 
Picture
Dr. Rahul Remanan,
CEO, Moad Computer
This blog article on Twitter saliency filter analysis introduces a few broad concepts to test machine vision tools. Described here is an end-to-end automated statistical analysis tool that is used to analyze the Twitter saliency filter​.​ The aim is to accelerate the development of scalable, automated testing of machine vision algorithms for possible biases. 
An extremely simple image processing pipeline is used here, to carefully manipulate the input images. The goal of this tool is to analyze the possible variances in the saliency filter outputs following simple image manipulation techniques.
One of the basic requirements to build a robust machine vision tool is the output invariance of the tool to common image processing techniques. 
The image processing techniques such as introduction of additional pixels with no new meaningful image data (padding), rotation of the image, addition of noisy data to an image and color/saturation changes are so ubiquitous in real world usage. ​
Therefore, if a machine vision tool is not robust enough to handle these image processing techniques, it can undermine the user confidence in the tool. One example is that, a non-technical end-user could extrapolate the variations in the outputs from a highly input dependent machine vision tool, even when similar looking images are passed as inputs, to a variety of unrelated factors.
The key motivation behind building this tool was to create an objective measure to understand the performance of machine vision tools. One of the hot-bed discussion topics around machine vision and artificial intelligence is the possibility of algorithmic bias. 
But, in-order to have a meaningful conversation around algorithmic bias,  accurately quantifying the basic performance characteristics of the algorithm in question is a necessity. Otherwise a lot of these observations about algorithmic failures could be the result of attribution biases. ​
Essentially, the search for biases in the algorithm, itself ends up exposing the inherent fears and biases of our society. Therefore, the ideal starting point for evaluating the Twitter saliency filter, or for that matter, any machine vision tool, is to quantify the invariance of the tool to basic image manipulations.​
Once the tool's robustness to these basic image manipulations are established, the next step will be to understand the nuances of the observed output variances, such as the possibility of algorithmic bias contributing to these skewed outputs. ​
In this tool, the input images are manipulated using padding. Two types of paddings are used here: horizontal and vertical. They are applied to randomly paired images.
The dataset used here is the FairFace: the face attribute dataset that is balanced for gender, race and age. This dataset is used to generate the fully randomized image pairs for performing the saliency filter tests.
​Quantification of the statistical significance in differences between, the carefully manipulated saliency filter outputs and the baseline saliency filter outputs, is performed using the Wilcoxon signed rank test.
Additional requirements to run this tool:
  • Valid Google account
  • This notebook by default assumes that the user is working inside the original Google Colab environment. To run locally or in other cloud environments, please make sure that the data dependencies are satisfied.
  • Google Drive access to save the FairFace dataset and the experiment history

Open In Colab

Install the Twitter saliency filter tool

Open In Colab

Import the dependent libraries necessary to run the code

Open In Colab

Mount Google Drive
By default this notebook assumes that the FairFace dataset is stored in the Google Drive attached here. Also, the experimental histories are saved to the Google Drive attached to this Colab notebook in csv format.
Data download
Download the FairFace dataset fairface-img-margin125-trainval.zip file and the labels fairface_label_train.csv file from the official FairFace GitHub repo.. The maintainers of the FairFace GitHub repository have published the links to download the data in their GitHub repo README file.

Open In Colab

Helper functions to handle the FairFace dataset

Open In Colab

Read FairFace data
To run the tool desribed here, the FairFace dataset should be downloaded and placed insides the {img_dir}/FairFace directory. By default the Google Colab notebook uses the fairface-img-margin125-trainval.zip data.

Open In Colab

Perform basic checks on the FairFace dataset

Open In Colab

Generate random FairFace image pairings

Open In Colab

Numerical encoding of the FairFace labels

Open In Colab

Generate pairwise image comparisons using the Twitter saliency filter

Open In Colab

Crop the output image generated using a pair of FairFace images 
  • The top 3 crops are sampled based on saliency scores converted into probabilities using the following formula:
$$ \begin{equation} p_i = \frac{exp(s_i)}{Z}\ Z = \sum_{j=0}^{j=N} exp(s_j) \end{equation} $$

Open In Colab

Mapping the saliency filter output to FairFace data

Open In Colab

Evaluate horizontal and vertical padding invariance

Open In Colab

Randomized saliency filter testing for padding invariance
Null hypothesis for the random image pairs experiment
​H₀ --> There are no differences between the baseline outputs of the saliency filter and the saliency filter outputs following randomized image paddings.
Methodology for generating randomized image pairs from FairFace data
Randomization of the images for the pairwise comparisons are generated using the random.SystemRandom() class in the Python random library.
The use of random.SystemRandom() class means, the exact image pairings are always dependent on the random numbers provided by the operating system sources. This method of random number generation is not available on all systems. Since this does not rely on the software state, the image pairing sequences are not reproducible.
The goal of this experiment is to identify the existence of any statistical significant differences between the saliency filter outputs using baseline image pairs and the saliency filter outputs following randomized image padding. Therefore, the exact image pairing sequences used for the saliency filter output comparisons are immaterial for the reproducibility of this experiment.

Open In Colab

Calculate statistical significance
Wilcoxon signed rank test is used to calculate whether there are any statistically significant differences between the baseline saliency filter outputs and the saliency filter outputs following image padding. The Wilcoxon signed rank test is performed using the SciPy library.

Open In Colab

Save experiment results and run tests

Open In Colab

TLDR: Described here is an end-to-end automated testing tool for the Twitter saliency filter. This tool  quantifies the statistically significant differences between the baseline image pairs and the image pairs that are carefully manipulated using horizontal and vertical padding. 
If your organization has a need to simplify your complex data solutions or your next data-science/artificial intelligence project needs our assistance, feel free to fill-out our consultation intake form (~1 min task).
Interested in learning more or want to contribute, please check out the project repository.
GitHub repository
0 Comments

Your comment will be posted after it is approved.


Leave a Reply.

    Overview

    Moad Computer is an actionable insights firm. We provide enterprises with end-to-end artificial intelligence solutions. Actionable Insights blog is a quick overview of things we are most excited about.

    Archives

    November 2022
    October 2022
    September 2022
    August 2022
    July 2022
    June 2022
    May 2022
    April 2022
    March 2022
    February 2022
    January 2022
    December 2021
    November 2021
    October 2021
    September 2021
    August 2021
    July 2021
    June 2021
    May 2021
    April 2021
    March 2021
    February 2021
    January 2021
    December 2020
    November 2020
    October 2020
    May 2020
    April 2020

    Categories

    All

    RSS Feed

Location

Our mission:

Cutting edge, insightful analytics using AI, for everyone.

Contact Us

    Subscribe Today!

Submit
  • Home
  • Contact
  • Shop
  • Blog