In this post, the attention mechanism for neural networks is explained. The explanation has two parts. The first part deals with the intuitive understanding of the attention mechanism in a neural network. The second part implements an attention function using Tensorflow-Keras in Python3.
Part 01 -- The intuition behind attention mechanism in neural networks
To explain the attention mechanism in neural networks, often it is compared to the select operation in a database. Therefore, to understand the attention mechanism, let us take a closer look at the database select operation.
Consider a one-hot encoded vector of length 'n' as the input query. In this example, let us assume the vector to have only one non-zero value. In this database mechanism, the position of the non-zero value conveys some meaning.
Now, consider a simple key-value database pairs, where the keys contain all the permutations and combinations of the one-hot encoded vector input.
The goal of the select operation is to find the key that is the closest match to the input query and return the corresponding values associated with that key.
For purposes of clarity, consider this concrete example:
One of the easiest ways of matching the example query with the example key is by matrix multiplying the query matrix with the transpose of the keys matrix. Let us define this similarity function's output as score.
If the input query matches with the 'i'th key, the output of the operation returns a vector where there is a 1 value at 'i'th position and the rest zeros.
In the example above, the input query matches with the 4th row in the keys matrix.
This is how the query-keys similarity operation described above looks like, using NumPy:
Now that we have defined a mathematical operation that returns the location of the row in the keys matrix that matches with the input query, the next step is to extract the value that correspond to the key from the values matrix.
Since we already have the score that matches the keys and the query, by doing a matrix multiplication of the score and all the stored values, we can extract the value that corresponds to a specific key.
The operation above will return the 'i'th row in the values matrix, if the 'i'th row in the keys matrix matches with the input query.
In the example above, the output is the 4th row in the values matrix.
This is how the operation to extract the value that corresponds to a matched key described above looks like, using NumPy:
Part 02 -- Example function implementing the attention mechanism in Keras
The attention mechanism in neural networks is extremely similar to the simple select operation described above. In a neural network, the operations described above are implemented in a probabilistic fashion, to achieve the functionality described as attention.
Here is an example implementation of the neural network attention mechanism using Tensorflow-Keras:
Moad Computer is an actionable insights firm. We provide enterprises with end-to-end artificial intelligence solutions. Actionable Insights blog is a quick overview of things we are most excited about.