Question Answering with context using Google's Universal Sentence Encoder

Code is made available on Github
The transformer family of models have been a huge hit in the NLP domain for many reasons. These state-of-the-art variations of the transformer models have proven to be very successful for question answering, inference, sentiment analysis and more. While initially exploring this area, I was amazed by the performance and simplicity of use of the Universal Encoder for a recommendation system that I was building. In this project, I am exploring how this model can be used for question - answering.
Programming Languages, Tools & Platforms

BERT model


Pdf can be found here

The transformer family of models have been a huge hit in the NLP domain for many reasons. These state-of-the-art variations of the transformer models have proven to be very successful for question answering, inference, sentiment analysis and more. While initially exploring this area, I was amazed by the performance and simplicity of use of the Universal Encoder for a recommendation system that I was building. Please see Google's colab notebook here.

One particular application of this model that I have not see was using the universal encoder against a dataset of questions and answers to find similiar questions that can be used to provide answers with context. Hence, this is something I explored and shared my code in a repository on Github.


Components


Stanford Question Answering Dataset is a great resource for a dataset across >400 topic areas with questions, answers, and context already prepared. This dataset is utilized in this project. In the notebook in my repository on Github only looks at first 15 topic areas, which is roughly over 2000 questions and answers. Then, all of these questions are then fed through the universal sentence encoder model to produce a vector representation of size (1 x 512). The vector representation enables complex text to be converted to a numeric representation that can be later used to cosine similarity, classification, clustering, and more...

Referencing the example in the original Google colab notebook, if two sentences are talking about the sun shining outside, then these two sentences will be much closer to each other in the vector space than a sentence talking about a rainy day and a sentence talking about a sunny day.

vector similarity graphic

From Wikipedia, Cosine similarity is a measure of similarity between two non-zero vectors of an inner product space. It is defined to equal the cosine of the angle between them, which is also the same as the inner product of the same vectors normalized to both have length 1. The cosine of 0° is 1, and it is less than 1 for any angle in the interval (0, π] radians. It is thus a judgment of orientation and not magnitude: two vectors with the same orientation have a cosine similarity of 1, two vectors oriented at 90° relative to each other have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude.

Thus, the higher the similarity, the higher alikeness between two vectors.

cosine similarity equation

Notebook


Performance review

Finding the closest 10 results to a question from a list of 2685 questions took 5.3 ms on average. This performance can be further optimized using specific vector database instead of in-memory mechanism. Overall, I think the performance was very impressive. Moreover, forward-feed through the network, despite not being timed, was also very fast.