Build Question Answering API with Flask and Python

While working on an end-to-end project, building API is always special. In this tutorial, I will guide you to make your own Question Answering API using Flask library of Python.

I always try to make an application after completing any project. It can be Web, Mobile, or Desktop Application using Python. It always satisfies my work.

In my last tutorial, I showed you how you can make a question-answering system with BERT and Python. I also developed a simple API for that question-answering project using Flask and Python. In this tutorial, I will show you how I did that.

Also Read:

Setup Development Environment

Before we start coding to make question answering flask application, I will recommend you to create a virtual environment in Python. It is required because we need to install some libraries which should not impact in other projects.

Once you create the virtual environment, you need to install all the required libraries mentioned in this post: Question Answering with BERT

Along with that, you just need to install the Flask library. To do that run below command:

!pip install Flask

Implement Question Answering Logic

Note I am using BERT for question answering but you can use any algorithm, even you can train your custom question answering model.

In this tutorial, I will be using the same code as in my previous tutorial. I recommend reading that tutorial to understand each and every part of the code and the underlying concept of building a question answering system with BERT.

Must Read Last Post: Build Question Answering System with BERT model

Create Flask Application

Now let’s code to build API for question answering applications using Flask. To build this kind of application there are mainly two steps. Let’s explore those.

In this tutorial, I will not explain the basics of Flask. So if you are new to Flask, I will recommend you to read below tutorials:

At this point, I am assuming that you have basic knowledge of Flask. Okay so let’s now build our Flask Question answering API using Python.

Step1: Write Backend Code

In this section, we need to write a function which will return the output of the BERT question answering model. Then a flask route function. Save below code in .py file type.

from flask import Flask, render_template, request

import torch
from transformers import BertForQuestionAnswering
from transformers import BertTokenizer

def Answer_Me(question):
    # Step: 1 ----
    # Load model from saved folder
    bert_model = BertForQuestionAnswering.from_pretrained('D:/Question_Answering/model_files')

    bert_tokenizer = BertTokenizer.from_pretrained('D:/Question_Answering/model_files')

    # Step: 2 ----
    paragraph = '''

    Cristiano Ronaldo dos Santos Aveiro was born on February 5, 1985, in Funchal, the capital of Madeira, Portugal. 
    He spent his early years in the neighboring parish of Santo António. 
    Ronaldo is the youngest of four children born to Maria Dolores dos Santos Viveiros da Aveiro, a cook, 
    and José Dinis Aveiro, a municipal gardener and part-time kit man. His family heritage includes his great-grandmother, 
    Isabel da Piedade, who hailed from São Vicente, Cape Verde. 
    Ronaldo has an older brother named Hugo and two older sisters, Elma and Liliana Cátia "Katia." 
    Ronaldo's mother revealed that she considered aborting him due to their challenging circumstances, 
    including poverty, his father's alcoholism, and already having a large family. However, 
    her doctor declined to perform the procedure as abortions were illegal in Portugal at that time. 
    Ronaldo grew up in a modest Catholic Christian household, sharing a room with his siblings, amidst financial difficulties.

    # Step: 3 ----
    # Encode the question and paragraph using BERT tokenizer
    encoding = bert_tokenizer.encode_plus(text=question, text_pair=paragraph)

    # Token embeddings
    token_ids = encoding['input_ids']
    # input tokens
    tokens = bert_tokenizer.convert_ids_to_tokens(token_ids)

    # Segment embeddings
    sentence_embedding = encoding['token_type_ids']

    # Step: 4 ----
    # Pass the encoded input to the BERT model
    bert_out = bert_model(torch.tensor([token_ids]), token_type_ids=torch.tensor([sentence_embedding]))

    # Step: 5 ----
    # Retrieve the start and end indices of the answer
    start_index = torch.argmax(bert_out['start_logits'])

    end_index = torch.argmax(bert_out['end_logits'])

    # print('start_index: ', start_index, 'end_index: ', end_index)

    # Step: 6 - ---
    # Extract the predicted answer
    answer = ' '.join(tokens[start_index:end_index + 1])
    # print(answer)

    # Step: 7 ----
    # Clean the answer
    final_answer = ''

    for word in answer.split():

        # If it's a subword token
        if word[0:2] == '##':
            final_answer += word[2:]

        elif answer.startswith("[CLS]") or answer.startswith("[SEP]"):
            final_answer = "Unable to find the answer to your question."

            final_answer += ' ' + word


app = Flask(__name__)

@app.route('/', methods=['GET', 'POST'])
def index():
    if request.method == 'POST':
        question_text = request.form['question']
            result = Answer_Me(question_text)
            return render_template('index.html', question = question_text, result=result)
        except (ValueError, IndexError):
            error_msg = 'Invalid input! Please enter a valid Question'
            return render_template('index.html', error_msg=error_msg)
    return render_template('index.html')

if __name__ == '__main__':
First part of the Code

In the first part of the code (line 7 – line76), we are implementing question answering logic with BERT. Here in the above code:

  • Line 7: Creating a function which will return output of BERT model. Later we will call this function inside route function
  • Line 17: Defining the paragraph from which answer will be picked
  • Line 76: Returning the final answer from the BERT model
  • Line 11 & Line 13: Reading saved model from local directory. Read this tutorial if you want to learn more: how to download and save huggingface models to custom path
Also Read:  Build Digital & Analog Clock GUI with Python Tkinter
Second part of the Code

Second part of the code is to handle incoming HTTP requests to the root URL (“/”) of the Flask application. In this code @app.route('/', methods=['GET', 'POST']) is to handle both GET and POST requests to this URL.

GET method is to retrieve the question from the browser, where the user can enter their question in a textbox. POST method is to display the answer output in the browser.

Line 83 is to GET the question from browser. In line 85, we are generating answer from the bert model function. Finally at line 86 rendering the “index.html” (frontend) to display question and the result to the user in the browser.

Here at line 86 question and answer are two variable allows the question and its answer to be displayed dynamically in the browser when the template is rendered. Those are basically connectors between index.html code and backend Python code.

Step2: Write Frontend Code

In this frontend part, we will write a simple HTML code. We want only three things:

  1. A text box where user can type the question
  2. A place to display the question
  3. A place to display the final output answer from BERT model

So let’s write our basic HTML code to make the user interface of our basic Question answering application or API using Flask.

First, create a folder named “templates” and Save the below code inside that folder as index.html. So now your folder structure should look like below:

    ├── templates
    │       └── index.html
<!DOCTYPE html>
    <title>Question Answering Web App</title>
    <h1>Question Answering Web App</h1>
    {% if error_msg %}
        <p>{{ error_msg }}</p>
    {% endif %}
    <form method="post">
        <label for="question">Type your Question here:</label>
        <input type="text" name="question" id="question" required>
        <input type="submit" value="Ask">
    {% if result %}
        <h2>Question: {{ question }}</h2>
        <h2>Answer: {{ result }}</h2>
    {% endif %}

Here in this code at line 13 we are creating a form with text input. Here user can type their question. Once they click the submit button, it will go to Line 16 to Line 19.

Line 16 to line 19 is most important in this HTML code. This part is used to connect with HTML and Python backend code. We are storing user question in the question variable. Sending this question variable to Python code.

Also Read:  Make Desktop Notifier App using Python & Tkinter

Then Python code will generate answers from BERT model and store it in the result variable. Finally, result variable will be displayed using HTML code.

To run this Flask application execute below command in your command promt.


It will show you a local host link in the terminal. Open that link in your browser to enjoy your application.


Output Demo

Deploy the QnA API

So you made your Flask application, now how you can deploy this question answering api ?

Deploying an API means it will be accessible to any user over the internet. Once you have developed and tested your API locally, deploying it allows others to interact with it from anywhere over the internet.

There are several platforms you can choose from to deploy your BERT question answering application. Listing some popular deployment platforms below. Select a platform based on your requirements, budget, and familiarity.

  • Heroku
  • AWS (Amazon Web Services)
  • Google Cloud Platform
  • Microsoft Azure

Once your API gains users, you may need to scale up the resources to handle increased traffic. All the above-listed deployment platform provides tools to monitor your API’s performance, such as tracking request/response times, error rates, and resource usage.


Just for demo, I made a simple Flask Application for question answering system using BERT model. You can take this to the next level while implementing it in your project.

I used a simple HTML code to take the input question and display the answer from BERT. You can add some CSS in this Flask Web Application to improve look and feel. You can also add login system to your application.

Also Read:  How to Create Desktop Application Using Python

I used a simple BERT model for question answering, you can train your custom BERT model or even you can use other question answering pre-trained models like Roberta.

This is it for this tutorial. If you have any questions or suggestions regarding this tutorial, please let me know in the comment section below.

Similar Read:

Leave a comment