Fine-tune T5 to Make Custom ChatBot


In this tutorial, I will show you how you can finetune a generative model named t5 to make our own chatbot. We will finetune t5 model using Openassist dataset.

Why to fine tune T5?

In my last tutorial, I explained how you can train transformer models like: BERT, RoBERTa, etc. But those are extractive models. It will pick answer from the provided document.

I wanted to use a model which can answer my question like ChatGPT. Generative model like ChatGPT is called closed book question answering systems. This kind of generative model learns things and stores them inside their weights as a memory. You can ask anything and it will generate answers based on their knowledge.

Knowledge of these models depends on the kind of data you used to fine-tune that model. You can use your business or domain data to fine-tune these models to perform any downstream task like question answering, chatbot, text summarization, etc.

In this tutorial, I am going to use a generative model named T5. It is developed by the Google AI team. You can find T5 model paper at this link.

T5 model Architecture

T5 is the short form of Text-to-Text Transfer Transformer. Here in this model name, the letter “T” comes five times. This is the reason researchers call it t5 (five times “T”).

T5 is trained using Colossal Clean Crawled Corpus (or C4 corpus) and achieves state-of-the-art results on many NLP benchmarks. T5 is a text-to-text transformer model, which means the input and output of this model is always text string.

Transformer models like BERT, Roberta, etc. only output class (like positive, negative, text classification, etc.) or span of input (start and end token of input).

But in T5 you can use the same model, hyperparameters, loss function to use any NLP downstream tasks like: document summarization, question answering, machine translation, chatbot, text classification (i.e.: sentiment analysis), etc.

So you can understand how flexible this T5 model is. This is the reason I am going to fine tune this T5 model to make my own chatbot.

Fine-tune T5 Model for Chatbot

Fine-tuning T5 model is so simple like a piece of cake. You can fine tune this T5 model using CPU and GPU both. Let me break down the entire process into some steps:

Step1: Install Libraries

To finetune T5 model for chatbot, you only need to install two packages. Below is the command to install those libraries.

pip install simplet5
pip install pandas

Note: It is good practice to create a fresh virtual environment to isolate this project’s dependencies from other projects.

Step2: Data Collection

In this tutorial, I am going to use open assist dataset. You can find this dataset on Kaggle. It is a human-generated assistant-style conversation data.

Also Read:  Automatic Keyword extraction using Topica in Python

Once you download and unzip the data you will find two datasets: training and test data. Let’s read those data using below Python code.

import pandas as pd

# Read kaggle open assist dataset
df_train = pd.read_csv('data/oasst1-train.csv')
df_val = pd.read_csv('data/oasst1-val.csv')

# Display some data

Step3: Data Pre-Processing

In the openAssist data, we are going to use mainly two columns. Those are: text and role. The text column contains the conversation chat text and the role column contains whether this command is from the customer (prompter) or agent (assistant).

But our t5 model needs data in question-answer format. That means we need two columns: first column should contain questions(comments from the prompter) and the second column should contain answers (comments from the assistant).

Below is the Python code to modify or convert OpenAssist Data to t5 input data format.

# Function to modify open assist dataset
def prep_data(df):
    df_assistant = df[(df.role=="assistant") & (df["rank"]==0.0)]
    df_prompter = df[(df.role=="prompter")]
    df_prompter = df_prompter.set_index("message_id")
    df_assistant["output"] = df_assistant["text"].values

    inputs = []
    for idx, row in df_assistant.iterrows():
        input = df_prompter.loc[row.parent_id]

    df_assistant["instruction"] = inputs

    df_assistant = df_assistant[df_assistant.lang=="en"]

    df_assistant = df_assistant[["instruction", "output"]]

    return df_assistant

# Convert open assist data to question answering format
df_train = prep_data(df_train)
df_val = prep_data(df_val)

# Display some data

After that conversion, we got two columns: instruction and output. The instruction column contains questions and output column contains the answer to those questions.

Now simpleT5 framework expects dataframe to have two columns: source_text and target_text. So we need to rename those columns (instruction and output).

Also, T5 model expects a task-related prefix: since it is a question-answering task, we will add a prefix “answer: “.

Now let’s write a Python script to convert our train and test data to match simpleT5 format using above two methods.

# For training data
# -----------------------------
# simpleT5 expects dataframe to have 2 columns: "source_text" and "target_text"
df_train = df_train.rename(columns={"output":"target_text", "instruction":"source_text"})
df_train = df_train[['source_text', 'target_text']]

# T5 model expects a task related prefix: since it is a question answering task, we will add a prefix "answer: "
df_train['source_text'] = "answer: " + df_train['source_text']
# Display some data to see final data to finetune t5 model
# df_train.head()

# For Test data
# -----------------------------
# simpleT5 expects dataframe to have 2 columns: "source_text" and "target_text"
df_val = df_val.rename(columns={"output":"target_text", "instruction":"source_text"})
df_val = df_val[['source_text', 'target_text']]

# T5 model expects a task related prefix: since it is a question answering task, we will add a prefix "answer: "
df_val['source_text'] = "answer: " + df_val['source_text']
# Display some data to see final data to finetune t5 model

Let’s now show you the length of training and test dataset of open assist.

print('Training data length: ' + str(len(df_train)) + '\n' + 'Test data length: ' + str(len(df_val)))
Training data length: 7856
Test data length: 418

Step4: Finetune T5 model

So our input data is ready. Now finally we can finetune our t5 model for our custom chatbot.

Also Read:  Word similarity matching using Soundex algorithm in python

As I said this technique is the simplest way to finetune t5 for making a chatbot. You just need to run below Python code to finetune the t5 model for our chatbot dataset (open assist).

from simplet5 import SimpleT5

model = SimpleT5()
model.from_pretrained(model_type="t5", model_name="t5-base")
            batch_size=16, max_epochs=3, use_gpu=False,
            outputdir = 'trained_t5')

In this code, I am using 3 epochs and 16 batch sizes. With this configuration, it took around 5 hours to complete the training of T5 model in CPU.

I used t5-base model to finetune chatbot. But you can also use the below-supported models by simpleT5 library.

model_type (str, optional): "t5" or "mt5" . Defaults to "t5".
model_name (str, optional): exact model architecture name, "t5-base" or "t5-large". Defaults to "t5-base".

Step5: Test our T5 model

So after long hours of waiting, we trained our T5 model. After completion of each epoch model got saved inside trained_t5 folder in our working directory.

Let us now read our trained t5 model and test how it is performing to answer any question.

from simplet5 import SimpleT5

model = SimpleT5()

# let's load the trained t5 model for inferencing:
model.load_model("t5", "trained_t5/simplet5-epoch-2-train-loss-3.3569-val-loss-3.3652", use_gpu=False)

# Ask a questin to the model to answer
question_to_answer = 'answer: hi, how are you doing?'
['Hello! How are you doing today?']

Let’s try some other questions:

question_to_answer = 'answer: What is your name?'
["I am a language model, so I don't have any personal experiences or opinions. However, I can provide you with some general information about my background and interests."]
question_to_answer = 'answer: Write a python code to print hello world?'
["Here's a Python code to print hello world: python import helloworld.htm from the library: import helloworld.htm import helloworld.htm import hello"]

I know, it is not able to generate the correct code to print “Hello world”. There are two reasons behind this. First, the training data might not contain much information about this type of coding.

Second, you should remember that we are using the base version of T5 model (t5-base). To get better results use a larger version of T5 for example: t5-large, t5-3b, t5-11b, etc.


Still, some questions may be poking into your mind. Let me answer those in this frequently asked section.

What is T5 model used for?

In this article, I showed you how to fine tune t5 model to make your own chatbot. But you can use T5 model for other NLP tasks like: Text Classification, Text Summarization, Named Entity Recognition, etc.

Difference between t5 and mt5 Model type

T5 model is trained on C4 corpus. It supports only English language. Where Mt5 is a multilingual model. It is pre-trained on the mC4 corpus, which has over 101 languages.

Also Read:  Understand LSTM Neural Network Model from Scratch

Is T5 better than Bert?

The main difference between Bert and T5 is the size of the words they use for prediction. Bert predicts a target that is made up of a single word, which means it focuses on one word at a time. This is called single-token masking.

On the other hand, T5 can predict multiple words together, as shown in the figure. This gives T5 more flexibility in learning the structure of the model.

In short, while Bert looks at one word at a time, T5 can look at and predict multiple words together, which helps it understand the overall structure better.

Along with that, BERT is an Encoder model, it can not generate anything on its own. So if you want to build an application like ChatBot, you will not be able to do it. In this case, T5 raises his hand. Since T5 is a generative model, it learns things and generates from its knowledge.

So to answer the question which is better, well it depends on what kind of application you are working on. If you are working on a context-based question answering, you can consider BERT. But if you are thinking to build a chatbot then you must consider a generative model like T5.


In this tutorial, I showed you how you can make your own chatbot using Google t5 model with a few lines of code using the simplet5 library. We used OpenAssist data to finetune t5-base model. There are other pre-trained models like: mt5, Flant5 which you can also try to finetune.

This is it for this tutorial. If you have any questions or suggestions regarding this tutorial, please let me know in the comment section below.

Similar Read:

1 thought on “Fine-tune T5 to Make Custom ChatBot”

  1. May I know which chatbot framework best suits with t5 model as rasa requires pytorchligtning with updated version but t5 requires lesser version?


Leave a comment