Generate sentences from keywords using Python

In this post, I will show you how to generate text using some keywords. To generate sentences from keywords I will use a python library that is built on top of Google Transformer model. This kind of application fall under the advanced NLP task.

About keytotext Python libray

For text generation from keywords, I am going to use the keytotext python library. This library is built on top of Google’s Text-to-Text Transfer Transformer model. The author of the library has made available 3 pre-trained models for making sentence from words, namely:

k2t (small T5 model)
k2t-tiny (tiny T5 model)
k2t-base (base T5 model)

Here the T5 refers to Text-To-Text Transfer Transformer model. You can notice that 5 times T appears in the model name. That’s why this model is also called as T5 model.

The data used for training all models used in the library “keytotext” is taken from the DART and WebNLG portals. These data contain various domains such as food, places, city, politics, sports, etc.

These data come as JSON or XML format with lots of information. Raw data looks like the below:

WebNLP data format to generate sentence from words python — **Raw data**

The final pre-processed data used to train all models of keytotext package looks like below:

train transformar model to make sentences from keywords — **Final Data for the model** **Input (Left) | Output (Right)**

Keytotext library in Action

To use keytotext library, We just need to run the below command to install the package.

pip install keytotext

Note: keytotext requires Python ‘>=3.7’. If you are using a lower version of python you can create a virtual environment for python 3.7 and then install keytotext package inside that virtual environment.

Generate text from keywords using Python

Loading pre-trained model in keytext is similar to spacy. Just follow the below code to make sentences from keywords.

from keytotext import pipeline

# Load the base pre-trained T5 model
# It will download three files: 1. config.json, 2. tokenizer.json, 3. pytorch_model.bin (~850 MB)
nlp = pipeline("k2t-base")

# Configure the model parameters
config = {"do_sample": True, "num_beams": 4, "no_repeat_ngram_size": 3, "early_stopping": True}

# Provide list of keywords into the model as input
print(nlp(['Ronaldo', 'football', 'Portuguese'], **config))

Output for Run 1

Ronaldo plays for the Portuguese football club.

Every time you run the code it will generate different types of sentences. For example:

Also Read: Prepare training data and train custom NER using Spacy Python

Output for Run 2

Ronaldo is a Portuguese football player.

You can also play with different parameter values to get suitable results for your keywords or phrases. Also the more number of keywords you feed into the model, there is chance to get more accurate results from the model.

Conclusion

In this post, you learned how to generate sentences from keywords using a python library named “keytext”. Make sure that you are using python 3.7 or above before installing this package. After installing this library the entire process of text generation from keywords is pretty straightforward.

In this post I used the k2t-base T5 model, you can also try k2t small and k2t-tiny models on your own and let me know the quality of result you are getting in the comment section below.

Anindya

Hi there, I’m Anindya Naskar, Data Science Engineer. I created this website to show you what I believe is the best possible way to get your start in the field of Data Science.