Natural Language Processing Using TextBlob

Interest in Natural Language Processing (NLP) is growing due to increasing number of interesting applications like Machine Translator, Chatbot, Image Captioning etc.

There are lots of tools to work with NLP. Some popular of them are:

NLTK

Spacy

Stanford Core NLP

TextBlob

In this topic I will show you how to use TextBlob in Python

What is TextBlob?

TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.

TextBlob, which is built on the shoulders of NLTK and Pattern. A big advantage of this is, it is easy to learn and offers a lot of features like sentiment analysis, pos-tagging, noun phrase extraction, etc. It has now become my go-to library for performing NLP tasks.

TextBlob is a Python (supported for 2 and 3) library for text processing. It is a combination of multiple libraries and functions like NLTK, google translator etc. to serve features like:

—-

Features:

Part-of-speech tagging

Noun phrase extraction

Sentiment analysis

Classification (by using Naive Bayes, Decision Tree)

Language translation and detection powered by Google Translate

Spelling correction

Tokenization (splitting text into words and sentences)

Word and phrase frequencies

Parsing

n-grams

Word inflection (pluralization and singularization) and lemmatization

Add new models or languages through extensions

WordNet integration

Install and setup TextBlob for Python

As TextBlob built on the shoulders of NLTK and Pattern, so we need to download necessary NLTK corpora along with TextBlob itself.

$ pip install -U textblob

$ python -m textblob.download_corpora

Now let’s explore some key features of TextBlob and implement them in Python.

To do any kind of text processing using TextBlob, we need to follow two steps listed below:

Convert any string to TextBlob object
Call functions of TextBlob to do a specific task

Tokenization with TextBlob

Tokenization refers to dividing text into sequence of tokens

from textblob import TextBlob
text = '''
TextBlob is a Python (2 and 3) library for processing textual data. 
It provides API to do natural language processing (NLP) 
such as part-of-speech tagging, noun phrase extraction, sentiment analysis, etc. 
'''
blob_obj = TextBlob(text)
# Divide into sentence
blob_obj.sentences

—

Output:

[Sentence(“

TextBlob is a Python (2 and 3) library for processing textual data.”),

Sentence(“It provides API to do natural language processing (NLP)

such as part-of-speech tagging, noun phrase extraction, sentiment analysis, etc.”)]

# Print tokens/words
blob_obj.tokens

Output:

WordList([‘TextBlob’, ‘is’, ‘a’, ‘Python’, ‘(‘, ‘2’, ‘and’, ‘3’, ‘)’, ‘library’, ‘for’, ‘processing’, ‘textual’, ‘data’, ‘.’, ‘It’, ‘provides’, ‘API’, ‘to’, ‘do’, ‘natural’, ‘language’, ‘processing’, ‘(‘, ‘NLP’, ‘)’, ‘such’, ‘as’, ‘part-of-speech’, ‘tagging’, ‘,’, ‘noun’, ‘phrase’, ‘extraction’, ‘,’, ‘sentiment’, ‘analysis’, ‘,’, ‘etc’, ‘.’])

Also Read: Code LLAMA: AI Tool That Will Change Your Coding Life

—

POS tagging with TextBlob

TextBlob have two type of POS tagger

PatternTagger (uses the same implementation as the pattern library)

NLTKTagger which uses NLTK’s TreeBank tagger

By default TextBlob use PatternTagger. If you want to use NLTK Treebank tagger you can always use that.

# By using TreeBank tagger
from textblob.taggers import NLTKTagger
nltk_tagger = NLTKTagger()
blob_obj = TextBlob(text, pos_tagger=nltk_tagger)
blob_obj.pos_tags

# By using Pattern Tagger
from textblob.taggers import PatternTagger
pattern_tagger = PatternTagger()
blob_obj = TextBlob(text, pos_tagger=pattern_tagger)
blob_obj.pos_tags

Output:

[(‘TextBlob’, ‘NN’),

(‘is’, ‘VBZ’),

(‘a’, ‘DT’),

(‘Python’, ‘NNP’),

(‘2’, ‘IN’),

(‘and’, ‘CC’),

(‘3’, ‘CD’),

(‘library’, ‘NN’),

(‘for’, ‘IN’),

(‘processing’, ‘NN’),

(‘textual’, ‘JJ’),

(‘data’, ‘NNS’),

(‘It’, ‘PRP’),

(‘provides’, ‘VBZ’),

(‘API’, ‘NNP’),

(‘to’, ‘TO’),

(‘do’, ‘VBP’),

(‘natural’, ‘JJ’),

(‘language’, ‘NN’),

(‘processing’, ‘NN’),

(‘NLP’, ‘NN’),

(‘such’, ‘JJ’),

(‘as’, ‘IN’),

(‘part-of-speech’, ‘JJ’),

(‘tagging’, ‘VBG’),

(‘noun’, ‘NN’),

(‘phrase’, ‘NN’),

(‘extraction’, ‘NN’),

(‘sentiment’, ‘NN’),

(‘analysis’, ‘NN’),

(‘etc.’, ‘FW’)]

Noun Phrase Extraction using TextBlob

Noun Phrase extraction is important in NLP when you want to analyze the “who” factor in a sentence. Let’s see an example below.

TextBlob uses NLTK data to do this job.

for np in blob_obj.noun_phrases:
    print (np)

—

Output:

textblob

python

processing textual data

api

natural language processing

nlp

noun phrase extraction

sentiment analysis

Word Inflection and Lemmatization using TextBlob

Lemmatization is to convert a word into its base form. For this also TextBlob uses NLTK (wordnet) data.

# Singularize form
print('Prvious: ', blob_obj.words[13], ' After: ', blob_obj.words[13].singularize())
# Pluralize form
print('Prvious: ', blob_obj.words[7], ' After: ', blob_obj.words[7].pluralize())

Output:

Prvious: provides After: provide

Prvious: library After: libraries

N-grams using TextBlob

N-Gram is combination of multiple words together. N grams can be used as features for language modelling.

By using “ngrams” function we can easily generate N gram words.

## 3-gram
blob_obj.ngrams(n=3)

Output:

[WordList([‘TextBlob’, ‘is’, ‘a’]),

WordList([‘is’, ‘a’, ‘Python’]),

WordList([‘a’, ‘Python’, ‘2’]),

WordList([‘Python’, ‘2’, ‘and’]),

WordList([‘2’, ‘and’, ‘3’]),

WordList([‘and’, ‘3’, ‘library’]),

WordList([‘3’, ‘library’, ‘for’]),

WordList([‘library’, ‘for’, ‘processing’]),

WordList([‘for’, ‘processing’, ‘textual’]),

WordList([‘processing’, ‘textual’, ‘data’]),

WordList([‘textual’, ‘data’, ‘It’]),

WordList([‘data’, ‘It’, ‘provides’]),

WordList([‘It’, ‘provides’, ‘API’]),

WordList([‘provides’, ‘API’, ‘to’]),

WordList([‘API’, ‘to’, ‘do’]),

WordList([‘to’, ‘do’, ‘natural’]),

WordList([‘do’, ‘natural’, ‘language’]),

WordList([‘natural’, ‘language’, ‘processing’]),

WordList([‘language’, ‘processing’, ‘NLP’]),

WordList([‘processing’, ‘NLP’, ‘such’]),

WordList([‘NLP’, ‘such’, ‘as’]),

WordList([‘such’, ‘as’, ‘part-of-speech’]),

WordList([‘as’, ‘part-of-speech’, ‘tagging’]),

WordList([‘part-of-speech’, ‘tagging’, ‘noun’]),

WordList([‘tagging’, ‘noun’, ‘phrase’]),

WordList([‘noun’, ‘phrase’, ‘extraction’]),

WordList([‘phrase’, ‘extraction’, ‘sentiment’]),

WordList([‘extraction’, ‘sentiment’, ‘analysis’]),

WordList([‘sentiment’, ‘analysis’, ‘etc’])]

—

Also Read: Word similarity matching using Soundex algorithm in python

Sentiment Analysis using TextBlob

Sentiment analysis is the process of determining the emotion (positive or negative or neutral) of a text.

The sentiment function of TextBlob has two properties, which are:

Polarity (range -1 to 1)

Subjectivity (range 0 to 1)

TextBlob has a training set with pre-classified movie reviews, when you provide a new text for analysis; it uses Naive Bayes classifier to classify the polarity of new text in negative and positive probabilities.

text = "I hate this phone"
blob_obj = TextBlob(text)
blob_obj.sentiment

Output:

Sentiment(polarity=-0.8, subjectivity=0.9)

—

text = "I love this phone"
blob_obj = TextBlob(text)
blob_obj.sentiment

Output:

Sentiment (polarity=0.5, subjectivity=0.6)

Note: subjectivity = 0.6 refers that it is a public opinion and not a general information.

—

Spelling Correction using TextBlob

In nlp sometimes spelling correction is mostly required to normalize text data. TextBlob offers spelling corrector with 80-90% accuracy at a processing speed of at least 10 words per second.

Spelling corrector is Based on: Peter Norvig, “How to Write a Spelling Corrector”

(http://norvig.com/spell-correct.html) as implemented in the pattern library.

blob_obj = TextBlob("speling")
blob_obj.words[0].spellcheck()
[('spelling', 1.0)]

So corrected word is ‘spelling’ with probability of 100%.

How spelling corrector works in TextBlob

Step1 => from a big textfilecalculate count for each word

Step2 => Calculate probability for each word by number of times that word appear in whole document/total number of words.

# def P(word, N=sum(WORDS.values())): 
#     "Probability of `word`."
#     return WORDS[word] / N

Step3 => arrange word (provided incorrect word) in various ways

In our example: For word ‘speling’

# ‘spsling’,

# ‘spteling’,

# ‘sptling’,

# ‘spueling’,

# ‘spuling’,

# ‘spveling’,

# ‘spvling’,

# ‘spweling’,

# ‘spwling’,

# ‘spxeling’,

# ‘spxling’,

# ‘spyeling’,

# ‘spyling’,

# ‘spzeling’,

# ‘spzling’,

# ‘sqeling’,

# ‘sqpeling’,

# ‘sreling’,

# ‘srpeling’,

# ‘sseling’,

# ‘sspeling’,

# ‘steling’,

# ‘stpeling’,

# ‘sueling’,

# ‘supeling’,

# ‘sveling’,

# ‘svpeling’,

# ‘sweling’,

# ‘swpeling’,

# ‘sxeling’,

# ‘sxpeling’,

# ‘syeling’,

# ‘sypeling’,

# ‘szeling’,

# ‘szpeling’,

# ‘tpeling’,

# ‘tspeling’,

# ‘upeling’,

# ‘uspeling’,

# ‘vpeling’,

# ‘vspeling’,

# ‘wpeling’,

# ‘wspeling’,

# ‘xpeling’,

# ‘xspeling’,

# ‘ypeling’,

# ‘yspeling’,

# ‘zpeling’,

# ‘zspeling’

—

Step4 => Search above word in entire word (big text file; step 1)

Step5 => Correct word will be that particular word whose probability (from step 2 will be higher)

Language detection and Translation using TextBlob

Language translation and detection is powered by the Google Translate API.

## Detect Language
text = "I hate this phone"
blob_obj = TextBlob(text)
blob_obj.detect_language()

>>‘en’
Now if you are trying to use this code in your office computer you may get TimeoutError called:

Also Read: Gensim word2vec python implementation

URLError: <urlopen error [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time or established connection failed because connected host has failed to respond>

In this case you have to define your proxy address before above code like below (as it is fetching result from web api):

## Detect Language with proxy
import nltk
# Set up your proxy address
nltk.set_proxy('http://111.199.236.103:8080') 
text = "I hate this phone"
blob_obj = TextBlob(text)
blob_obj.detect_language()

## Translate to bengali language
blob_obj.translate(to="bn")

>> TextBlob(“আমি এই ফোন ঘৃণা করি“)

Here‘bn’ is language code you need to provide to TextBlob. So you should know all Supported Language codes.

—

Language Name	Code	Language Name	Language Code
Afrikaans	af	Irish	ga
Albanian	sq	Italian	it
Arabic	ar	Japanese	ja
Azerbaijani	az	Kannada	kn
Basque	eu	Korean	ko
Bengali	bn	Latin	la
Belarusian	be	Latvian	lv
Bulgarian	bg	Lithuanian	lt
Catalan	ca	Macedonian	mk
Chinese Simplified	zh-CN	Malay	ms
Chinese Traditional	zh-TW	Maltese	mt
Croatian	hr	Norwegian	no
Czech	cs	Persian	fa
Danish	da	Polish	pl
Dutch	nl	Portuguese	pt
English	en	Romanian	ro
Esperanto	eo	Russian	ru
Estonian	et	Serbian	sr
Filipino	tl	Slovak	sk
Finnish	fi	Slovenian	sl
French	fr	Spanish	es
Galician	gl	Swahili	sw
Georgian	ka	Swedish	sv
German	de	Tamil	ta
Greek	el	Telugu	te
Gujarati	gu	Thai	th
Haitian Creole	ht	Turkish	tr
Hebrew	iw	Ukrainian	uk
Hindi	hi	Urdu	ur
Hungarian	hu	Vietnamese	vi
Icelandic	is	Welsh	cy
Indonesian	id	Yiddish	yi

Conclusion

TextBlob is built by using various NLP tools like NLTK, Pattern, google translator etc.

There is nothing new or special in this package but if you want multiple important NLP functions together in one place then you can go with this package.

In this tutorial I have discussed about:

What is TextBlob?

Install and setup TextBlob for Python

Tokenization with TextBlob

POS tagging with TextBlob

Noun Phrase Extraction using TextBlob

Word Inflection and Lemmatization using TextBlob

N-grams using TextBlob

Sentiment Analysis using TextBlob

Spelling Correction using TextBlob

How spelling corrector works in TextBlob

Language detection and Translation using TextBlob

If you have any question or suggestion regarding this topic please let me know in comment section, will try my best to answer.

Anindya Naskar

8 thoughts on “Natural Language Processing Using TextBlob”

Filozofia.XMC.pl

February 19, 2021 at 1:48 am

To the stage and written well, tyvm for that info.
Japonia News

February 26, 2021 at 10:26 am

Brilliant, thank you, I will subscribe to you RSS now!
Cukrzyca.XMC.pl

March 1, 2021 at 2:29 am

Hello :-), quite wonderful weblog. Finally any person offers helpful details.
Rodzaje Szkła

March 9, 2021 at 12:47 pm

Good read
Leasing Co to jest

March 27, 2021 at 9:41 am

Excellent goods from you, man. Ive understand your stuff previous to and youre just extremely excellent. I really like what you have acquired here, certainly like what youre stating and the way in which you say it. You make it enjoyable and you still care for to keep it smart. I can not wait to read much more from you. This is really a tremendous web site.
gwqeyxebs

May 10, 2021 at 9:38 am

A complete article for Natural Language Processing Using Textblob. Thanks
combigan

May 15, 2021 at 6:51 pm

Complete understanding of Natural Language Processing. nlp textblob. Thanks
Submit Listing

June 4, 2021 at 2:03 pm

I love the theme youre using in your blog Im so grateful with this post and thank you a lot for sharing it with us. Will definately keep close track of these pages.

What is TextBlob?

Install and setup TextBlob for Python

Tokenization with TextBlob

POS tagging with TextBlob

Noun Phrase Extraction using TextBlob

Word Inflection and Lemmatization using TextBlob

N-grams using TextBlob

Sentiment Analysis using TextBlob

Spelling Correction using TextBlob

How spelling corrector works in TextBlob

Language detection and Translation using TextBlob

Conclusion

Related Posts

8 thoughts on “Natural Language Processing Using TextBlob”

Leave a comment Cancel reply