Interest in Natural Language Processing (NLP) is growing due to increasing number of interesting applications like Machine Translator, Chatbot, Image Captioning etc.
There are lots of tools to work with NLP. Some popular of them are:
In this topic I will show you how to use TextBlob in Python
What is TextBlob?
TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.
TextBlob, which is built on the shoulders of NLTK and Pattern. A big advantage of this is, it is easy to learn and offers a lot of features like sentiment analysis, pos-tagging, noun phrase extraction, etc. It has now become my go-to library for performing NLP tasks.
TextBlob is a Python (supported for 2 and 3) library for text processing. It is a combination of multiple libraries and functions like NLTK, google translator etc. to serve features like:
—-
Features:
- Classification (by using Naive Bayes, Decision Tree)
- Language translation and detection powered by Google Translate
- Tokenization (splitting text into words and sentences)
- Word and phrase frequencies
- Word inflection (pluralization and singularization) and lemmatization
- Add new models or languages through extensions
Install and setup TextBlob for Python
As TextBlob built on the shoulders of NLTK and Pattern, so we need to download necessary NLTK corpora along with TextBlob itself.
$ pip install -U textblob
$ python -m textblob.download_corpora
Now let’s explore some key features of TextBlob and implement them in Python.
To do any kind of text processing using TextBlob, we need to follow two steps listed below:
- Convert any string to TextBlob object
- Call functions of TextBlob to do a specific task
Tokenization with TextBlob
Tokenization refers to dividing text into sequence of tokens
from textblob import TextBlob
text = '''
TextBlob is a Python (2 and 3) library for processing textual data.
It provides API to do natural language processing (NLP)
such as part-of-speech tagging, noun phrase extraction, sentiment analysis, etc.
'''
blob_obj = TextBlob(text)
# Divide into sentence
blob_obj.sentences
—
Output:
[Sentence(“
TextBlob is a Python (2 and 3) library for processing textual data.”),
Sentence(“It provides API to do natural language processing (NLP)
# Print tokens/words
blob_obj.tokens
Output:
WordList([‘TextBlob’, ‘is’, ‘a’, ‘Python’, ‘(‘, ‘2’, ‘and’, ‘3’, ‘)’, ‘library’, ‘for’, ‘processing’, ‘textual’, ‘data’, ‘.’, ‘It’, ‘provides’, ‘API’, ‘to’, ‘do’, ‘natural’, ‘language’, ‘processing’, ‘(‘, ‘NLP’, ‘)’, ‘such’, ‘as’, ‘part-of-speech’, ‘tagging’, ‘,’, ‘noun’, ‘phrase’, ‘extraction’, ‘,’, ‘sentiment’, ‘analysis’, ‘,’, ‘etc’, ‘.’])
—
POS tagging with TextBlob
TextBlob have two type of POS tagger
- PatternTagger (uses the same implementation as the pattern library)
- NLTKTagger which uses NLTK’s TreeBank tagger
By default TextBlob use PatternTagger. If you want to use NLTK Treebank tagger you can always use that.
# By using TreeBank tagger
from textblob.taggers import NLTKTagger
nltk_tagger = NLTKTagger()
blob_obj = TextBlob(text, pos_tagger=nltk_tagger)
blob_obj.pos_tags
# By using Pattern Tagger
from textblob.taggers import PatternTagger
pattern_tagger = PatternTagger()
blob_obj = TextBlob(text, pos_tagger=pattern_tagger)
blob_obj.pos_tags
Output:
[(‘TextBlob’, ‘NN’),
(‘is’, ‘VBZ’),
(‘a’, ‘DT’),
(‘Python’, ‘NNP’),
(‘2’, ‘IN’),
(‘and’, ‘CC’),
(‘3’, ‘CD’),
(‘library’, ‘NN’),
(‘for’, ‘IN’),
(‘processing’, ‘NN’),
(‘textual’, ‘JJ’),
(‘data’, ‘NNS’),
(‘It’, ‘PRP’),
(‘provides’, ‘VBZ’),
(‘API’, ‘NNP’),
(‘to’, ‘TO’),
(‘do’, ‘VBP’),
(‘natural’, ‘JJ’),
(‘language’, ‘NN’),
(‘processing’, ‘NN’),
(‘NLP’, ‘NN’),
(‘such’, ‘JJ’),
(‘as’, ‘IN’),
(‘part-of-speech’, ‘JJ’),
(‘tagging’, ‘VBG’),
(‘noun’, ‘NN’),
(‘phrase’, ‘NN’),
(‘extraction’, ‘NN’),
(‘analysis’, ‘NN’),
(‘etc.’, ‘FW’)]
Noun Phrase Extraction using TextBlob
Noun Phrase extraction is important in NLP when you want to analyze the “who” factor in a sentence. Let’s see an example below.
TextBlob uses NLTK data to do this job.
for np in blob_obj.noun_phrases:
print (np)
—
Output:
textblob
python
processing textual data
api
natural language processing
nlp
noun phrase extraction
Word Inflection and Lemmatization using TextBlob
Lemmatization is to convert a word into its base form. For this also TextBlob uses NLTK (wordnet) data.
# Singularize form
print('Prvious: ', blob_obj.words[13], ' After: ', blob_obj.words[13].singularize())
# Pluralize form
print('Prvious: ', blob_obj.words[7], ' After: ', blob_obj.words[7].pluralize())
Output:
Prvious: provides After: provide
Prvious: library After: libraries
N-grams using TextBlob
N-Gram is combination of multiple words together. N grams can be used as features for language modelling.
By using “ngrams” function we can easily generate N gram words.
## 3-gram
blob_obj.ngrams(n=3)
Output:
[WordList([‘TextBlob’, ‘is’, ‘a’]),
WordList([‘is’, ‘a’, ‘Python’]),
WordList([‘a’, ‘Python’, ‘2’]),
WordList([‘Python’, ‘2’, ‘and’]),
WordList([‘2’, ‘and’, ‘3’]),
WordList([‘and’, ‘3’, ‘library’]),
WordList([‘3’, ‘library’, ‘for’]),
WordList([‘library’, ‘for’, ‘processing’]),
WordList([‘for’, ‘processing’, ‘textual’]),
WordList([‘processing’, ‘textual’, ‘data’]),
WordList([‘textual’, ‘data’, ‘It’]),
WordList([‘data’, ‘It’, ‘provides’]),
WordList([‘It’, ‘provides’, ‘API’]),
WordList([‘provides’, ‘API’, ‘to’]),
WordList([‘API’, ‘to’, ‘do’]),
WordList([‘to’, ‘do’, ‘natural’]),
WordList([‘do’, ‘natural’, ‘language’]),
WordList([‘natural’, ‘language’, ‘processing’]),
WordList([‘language’, ‘processing’, ‘NLP’]),
WordList([‘processing’, ‘NLP’, ‘such’]),
WordList([‘NLP’, ‘such’, ‘as’]),
WordList([‘such’, ‘as’, ‘part-of-speech’]),
WordList([‘as’, ‘part-of-speech’, ‘tagging’]),
WordList([‘part-of-speech’, ‘tagging’, ‘noun’]),
WordList([‘tagging’, ‘noun’, ‘phrase’]),
WordList([‘noun’, ‘phrase’, ‘extraction’]),
WordList([‘phrase’, ‘extraction’, ‘sentiment’]),
WordList([‘extraction’, ‘sentiment’, ‘analysis’]),
—
Sentiment Analysis using TextBlob
Sentiment analysis is the process of determining the emotion (positive or negative or neutral) of a text.
The sentiment function of TextBlob has two properties, which are:
- Subjectivity (range 0 to 1)
TextBlob has a training set with pre-classified movie reviews, when you provide a new text for analysis; it uses Naive Bayes classifier to classify the polarity of new text in negative and positive probabilities.
text = "I hate this phone"
blob_obj = TextBlob(text)
blob_obj.sentiment
Output:
—
text = "I love this phone"
blob_obj = TextBlob(text)
blob_obj.sentiment
Output:
Sentiment (polarity=0.5, subjectivity=0.6)
Note: subjectivity = 0.6 refers that it is a public opinion and not a general information.
—
Spelling Correction using TextBlob
In nlp sometimes spelling correction is mostly required to normalize text data. TextBlob offers spelling corrector with 80-90% accuracy at a processing speed of at least 10 words per second.
Spelling corrector is Based on: Peter Norvig, “How to Write a Spelling Corrector”
blob_obj = TextBlob("speling")
blob_obj.words[0].spellcheck()
[('spelling', 1.0)]
So corrected word is ‘spelling’ with probability of 100%.
How spelling corrector works in TextBlob
Step2 => Calculate probability for each word by number of times that word appear in whole document/total number of words.
# def P(word, N=sum(WORDS.values())):
# "Probability of `word`."
# return WORDS[word] / N
Step3 => arrange word (provided incorrect word) in various ways
In our example: For word ‘speling’
# ‘spsling’,
# ‘spteling’,
# ‘sptling’,
# ‘spueling’,
# ‘spuling’,
# ‘spveling’,
# ‘spvling’,
# ‘spweling’,
# ‘spwling’,
# ‘spxeling’,
# ‘spxling’,
# ‘spyeling’,
# ‘spyling’,
# ‘spzeling’,
# ‘spzling’,
# ‘sqeling’,
# ‘sqpeling’,
# ‘sreling’,
# ‘srpeling’,
# ‘sseling’,
# ‘sspeling’,
# ‘steling’,
# ‘stpeling’,
# ‘sueling’,
# ‘supeling’,
# ‘sveling’,
# ‘svpeling’,
# ‘sweling’,
# ‘swpeling’,
# ‘sxeling’,
# ‘sxpeling’,
# ‘syeling’,
# ‘sypeling’,
# ‘szeling’,
# ‘szpeling’,
# ‘tpeling’,
# ‘tspeling’,
# ‘upeling’,
# ‘uspeling’,
# ‘vpeling’,
# ‘vspeling’,
# ‘wpeling’,
# ‘wspeling’,
# ‘xpeling’,
# ‘xspeling’,
# ‘ypeling’,
# ‘yspeling’,
# ‘zpeling’,
# ‘zspeling’
—
Step4 => Search above word in entire word (big text file; step 1)
Step5 => Correct word will be that particular word whose probability (from step 2 will be higher)
Language detection and Translation using TextBlob
## Detect Language
text = "I hate this phone"
blob_obj = TextBlob(text)
blob_obj.detect_language()
>>‘en’
Now if you are trying to use this code in your office computer you may get TimeoutError called:
URLError: <urlopen error [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time or established connection failed because connected host has failed to respond>
In this case you have to define your proxy address before above code like below (as it is fetching result from web api):
## Detect Language with proxy
import nltk
# Set up your proxy address
nltk.set_proxy('http://111.199.236.103:8080')
text = "I hate this phone"
blob_obj = TextBlob(text)
blob_obj.detect_language()
## Translate to bengali language
blob_obj.translate(to="bn")
>> TextBlob(“আমি এই ফোন ঘৃণা করি“)
Here‘bn’ is language code you need to provide to TextBlob. So you should know all Supported Language codes.
—
Language Name
|
Code
|
Language Name
|
Language Code
|
Afrikaans
|
af
|
Irish
|
ga
|
Albanian
|
sq
|
Italian
|
it
|
Arabic
|
ar
|
Japanese
|
ja
|
Azerbaijani
|
az
|
Kannada
|
kn
|
Basque
|
eu
|
Korean
|
ko
|
Bengali
|
bn
|
Latin
|
la
|
Belarusian
|
be
|
Latvian
|
lv
|
Bulgarian
|
bg
|
Lithuanian
|
lt
|
Catalan
|
ca
|
Macedonian
|
mk
|
Chinese Simplified
|
zh-CN
|
Malay
|
ms
|
Chinese Traditional
|
zh-TW
|
Maltese
|
mt
|
Croatian
|
hr
|
Norwegian
|
no
|
Czech
|
cs
|
Persian
|
fa
|
Danish
|
da
|
Polish
|
pl
|
Dutch
|
nl
|
Portuguese
|
pt
|
English
|
en
|
Romanian
|
ro
|
Esperanto
|
eo
|
Russian
|
ru
|
Estonian
|
et
|
Serbian
|
sr
|
Filipino
|
tl
|
Slovak
|
sk
|
Finnish
|
fi
|
Slovenian
|
sl
|
French
|
fr
|
Spanish
|
es
|
Galician
|
gl
|
Swahili
|
sw
|
Georgian
|
ka
|
Swedish
|
sv
|
German
|
de
|
Tamil
|
ta
|
Greek
|
el
|
Telugu
|
te
|
Gujarati
|
gu
|
Thai
|
th
|
Haitian Creole
|
ht
|
Turkish
|
tr
|
Hebrew
|
iw
|
Ukrainian
|
uk
|
Hindi
|
hi
|
Urdu
|
ur
|
Hungarian
|
hu
|
Vietnamese
|
vi
|
Icelandic
|
is
|
Welsh
|
cy
|
Indonesian
|
id
|
Yiddish
|
yi
|
Conclusion
TextBlob is built by using various NLP tools like NLTK, Pattern, google translator etc.
There is nothing new or special in this package but if you want multiple important NLP functions together in one place then you can go with this package.
In this tutorial I have discussed about:
- Install and setup TextBlob for Python
- Tokenization with TextBlob
- POS tagging with TextBlob
- Noun Phrase Extraction using TextBlob
- Sentiment Analysis using TextBlob
- Spelling Correction using TextBlob
- How spelling corrector works in TextBlob
- Language detection and Translation using TextBlob
If you have any question or suggestion regarding this topic please let me know in comment section, will try my best to answer.
To the stage and written well, tyvm for that info.
Brilliant, thank you, I will subscribe to you RSS now!
Hello :-), quite wonderful weblog. Finally any person offers helpful details.
Good read
Excellent goods from you, man. Ive understand your stuff previous to and youre just extremely excellent. I really like what you have acquired here, certainly like what youre stating and the way in which you say it. You make it enjoyable and you still care for to keep it smart. I can not wait to read much more from you. This is really a tremendous web site.
A complete article for Natural Language Processing Using Textblob. Thanks
Complete understanding of Natural Language Processing. nlp textblob. Thanks
I love the theme youre using in your blog Im so grateful with this post and thank you a lot for sharing it with us. Will definately keep close track of these pages.