Natural Language Processing Using TextBlob

Interest of Natural Language Processing (NLP) is growing due to increasing number of interesting applications like Machine Translator, Chatbot, Image Captioning etc.

There are lots of tools to work with NLP. Some popular of them are:

  • NLTK
  • Spacy
  • Stanford Core NLP
  • TextBlob
In this topic I will show you how to use TextBlob in Python

What is TextBlob?

TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.
TextBlob, which is built on the shoulders of NLTK and Pattern. A big advantage of this is, it is easy to learn and offers a lot of features like sentiment analysis, pos-tagging, noun phrase extraction, etc. It has now become my go-to library for performing NLP tasks.
TextBlob is a Python (supported for 2 and 3) library for text processing. It is a combination of multiple libraries and functions like NLTK, google translator etc. to serve features like:

Features:
  • Part-of-speech tagging
  • Noun phrase extraction
  • Sentiment analysis
  • Classification (by using Naive Bayes, Decision Tree)
  • Language translation and detection powered by Google Translate
  • Spelling correction
  • Tokenization (splitting text into words and sentences)
  • Word and phrase frequencies
  • Parsing
  • n-grams
  • Word inflection (pluralization and singularization) and lemmatization
  • Add new models or languages through extensions
  • WordNet integration

Install and setup TextBlob for Python

As TextBlob built on the shoulders of NLTK and Pattern, so we need to download necessary NLTK corpora along with TextBlob itself.
$ pip install -U textblob
$ python -m textblob.download_corpora
Now let’s explore some key features of TextBlob and implement them in Python.
To do any kind of text processing using TextBlob, we need to follow two steps listed below:

  • Convert any string to TextBlob object
  • Call functions of TextBlob to do a specific task

Tokenization with TextBlob

Tokenization refers to dividing text into sequence of tokens

from textblob import TextBlob
text = '''
TextBlob is a Python (2 and 3) library for processing textual data.
It provides API to do natural language processing (NLP)
such as part-of-speech tagging, noun phrase extraction, sentiment analysis, etc.
'''
blob_obj = TextBlob(text)
# Divide into sentence
blob_obj.sentences

<! –– In article Ad start ––>
<! –– In article Ad End ––>

Output:
[Sentence(“
 TextBlob is a Python (2 and 3) library for processing textual data.”),
 Sentence(“It provides API to do natural language processing (NLP)
 such as part-of-speech tagging, noun phrase extraction, sentiment analysis, etc.”)]

 

# Print tokens/words
blob_obj.tokens

Output:
WordList([‘TextBlob’, ‘is’, ‘a’, ‘Python’, ‘(‘, ‘2’, ‘and’, ‘3’, ‘)’, ‘library’, ‘for’, ‘processing’, ‘textual’, ‘data’, ‘.’, ‘It’, ‘provides’, ‘API’, ‘to’, ‘do’, ‘natural’, ‘language’, ‘processing’, ‘(‘, ‘NLP’, ‘)’, ‘such’, ‘as’, ‘part-of-speech’, ‘tagging’, ‘,’, ‘noun’, ‘phrase’, ‘extraction’, ‘,’, ‘sentiment’, ‘analysis’, ‘,’, ‘etc’, ‘.’])

POS tagging with TextBlob

TextBlob have two type of POS tagger
  • PatternTagger (uses the same implementation as the pattern library)
  • NLTKTagger which uses NLTK’s TreeBank tagger
By default TextBlob use PatternTagger. If you want to use NLTK Treebank tagger you can always use that.

 

# By using TreeBank tagger
from textblob.taggers import NLTKTagger
nltk_tagger = NLTKTagger()
blob_obj = TextBlob(text, pos_tagger=nltk_tagger)
blob_obj.pos_tags

# By using Pattern Tagger
from textblob.taggers import PatternTagger
pattern_tagger = PatternTagger()
blob_obj = TextBlob(text, pos_tagger=pattern_tagger)
blob_obj.pos_tags

Output:
[(‘TextBlob’, ‘NN’),
 (‘is’, ‘VBZ’),
 (‘a’, ‘DT’),
 (‘Python’, ‘NNP’),
 (‘2’, ‘IN’),
 (‘and’, ‘CC’),
 (‘3’, ‘CD’),
 (‘library’, ‘NN’),
 (‘for’, ‘IN’),
 (‘processing’, ‘NN’),
 (‘textual’, ‘JJ’),
 (‘data’, ‘NNS’),
 (‘It’, ‘PRP’),
 (‘provides’, ‘VBZ’),
 (‘API’, ‘NNP’),
 (‘to’, ‘TO’),
 (‘do’, ‘VBP’),
 (‘natural’, ‘JJ’),
 (‘language’, ‘NN’),
 (‘processing’, ‘NN’),
 (‘NLP’, ‘NN’),
 (‘such’, ‘JJ’),
 (‘as’, ‘IN’),
 (‘part-of-speech’, ‘JJ’),
 (‘tagging’, ‘VBG’),
 (‘noun’, ‘NN’),
 (‘phrase’, ‘NN’),
 (‘extraction’, ‘NN’),
 (‘sentiment’, ‘NN’),
 (‘analysis’, ‘NN’),
 (‘etc.’, ‘FW’)] 

Noun Phrase Extraction using TextBlob

Noun Phrase extraction is important in NLP when you want to analyze the “who” factor in a sentence. Let’s see an example below.
TextBlob uses NLTK data to do this job.
for np in blob_obj.noun_phrases:
print (np)

<! –– In article Ad start ––>
<! –– In article Ad End ––>

Output:
textblob
python
processing textual data
api
natural language processing
nlp
noun phrase extraction
sentiment analysis

Word Inflection and Lemmatization using TextBlob

Lemmatization is to convert a word into its base form. For this also TextBlob uses NLTK (wordnet) data.
# Singularize form
print('Prvious: ', blob_obj.words[13], ' After: ', blob_obj.words[13].singularize())
# Pluralize form
print('Prvious: ', blob_obj.words[7], ' After: ', blob_obj.words[7].pluralize())

Output:
Prvious:  provides  After:  provide

Prvious:  library  After:  libraries

N-grams using TextBlob

N-Gram is combination of multiple words together. N grams can be used as features for language modelling.
By using “ngrams” function we can easily generate N gram words.
## 3-gram
blob_obj.ngrams(n=3)

Output:
[WordList([‘TextBlob’, ‘is’, ‘a’]),
 WordList([‘is’, ‘a’, ‘Python’]),
 WordList([‘a’, ‘Python’, ‘2’]),
 WordList([‘Python’, ‘2’, ‘and’]),
 WordList([‘2’, ‘and’, ‘3’]),
 WordList([‘and’, ‘3’, ‘library’]),
 WordList([‘3’, ‘library’, ‘for’]),
 WordList([‘library’, ‘for’, ‘processing’]),
 WordList([‘for’, ‘processing’, ‘textual’]),
 WordList([‘processing’, ‘textual’, ‘data’]),
 WordList([‘textual’, ‘data’, ‘It’]),
 WordList([‘data’, ‘It’, ‘provides’]),
 WordList([‘It’, ‘provides’, ‘API’]),
 WordList([‘provides’, ‘API’, ‘to’]),
 WordList([‘API’, ‘to’, ‘do’]),
 WordList([‘to’, ‘do’, ‘natural’]),
 WordList([‘do’, ‘natural’, ‘language’]),
 WordList([‘natural’, ‘language’, ‘processing’]),
 WordList([‘language’, ‘processing’, ‘NLP’]),
 WordList([‘processing’, ‘NLP’, ‘such’]),
 WordList([‘NLP’, ‘such’, ‘as’]),
 WordList([‘such’, ‘as’, ‘part-of-speech’]),
 WordList([‘as’, ‘part-of-speech’, ‘tagging’]),
 WordList([‘part-of-speech’, ‘tagging’, ‘noun’]),
 WordList([‘tagging’, ‘noun’, ‘phrase’]),
 WordList([‘noun’, ‘phrase’, ‘extraction’]),
 WordList([‘phrase’, ‘extraction’, ‘sentiment’]),
 WordList([‘extraction’, ‘sentiment’, ‘analysis’]),
 WordList([‘sentiment’, ‘analysis’, ‘etc’])]

Sentiment Analysis using TextBlob

Sentiment analysis is the process of determining the emotion (positive or negative or neutral) of a text.
The sentiment function of TextBlob has two properties, which are:

  • Polarity (range -1 to 1) 

  • Subjectivity (range 0 to 1)

    TextBlob has a training set with pre-classified movie reviews, when you provide a new text for analysis; it uses Naive Bayes classifier to classify the polarity of new text in negative and positive probabilities.
    text = "I hate this phone"
    blob_obj = TextBlob(text)
    blob_obj.sentiment

    Output:
    Sentiment(polarity=-0.8, subjectivity=0.9)

    <! –– In article Ad start ––>
    <! –– In article Ad End ––>

    text = "I love this phone"
    blob_obj = TextBlob(text)
    blob_obj.sentiment

    Output:
    Sentiment (polarity=0.5, subjectivity=0.6)

    Note: subjectivity = 0.6 refers that it is a public opinion and not a general information.

    Spelling Correction using TextBlob

    In nlp sometimes spelling correction is mostly required to normalize text data. TextBlob offers spelling corrector with 80-90% accuracy at a processing speed of at least 10 words per second.
    Spelling corrector is Based on: Peter Norvig, “How to Write a Spelling Corrector”
    (http://norvig.com/spell-correct.html) as implemented in the pattern library.
    blob_obj = TextBlob("speling")
    blob_obj.words[0].spellcheck()
    [('spelling', 1.0)]

    So corrected word is ‘spelling’ with probability of 100%.

    How spelling corrector works in TextBlob

    Step1 => from a big textfilecalculate count for each word
    Step2 => Calculate probability for each word by number of times that word appear in whole document/total number of words.  
    # def P(word, N=sum(WORDS.values())): 
    # "Probability of `word`."
    # return WORDS[word] / N

    Step3 => arrange word (provided incorrect word) in various ways
    In our example: For word ‘speling’
    #  ‘spsling’,
    #  ‘spteling’,
    #  ‘sptling’,
    #  ‘spueling’,
    #  ‘spuling’,
    #  ‘spveling’,
    #  ‘spvling’,
    #  ‘spweling’,
    #  ‘spwling’,
    #  ‘spxeling’,
    #  ‘spxling’,
    #  ‘spyeling’,
    #  ‘spyling’,
    #  ‘spzeling’,
    #  ‘spzling’,
    #  ‘sqeling’,
    #  ‘sqpeling’,
    #  ‘sreling’,
    #  ‘srpeling’,
    #  ‘sseling’,
    #  ‘sspeling’,
    #  ‘steling’,
    #  ‘stpeling’,
    #  ‘sueling’,
    #  ‘supeling’,
    #  ‘sveling’,
    #  ‘svpeling’,
    #  ‘sweling’,
    #  ‘swpeling’,
    #  ‘sxeling’,
    #  ‘sxpeling’,
    #  ‘syeling’,
    #  ‘sypeling’,
    #  ‘szeling’,
    #  ‘szpeling’,
    #  ‘tpeling’,
    #  ‘tspeling’,
    #  ‘upeling’,
    #  ‘uspeling’,
    #  ‘vpeling’,
    #  ‘vspeling’,
    #  ‘wpeling’,
    #  ‘wspeling’,
    #  ‘xpeling’,
    #  ‘xspeling’,
    #  ‘ypeling’,
    #  ‘yspeling’,
    #  ‘zpeling’,
    #  ‘zspeling’

    <! –– In article Ad start ––>
    <! –– In article Ad End ––>

    Step4 => Search above word in entire word (big text file; step 1)
    Step5 => Correct word will be that particular word whose probability (from step 2 will be higher)

    Language detection and Translation using TextBlob

    Language translation and detection is powered by the Google Translate API
    ## Detect Language
    text = "I hate this phone"
    blob_obj = TextBlob(text)
    blob_obj.detect_language()

    >>‘en’ 
    Now if you are trying to use this code in your office computer you may get TimeoutError called:

    URLError: <urlopen error [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time or established connection failed because connected host has failed to respond>  

    In this case you have to define your proxy address before above code like below (as it is fetching result from web api):  

    ## Detect Language with proxy
    import nltk
    # Set up your proxy address
    nltk.set_proxy('http://111.199.236.103:8080')
    text = "I hate this phone"
    blob_obj = TextBlob(text)
    blob_obj.detect_language()

    ## Translate to bengali language
    blob_obj.translate(to="bn")

    >> TextBlob(“আমি এই ফোন ঘৃণা করি“)

    Here‘bn’ is language code you need to provide to TextBlob. So you should know all Supported Language codes.

    <! –– In article Ad start ––>
    <! –– In article Ad End ––>

     

    Language Name
    Code
    Language Name
    Language Code
    Afrikaans
    af
    Irish
    ga
    Albanian
    sq
    Italian
    it
    Arabic
    ar
    Japanese
    ja
    Azerbaijani
    az
    Kannada
    kn
    Basque
    eu
    Korean
    ko
    Bengali
    bn
    Latin
    la
    Belarusian
    be
    Latvian
    lv
    Bulgarian
    bg
    Lithuanian
    lt
    Catalan
    ca
    Macedonian
    mk
    Chinese Simplified
    zh-CN
    Malay
    ms
    Chinese Traditional
    zh-TW
    Maltese
    mt
    Croatian
    hr
    Norwegian
    no
    Czech
    cs
    Persian
    fa
    Danish
    da
    Polish
    pl
    Dutch
    nl
    Portuguese
    pt
    English
    en
    Romanian
    ro
    Esperanto
    eo
    Russian
    ru
    Estonian
    et
    Serbian
    sr
    Filipino
    tl
    Slovak
    sk
    Finnish
    fi
    Slovenian
    sl
    French
    fr
    Spanish
    es
    Galician
    gl
    Swahili
    sw
    Georgian
    ka
    Swedish
    sv
    German
    de
    Tamil
    ta
    Greek
    el
    Telugu
    te
    Gujarati
    gu
    Thai
    th
    Haitian Creole
    ht
    Turkish
    tr
    Hebrew
    iw
    Ukrainian
    uk
    Hindi
    hi
    Urdu
    ur
    Hungarian
    hu
    Vietnamese
    vi
    Icelandic
    is
    Welsh
    cy
    Indonesian
    id
    Yiddish
    yi

    Conclusion

    TextBlob is built by using various NLP tools like NLTK, Pattern, google translator etc.
    There is nothing new or special in this package but if you want multiple important NLP functions together in one place then you can go with this package.
    In this tutorial I have discussed about:

    • What is TextBlob?

    • Install and setup TextBlob for Python

    • Tokenization with TextBlob

    • POS tagging with TextBlob

    • Noun Phrase Extraction using TextBlob

    • Word Inflection and Lemmatization using TextBlob

    • N-grams using TextBlob

    • Sentiment Analysis using TextBlob

    • Spelling Correction using TextBlob

    • How spelling corrector works in TextBlob

    • Language detection and Translation using TextBlob 

    If you have any question or suggestion regarding this topic please let me know in comment section, will try my best to answer.

    Leave a Comment

    Your email address will not be published. Required fields are marked *