Advanced Natural Language Processing with Stanford CoreNLP

Hope you already know what Natural Language Processing (NLP) is, what is the use of NLP is and where to apply NLP.

There are lots of tools to work with NLP. Some popular of them are:

  • NLTK
  • Spacy
  • Stanford Core NLP
  • Textblob

Above listed tools are most popularly used for Natural Language Processing.

In this topic I will show you how to use Stanford Core NLP in python.

What is Stanford Core NLP?

The Stanford CoreNLP suite released by the NLP research group at Stanford University. It offers Java-based modulesfor the solution of a range of basic NLP tasks like POS tagging (parts of speech tagging), NER (Name Entity Recognition), Dependency Parsing, Sentiment Analysis etc.

Apart from python or java you can test the service on core NLP demo page.

Setup Stanford Core NLP for python

Install pycorenlp

pip install pycorenlp (or by manually from https://pypi.org/project/pycorenlp/)

Download Stanford Core NLP Model

Use below link to download:
Install Package

Unzip downloaded file (stanford-corenlp-full-2018-10-05.zip)
Open command prompt inside unzipped folder.
cd stanford-corenlp-full-2018-10-05

I kept the unzipped file inside C drive.



Start Server
Copy and paste below command in cmd and hit enter.
java -mx5g -cp “*” edu.stanford.nlp.pipeline.StanfordCoreNLPServer -timeout 10000
Keep this local server running while using Stanford NLP in python. Do not close the cmd. That’s it.

Also Read:  Prepare training data and train custom NER using Spacy Python

Notes:

    • Time out is in milliseconds, I set it to 10 sec above. You should increase it if you want to pass huge blobs to the server.
      • There are more options, you can list them with –help command.
        • -mx5g should allocate enough memory, but you may need to modify the option if your box is under powered.

          Now we are all done with set up Stanford Core NLP for python. We can start using Stanford CoreNLP to do different kind of NLP tasks.

          POS (Parts of Speech Tagging) using Stanford Corenlp

          from pycorenlp import StanfordCoreNLP
          nlp = StanfordCoreNLP('http://localhost:9000')
          text = "I love you. I hate him. You are nice. He is dumb"

          res = nlp.annotate(text,
          properties={
          'annotators': 'sentiment',
          'outputFormat': 'json',
          'timeout': 1000,
          })

          for sen in res['sentences']:
          for tok in sen['tokens']:
          print('Word:',tok['word'],'==> POS:',tok['pos'])

          Output:

          Word: I ==> POS: PRP
          Word: love ==> POS: VBP
          Word: you ==> POS: PRP
          Word: . ==> POS: .
          Word: I ==> POS: PRP
          Word: hate ==> POS: VBP
          Word: him ==> POS: PRP
          Word: . ==> POS: .
          Word: You ==> POS: PRP
          Word: are ==> POS: VBP
          Word: nice ==> POS: JJ
          Word: . ==> POS: .
          Word: He ==> POS: PRP
          Word: is ==> POS: VBZ

          Word: dumb ==> POS: JJ

          Also Read:  Improve Text Classification with Python and FastText

          Name Entity Recognition (NER) using Stanford Corenlp

          from pycorenlp import StanfordCoreNLP
          nlp = StanfordCoreNLP('http://localhost:9000')
          text = "I love you. I hate him. You are nice. He is dumb"

          text = "Pele was an excellent football player."

          res = nlp.annotate(text,
          properties={
          'annotators': 'ner',
          'outputFormat': 'json',
          'timeout': 1000,
          })

          for sen in res['sentences']:
          for tok in sen['entitymentions']:
          print('Word:',tok['text'],'==> Entity:',tok['ner'])

          Output:
          Word: Pele ==> Entity: PERSON
          Word: football player ==> Entity: TITLE

          Sentiment polarity score using Stanford Corenlp

          from pycorenlp import StanfordCoreNLP
          nlp = StanfordCoreNLP('http://localhost:9000')

          text = "I love you. I hate him. You are nice. He is dumb"

          res = nlp.annotate(text,
          properties={
          'annotators': 'sentiment',
          'outputFormat': 'json',
          'timeout': 1000,
          })

          for sen in res["sentences"]:
          print("%d: '%s': %s %s" % (
          sen["index"],
          " ".join([tok["word"] for tok in sen["tokens"]]),
          sen["sentimentValue"], sen["sentiment"]))

          Output:
          0: ‘I love you .’: 3 Positive
          1: ‘I hate him .’: 1 Negative
          2: ‘You are nice .’: 3 Positive
          3: ‘He is dumb’: 1 Negative 

          How Stanford Corenlp works

          Task like POS tagger, dependency parsing, NER etc is done by using labeled parse trees prepared by the  corpus  is  based  on the data set introduced by Pang and Lee (2005) and consists of 11,855 single sentences extracted from movie reviews. 

          The sentiment analysis implementation is based on the paper Recursive Deep Models for SemanticCompositionality Over a Sentiment Treebank by Richard Socher et al. Here neural network is trained on a sentiment treebank, a novel type of data set the authors conceived of which associates the individual phrases of various sizes which comprise sentences with sentiment labels.

          Also Read:  Finetune LLM (LLAMA, GPT-Neo, GPT-J, Pythia) in Google Colab


          Conclusion:

          In this topic I have covered

          • Different tools to work with NLP
          • What is Stanford Corenlp
          • Setup Stanford Corenlp for python
          • Parts of speech tagging with Stanford Corenlp
          • Entity Recognition using Stanford Corenlp
          • Sentiment Analysis using Stanford Corenlp
          • How POS tagging, dependency parsing, NER and Sentiment analysis works in Stanford Corenlp

           

          If you have any question or suggestion regarding this topic see you in comment section. I will try my best to answer.

           

          Leave a comment