Abzooba's core product XpressoTM is a knowledge distillation engine that processes unstructured textual information fetched from online review sites, social media, blogs etc. and extracts knowledge relevant for decision making. Aspect-based sentiment analysis, statement type determination, emotion detection are some of the key capabilities of the engine.
Abzooba realized the potential in social media analysis and came up with Xpresso - Abzooba’s core text analytics engine in 2011. Over the past year, we've been adding lots and lots of features for enterprise-class quality, ease-of-use and customizability. From firsts like aspect-based sentiment analysis and flexibility of domain-specific customization, control of negation, intensification and detection of buzz,resume analytics Xpresso continues to be on the forefront of text analytics.
Whenever one encounters a text, the obvious question coming to mind - Is the text positive or negative or does it have no sentiment at all? Does the text talk something Good or does it point out something bad? How will you know if you don't look? How can you look by hand if it's millions of comments per hour?
The obvious limitations of manual sentiment analysis led to the development of machine learning based sentiment analysis. Once you have reliable, consistent machine-based sentiment analysis, there are a number of easy applications; reputation management (the problem every marketing person faces), “voice of customer” (listen to how they’re saying what they say, don’t constrain them to closed-ended questions) etc.
Abzooba has been working on the problem of aspect based sentiment analysis for quite some time now. As such, we have a substantial amount of experience and know where it works well. Unfortunately, many vendors haven’t been doing aspect-based sentiment analysis,rather they are mostly concerned with overall sentiment analysis so you need to be wary of over-reaching claims and confusing descriptions by the vendors rolling out sentiment analysis engines.
Below are the descriptions that will demystify sentiment extraction as well as other text analytics tasks and explain how the Xpresso engine works. This includes a discussion of how and why we have extended the basic concept of document sentiment to the aspect and entity level, and how this technology is being further extended to measure domain-specific contextual indicators.
How does Aspect based Sentiment Analysis Work?
Humans seemingly have no trouble reading a sentence and mentally assessing the aspects that are talked about in the sentence and the sentiment associated with the sentiments. As humans, we use a process of reading and understanding the descriptors placed on the subject of a sentence.
Consider these posts:
We went to a restaurant last night. The food was delicious but the service was slow. The music in the background was a bit too loud.
The beds in the hotel room were comfortable. However the shower in the bathroom was not working.
The aforementioned posts belong to “food” and “hospitality” domain respectively. One talks about a restaurant and the other talks about a hotel room. The keys humans that humans use to discern this are to focus on the emotive phrases "delicious" being associated with food and "slow" associated with service. Moreover, humans can also discriminate between the sentiment being expressed with respect to different aspects. The sentiment analysis module developed by Xpresso leverages the association between sentiment bearing or emotive phrases and the aspects and determines the sentiment based on it.
Xpresso identifies the emotive phrases within a post and aspects with which these emotive phrases are associated. Moreover it also maps the aspects with the aspect classes listed in the domain specific knowledge bases. Finally it reveals the aspect-sentiment pair for each sentence in a post and aggregates the sentiment at the aspect level. Also Xpresso reports the overall sentiment of the post. As for example in the first post, Xpresso would report <FOOD,POSITIVE>, <SERVICE,NEGATIVE> and <AMBIENCE,NEGATIVE> where FOOD, SERVICE AND AMBIENCE are the aspect classes pre-defined for the hospitality domain.
Deeper Semantic Understanding
In Xpresso, we believe in Natural Language Understanding, on top of base Natural Language Processing. The engine has potential to understand sentences beyond just keywords.
eg: “Walmart needs to have better employees”, has the concept of “better employees” which is biased towards positive polarity. Xpresso has the capability to understand the context of “NEED” here. Henceforth it rightly classifies this text as negative. Similarly goes for the sentence, “Staff could not be slower”, a State of the art Sentiment Analysis system would use negation of a negative concept to classify it positive. Xpresso correctly understands this to be negative. “This washer uses very little power”, this sentence has no polar keywords yet, this is a positive sentence
Domain Knowledge and Customization
In Xpresso, we believe Domain Knowledge and Contextual information plays a very important role for better classification of text. Domain Modelling is an additional layer of customization that we do on top of generic NLP classification
eg: For the sales domain “velocity of sales” is an attribute for sales performance and nowhere related to “velocity” in Physics . “Covered Parking” is a positive attribute for Parking and a very favourable facility
Be it any domain, Xpresso has its automated algorithms for domain modelling and ontology buildup. This is incrementally refined and rebuilt by high precision human annotators.
Detecting emotional state of a person by analyzing a text document written by him/her appear challenging but also essential many times due to the fact that most of the times textual expressions are not only direct using emotion words but also result from the interpretation of the meaning of concepts and interaction of concepts which are described in the text document. Recognizing the emotion of the text plays a key role in the human-computer interaction. Emotions may be expressed by a person’s speech, face expression and written text known as speech, facial and text based emotion respectively. Emotion detection adds an extra dimension to the Xpresso and places it a notch higher than the contemporary text-analytics engines.
Xpresso classifies emotion as one of the 6 following classes :-
Named Entity Recognition
Xpresso provides the functionality of extracting the named entities from text. The named entities include names of companies, people, places, organizations, dates, phone numbers, currency amounts and more.
Named entities are important in that Xpresso uses them to compute sentiment along with the entity or the emotion of the text.
Part Of Speech Tagging
A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text and assigns parts of speech to each word (and other token), such as noun, verb, adjective, etc., although generally computational applications use more fine-grained POS tags like 'noun-plural'. Part of speech tagging are important in that Xpresso uses them to compute sentiment or the emotion of the text. The tags used are those used by the Penn Treebank - https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html
Texts use different forms of a word, such as organize, organizes, and organizing. Additionally, there are families of derivationally related words with similar meanings, such as democracy, democratic, and democratization. In many situations, it seems as if it would be useful for a search for one of these words to return documents that contain another word in the set.
The goal of lemmatization is to reduce inflectional forms and sometimes derivationally related forms of a word to a common base form. For instance:
am, are, is -> be
car, cars, car's, cars' -> car
Lemmatization usually refers to a heuristic process that chops off the ends of words with the use of a vocabulary and morphological analysis of words, normally aiming to remove inflectional endings only and to return the base or dictionary form of a word, which is known as the lemma. Lemmatized forms of words help to identify aspects better and prevents the engine from being fallible to different word forms.
Xpresso provides the functionality of lexical parsing. A natural language parser is a program that works out the grammatical structure of sentences, for instance, which groups of words go together (as "phrases") and which words are the subject or object of a verb. Probabilistic parsers use knowledge of language gained from hand-parsed sentences to try to produce the most likely analysis of new sentences. This is richer and more informative and have got all the advantages that a machine learning based model has over any rule based system. It understands the underlying semantics of the sentence.
For eg: "Food is great": "(ROOT (S (NP (NNP Food)) (VP (VBZ is) (ADJP (JJ great)))))"