Proposal Title: Sentiment analysis of Youtube videos with joint models of text and speech
Principal Investigator: Alessandro Moschitti
Google Sponsors: Katja Filippova and Massimilano Ciaramita
The SenTube corpus is available for research and commercial purposes. The comments corpus can be downloaded from here (16MB). Video files are available on request.
In recent years automatic sentiment analysis of text has attracted the attention of both academic and industrial world. On one hand, focusing on one well-defined aspect of text semantics, e.g., opinion polarity, allows for simplifying the semantic model required for extracting the desired piece of information. On the other hand, the sentiment data is very appealing for defining interesting business models, e.g., when it quantifies the user's appreciation towards a certain product or represents the opinion of market experts. Such interest has produced a large body of research work, mainly focused on the use of machine learning algorithms for opinion classification.
The use of advanced linguistic information such as syntactic structures and shallow semantics has also been investigated to carry out deep linguistic analysis. In contrast, most of real-world applications are based on opinion polarity extraction about products or facts, which can be easily implemented as simple text categorization using bag-of-words.
However, the methods above do not scale to more complex tasks such as sentiment analysis from comments and videos, e.g., those in Youtube. Some potential reasons are: (i) polarity words may be heavily present in a comment that may be unrelated to the video and thus not interesting for the target analysis. (ii) A comment may refer to different aspects of the video content, which may be differently evaluated. Promising solutions for such problems relate to jointly modeling the information in videos, which contain the target of the sentiment analysis, e.g., entities, and the information expressed in the text, which can be used to measure the impact of the video on the viewers.
This project studies and defines models for sentiment analysis from Youtube video comments. In particular, the tasks we pursue:
- carry out semantic and sentiment analysis on comments to identify the targets of videos;
- annotate such concepts in comments to build structural representations, e.g. using shallow syntactic trees or full parsing and shallow semantics, when possible;
- apply machine learning algorithms to such representations to design automatic classifiers of several sentiment dimensions in comments: positive, negative, boring, exciting and so on; and
- use features from comments to improve the model for opinion polarity.
Given the complexity and the novelty of the task, we will rely on kernel methods to describe the joint information. We will exploit structural kernels (e.g., for sequences and trees) to encode structures in powerful algorithms, e.g., Support Vector Machines (SVMs), where the structures represent dependencies between words and concepts of comments.
Definition of Annotation guidelines
As a first step to build a framework for supervised learning of user generated content we have developed a set of annotation guidelines, which are used as a general reference guide for our annotators. It is available here.
To support human annotators we rolled out an in-house web annotation system (available only inside the UNITN network, otherwise one should connect through a VPN). Here is an example image of the annotation interface to collect the labeled data from each video comment.
Each comment gets assigned to one of the following categories: product related and/or video related (some comments, indeed, can fall under both categories) or as spam, off-topic or not-english, receives positive, negative or neutral sentiment.
At the moment we have over 1k of videos annotated with the total 20k comments.
To support the task of constructing syntactic/semantic graphs to jointly represent question/answer pairs, we have constructed an NLP pipeline within Apache UIMA framework. UIMA is a modular and flexible architecture for managing unstructured data, it makes it easy to plug in a great number of NLP components available as open-source projects. Currently, our pipeline includes components for performing basic tokenization, lemmatization, named entity recognition, constituency and dependency parsing, semantic role labeling, topic modeling with LDA, co-reference resolution, question focus detection and question classification. We plan to extend this list to support an easy construction of richer syntactic/semantic graphs for relational learning.
We use the rich set of NLP components to extract a large number of features ranging from simple word match metrics, i.e. n-gram overlap, longest common subsequence, to more much more involved knowledge-based features, i.e explicit semantic analysis and also various syntactic and shallow semantic similarity metrics.
To test the efficacy of our current feature-based system, we evaluted its performance on the Semantic Textual Similarity (STS) 2012 task. Following the setup of the 2012 challenge, we observed an improvement from 83.3% (Pearson correlation achieved by the best system) to 88.0% obtained by our system.
Motivated by these results, we took part in the STS 2013 challenge.
Relational models for classification
The tasks we plan to target range from spam filtering to sentiment analysis, information aggregation for recommendation and summarization. One of the core challenges across these various tasks is the ability to capture salient features to build accurate models.
Simple approaches to the representation of text, e.g. bag of words model, have shown to yield reasonable results in simple tasks, e.g. classification.
In this project our goal is to enable richer syntactic/semantic representation which represents a promising step towards building more accurate models. We attempt to bring our expertise in encoding structural relationships to build an efficient yet semantically rich representation of YouTube comments.
Currently, we explore the relational representations previously defined for modeling structural relations in question answer pairs (Severyn & Moschitti, 2012) to model the YouTube comments.
We explore the basic shallow relational trees derived from chunks and part-of-speech tags and dependency tree representations. An example of such relational models for rerpesenting a q/a pair is shown below.
- Severyn, A. and Moschitti, A. (2012) Structural relationships for large-scale learning of answer re-ranking. In SIGIR, 2012.
- Severyn, A., Moschitti, A., Uryupina, O., Plank, B., Filippova, K. (2014) Opinion Mining on YouTube. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL 2014).
- Uryupina, O., Plank, B., Severyn, A., Rotondi, A., Moschitti, A. (2014) SenTube: A corpus for sentiment analysis on YouTube social media. In Proceedings of the Language Resources and Evaluation Conference (LREC’14).