Machine Learning and NLP group at Trento.


Proposal Title: Sentiment analysis of Youtube videos with joint models of text and speech

Principal Investigator: Alessandro Moschitti

Google Sponsors: Katja Filippova and Massimilano Ciaramita

Team: Aliaksei Severyn, Barbara Plank, Olga Uryupina, Agata Rotondi


The SenTube corpus is available for research and commercial purposes. The comments corpus can be downloaded from here (16MB). Video files are available on request.

Brief Introduction

In recent years automatic sentiment analysis of text has attracted the attention of both academic and industrial world. On one hand, focusing on one well-defined aspect of text semantics, e.g., opinion polarity, allows for simplifying the semantic model required for extracting the desired piece of information. On the other hand, the sentiment data is very appealing for defining interesting business models, e.g., when it quantifies the user's appreciation towards a certain product or represents the opinion of market experts. Such interest has produced a large body of research work, mainly focused on the use of machine learning algorithms for opinion classification.

The use of advanced linguistic information such as syntactic structures and shallow semantics has also been investigated to carry out deep linguistic analysis. In contrast, most of real-world applications are based on opinion polarity extraction about products or facts, which can be easily implemented as simple text categorization using bag-of-words.

However, the methods above do not scale to more complex tasks such as sentiment analysis from comments and videos, e.g., those in Youtube. Some potential reasons are: (i) polarity words may be heavily present in a comment that may be unrelated to the video and thus not interesting for the target analysis. (ii) A comment may refer to different aspects of the video content, which may be differently evaluated. Promising solutions for such problems relate to jointly modeling the information in videos, which contain the target of the sentiment analysis, e.g., entities, and the information expressed in the text, which can be used to measure the impact of the video on the viewers.

Research Goals

This project studies and defines models for sentiment analysis from Youtube video comments. In particular, the tasks we pursue:

  • carry out semantic and sentiment analysis on comments to identify the targets of videos;
  • annotate such concepts in comments to build structural representations, e.g. using shallow syntactic trees or full parsing and shallow semantics, when possible;
  • apply machine learning algorithms to such representations to design automatic classifiers of several sentiment dimensions in comments: positive, negative, boring, exciting and so on; and
  • use features from comments to improve the model for opinion polarity.

Given the complexity and the novelty of the task, we will rely on kernel methods to describe the joint information. We will exploit structural kernels (e.g., for sequences and trees) to encode structures in powerful algorithms, e.g., Support Vector Machines (SVMs), where the structures represent dependencies between words and concepts of comments.

Progress Summary

Definition of Annotation guidelines

As a first step to build a framework for supervised learning of user generated content we have developed a set of annotation guidelines, which are used as a general reference guide for our annotators. It is available here.

Annotation interface

alt Annotation_example

To support human annotators we rolled out an in-house web annotation system (available only inside the UNITN network, otherwise one should connect through a VPN). Here is an example image of the annotation interface to collect the labeled data from each video comment.

Comment annotation

Each comment gets assigned to one of the following categories: product related and/or video related (some comments, indeed, can fall under both categories) or as spam, off-topic or not-english, receives positive, negative or neutral sentiment.

At the moment we have over 1k of videos annotated with the total 20k comments.

NLP pipeline

alt UIMA_pipeline

To support the task of constructing syntactic/semantic graphs to jointly represent question/answer pairs, we have constructed an NLP pipeline within Apache UIMA framework. UIMA is a modular and flexible architecture for managing unstructured data, it makes it easy to plug in a great number of NLP components available as open-source projects. Currently, our pipeline includes components for performing basic tokenization, lemmatization, named entity recognition, constituency and dependency parsing, semantic role labeling, topic modeling with LDA, co-reference resolution, question focus detection and question classification. We plan to extend this list to support an easy construction of richer syntactic/semantic graphs for relational learning.


We use the rich set of NLP components to extract a large number of features ranging from simple word match metrics, i.e. n-gram overlap, longest common subsequence, to more much more involved knowledge-based features, i.e explicit semantic analysis and also various syntactic and shallow semantic similarity metrics.

To test the efficacy of our current feature-based system, we evaluted its performance on the Semantic Textual Similarity (STS) 2012 task. Following the setup of the 2012 challenge, we observed an improvement from 83.3% (Pearson correlation achieved by the best system) to 88.0% obtained by our system.

Motivated by these results, we took part in the STS 2013 challenge.

Relational models for classification

The tasks we plan to target range from spam filtering to sentiment analysis, information aggregation for recommendation and summarization. One of the core challenges across these various tasks is the ability to capture salient features to build accurate models.

Simple approaches to the representation of text, e.g. bag of words model, have shown to yield reasonable results in simple tasks, e.g. classification.

In this project our goal is to enable richer syntactic/semantic representation which represents a promising step towards building more accurate models. We attempt to bring our expertise in encoding structural relationships to build an efficient yet semantically rich representation of YouTube comments.

Currently, we explore the relational representations previously defined for modeling structural relations in question answer pairs (Severyn & Moschitti, 2012) to model the YouTube comments.

We explore the basic shallow relational trees derived from chunks and part-of-speech tags and dependency tree representations. An example of such relational models for rerpesenting a q/a pair is shown below.

alt dependency_tree

alt focus_dependency_tree


  1. Severyn, A. and Moschitti, A. (2012) Structural relationships for large-scale learning of answer re-ranking. In SIGIR, 2012.
  2. Severyn, A., Moschitti, A., Uryupina, O., Plank, B., Filippova, K. (2014) Opinion Mining on YouTube. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL 2014).
  3. Uryupina, O., Plank, B., Severyn, A., Rotondi, A., Moschitti, A. (2014) SenTube: A corpus for sentiment analysis on YouTube social media. In Proceedings of the Language Resources and Evaluation Conference (LREC’14).