From IDSS Group at INESC-ID
Jump to: navigation, search


POPSTAR (Public Opinion and Sentiment Tracking, Analysis, and Research) is a collaborative research initiative by ICS-UL, INESC-ID, FEUP and NIPE-UM that explores web-based conventional and social media with text mining and generates indicators of both frequency and polarity (positivity/negativity) of mentions to political actors, the economy, and economic policies across sources, types of sources, and across time.

A website at www.popstar.pt provides real-time graphics

POPSTAR was under active development while funded by FCT (until October 2014). Since then it is updated sporadically, depending on the availability of its research team.

New features are being permanently added.


Previous Announcements



The project has two main goals:

  1. design an opinion mining system capable of measuring, almost in real-time, sentiments vis-à-vis parties, political actors, and the economy in the content of both conventional web-based media (online newspapers) and the so-called social media (blogs and micro-blogs).
  2. use collected the data to explore and explain the relationship between trends in sentiments as expressed in the conventional media, the social media, and public opinion polls and surveys in Portugal.

There are three crucial components of our research plan: raw data collection, data analysis, and dissemination.

Data collection

POPSTAR Big Picture
The "Big Picture" of the information flow within the POPSTAR information system

Two types of raw data need to be collected for our purposes:

  1. results of public opinion polls
  2. surveys on consumer confidence.

Data from opinion polls measuring voting intentions and approval rates of party leaders, cabinet ministers, and the President of the Republic have already been systematically collected at ICS-UL since 2005, and can be continually updated by accessing the polls’ technical documentation that, since 2009, is made public at the website of the Entidade Reguladora para a Comunicação Social. Monthly data since the 1980s from consumer sentiment surveys are available at the Instituto Nacional de Estatística (INE) website.

The second type of raw data to be collected of consists of politically relevant texts in Portuguese. These will be obtained from media partners (the SAPO content aggregator and the Público newspaper are our partners in the REACTION project), namely from newspaper articles and editorials, blog posts and tweets. Our software detects implicit connections between related comments and discussions linked to news articles, in order to contextualize such messages and determine their relevance.

Data analysis

The collected public opinion data will be analyzed with the goal of obtaining summary (at least) monthly measures of voting intention in parties, approval/disapproval of political leaders, and consumer confidence. The collected text data will be analyzed with the goals of obtaining measures frequency of mention, polarity and intensity of views about political actors, political parties, and the Portuguese economy and economic policy in web-based mainstream and social media texts.

Opinion polls typically use individuals’ responses through scales to measure evaluation of the performance of political actors, consumer confidence, or evaluations of the economy. Although question wording and scales vary, statistical analysis of a long time series data can produce reliable and valid summary measures associated to time points and allow for the detection of trends. The generation of such measures for mainstream and social media texts, particularly if the goal is automation, presents challenges of an entirely different nature.

In order to classify sentiments and opinions according to their polarity and intensity, we will select and evaluate publicly available language resources for Portuguese. These resources will be complemented by the ones developed in the REACTION project:

This ensemble is being adapted to fit the task of political opinion mining, by adding syntactic-semantic attributes to sentiment predicates, which are more robust to domain dependency.

We experiment with several possibilities for choosing the best algorithm both for separating opinions from factual information, and for determining the opinion polarity and intensity. A research question being addressed is the selection of the best feature set. Given that we will classify opinions about political entities on complex issues (as opposed to very specific opinion-mining scenarios), a combination of high-level (syntactic information, previous information about named-entities), medium-level (e.g. polarity classes of words as predicted by the sentiment lexicon) and low-level (words, relative positions) features will be necessary for the classification task.

A final data analysis task is exploring the relationships between mentions and opinion polarity and intensity indicators for different political actors and the economy, both between media sources themselves (mainstream, blogs, and micro-blogs) and between those sources and survey-based indicators (polls and consumer confidence index). Besides the generic indicators, a number of important categorizations and sub-divisions can be used, distinguishing between different media sources, A-List blogs and others, and between different profiles of micro-blog users. And since we have the ability to use archived data to produce those indicators both for polls and texts retrospectively, our time series will begin in 2008, well before the moment of completion of the first opinion mining software prototype.

Time series econometrics, specifically Vector Auto Regressions and Wavelet Analysis, will be used to detect co-movements in the different series and explore their potential causal relationship.

Data dissemination

The research carried out by the project will allow the production of:

Users will be able to visit this website and observe the indicators generated by the data analysis process in near-real time, providing a permanent resource that tracks the evolution of public sentiment vis-à-vis political actors and the economy.

Three of the project’s researchers (Pedro Magalhães, Luís Aguiar-Conraria, and Nina Wiesehomeier) will produce regular comments and analysis of the data for the general public, together with invited experts who will contribute on a less regular basis.


POPSTAR is organized in seven complementary research tasks which jointly address the problem areas identified above:

  1. Opinion Mining, led by Carlos Soares.
  2. Information System, led by Mário J. Silva.
  3. Trend Analysis, led by Luís Aguiar-Conraria.
  4. Website, led by Mário Silva.
  5. Social Diffusion, led by Pedro Magalhães
  6. Project Managament, led by Pedro Magalhães.


Fct logo.gif

POPSTAR is funded by FCT

Research Team

The POPSTAR project focuses on a challenging interdisciplinary that involves the application of knowledge and methods from information retrieval, text mining, computational linguistics, and political and social sciences.

Senior Researchers:

Pedro Magalhães (Principal Investigator) 
a political scientist who has conducted extensive research of electoral behavior and forecasting, political attitudes and public opinion. He is one of the coordinators of the Portuguese Electoral Study at ICS-UL, twice funded by FCT.
Carlos Soares 
brings to the project his experience in data mining.
Luís Aguiar-Conraria 
a political economist who has researched dynamic macroeconomic theory and modelling, electoral behaviour and forecasting, and advanced econometrics.
Mário J. Silva 
brings expertise on text mining.
Nina Wiesehomeier 
a political scientist who has done research on political institutions, party placements and evaluations, and expert surveys.
Paula Carvalho 
brings her experience in sentiment analysis for Portuguese.

Student Researchers:

Silvio Moreira
João Filgueiras
Pedro Saleiro
Miguel Maria


Manuel Távora

Former Researchers:

Eduarda Mendes Rodrigues


POPSTAR Poster Mai 2012 

POPSTAR Workshop. Lisboa July 12, 2013

POPSTAR Workshop. Porto Feb 01, 2013


Check also the Reaction#Publications. As POPSTAR builds on Reaction, some are quite relevant.

S. Amir, M. Almeida, B. Martins, J. Filgueiras, and M. J. Silva. TUGAS: Exploiting Unlabelled Data for Twitter Sentiment Analysis. In proceedings of the 8th International Workshop on Semantic Evaluation, SemEval ’14, Dublin, Ireland

P. Saleiro, L. Rei, A. Pasquali, C. Soares, et al. POPSTAR at RepLab 2013: Name ambiguity resolution on Twitter. In proceedings of the 4th International Conference of the CLEF initiative, CLEF 2013, Valencia, Spain.

J. Filgueiras, S. Amir. POPSTAR at RepLab 2013: Polarity for Reputation Classification. In proceedings of the 4th International Conference of the CLEF initiative, CLEF 2013, Valencia, Spain.

S. Moreira, J. Filgueiras, B. Martins, F. Couto, M. J. Silva. REACTION: A naive machine learning approach for sentiment classification. In proceedings of the 7th International Workshop on Semantic Evaluation (SemEval 2013), in conjunction with the Second Joint Conference on Lexical and Computational Semantics (*SEM 2013).

Personal tools