Tag Archives: Research

Corpora for Sentiment Analysis

Our recent paper on ”Potential and Limitations of Commercial Sentiment Detection Tools” (see this blog post) received alot of attention in the community. In face, we got several requests to provide access our data and the test corpora.

You can find our results and data at our sentiment analysis site. Unfortunaltely, we cannot provide the corpora directly, due to legal reasons. But you can find and download them from the following sources:  

Hope this simplifies your work!


Sentiment Analysis Tools are Good – but not Perfect

How good are commercial sentiment analysis tools? We recently tackeled this question in our research team, and evaluated the quality of 9 state-of-the-art commercial sentiment detection tools. We applied them to 30,000 short texts from various sources (tweets, news headlines, reviews etc.). The best tools have an accuracy of 75% for some document types (tweets), but the average accuracy over all documents is at best 60%. This means that even with the best tool, 4 out of 10 documents will be classified wrong.

Since we were convinced that there is still some “potential” for improvement, we combined all tools with a meta-classifier. It turned out that using a random forest classifier can improve accuracy by up to 9 percent points, in comparison to the best single tool.

Our results were published at ESSEM 2013. For more details, please see our paper.