Our recent paper on ”Potential and Limitations of Commercial Sentiment Detection Tools” (see this blog post) received alot of attention in the community. In face, we got several requests to provide access our data and the test corpora.
You can find our results and data at our sentiment analysis site. Unfortunaltely, we cannot provide the corpora directly, due to legal reasons. But you can find and download them from the following sources:
- DAI (4093 tweets):
http://data.dai-labor.de/corpus/sentiment/ - JRC (1290 speech quotations):
http://langtech.jrc.ec.europa.eu/Resources/2010_JRC_1590-Quotes-annotated-for-sentiment.zip - TAC (2689 product review sentences):
https://raw.github.com/oscartackstrom/sentence-sentiment-data/master/data/finegrained.txt - SEM (1250 news headlines):
http://lit.csci.unt.edu/~rada/downloads/AffectiveText.Semeval.2007.tar.gz - HUL (3945 product review sentences):
http://www.cs.uic.edu/~liub/FBS/CustomerReviewData.zip - DIL (4275 product review sentences):
http://www.cs.uic.edu/~liub/FBS/Reviews-9-products.rar - MPQ (11111 news sentences):
http://mpqa.cs.pitt.edu/corpora/mpqa_corpus/mpqa_corpus_2_0/
Hope this simplifies your work!