Blog‎ > ‎

Social multimedia datasets @ MMSys 2013

posted Mar 1, 2013, 12:16 AM by Symeon Papadopoulos   [ updated Mar 1, 2013, 12:17 AM ]
This year's dataset session in MMSys featured a host of interesting datasets for the research community, some of those directly collected from popular social multimedia platforms. Here, I will provide a short overview of them:
  • Fashion-focused Creative Commons Social dataset. The authors first collected a fashion taxonomy from Wikipedia and then used the respective keywords to collect CC-licensed photos from Flickr. They then validated them (ensuring that indeed they are related to fashion) with the help of Mechanical Turk workers. The photos licenses are quite liberal, i.e. they permit commercial use. Together with the photos and annotations, they provide rich metadata (from Flickr).
  • Blip10000: A social Video Dataset containing SPUG Content for Tagging and Retrieval. This contains a set of semi-professional user-generated (SPUG) videos from (which was selected due to the licensing terms of the videos), together with user-contributed metadata, automatically generated transcripts and shot boundaries. What makes the dataset unique is the fact that it also contains rich social information. In particular, the authors collected (by use of the Topsy Twitter search engine) a set of Twitter accounts and historical information on them that tweeted about these videos.
  • The 2012 Social Event Detection dataset. This is a set of more than 160 thousands CC-licensed photos from Flickr collected by using the Flickr API search method using five European cities as centers and a period of two years (2009-2011) as the time interval of interest. Together with the photos and some of their Flickr metadata, the dataset contains manually generated annotations for 149 events of interest, namely technical events in Germany, soccer events in Madrid and Hamburg, and Indignados movement events in Madrid. This makes the dataset an ideal playground for assessing event detection approaches. The dataset is downloadable from here. There is also a state-of-the-art event detection approach available for use. 
All in all, these new datasets offer exciting opportunities for researchers in the area of social networks and multimedia to test new approaches and identify new research problems.