Crisis-related Sub-Event Detection Based on Clustering

Daniela Pohl (*), Abdelhamid Bouchachia (**) and Hermann Hellwagner (*)
*) Institute of Information Technology, Alpen-Adria Universität Klagenfurt, Universitätsstr. 65-67, Klagenfurt, Austria.
**) Smart Technology Research Center, Fern Barrow Poole BH12 5BB, Bournemouth University, Bournemouth, UK.


ABSTRACT
This contribution summarizes our research work in the context of social media analysis and crisis management. It focuses on the detection of sub-events, which are special hotspots of a crisis that emergency management teams must be aware of. We give an overview on our investigations done in the offline sub-event detection area and summarize the corresponding evaluation results. We also provide some insights into our current work on online sub-event detection (i.e., online features selection and incremental clustering).


1 INTRODUCTION
In large scale disasters, emergency agencies have to work closely together with different parties including citizens affected by the incident. Social media turns out to be a very important source of information, especially when emergency agencies cannot be on-site immediately or the phone lines are overloaded. The present contribution summarizes some of the work done by the authors in the area of crisis management (see [1], [2], [3], [4], [5], [6]).

Several case studies have examined the importance of social media platforms (e.g., Twitter, Flickr) during various kinds of crises [7], [8], [9] (e.g., 2007 California Wildfires). There are also studies that describe social media in the context of specific emergency forces, i.e., analyzing their attitude to social media in crisis management [10], [11]. Results show a positive attitude to use social media in crisis management practice.

Currently, the successful incorporation of social media is based on hundreds (or more) of volunteers who pay attention to social media: perform monitoring activities and structuring social media items. Our idea was to find mechanisms that allows emergency management personnel to use social media directly without waiting for interstation. Therefore, we examined the suitability of clustering algorithms to perform event monitoring by creating media items groups based on hotspots (i.e., sub-events) they describe. We concentrate on offline and online sub-event detection, whereas the former focus on static analysis of the data and the latter on dynamic or data stream analysis. The sub-events identified via the detection processes need special attention from crisis management personal to fully capture the ongoing crisis situation.

For better use of our clustering results, the gained sub-events are shared with other existing emergency management systems developed in the BRIDGE project. Our framework [12] is part of the overall BRIDGE System that allows the communication of data to other BRIDGE components requiring the data (see Section 3-D). Currently, the data is transferred to the BRIDGE Master Table [13].

This contribution is structured as follows. Section 2 gives an overview of related work. Section 3 describes the general idea behind the event monitoring based on sub-event detection. Section 4 summarizes the data sets we used in the evaluation. Section 5 and Section 6 summarize our investigation in offline and online detection mechanisms, respectively. In Section 7 the work is concluded.


2 RELATED WORK
Social media analysis has gained high interest in research, in particular microblogging platforms, like Twitter. For instance, Marcus et al. [14] extract trends describing events from Twitter data. Also, Mathioudakis and Koudas [15] and Petrović et al. [16] present analysis approaches for trend detection and streaming analysis based on tweets.

Twitter has also gained high attention in the context of several emergencies (see Vieweg et al. [8] and Palen [9]). Terpstra et al. [17] perform realtime analysis of Twitter data using keyword-based filters. Ireson [18] implemented an approach to analyze city-related blogs containing crisis information. Beside Twitter, Liu et al. [7] show the usage and importance of Flickr during crises. Fontugne et al. [19] analyze Flickr items and related tags in the context of the Tohoku earthquake (the tsunami in Japan).

In several approaches, both, geo-data and temporal information are of importance. Geo-information is important to identify relevant locations [20]. Yin et al. [21], Jaffe et al. [22] use geo-tagging for analyzing images. Becker et al. [23] show a classification approach for event detection considering location and time information of social media items. Petkos et al. [24] use time stamps in their clustering approach. In similar other works [25], [26], [19] time and location are considered for social media analysis.

The goal of our work is to identify sub-events describing specific hotspots of an emergency where crisis management teams have to pay attention to. Therefore, we make use of clustering algorithms (offline and online approaches) to identify these sub-events. This choice is motivated by the fact that clustering algorithms do not need any preparation steps (i.e., pre-labeling) before they can be applied.


3 SUB-EVENT DETECTION
This section describes several aspects of sub-event detection, including the definition of sub-events, the general framework (i.e., suitable for offline and online sub-event detection) and the relation to the BRIDGE system.

A Sub-Events
A crisis (i.e., the parent event) consists of several sub-events. Sub-events describe different emergency related hotspots, e.g., flooding, power outage, infrastructure damages in different affected areas. Examples for events and sub-events can be found in Table 1. The meanings of those terms are similar to the definitions used in 'topic detection and tracking' (TDT) [27], where topics (events) consist of sub-topics (sub-events). The main difference to TDT is the more detailed focus on location and temporal facts of the fast moving crisis.

Sub-events can be detected by considering social media items (posts, tweets, pictures etc.) describing the same incident. Through aggregation related/similar items are grouped together to identify sub-events.

Table 1: Exampels for events and sub-events


B General Framework
The general framework of our detection mechanism is shown in Fig. 1. It summarizes the most important processing steps for offline and online clustering (details see Sec. 5 and Sec. 6).

Based on keywords describing the user's interest (e.g., Hurricane Sandy New York) items are fetched from social media. Then, if there is no location in the fetched items, they are automatically geo-tagged. The geo-tagging approach considers the most important locations of the text covered by metadata fields; the geo-data is assigned using a web service (http://www.geonames.org/, Sep. 2013).

Afterwards, the sub-event detection procedure starts. Items have to be automatically pre-processed by an indexing mechanism. For the offline approaches, tf-idf for textual features is used (see Sec. 5). This approach is extended for an online mechanism to perform online indexing (see Sec. 6). This extension acts as a memory describing the importance of terms. The importance of terms is remembered over periods of time.

During preprocessing, the items are transformed into a vector space representation for performing the clustering (i.e., indexing). Afterwards, the offline respectively the online clustering is performed. The results are summarized and shown to the user. The framework offers different visualization mechanism depending on the approach (see [5] and [4]) executed, reaching from a simple table representation to a map-based visualization considering different browsing functions.

Fig. 1: Multimedia Exploration Framework (BRIDGE Aggregation/Detection)


C Study with Practitioners
We conducted a study with 16 practitioners from police forces, fire departments, paramedics and non-governmental organizations [6]. The study shows the attitude of the practitioners to social media and the usefulness of social media in crisis management. It shows that there is a positive tendency in using social media (i.e., the practitioners see a benefit in social media).

The study informs us about where to integrate the social media analysis in the BRIDGE project as we explicitly asked who could perform such analysis, and where it should be done. The study shows that the analysis task is best placed in the control room or stationary command post. Indeed, additional and more extensive studies are needed to confirm the results [6]. However, it helps us in positioning the analysis approach in the BRIDGE system.


D Integration into the BRIDGE System

The aggregation is triggered and monitored in the command post. The whole information is incorporated into the BRIDGE system (see Fig. 2), so, that it can be transmitted to other places/systems as well. This means, detected sub-events can be transmitted to other BRIDGE components either to further process them or as additional information for other components. Currently, the BRIDGE Master visualizes sub-events selected by the end-user to create a common operational picture. The transmission of data is enabled by the BRIDGE Middleware.

Fig. 2: Sub-Event Detection Embedded into the BRIDGE System


In addition, it is possible to incorporate, beside social media data, live data from an exercise. Live data (i.e., text messages and pictures from an incident) can be directly integrated into the system through an Android App. For example, this App can be used during training to simulate social media or as additional information provided by first-responders during a real crisis [12].


4 DATA SETS
Table 2 describes which data sets we used for our offline and online evaluations. The Mississippi Flood (MF), Oslo Bombing (OB), UK riots (UK) and Hurricane Irene (HI) data sets are used to evaluate the offline algorithm proposed in Pohl et al. [5]. They comprise Flickr and YouTube data from different kinds of crises and with different extent. For further evaluation, the UK riot data set was labeled based on the sub-events covered in this data set [5].

For the online evaluation we needed data of a crisis happening in a very compact time span, therefore, we collected the Hurricane Sandy (HS) dataset [4]. The data set also includes tweets, due to the fact that Twitter can be seen as highly popular social media platform with real-time behavior [28].

Table 2: Data Sets


5 OFFLINE APPROACHES
We started first with the examination of offline (i.e., static) clustering approaches for sub-event detection. Figure 1 (see left-hand side) shows the sequential steps for performing offline sub-event detection. The geo-tagging step is only used when geo-locations are considered for processing. The sub-event detection is based on different examined clustering approaches.

In particular, we analyzed four types of clustering approaches. First, we studied self-organizing maps (SOM) to identify sub-events [1]. Second, we focused on agglomerative clustering (AC) as an intuitive clustering form [3], [6]. Third, we introduced a two phase clustering approach based on self-organizing maps to incorporate sparse geo-location data given in the social media items efficiently into the clustering, called two Phase-Geo (2PG) [2]. We extended the 2PG approach to consider, beside on geo-data, also  time-information, called two Phase-Geo-Time (2PGT). The four sub-event detection approaches were evaluated according to an evaluation framework given in [5] and the data sets (MF, OB, UK and HI) given in Table 2.

The framework allowed the evaluation and comparison of the algorithms in a structured manner by considering the following criteria: scalability, metadata quality, ground truth and clustering quality. Scalability focused on the run-time complexity, the number of parameters used in clustering and the representation of results created by the algorithms. Metadata quality focused on different metadata fields used for performing the clustering. The ground truth compared the results of the clustering with real-world sub-events from the investigated crisis. The real-world sub-events were identified by several documenting sources of the crisis.

The clustering quality is evaluated for two levels. First, by considering the topic-level to identify if a clustering approach identifies the different topics. Therefore, we evaluated the approaches by considering clustering metrics (Dunn, Davies-Bouldin and Silhouette Index) and, additionally, the normalized mutual information to express how similar the clustering results between the clustering approaches are [5]. For the UK riots, sub-events in major affected cities can be identified (e.g., London with different districts, Manchester/Salford, Liverpool etc.). The results depend on the inherent granularity of the data and features used for clustering. A discussion and illustration of the results for each algorithm can be found in [5].

Second, we performed an item-level evaluation by considering a labeled data set (i.e., UK riots data set) to see if the items are assigned to the corresponding sub-events. Therefore, we labeled one data set (i.e., UK riots). The evaluation step is based on various granularity levels. This means that the labeling task was performed on a City-District-Incident-Date (CDID) format. This format summarizes items to the same sub-event, if they are in the same city, district, describing the same incident on the same date (i.e., the most detailed way of labeling). For evaluation purpose, we also focused on other granularity levels, like the City-District (i.e., summarizing items based only on the same city and district) and City level (i.e., summarizing items based on the same city). We measured the assignment of items to sub-events by considering the purity (high values indicate pure clusters) for each granularity level [5]. The purity for the AC (CDID level: 0.55) and 2PGT (CDID level: 0.93) shows the best performance in all three levels, i.e., also for the City-District and City level.

A summarization of the results for the different categories and algorithms can be found in Table 3 (for details see [5]).

Table 3: Short summary of the evaluation, (n = number of media items [5])


6 ONLINE APPROACH
In our next step, we examine online (i.e., streaming-based) sub-event detection which is related to the 'topic detection and tracking' (TDT) research field. The work described in [4] processes incoming social media items based on a batch mode. The approach proposed in [4] covers online indexing (feature selection) and incremental clustering as shown in the Fig. 1. With the online feature selection new terms/features are added and outdated ones are removed. This is based on an extension of the tf-idf representation, following the importance of terms by considering the incoming items containing the term over time intervals and an aging function for adjusting/updating the importance accordingly.

Only the most important terms are used as features for online clustering. The clustering is based on the Growing Gaussian Mixture Models [29], where each cluster is described by a multivariate Gaussian. Incremental clustering is also adjusted based on the changing term set by adding and removing corresponding dimension from the cluster representation. The advantage of the clustering approach is that it merges and splits clusters based on similarity, resulting in a compact representation of sub-events.

An evaluation based on the Hurricane Sandy data (see HS in Table 2) shows that several sub-events according to the data can be identified (i.e., flooding, power outage and damages). Additionally, it shows quite good performance compared to an offline (baseline) clustering approach, where features can be determined based on the whole data set [4]. The comparison between the offline and the online approach (for different settings) was performed by calculating the Normalized Mutual Information (NMI) and the Silhouette values after each processed batch. The NMI (high NMI values indicate similar clusterings) values after each batch are above 0.65 except for a few batches/periods. The Silhouette (high Silhouette values indicate well separated clusters) values for the online and offline approach are very similar (e.g., in average difference of 0.16). The values show that both the offline and the online approach behave in a similar way, although the online algorithm (compared to the offline algorithm) does not operate on the full data space for the feature extraction process. For details on the evaluation and settings see Pohl et al. [4].


7 FUTURE WORK & CONCLUSION
This contribution summarizes our research in the area of sub-event detection in crisis management. We described how the social media analysis approach is incorporated into the BRIDGE project. We present the examined approaches (SOM, AC, 2PG, and 2PGT) for the offline sub-event detection and summarize our findings. AC and 2PGT shows good performance, especially in the item-based assignment to sub-events. We also showed the work in the context of online sub-event detection based on dynamic feature selection and incremental clustering. In the future, we plan to focus more on online sub-event detection combined with a 'trust mechanism' to allow the filtering on trustful information sources.

Acknowledgment
The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement nr 261817 and was partly performed in the Lakeside Labs research cluster at Alpen-Adria-Universität Klagenfurt.

REFERENCES

[1] D. Pohl, A. Bouchachia, and H. Hellwagner, “Automatic Sub-Event Detection in Emergency Management Using Social Media,” in In First Inter. Workshop on Social Web for Disaster Management (SWDM), In
conjunction with WWW’12, Lyon, France, 2012.

[2] D. Pohl, A. Bouchachia, and H. Hellwagner, “Automatic Identification of Crisis-Related Sub-events Using Clustering,” in 11th International Conference on Machine Learning and Applications (ICMLA), vol. 2,
Dec. 2012, pp. 333 –338.

[3] D. Pohl, A. Bouchachia, and H. Hellwagner, “Supporting Crisis Management via Sub-Event Detection in Social Networks,” in Inter. Conf. on Collaboration Technologies and Infrastructures, Toulouse, France, 2012.

[4] D. Pohl, A. Bouchachia, and H. Hellwagner, “Online Processing of Social Media Data for Emergency Management,” in International Conference on Machine Learning and Applications (ICMLA), vol. 2, Dec. 2013, pp. 333 – 338.

[5] D. Pohl, A. Bouchachia, and H. Hellwagner, “Social Media for Crisis Management: Clustering Approaches for Sub-Event Detection,” Multimedia Tools and Applications, pp. 1–32, 2013.

[6] D. Pohl, A. Bouchachia, and H. Hellwagner, “Supporting Crisis Management via Detection of Sub-Events in Social Networks,” International Journal of Information Systems for Crisis Response and Management (IJISCRAM), vol. 5, no. 3, pp. 20–36, jul 2013.

[7] S. Liu, L. Palen, J. Sutton, A. Hughes, and S. Vieweg, “In Search of the Bigger Picture: The Emergent Role of On-Line Photo-Sharing in Times of Disaster,” in Proceedings of the Information Systems for Crisis Response and Management Conference (ISCRAM 2008), 2008.

[8] S. Vieweg, A. L. Hughes, K. Starbird, and L. Palen, “Microblogging During Two Natural Hazards Events: What Twitter May Contribute to Situational Awareness,” in Proceedings of the 28th International Conference on Human Factors in Computing Systems, ser. CHI ’10. New York, NY, USA: ACM, 2010, pp. 1079–1088.

[9] L. Palen, “Online Social Media in Crisis Events,” EDUCAUSE Quarterly (EQ), vol. 31, no. 3, pp. 76 – 78, 2008. [Online]. Available: http://www.educause.edu/

[10] A. Hughes, L. S. Denis, L. Palen, and K. Anderson, “Online Public Communications by Police & Fire Services during the 2012 Hurricane Sandy,” in Proceedings of the ACM 2014 Conference on Human Factors in Computing Systems (CHI), Toronto, 2014.

[11] S. Denef, P. S. Bayerl, and N. Kaptein, “Cross-European Approaches to Social Media as a Tool for Police Communication,” in European Police Science And Research Bulletin. Issue 6., 2011/12, pp. 1553–1562.

[12] Pohl Daniela. (2013, Sep.) Information Intelligence (II). [Online]. Available: http://www.bridgeproject.eu/content/bridge_information_intelligence_flyer.pdf

[13] Jan H°avard Skjetne. (2013, Sep.) BRIDGE Master System . [Online]. Available: http://www.bridgeproject.eu/content/bridge_master_system_flyer.pdf

[14] A. Marcus, M. S. Bernstein, O. Badar, D. R. Karger, S. Madden, and R. C. Miller, “Twitinfo: Aggregating and Visualizing Microblogs for Event Exploration,” in Proceedings of the 2011 Annual Conference on Human Factors in Computing Systems, ser. CHI ’11. New York, NY, USA: ACM, 2011, pp. 227–236.

[15] M. Mathioudakis and N. Koudas, “TwitterMonitor: Trend Detection over the Twitter Stream,” in Proceedings of the 2010 International Conference on Management of Data, ser. SIGMOD ’10. New York, NY, USA: ACM, 2010, pp. 1155–1158.

[16] S. Petrovi´c, M. Osborne, and V. Lavrenko, “Streaming First Story Detection with Application to Twitter,” in Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, ser. HLT ’10. Stroudsburg, PA, USA: Association for Computational Linguistics, 2010, pp. 181–189.

[17] T. Terpstra, A. de Vries, R. Stronkman, and G. L. Paradies, “Towards a Realtime Twitter Analysis during Crises for Operational Crisis Management,” in Proceedings of the 9th International ISCRAM Conference, Vancouver, Canada, April 2012.

[18] N. Ireson, “Local Community Situational Awareness during an Emergency,” in Digital Ecosystems and Technologies, 2009. DEST ’09. 3rd IEEE International Conference on, june 2009, pp. 49 –54.

[19] R. Fontugne, K. Cho, Y. Won, and K. Fukuda, “Disasters seen through Flickr Cameras,” in Proceedings of the Special Workshop on Internet and Disasters, ser. SWID ’11. New York, NY, USA: ACM, 2011, pp. 5:1–5:10. [Online]. Available: http://doi.acm.org/10.1145/2079360.2079365

[20] C. Zhou, D. Frankowski, P. Ludford, S. Shekhar, and L. Terveen, “Discovering Personally Meaningful Places: An Interactive Clustering Approach,” ACM Transactions on Information Systems, vol. 25, no. 3, July 2007.

[21] J. Yin, A. Lampert, M. Cameron, B. Robinson, and R. Power, “Using Social Media to Enhance Emergency Situation Awareness,” IEEE Intelligent Systems, vol. 27, no. 6, pp. 52–59, 2012.

[22] A. Jaffe, M. Naaman, T. Tassa, and M. Davis, “Generating Summaries and Visualization for Large Collections of Geo-Referenced Photographs,” in Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval, ser. MIR ’06. New York, NY, USA: ACM, 2006, pp. 89–98.

[23] H. Becker, M. Naaman, and L. Gravano, “Learning Similarity Metrics for Event Identification in Social Media,” in Proceedings of the Third ACM International Conference on Web Search and Data Mining, ser. WSDM ’10. New York, NY, USA: ACM, 2010, pp. 291–300. [Online]. Available: http://doi.acm.org/10.1145/1718487.1718524

[24] G. Petkos, S. Papadopoulos, and Y. Kompatsiaris, “Social Event Detection using Multimodal Clustering and Integrating Supervisory Signals,” in Proceedings of the 2nd ACM International Conference on Multimedia Retrieval, ser. ICMR ’12. New York, NY, USA: ACM, 2012, pp. 23:1–23:8.

[25] T. Reuter and P. Cimiano, “Event-based Classification of Social Media Streams,” in Proceedings of the 2nd ACM International Conference on Multimedia Retrieval, ser. ICMR ’12. New York, NY, USA: ACM, 2012, pp. 22:1–22:8. [Online]. Available: http://doi.acm.org/10.1145/2324796.2324824

[26] T. Rattenbury, N. Good, and M. Naaman, “Towards Automatic Extraction of Event and Place Semantics from Flickr Tags,” in Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ser. SIGIR ’07. New York, NY, USA: ACM, 2007, pp. 103–110.

[27] R. Nallapati, A. Feng, F. Peng, and J. Allan, “Event threading within news topics,” in Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, ser. CIKM’04. New York, NY, USA: ACM, 2004, pp. 446–453. [Online]. Available: http://doi.acm.org/10.1145/1031171.1031258

[28] T. Sakaki, M. Okazaki, and Y. Matsuo, “Earthquake Shakes Twitter Users: Real-time Event Detection by Social Sensors,” in Proceedings of the 19th International Conference on World Wide Web, ser. WWW’10. New York, NY, USA: ACM, 2010, pp. 851–860. [Online]. Available: http://doi.acm.org/10.1145/1772690.1772777

[29] A. Bouchachia and C. Vanaret, “Incremental Learning Based on Growing Gaussian Mixture Models,” in Int’l Conf. on Machine Learning and Applications and Workshops, vol. 2, Dec. 2011, pp. 47 –52.


Daniela POHL received her Dipl.-Ing. (Master's degree) in Computer Science in 2008 at the Alpen-Adria-Universität Klagenfurt, Austria. She is currently a research assistant and Ph.D. candidate at the Institute of Information Technology, Alpen-Adria-Universität Klagenfurt. She works in the scope of the EU-funded FP7 project BRIDGE (www.bridgeproject.eu) to develop technical solution to improve crisis management. Her research interests include social media analysis, information retrieval, data mining, and machine learning.












Abdelhamid BOUCHACHIA is currently an Associate Professor at the Bournemouth University, Department of Smart Technology Research Center, UK. His major research interests include Machine Learning and Soft Computing with a particular focus on online/incremental learning, semi-supervised learning, prediction systems, and uncertainty modeling. He is the general chair of the International Conference on Adaptive and Intelligent Systems (ICAIS). He serves as program committee member for many conferences. He also serves as Associate Editor of Evolving Systems and acts as member of Evolving Intelligent Systems (EIS) Technical Committee (TC) of the IEEE Systems, Man and Cybernetics Society, the IEEE Task-Force for Adaptive and Evolving Fuzzy Systems and the IEEE Computational Intelligence Society.










Hermann HELLWAGNER
is a full professor at the Institute of Information Technology, Klagenfurt University, Austria, leading the Multimedia Communications group. His current research areas are distributed multimedia systems, multimedia communications, and information-centric networking. He has published about 200 scientific papers on parallel computer architecture, parallel programming, and multimedia communications and adaptation. He is a senior member of the IEEE, member of the ACM, and Vice President of the Austrian Science Fund (FWF).

Comments