Combining Human and Machine Intelligence for Processing of Twitter Data During Mass Emergencies

Sarah Vieweg and Carlos Castillo
Qatar Computing Research Institute

Members of the public play a critical role in disaster and mass emergency situations; after an earthquake strikes, a hurricane makes landfall, or a wildfire begins to spread, area residents are first on the scene to start search and rescue operations, organize food and shelter, and help injured victims (Palen & Liu, 2007). Heightened use of information communication technologies (ICT) in disasters has extended the range of who can actively participate and respond in such situations—the ability to help is no longer limited to local or nearby populations. Through the use of social media sites and microblogging platforms, people who are thousands of miles from the area of impact can volunteer and help direct information, organize relief efforts, offer advice, and gather and distribute useful data (Starbird & Palen, 2011; Vieweg et al., 2010).

The role of social media in mass emergency and humanitarian crises is constantly evolving and gaining traction; people are increasingly turning to these online outlets to learn about and stay updated in such situations. And, as more and more people turn to social media and microblogging to communicate during these time-critical events, the amount of data generated continues to expand. Communications that take place on social media and microblogging platforms in such fraught, time-critical circumstances include explanations of the event, reactions to it, offers of emotional and financial support, and updates that provide practical, tactical information. It is this last type of information that we are concerned with, and for which we design end-user tools.

Digital volunteering has prompted technical communities to work on creating intuitive, easy-to-use tools that volunteers can use to quickly identify actionable, timely information generated on microblogging platforms, particularly on Twitter. Our specific goal is to leverage human intelligence by training machine-learning algorithms on human-annotated data. As the algorithms process the annotated data, they learn how to identify and extract relevant information in real-time, as crises unfold. The automatically categorized information can then be used by humanitarian organizations, responders and other parties focused on disaster relief and recovery. However, the challenges that come with training machines to intelligently process Twitter data in real-time during emergencies are many; we must consider how to manage tweet volume, speed of delivery, redundancy (i.e. avoiding repeated processing of the same tweet text), information verifiability, and how to train machines to process tweets according to their situated, contextualized content.

AIDR (Artificial Intelligence for Disaster Response) is a web-based platform that filters and classifies timely, actionable information broadcast via Twitter during crises (Imran et al, 2014). Users begin by creating a collection of tweets that corresponds to their requirements. Tweets may be selected for collection based on whether they contain particular keyword(s), or based on their geo-location coordinates. Once tweets are collected, they are provided to volunteers who read them, and label or “tag” each tweet with an appropriate user-defined category that specifies the content of the tweet; examples of labels might be high-level, such as: “contains situational awareness information,” “does not contain situational awareness information.” In addition, it is possible to label tweets with more detailed categories of information, such as “warning,” “damage,” “road closure,” etc. These tagged tweets are then used as training data for classifiers that learn to more accurately identify tweets that contain situational awareness information.

Fig. 1. Tweet example, ready for a volunteer to tag it with the appropriate category

An illustration of how the AIDR system works is shown below in Figure 2. The “crowdsourced examples” are the tweets that have been labeled by volunteers, using the interface shown in Figure 1.

Fig. 2. Overview of AIDR process

The ability to process tweets that contain valuable information in real-time, and provide members of the public, concerned outsiders, formal response agencies, and other related organizations with information regarding what is happening “on the ground” is critical. Such information enables these parties to better understand and support the needs of the affected population. The hope is that through the continued and prolonged use of Twitter to communicate valuable information during crises, machines will become more and more adept at identifying and extracting useful information, which will in turn empower people to better understand and respond in such situations.
AIDR is available for public use and review at


Imran, M. Castillo, C., Lucas, J., Meier, P. and Vieweg, S. (2014). AIDR: Artificial Intelligence for Disaster Response. WWW  ’14 Companion. Seoul, Korea.

Palen, L. and Liu, S.B. (2007). Citizen communication in crisis: anticipating a future of ICT-supported public participation. In Proc. CHI 2007. 727-736.

Starbird, K. and Palen, L. (2011). “Voluntweeters”: self-organizing by digital volunteers in times of crises. In Proc. CHI 2011. 1071-1080.

Vieweg, S., Hughes, A.L., Starbird, K. & Palen, L. (2010). Microblogging During Two Natural Hazards Events: What Twitter May Contribute to Situational Awareness. In Proc. CHI 2010, 1079-1088.

Sarah Vieweg is a Scientist in the Social Computing group at the Qatar Computing Research Institute. Her research focuses on the use of social media and microblogging during mass emergency and humanitarian crises in an effort to inform the development of tools that help stakeholders gain relevant information in time-sensitive situations.

Carlos Castillo is a Senior Scientist at the Qatar Computing Research Institute in Doha. He is a web miner with a background on information retrieval, and has been influential in the areas of adversarial web search and web content quality and credibility. His current research focuses in the application of web mining methods to problems in the domain of on-line news and humanitarian crises.