Monitoring Social Media for Humanitarian Assistance and Disaster Relief

Shamanth Kumar, Fred Morstatter, and Huan Liu
CIDSE
Arizona State University
Tempe, AZ, USA
{shamanth.kumar, fred.morstatter, huan.liu}@asu.edu


ABSTRACT
Social media has emerged as a major platform for information sharing. Twitter, one such platform, is transforming the way people communicate, particularly during crises. One area that particularly benefits from this new information channel is Humanitarian Assistance and Disaster Relief.  Twitter has been widely used in several major crisis around the world to share and transmit critical information. During the Boston Marathon Bombings in 2013, first reports of the incident were published on Twitter. This event grew to become one of the most-discussed events of the year. A challenge in leveraging tweets for crisis monitoring is information deluge. Identifying relevant information manually during a crisis is a daunting task. In this paper, we present two systems: TweetTracker and TweetXplorer, which are designed to address this challenge and aid first responders in monitoring emerging crises. TweetTracker is a tweet collection and aggregation system that addresses the problem of monitoring big social media data. It facilitates the collection and analysis of tweets from crises. TweetXplorer is an analysis platform, which addresses the challenge of understanding the big data generated during a crisis on social media. Using visual analytics, the platform allows first responders to gain insights from crisis data. Together these systems provide a collaborative environment where the users can monitor and analyze tweets to gain insight into crises via Twitter and efficiently identify relevant information using partially automated mechanisms.


1 INTRODUCTION
Social media has become one of the prime methods of communication during natural disasters and other crises. In October of 2012, Hurricane Sandy ravaged the Eastern seaboard of the United States, taking many lives and costing billions of dollars in damage. In early 2011, the Arab Spring protests toppled many governments in the Middle East. These events are unique because of the manner in which social media was used by the people in the affected region. Social media posts from the affected individuals include sharing of information on the ground and, in the case of disasters, seeking help from disaster response agencies. As stated in a recent issue of the Scientific American, 'By the time Hurricane Sandy slammed the eastern seaboard last year, social media had become an integral part of disaster response, filling the void in areas where cell phone service was lost while millions of Americans looked to resources including Twitter and Facebook to keep informed, locate loved ones, notify authorities, and express support. Gone are the days of one-way communication where only official sources provide bulletins on disaster news'. Social media is the new way to communicate during crisis, however harnessing it for effective relief efforts is a challenging endeavor.

Over the last several years, Twitter has grown to major proportions with 200 million users publishing 500 million tweets each day. These millions of tweets contain personal updates to discussion of disaster-related events. When crises occur, they are widely discussed on Twitter. During disasters, users post tweets asking for help, speculating about the state of the disaster, and spreading information from people involved in the crisis. The Federal Emergency Management Agency (FEMA) wrote that during Hurricane Sandy 'users sent more than 20 million Sandy-related Twitter posts, or 'tweets', despite the loss of cell phone service during the peak of the storm'. During Arab Spring protests, protesters published millions of tweets planning and coordinating protest activity.

In this article, we discuss state-of-the-art efforts to use the information in crisis-related tweets to obtain situational awareness and to direct crisis relief efforts to the affected areas. We begin by introducing important challenges in understanding disaster-related tweets and then introduce two systems to address these challenged and help in understanding the deluge of crisis-related information generated on Twitter. Finally, we provide a use case for how these tools were used during Hurricane Sandy to direct relief efforts.


2 RESEARCH CHALLENGES
Although social media is a rich source of information during mass emergencies and crises, there are unique challenges that prevent the direct consumption of this information. The large volume of information generated on social media sites makes manual analysis of the data impractical. Thus, partially automated solutions must be developed, and they must be intuitive and straightforward to use. Some specific challenges in handling social media data are:
  • Information Overload: Users on Twitter generate over 500 million tweets every day [4]. The recent popularity of Twitter in particular, during major crises such as the Arab Spring [3] protests, the Boston marathon bombing [2] has attracted the attention of crisis responders. Thus, there is a need for partially automated solutions to social media monitoring. Manually perusing the data at this scale is not practical. Automated mechanisms are desired to ingest the information generated on social media platforms. The goal is to facilitate human-in-the-loop computing, where analysts and first responders play an active role in the collection and analysis of the data.
  • Noise: Previous studies have shown that a large fraction of tweets are inter-personal conversations and other discussions [13]. As Twitter is globally visible, it is susceptible to such noise during crisis. Therefore, users should be able to filter noise and identify relevant information efficiently.
  • Location Information: Location information is vital during a crisis as first responders and analysts rely upon it to obtain situational awareness. However, location information is only available in ~1% of all tweets. Therefore, we need strategies to extract the location of tweets which do not explicitly provide this information.
  • Data Incompleteness: While Twitter users produce a vast amount of data each day, only a small subset of this information is available for public consumption. Twitter offers several APIs which allow for analysis to be carried out with its data. One question that arises is the representativeness of this data. Methodologies are needed to validate these data sources and determine areas of bias in these sources [11].
  • Collaborative Environment: The collection of information relevant to a crisis is a collaborative effort in the real-world. Volunteers on the ground collect, aggregate, and assimilate information from multiple sources to create a holistic view of the disaster's impact. However, in the virtual environment, such mechanisms are lacking. Thus, there is a need for a collaborative environment, where individuals can collectively identify and analyze social media information pertinent to a crisis.

Fig. 1: TweetTracker: The map shows tweets that are located using two strategies: green dots show geotagged tweets and blue dots are tweets located using the user’s profile location. The right pane shows a tag cloud of the top keywords in the dataset.


3 OVERVIEW OF TWEETTRACKER & TWEETXPLORER
TweetTracker is a Tweet monitoring and collection system that focuses on collecting Twitter data for first responders to monitor disaster-related tweets of large volume and high velocity [5]. Its primary function is to make Twitter data conveniently and instantly accessible to the agencies for information gathering and situational awareness during a crisis and for intelligent decision making for disaster relief efforts.

One of the primary functions of TweetTracker is to monitor tweets related to an event. To accommodate that TweetTracker has an advanced job-monitoring interface. In the interface, users can enter 'parameters' which describe the event. These parameters can be keywords, hashtags, geographic bounding boxes, and usernames. The parameters are entered by the user using firsthand knowledge of the event and the region. To facilitate collaborative collection of tweets, users can invite others to join their event to refine the list of parameters. This strategy mimics the collection of information in the real-world and enables better coverage of tweets for an event from different perspectives. The system also provides aggregate information on trending hashtags to facilitate the updation of the parameter list with new hashtags.

Because Twitter data is big data, the number of tweets collected from one event will be too large to peruse manually. To accommodate first responders, TweetTracker supports some views for first analysis. Namely, TweetTracker supports the following views: temporal information that can be used to identify times of peak activity; geospatial information used to identify the location of people affected by the disaster; and content-related views to identify the hot topics on Twitter. An example of these views is shown in Figure 1.

In addition to the data collection and visualization features offered by TweetTracker, the system also offers mature search functionality. Shown in Figure 2, the system allows a user to filter along different perspectives to obtain the most relevant information to them. Search options include text, geographical regions, language of the tweet, and specific users.

Fig. 2: Search functionality of TweetTracker. Here we see the different search inputs at the top, the list of returned tweets, and the ability to export the tweets at the bottom.


In addition to data visualization, TweetTracker also supports easy information export to other systems, thus easily integrating into existing workflows. With export tools such as TSV, and JSON, TweetTracker makes it easy to import its data into other tools (such as Excel).

TweetXplorer is a visual analytics based tool which facilitates deeper analysis of the data collected by TweetTracker [10]. TweetXplorer helps the user to answer the following important questions: who is important, where relevant tweets originate, and when different trends propagate. By providing deeper knowledge about the Twitter data, the user can better understand the situation on the ground. A screenshot of TweetXplorer is shown in Figure 3.

Fig. 3: TweetXplorer: The top-left pane shows the User . User retweet network. The top-right pane shows the geographic distribution of geotagged tweets. The bottom-right pane shows the grouped keyword query pane.


TweetXplorer is enables the user to investigate the data along multiple facets. It allows users to group keywords to show related terms together, to search along multiple areas of thought. This is facilitated in the bottom-right component of Figure 3. Here, the user grouped the keywords according to different 'themes' of hurricane sandy: evacuation from New York City, requests for help, and infrastructure damage as a result of the storm.

The top-left pane of TweetXplorer shows the view of the information propagation through retweets. Each node represents a user and each edge represents a retweet relationship between two users. By viewing this network, the user gets a sense of who the most important users are: who produce the most influential information and who spread that information to smaller communities. By clicking a user they can see observe tweets which were retweeted the most and what information a user is most interested in. They can also zoom in to see the user's specific retweet network.

The map component allows a user to see geographic regions which received the most attention on Twitter. The map shows where users are tweeting from, an important piece of information in understanding the nature of the tweets. If the topics in the user query do not come from the disaster region, then further analysis may be needed along other dimensions.

The components of TweetXplorer can also interact to allow the user to see different points of information. For example, a user of the system can select a user in the retweet network to see where his retweets originated from. This can give him an idea of the regions which are interested in the tweet's information. Another interaction is when the user selects a region on the map. This action updates the network to show the retweets that originate from within that region.

A more detailed description of the components and strategies to address the challenges can be found in [6].


4 FROM TWEETS TO INSIGHT
To highlight the utility of TweetTracker and TweetXplorer, we present a case study focused on the tweets collected during Hurricane Sandy. A storm of this magnitude is highly unusual in this region of the United States, and as a result the disaster generated a tremendous amount of Twitter activity. We collected some of this discussion using TweetTracker from storm related keywords and Twitter usernames provided by the users of our system. The collection started on October 25, 2012 and continued through November 3, 2012. During this time, we collected 5,639,643 tweets related to the storm. Figure 1 shows an overview of the tweets collected on October 30.

Fig. 4: Traffic trend for most severely affected areas of Hurricane Sandy.


Investigating the data
Consider a scenario, where an analyst intends to investigate tweets to understand Hurricane Sandy's impact. Clearly, the first step would be to identify regions of interest. This can be determined by analyzing the patterns in tweet traffic from the regions on Sandy's path. In Figure 4, we present a comparison of the traffic from parts of New York and New Jersey, the most severely affected regions. The traffic patterns indicate that tweets from northern NJ indicate high interest on the topic. It is also clear that the volume of tweets is highest on the day of the landfall (October 29). The next step in this investigation would be to understand and contrast the patterns in the content of tweets before, during, and after the disaster by drilling into the data. We partition the dataset into three distinct epochs: pre-landfall (2012-10-29 00:00 - 2012-10-29 17:59), landfall (2012-10-29 18:00 - 2012-10-30 23:59), and recovery (2012-10-31 00:00 - 2012-11-01 12:00). However, as we are interested in comparing the nature of the discussion, we identify keywords indicative of relevant topics as in Figure 3. Below, we discuss our findings from the three epochs.

Fig. 5: Pre-landfall discussion of Hurricane Sandy.

Fig. 6: Discussion of Hurricane Sandy immediately after landfall.



Pre-Landfall
In the hours leading up to Hurricane Sandy's landfall, we see discussions representing different issues. In Figure 5a, one of the most highly-retweeted tweets mentions the availability of pet shelters in evacuation areas, which indicates a concern from the pet owners regarding the safety of their pets during the storm. While the geotagged tweets produced during this epoch show generic discussion with no clear topic as a focus. At the beginning of the epoch we find that 'damage', and 'flood' are ranked highly in the New York area. However, as the storm neared, in Figure 5c we observe that specific issues such as 'rumors', 'damage', and 'subway' become popular.

Landfall
Hurricane Sandy made landfall on Oct 29, 2012 at 20:00 EST [12]. First reports of flooding started to arrive around this time. As seen in Figure 6a, these reports contain links to images of flooding. As the storm progressed, we observed several reports of power outage. This makes sense as Con Edison, New York's power supplier, claimed that this was the worst power outage in their history. Figure 6b, shows the tweets centered on power outage and flooding from New York City and its surrounding areas. Due to the power outage we observed that at least two hospitals were forced to evacuate their patients. In Figure 6c, we can see two clusters of retweets connected by common retweeters. These two users are @NYScanner and @ABC and the tweets claimed that the Bellevue Hospital and the NYU Langone Medical center were being evacuated due to power failure.

Recovery
After the storm, people turned their attention towards the estimated $71 billion in damage [14] caused by the hurricane. While analyzing the Twitter activity after the events, we notice the following: First, the most prominent tweets on the day after the hurricane are directing people to assistance in repairing the damage done to their homes as shown in Figure 7, Second, Figure 8 shows that the discussion in New York City focuses on the words 'damage', 'power', and 'flood', indicating that people have turned their attention to post-storm topics, such as power outage and post-storm cleanup.

Fig. 7: Network of retweeters signifying the importance of recovery resources after the hurricane.

Fig. 8: Tag cloud of most commonly-used words in New York after the hurricane.


Scaling to Big Data
The visual analytics implemented in the systems are designed to help users effectively filter information from a large volume of data. The visualizations initially present an overview of the data and further facilitate drill-down operations to refine the search. For example, a heatmap of the tweet locations helps a user to identify hotspots as in Figure 3. From this general view, a user can filter specific information through the zoom-in operation to focus on specific regions and learn about the topics from the region as seen in Figure 5b, 5c. In addition, to devising ways to efficiently filter data, The implementations in the system are also selected for their ability to scale to big data. For example, the computation of the network layout is an expensive operation, which is typically O(n3), where n is the number of nodes in the network. We select the implementation in D3, which reduces the computation complexity through the Barnes-Hut approximation to O(n2 log n). Even with this approximation the algorithm can still involve significant computation, so we reduce n by trimming the network to 3 hops away from the original tweet. Additionally, all the collected data is stored in a NoSQL database to allow for flexibility in handling changes to the data structure and to scale to the increasing data volume. Indexes are created for these queries that allow for fast retrieval of network information.


5 DISCUSSION
In this paper, we illustrated some challenges involved in handling enormous and dynamic social media information during crises. As Twitter and other social media platforms grow in popularity, it is essential to successfully incorporate social media information into disaster response workflows. Crisis data also present interesting research issues, such as extracting situational awareness, predicting future disasters for early warning, and identifying trustworthy information. MacEachren et al. [8] introduced a tool which focused on geospatial analytics relying upon 'place-time' queries to extract situational awareness. Sakaki et al. [15] showed that tweets can be used to predict the location of an earthquake. Tweets have also been used to propagate rumors and false information [1] and identifying trustworthy tweets is an important task during crisis. Mendoza et al. [9] investigated the trustworthiness of tweets generated during crises and established that rumors were questioned more often by other Twitter users. We intend to develop and incorporate methods to address this challenge in our systems in the future. Dealing with noise and identifying relevant information in Twitter is a challenge. Kumar et al. [7], demonstrated that the communication of a smaller set of users can be followed to understand the progress of a crisis. Starbird and Palen [16] proposed a new tweeting model which facilitated faster identification and consumption of relevant information through the use of informational tags which described the type of information contained in the tweets.

Acknowledgment
These projects are supported in part by the Office of Naval Research grants N000141110527 and N000141410095. The authors would also like to thank the volunteers at Humanity Road Inc. for their suggestions and comments.


REFERENCES

[1] M. Castillo. Searching For Truth in Venezuela. http://www.cnn.com/2014/02/21/world/americas/venezuela-fact-from-fiction/, Feb. 2014. [Online; accessed 21-February-2014].

[2] D. Gilgoff and J. J. Lee. Social Media Shapes Boston Bombings Response. http://bit.ly/1iESExb, Apr. 2013. [Online; accessed 27-January-2014].

[3] C. Huang. Facebook and Twitter key to Arab Spring uprisings: report. http://www.thenational.ae/news/uae-news/facebook-and-twitter-key-to-arab-spring-uprisings-report, June 2011. [Online; accessed 28-August-2013].

[4] S. Kim. Twitter’s IPO Filing Shows 215 Million Monthly Active Users. http://abcnews.go.com/Business/twitter-ipo-filing-reveals-500-million-tweets-day/story?id=20460493, 2013. [Online; accessed 26-February-2014].

[5] S. Kumar, G. Barbier, M. Abbasi, and H. Liu. TweetTracker: An Analysis Tool for Humanitarian and Disaster Relief. In Proceedings of 5th AAAI International Conference on Weblogs and Social Media, 2011.

[6] S. Kumar, F. Morstatter, and H. Liu. Twitter Data Analytics. Springer, 2014.

[7] S. Kumar, F. Morstatter, R. Zafarani, and H. Liu. Whom Should I Follow? Identifying Relevant Users During Crises. In Proceedings of the 24th ACM conference on Hypertext and social media, pages 139–147. ACM, 2013.

[8] A. M. MacEachren, A. C. Robinson, A. Jaiswal, S. Pezanowski, A. Savelyev, J. Blanford, and P. Mitra. Geo-Twitter Analytics: Applications in Crisis Management. In Proceedings, 25th International Cartographic Conference, Paris, France, 2011.

[9] M. Mendoza, B. Poblete, and C. Castillo. Twitter Under Crisis: Can we Trust What We RT? In Proceedings of the First Workshop on Social Media Analytics, 2010.

[10] F. Morstatter, S. Kumar, H. Liu, and R. Maciejewski. Understanding Twitter Data with TweetXplorer. In KDD, pages 1482–1485. ACM, 2013.

[11] F. Morstatter, J. Pfeffer, H. Liu, and K. M. Carley. Is the Sample Good Enough? Comparing Data from Twitters Streaming API With Twitters Firehose. Proceedings of ICWSM, 2013.

[12] NHC. Post-Tropical Cyclone SANDY. http://www.nhc.noaa.gov/archive/2012/al18/al182012.update.10300002.shtml, October 2012.

[13] Pear Analytics. Twitter Study. http://www.pearanalytics.com/wp-content/uploads/2012/12/Twitter-Study-August-2009.pdf, 2009. [Online; accessed 27-January-2014].

[14] H. Russ. New York, New Jersey put $71B price tag on Sandy. http://news.msn.com/us/new-york-new-jersey-put-dollar71b-price-tag-on-sandy, November 2012.

[15] T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake Shakes Twitter Users: Real-Time Event Detection by Social Sensors. In Proceedings of the 19th international conference on World wide web, pages 851–860. ACM, 2010.

[16] K. Starbird and L. Palen. Voluntweeters: Self-Organizing by Digital Volunteers in Times of Crisis. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 1071–1080. ACM, 2011.

Shamanth Kumar is a Ph.D. Student in Computer Science at Arizona State University, and is interested in social media mining, online user behavior analysis, and information visualization. He works on social media based data analysis tools targeted towards information gathering and information analysis during Humanitarian Assistance/Disaster Relief events. He is the Chief Architect of TweetTracker (http://tweettracker.fulton.asu.edu/), which is a Twitter data aggregation and analysis system. He also works on TweetXplorer, which is an advanced visual analytics system for Twitter data. He has published research papers in several peer-reviewed conferences and workshops. He has also served as an external reviewer for various conferences and journals. He has served as a Program Committee member at SBP 2013, IJCAI 2013, and SBP 2014. A full list of his publications and updated information can be found at http://www.public.asu.edu/~skumar34/.








Fred Morstatter is a PhD student in computer science at Arizona State University in Tempe, Arizona. He is a research assistant in the Data Mining and Machine Learning (DMML) laboratory. Fred won the Dean's Fellowship for outstanding leadership and scholarship during his time at ASU. He is the Principal Architect for TweetXplorer, an advanced visual analytic system for Twitter data. He has also worked on TweetTracker. He has published in ICWSM, WWW, KDD, IEEE Intelligent Systems. He has also published a book: Twitter Data Analytics. Contact him at fred.morstatter@asu.edu. A full list of publications and updated information can be found at http://www.public.asu.edu/~fmorstat.











Huan Liu is a professor of Computer Science and Engineering at Arizona State University. He obtained his Ph.D. in Computer Science at University of Southern California and B.Eng. in Computer Science and Electrical Engineering at Shanghai JiaoTong University. Before he joined ASU, he worked at Telecom Australia Research Labs and was on the faculty at National University of Singapore. He was recognized for excellence in teaching and research in Computer Science and Engineering at Arizona State University. His research interests are in data mining, machine learning, social computing, and artificial intelligence, investigating problems that arise in many real-world, data-intensive applications with high-dimensional data of disparate forms such as social media. His well-cited publications include books, book chapters, encyclopedia entries as well as conference and journal papers. He serves on journal editorial boards and numerous conference program committees, and is a founding organizer of the International Conference Series on Social Computing, Behavioral-Cultural Modeling, and Prediction (http://sbp.asu.edu/). He is an IEEE Fellow and an ACM Distinguished Scientist. Updated information can be found at http://www.public.asu.edu/~huanliu.

Comments