Shamanth Kumar, Fred Morstatter, and Huan Liu ABSTRACTCIDSE Arizona State University Tempe, AZ, USA {shamanth.kumar, fred.morstatter, huan.liu}@asu.edu Social media has emerged as a major platform for information sharing. Twitter, one such platform, is transforming the way people communicate, particularly during crises. One area that particularly benefits from this new information channel is Humanitarian Assistance and Disaster Relief. Twitter has been widely used in several major crisis around the world to share and transmit critical information. During the Boston Marathon Bombings in 2013, first reports of the incident were published on Twitter. This event grew to become one of the most-discussed events of the year. A challenge in leveraging tweets for crisis monitoring is information deluge. Identifying relevant information manually during a crisis is a daunting task. In this paper, we present two systems: TweetTracker and TweetXplorer, which are designed to address this challenge and aid first responders in monitoring emerging crises. TweetTracker is a tweet collection and aggregation system that addresses the problem of monitoring big social media data. It facilitates the collection and analysis of tweets from crises. TweetXplorer is an analysis platform, which addresses the challenge of understanding the big data generated during a crisis on social media. Using visual analytics, the platform allows first responders to gain insights from crisis data. Together these systems provide a collaborative environment where the users can monitor and analyze tweets to gain insight into crises via Twitter and efficiently identify relevant information using partially automated mechanisms. 1 INTRODUCTION Social media has become one of the prime methods of communication during natural disasters and other crises. In October of 2012, Hurricane Sandy ravaged the Eastern seaboard of the United States, taking many lives and costing billions of dollars in damage. In early 2011, the Arab Spring protests toppled many governments in the Middle East. These events are unique because of the manner in which social media was used by the people in the affected region. Social media posts from the affected individuals include sharing of information on the ground and, in the case of disasters, seeking help from disaster response agencies. As stated in a recent issue of the Scientific American, 'By the time Hurricane Sandy slammed the eastern seaboard last year, social media had become an integral part of disaster response, filling the void in areas where cell phone service was lost while millions of Americans looked to resources including Twitter and Facebook to keep informed, locate loved ones, notify authorities, and express support. Gone are the days of one-way communication where only official sources provide bulletins on disaster news'. Social media is the new way to communicate during crisis, however harnessing it for effective relief efforts is a challenging endeavor. Over the last several years, Twitter has grown to major proportions with 200 million users publishing 500 million tweets each day. These millions of tweets contain personal updates to discussion of disaster-related events. When crises occur, they are widely discussed on Twitter. During disasters, users post tweets asking for help, speculating about the state of the disaster, and spreading information from people involved in the crisis. The Federal Emergency Management Agency (FEMA) wrote that during Hurricane Sandy 'users sent more than 20 million Sandy-related Twitter posts, or 'tweets', despite the loss of cell phone service during the peak of the storm'. During Arab Spring protests, protesters published millions of tweets planning and coordinating protest activity. In this article, we discuss state-of-the-art efforts to use the information in crisis-related tweets to obtain situational awareness and to direct crisis relief efforts to the affected areas. We begin by introducing important challenges in understanding disaster-related tweets and then introduce two systems to address these challenged and help in understanding the deluge of crisis-related information generated on Twitter. Finally, we provide a use case for how these tools were used during Hurricane Sandy to direct relief efforts. 2 RESEARCH CHALLENGES Although social media is a rich source of information during mass emergencies and crises, there are unique challenges that prevent the direct consumption of this information. The large volume of information generated on social media sites makes manual analysis of the data impractical. Thus, partially automated solutions must be developed, and they must be intuitive and straightforward to use. Some specific challenges in handling social media data are:
Fig. 1: TweetTracker: The map shows tweets that are located using two strategies: green dots show geotagged tweets and blue dots are tweets located using the user’s profile location. The right pane shows a tag cloud of the top keywords in the dataset. 3 OVERVIEW OF TWEETTRACKER & TWEETXPLORER TweetTracker is a Tweet monitoring and collection system that focuses on collecting Twitter data for first responders to monitor disaster-related tweets of large volume and high velocity [5]. Its primary function is to make Twitter data conveniently and instantly accessible to the agencies for information gathering and situational awareness during a crisis and for intelligent decision making for disaster relief efforts. One of the primary functions of TweetTracker is to monitor tweets related to an event. To accommodate that TweetTracker has an advanced job-monitoring interface. In the interface, users can enter 'parameters' which describe the event. These parameters can be keywords, hashtags, geographic bounding boxes, and usernames. The parameters are entered by the user using firsthand knowledge of the event and the region. To facilitate collaborative collection of tweets, users can invite others to join their event to refine the list of parameters. This strategy mimics the collection of information in the real-world and enables better coverage of tweets for an event from different perspectives. The system also provides aggregate information on trending hashtags to facilitate the updation of the parameter list with new hashtags. Because Twitter data is big data, the number of tweets collected from one event will be too large to peruse manually. To accommodate first responders, TweetTracker supports some views for first analysis. Namely, TweetTracker supports the following views: temporal information that can be used to identify times of peak activity; geospatial information used to identify the location of people affected by the disaster; and content-related views to identify the hot topics on Twitter. An example of these views is shown in Figure 1. In addition to the data collection and visualization features offered by TweetTracker, the system also offers mature search functionality. Shown in Figure 2, the system allows a user to filter along different perspectives to obtain the most relevant information to them. Search options include text, geographical regions, language of the tweet, and specific users. Fig. 2: Search functionality of TweetTracker. Here we see the different search inputs at the top, the list of returned tweets, and the ability to export the tweets at the bottom. In addition to data visualization, TweetTracker also supports easy information export to other systems, thus easily integrating into existing workflows. With export tools such as TSV, and JSON, TweetTracker makes it easy to import its data into other tools (such as Excel). TweetXplorer is a visual analytics based tool which facilitates deeper analysis of the data collected by TweetTracker [10]. TweetXplorer helps the user to answer the following important questions: who is important, where relevant tweets originate, and when different trends propagate. By providing deeper knowledge about the Twitter data, the user can better understand the situation on the ground. A screenshot of TweetXplorer is shown in Figure 3. Fig. 3: TweetXplorer: The top-left pane shows the User . User retweet network. The top-right pane shows the geographic distribution of geotagged tweets. The bottom-right pane shows the grouped keyword query pane. TweetXplorer is enables the user to investigate the data along multiple facets. It allows users to group keywords to show related terms together, to search along multiple areas of thought. This is facilitated in the bottom-right component of Figure 3. Here, the user grouped the keywords according to different 'themes' of hurricane sandy: evacuation from New York City, requests for help, and infrastructure damage as a result of the storm. The top-left pane of TweetXplorer shows the view of the information propagation through retweets. Each node represents a user and each edge represents a retweet relationship between two users. By viewing this network, the user gets a sense of who the most important users are: who produce the most influential information and who spread that information to smaller communities. By clicking a user they can see observe tweets which were retweeted the most and what information a user is most interested in. They can also zoom in to see the user's specific retweet network. The map component allows a user to see geographic regions which received the most attention on Twitter. The map shows where users are tweeting from, an important piece of information in understanding the nature of the tweets. If the topics in the user query do not come from the disaster region, then further analysis may be needed along other dimensions. The components of TweetXplorer can also interact to allow the user to see different points of information. For example, a user of the system can select a user in the retweet network to see where his retweets originated from. This can give him an idea of the regions which are interested in the tweet's information. Another interaction is when the user selects a region on the map. This action updates the network to show the retweets that originate from within that region. A more detailed description of the components and strategies to address the challenges can be found in [6]. 4 FROM TWEETS TO INSIGHT To highlight the utility of TweetTracker and TweetXplorer, we present a case study focused on the tweets collected during Hurricane Sandy. A storm of this magnitude is highly unusual in this region of the United States, and as a result the disaster generated a tremendous amount of Twitter activity. We collected some of this discussion using TweetTracker from storm related keywords and Twitter usernames provided by the users of our system. The collection started on October 25, 2012 and continued through November 3, 2012. During this time, we collected 5,639,643 tweets related to the storm. Figure 1 shows an overview of the tweets collected on October 30. Fig. 4: Traffic trend for most severely affected areas of Hurricane Sandy. Investigating the data Consider a scenario, where an analyst intends to investigate tweets to understand Hurricane Sandy's impact. Clearly, the first step would be to identify regions of interest. This can be determined by analyzing the patterns in tweet traffic from the regions on Sandy's path. In Figure 4, we present a comparison of the traffic from parts of New York and New Jersey, the most severely affected regions. The traffic patterns indicate that tweets from northern NJ indicate high interest on the topic. It is also clear that the volume of tweets is highest on the day of the landfall (October 29). The next step in this investigation would be to understand and contrast the patterns in the content of tweets before, during, and after the disaster by drilling into the data. We partition the dataset into three distinct epochs: pre-landfall (2012-10-29 00:00 - 2012-10-29 17:59), landfall (2012-10-29 18:00 - 2012-10-30 23:59), and recovery (2012-10-31 00:00 - 2012-11-01 12:00). However, as we are interested in comparing the nature of the discussion, we identify keywords indicative of relevant topics as in Figure 3. Below, we discuss our findings from the three epochs. Fig. 5: Pre-landfall discussion of Hurricane Sandy. Fig. 6: Discussion of Hurricane Sandy immediately after landfall. Pre-Landfall In the hours leading up to Hurricane Sandy's landfall, we see discussions representing different issues. In Figure 5a, one of the most highly-retweeted tweets mentions the availability of pet shelters in evacuation areas, which indicates a concern from the pet owners regarding the safety of their pets during the storm. While the geotagged tweets produced during this epoch show generic discussion with no clear topic as a focus. At the beginning of the epoch we find that 'damage', and 'flood' are ranked highly in the New York area. However, as the storm neared, in Figure 5c we observe that specific issues such as 'rumors', 'damage', and 'subway' become popular. Landfall Hurricane Sandy made landfall on Oct 29, 2012 at 20:00 EST [12]. First reports of flooding started to arrive around this time. As seen in Figure 6a, these reports contain links to images of flooding. As the storm progressed, we observed several reports of power outage. This makes sense as Con Edison, New York's power supplier, claimed that this was the worst power outage in their history. Figure 6b, shows the tweets centered on power outage and flooding from New York City and its surrounding areas. Due to the power outage we observed that at least two hospitals were forced to evacuate their patients. In Figure 6c, we can see two clusters of retweets connected by common retweeters. These two users are @NYScanner and @ABC and the tweets claimed that the Bellevue Hospital and the NYU Langone Medical center were being evacuated due to power failure. Recovery After the storm, people turned their attention towards the estimated $71 billion in damage [14] caused by the hurricane. While analyzing the Twitter activity after the events, we notice the following: First, the most prominent tweets on the day after the hurricane are directing people to assistance in repairing the damage done to their homes as shown in Figure 7, Second, Figure 8 shows that the discussion in New York City focuses on the words 'damage', 'power', and 'flood', indicating that people have turned their attention to post-storm topics, such as power outage and post-storm cleanup. Fig. 7: Network of retweeters signifying the importance of recovery resources after the hurricane. Fig. 8: Tag cloud of most commonly-used words in New York after the hurricane. Scaling to Big Data The visual analytics implemented in the systems are designed to help users effectively filter information from a large volume of data. The visualizations initially present an overview of the data and further facilitate drill-down operations to refine the search. For example, a heatmap of the tweet locations helps a user to identify hotspots as in Figure 3. From this general view, a user can filter specific information through the zoom-in operation to focus on specific regions and learn about the topics from the region as seen in Figure 5b, 5c. In addition, to devising ways to efficiently filter data, The implementations in the system are also selected for their ability to scale to big data. For example, the computation of the network layout is an expensive operation, which is typically O(n3), where n is the number of nodes in the network. We select the implementation in D3, which reduces the computation complexity through the Barnes-Hut approximation to O(n2 log n). Even with this approximation the algorithm can still involve significant computation, so we reduce n by trimming the network to 3 hops away from the original tweet. Additionally, all the collected data is stored in a NoSQL database to allow for flexibility in handling changes to the data structure and to scale to the increasing data volume. Indexes are created for these queries that allow for fast retrieval of network information. 5 DISCUSSION In this paper, we illustrated some challenges involved in handling enormous and dynamic social media information during crises. As Twitter and other social media platforms grow in popularity, it is essential to successfully incorporate social media information into disaster response workflows. Crisis data also present interesting research issues, such as extracting situational awareness, predicting future disasters for early warning, and identifying trustworthy information. MacEachren et al. [8] introduced a tool which focused on geospatial analytics relying upon 'place-time' queries to extract situational awareness. Sakaki et al. [15] showed that tweets can be used to predict the location of an earthquake. Tweets have also been used to propagate rumors and false information [1] and identifying trustworthy tweets is an important task during crisis. Mendoza et al. [9] investigated the trustworthiness of tweets generated during crises and established that rumors were questioned more often by other Twitter users. We intend to develop and incorporate methods to address this challenge in our systems in the future. Dealing with noise and identifying relevant information in Twitter is a challenge. Kumar et al. [7], demonstrated that the communication of a smaller set of users can be followed to understand the progress of a crisis. Starbird and Palen [16] proposed a new tweeting model which facilitated faster identification and consumption of relevant information through the use of informational tags which described the type of information contained in the tweets. Acknowledgment |
E-Letter > STCSN E-Letter Vol.2 No.1 >