Communication Patterns in the Altmetrics 2014 Workshop

Christof Steinkellner
Graz University of Technology
Graz, Austria
Email: csteinkellner@student.tugraz.at


ABSTRACT
Researchers at scientific conferences increasingly use Twitter as a tool to share information, to discuss and to promote their scientific work.  In this paper, we empirically analyze a Twitter dataset that has been acquired at the altmetrics 2014 workshop that was co-located with the ACM Web Science conference. More specifically, we investigate the activity of all users who tweeted about the workshop in three types of communication networks that we extracted from the dataset: the user mentions network, the retweets network and the followers network. In all three networks, we determined the most active users. Our study shows that most users are rather passive while only a small number of influential users are responsible for most of the observed activities.


1 INTRODUCTION
In the past, scholarly communication mostly involved creating and publishing scientific publications and citing publications of other scientists. Social media has brought the long standing communication patterns between researchers to the next level - many researchers use social media to present intermediate results, promote publications, introduce new ideas and discuss various topics in real-time. For example, Twitter has become an important communication platform for research communities and is already used by about one researcher out of 40 [12].

Often a scientist's reputation is largely measured by her or his received citations. However, getting citations for a publication may take quite a long time. The novel research field of altmetrics aims to address this limitation as well as to take into account other the scientific outputs than papers that are generated on social media platforms. Altmetrics consider for example the count of downloads, views, pages linking to a publication and citations in social networks. Consequently, altmetrics have been proposed as new and complementary metrics to quantify a researcher's scientific impact [1].

In this work, we study the Twitter dataset of the altmetrics 2014 conference workshop. Our goal is to detect communication patterns in this research community and to identify the most influential workshop attendees. Attendees are considered being important if they either created a significant part of the Twitter data or if their names or their tweets are frequently mentioned. Our analysis is mostly performed by means of network visualization, i.e., we generate network plots of the three user interactions mentions, retweets and replies. Additionally, we investigate the friendship and follower relations of the Twitter users in the dataset.


2 RELATED WORK
Twitter datasets are widely used by researchers for various studies. There are already many studies on the usage of Twitter in scientific communities. In [7] a scientific Twitter dataset was used to detect and visualize trends over time. Eysenbach discovered in [4] that tweets can predict highly cited papers only three days after publication. The importance of Twitter in scientific communities is also shown by its usage for altmetrics [10]. The analysis of Twitter data of conferences in [8] identified important topics and showed that important people at a conference are also represented as important users in the Twitter dataset. The work presented in [11] shows that Twitter enables to reach a wider audience. According to [3], the popularity of Twitter users is not directly linked to the number of retweets and user mentions, because the quality and content of a tweet is regarded as the main factor for these measurements. In [14], it was discovered that there are typically two types of tweets during a scientific conference: citations of external sources and quotations through retweets.


3 EXPERIMENTS AND RESULTS
In the following section, the analysis of the Twitter dataset gathered at the "Altmetrics14: expanding impacts and metrics" workshop is presented. This workshop took place at the ACM Web Science Conference 2014. The dataset consists of 1758 tweets that were created by 204 unique Twitter users. The dataset is freely available on Figshare [9].

Preprocessing the data
First of all, the unique users were extracted from the dataset and linked to the tweets they created. Each tweet consists of the tweet-text, an author reference, a unique ID and some additional data, e.g., which users were mentioned or which hashtags were used in the tweet. Consequently, a user is described by a name, a unique ID and also a unique screen-name (the name usually displayed on Twitter). An examination of the dataset showed that some user IDs and the follower and friend relationship information were missing. In order to complete the collection of users, the Twitter-API [13] was used to acquire the missing data. We also tried to crawl additional data for the tweets using the Twitter API but this was unsuccessful due to missing tweet IDs in the dataset and the resulting constraints of the Twitter API.

Users mentioned in tweets
In the first experiment, we visualized the users that were mentioned in tweets by others. Note that in the original dataset, there is no distinction between retweets, replies to tweets and user mentions. Consequently, user mentions could also be replies or retweets, which is a limitation of this study.

In the next step, we created a visual matrix of user mentions. The matrix rows show how often a user mentioned another user, while the columns show how often a user got mentioned by another user.  Additionally, each cell-background was colored with a color gradient ranging from white (zero values) over yellow (values around 20) to red (values above 100). A linear color gradient ranging from the smallest to the biggest value would not work as well, because then, there would be only one colored value that represents a very high value. The resulting matrix is of size 205x205 and hard to visually interpret due to its size. Therefore, we reduced the original dataset to a smaller subset that contains only users with more than the  average user-mentions. The resulting matrix visualization is depicted in Figure 1.

Fig. 1. Matrix of user mentions. The rows indicate the mentions of other users, the columns show how many mentions a user got by whom.

From this matrix visualization, we can derive that the most active user is named RouhiRoo. This user mentioned most other users and is often mentioned by others. The fact that RouhiRoo is a sales manager of Altmetric, could explain her high activity on the workshop. The most mentioned user is called altmetric. This Twitter user represents the official account of the company Altmetric.com and has a large number of mentions. Other notable users are habib and iaravps, since these users created the second most mentions. However, these two users are only rarely mentioned themselves. habib is also an employee of Altmetric and iaravps is a PhD student working on altmetrics, which explains their multiple mentions of RouhiRoo and altmetric. Another relevant user is apparently mfenner, a researcher also working on altmetrics. mfenner has many mentions, but is quite inactive himself, given his user mentions. Since the matrix already shows only about a seventh of all users, the matrix also shows that the majority of users are rather passive in terms of mentions. Specifically, most users neither mention many other users, nor are mentioned themselves.

To provide another perspective of the data displayed by the matrix, a network plot of the user mention matrix of all users was created using the Gephi [5] analysis tool. In this network plot, each value above zero represents an edge with the weight of this value between the mentioned and the mentioning user. The resulting plot can be seen in Figure 2 whereas the size of a node in the network is determined by its authority. Note that node authority is calculated using the HITS algorithm [6]. The edge width is characterized by its weight. We perform community detection on the network using the modularity filter that is included in Gephi. This modularity filter uses the Louvain method for community detection [2]. The nodes are therefore colored in respect to the community they have been assigned to.

Fig. 2. User mentions as network plot with Gephi. Each node represents a user. An edge from user A to user B indicates that user A mentioned user B. Thicker edges indicate multiple mentions

The plot shows that the observed community has a strongly connected component, which includes the most important users. This tightly connected area in the graph reflects that the users which are part of this area are mentioned a lot or mention many other users. However, the plot also shows that some users like the official Twitter accounts of figshare and ImpactStory are more prominent in the plot than in the matrix. Since the node size is based on their authority in the network, these nodes are also most influential in the graph. This therefore enables us to more accurately assess influential users in the network compared to the matrix based visualization where importance is determined by the highest values of user mentions. The plot also shows that for instance figshare is an important user, but not in the center of communications. The user RouhiRoo was quite outstanding in the matrix plot, but in the network plot, the user is just one of the medium sized nodes. The comparison of mere mention counting and the authority of a user shows that authority also takes the importance of the mentioning user into account. Consequently, the ranking of users by their authority values is more reasonable in our setting, because we want to filter the most important users.

Replies and retweets in the dataset
Next, we investigated the replies in the dataset. Again, we created plots for these types of communication patterns. However, the information of this plot is rather sparse, because in this dataset only about ten users use the reply functionality. The resulting plot can be seen in Figure 3.

Fig. 3. Replies as network plot with Gephi. Each node represents a user. An edge from user A to user B indicates that user A replied to a tweet of user B. Thicker edges indicate more replies

Then, we investigated the retweet interaction patterns in the dataset. Unfortunately, the retweet information is not included in the dataset and it cannot be easily obtained via the Twitter API, because in that case, we would need the tweet ID. Henceforth, all tweets starting with RT @some\_user\_name were assumed to be retweets to a tweet of user some\_user\_name. We created a network plot using these retweets (see Figure 4). There, each retweet is represented by an edge from the creator of the retweet to the creator of the initial tweet. Edges with a higher edge weight represent multiple retweet relations.

Fig. 4. Retweets as network plot with Gephi. Each node represents a user. An edge from user A to user B indicates that user A retweeted a tweet of user B. Thicker edges indicate more retweets

This plot of retweets shows that there are only a few users, who were retweeted more than once and that only few users created more than one retweet. Again, as expected, the users in the connected component retweet each other a lot. From the plot, we can see that there are two important users who are not so strongly connected to the other important users. Note that the network of retweets is a subset of the mentions network, because each retweet is also counted as a mention. The mention network plot without retweets and replies can be seen in Figure 5.

Fig. 5. User mentions without retweets as network plot with Gephi. Each node represents a user. An edge from user A to user B indicates that user A mentioned user B. Thicker edges indicate multiple mentions

This plot clearly shows that many mentions are just retweets. The two biggest nodes figshare and altmetric are not in the retweets plot, so they were just mentioned and never retweeted. The importance of these two nodes is not really surprising given that both are companies related to Science 2.0/Open Science and altmetrics, and both companies are closely linked to the event. In order to find users who were both retweeted and mentioned independently, we investigated the largest nodes (i.e., the nodes with highest authority values) in the retweet and mentions in more detail. We found that the Twitter users stefhaustein, RodrigoCostas1 and juancommander were represented as nodes with high authority in both plots. These three users are researchers whose research interests also indicate that they are closely linked to the event, which may explain their higher importance.

Friends and Followers of Users
Friend and follower relationships are another dimension of Twitter data. These relationships are created if a user A follows user B, then user B is considered being a friend of user A. Hence follower and friend relationships are just two different perspectives on the same data. The follower and friendship relationships are examined to find matches between these relationships and the interaction patterns. We plotted the friend and follower relation using Gephi, as shown in figures 6 and 7. The coloring, node-size and font-size reflect again communities as well as authorities.

Fig. 6. User follower relationship as Gephi network plot. Each node represents a user. An edge from user A to user B indicates that user A follows user B.

Fig. 7. User friend relationship as Gephi network plot. Each node represents a user. An edge from user A to user B indicates that user B follows user A.

The friend and the follower relationship plots reveal that these networks are more connected than the others. Moreover, more larger nodes are present, as well as many more notable nodes, as opposed to the other plots. The most notable nodes in the mentions and retweet plots are also important nodes in one or both of the friends/follower plots. Apparently, notable users in the mentions plot are more likely to be well connected with friend/follower relationships than notable users in the retweets plot. This assumption can however not be sufficiently explained without a more detailed research with more than one dataset. Furthermore, the differences between these two plots clearly show that having many followers doesn't necessarily indicate many friends as well and vice versa. The observed Twitter users do not split into observable groups given their follower/friend relationships and there are only few users who are not tightly connected to the main network.


4 CONCLUSION
Twitter has become an important platform for communication between researchers. The analysis of communication patterns based on the Twitter data of a scientific event clearly shows that there is a small group of very active users and a big group of less active users. The classification of the Twitter data into user mentions, replies and retweets gives additional information of the type of the communication. Although important Twitter users can be detected based on their authority, the reason for their status cannot be determined without further contextual information and knowledge. For example, given only the pure Twitter data, the reason for lots of user mentions cannot be determined. In the future, we will apply content analysis to better understand why users are more mentioned than others. The same holds true for the users, who created the majority of the Twitter data, since understanding their roles at the event need more contextual, external information.

However, the analyzed data shows that there is no easily interpretable correlation between creating many tweets and getting a lot attention either by more followers or by getting many user mentions. Another interesting finding is that there are users in our dataset, whose only significance is that they are retweeting other users frequently. Then again there are users, who are quite inactive, but still seem significant, because they are mentioned a lot. The analysis of the follower and friend relationships revealed that all notable users from the other sets, i.e., mentions, retweets and replies, are important here, too.


REFERENCES

[1] Barbaro, Annarita, Donatella Gentili, and Chiara Rebuffi. ”Altmetrics as new indicators of scientific impact.” Journal of the European Association for Health Information and Libraries 10.1 (2014): 4.

[2] Blondel, V. D., Guillaume, J. L., Lambiotte, R., and Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10), P10008.

[3] Cha, Meeyoung, et al. ”Measuring User Influence in Twitter: The Million Follower Fallacy.” ICWSM 10.10-17 (2010): 30.

[4] Eysenbach, Gunther. ”Can tweets predict citations? Metrics of social impact based on Twitter and correlation with traditional metrics of scientific impact.” Journal of medical Internet research 13.4 (2011).

[5] Gephi.org (2015): Gephi - The Open Graph Viz Platform http://gephi.github.io Retrieved March 26 (2015)

[6] Kleinberg, Jon M. ”Hubs, authorities, and communities.” ACM Computing Surveys (CSUR) 31.4es (1999): 5.

[7] Kraker, P., Wagner, C., Jeanquartier, F., and Lindstaedt, S. (2011). On the Way to a Science Intelligence: Visualizing TEL Tweets for Trend Detection. In Proceedings of the 6th European Conference on Technology Enhanced Learning (pp. 220232). Springer. doi:10.1007/978-3-642-23985-4 18

[8] Letierce, Julie, et al. ”Understanding how Twitter is used to spread scientific messages.” (2010).

[9] Priego, Ernesto (2014): An #altmetrics14 Twitter Archive. figshare. http://dx.doi.org/10.6084/m9.figshare.1151577 Retrieved March 26 (2015)

[10] Priem, Jason, et al. ”Alt-metrics: A manifesto,(v. 1.0).” http://altmetrics.org/manifesto/ Retrieved January 08 (2015)

[11] Priem, Jason, and Costello, Kaitlin. ”How and why scholars cite on Twitter.” Proceedings of the American Society for Information Science and Technology 47.1 (2010): 1-4.

[12] Priem, Jason; Costello, Kaitlin; Dzuba, Tyler (2012): Prevalence and use of Twitter among scholars. figshare. http://dx.doi.org/10.6084/m9.figshare.104629 Retrieved 12:46, May 27, 2015 (GMT)

[13] Twitter, Inc (2015): REST APIs https://dev.twitter.com/rest/public Retrieved March 26 (2015)

[14] Weller, Katrin, Evelyn Drge, and Cornelius Puschmann. ”Citation Analysis in Twitter: Approaches for Defining and Measuring Information Flows within Tweets during Scientific Conferences.” # MSM. 2011.



Christof Steinkellner
is currently pursuing his Master’s degree in the program of Software Development and Business Management at Graz University of Technology. He received his Bachelor’s degree also in the program of Software Development and Business Management at Graz University of Technology.  His research interests include Science 2.0 and the usage of social media platforms for research.
Comments