The students in my artificial intelligence course recently participated in a competition in which they formed 10 teams to design, develop and deploy "social robots" (socialbots) on Twitter [the Twitter profile images for the teams' socialbots are shown on the right]. 500 Twitter accounts were semi-randomly selected as a target user population, and the measurable goals were for the socialbots to operate autonomously for 5 days and gain as many Twitter followers and mentions by members of the target population as possible. The broader goals were to create a fun and competitive undergraduate team project with real-world relevance in a task domain related to AI.
I'm pretty excited about the overall experience, but I recognize that others may not share the same level of enthusiasm, so I'll offer a brief roadmap of what follows in the rest of this post. I'll start off highlighting the high level outcomes from the event, provide some background on this and the first socialbots competition, share more detailed statistics about the target population of users and the behavior of the socialbots, briefly summarize the strategies employed by the teams, and end off with a few observations and reflections.
The outcomes, with respect to specific measurable goals, were
- 138 Twitter accounts in the target user population (27%) followed at least one socialbot
- The number of targeted Twitter accounts that followed each socialbot ranged from 4 to 98
- 60 Twitter accounts in the target user population (12%) mentioned at least one socialbot
- 108 mentions of one or more socialbots was made by targeted Twitter accounts
- The number of mentions of each socialbot from the target population ranged from 0 to 34
Outcomes regarding the broader goals are more difficult to assess. The students - computer science seniors at the University of Washington, Tacoma's Institute of Technology - seemed to enjoy the competition; one enthusiastically described the experience of observing the autonomous socialbots over the 5 days like "watching a drunken uncle or family member at a party: you never know what’s going to come out of his/her mouth". And they learned a lot about Python, artificial intelligence - or, at least, social intelligence - Twitter, and the ever-evolving Twitter API ... skills that I believe will serve them well, and differentiate these new CS graduates from many of their peers (hopefully, in a positive way).
Background on the Socialbots Competitions
The project was inspired by the Socialbots competition orchestrated by Tim Hwang and his colleagues at the Web Ecology Project earlier this year. I read an article about Socialbots in New Scientist shortly after the quarter began, and was intrigued by the way the competition involved elements of artificial intelligence, social networks, network security as well as other aspects of technology, psychology, sociology, ethics and politics, all in the context of Twitter. Several articles about the initial Socialbots competition focused on the darker side of social robots, no doubt related to the revelation of the U.S. military's interest in online persona management services for influencing [foreign?] online political discussions around the time the competition ended. However, Tim has consistently championed the potential positive impacts of socialbots designed to build bridges between people and groups - via chains of mutual following and mentions - that can promote greater awareness, understanding and perhaps even cooperation on shared goals.
The initial Socialbots competition lasted 4 weeks; ours lasted 2 weeks ... and in the compressed context of our Socialbots competition, we didn't have time to explore the grander goals articulated by Tim. In fact, given that most of the students had never programmed in Python or used a web services API, and several had never used Twitter, there was a lot to learn just to enable the construction of autonomous software that could use the Twitter API to read and write status updates (including retweets), follow other Twitter users and/or "favorite" other Twitter users' status updates.
We began the course by covering some of the basic concepts in AI (the first several chapters of Artificial Intelligence: A Modern Approach, by Russell & Peter Norvig, which has an associated set of Python scripts for AI algorithms) and an introduction to Python (using Python 2.6.6, which was the latest version still compatible with the AIMA Python scripts). Once we turned our attention to socialbots, we engaged in a very brief whirlwind tour of the Natural Language Toolkit (also based on Python 2.6), and had a crash course on the Twitter API, making extensive use of Mike Verdone's Python Twitter Tools (probably simplest Twitter API wrapper for Python) and Vivek Haldar's Shrinkbot (a simple Python bot based on the classic Eliza program modeling a Rogerian psychotherapist). The students also had access to the series of Web Ecology Project blog posts on the initial Socialbots competition, as well as the additional insights and experienced shared by Dubose.Cole, one of the participants in that competition, on What Losing a Socialbots Competition Taught Me About Humans and Social Networking.
We adopted the same basic rules as the initial competition:
- no human intervention or interaction with the target user population
- no revealing the game
- no reporting other socialbots as spam
We included the additional provisions that the bots must avoid the use of inappropriate language and may not issue any offers of or solicitations for money, sex or jobs. I don't know if these issues arose in the initial competition, but I wanted to be explicit about them in our competition.
The initial competition included 2 weeks of development, and 2 weeks of deployment, in the middle of which there was a 24-hour period during which all socialbots had to cease operation, software updates could be made, and the possibly updated socialbots were relaunched. The teams were informed of the identity of the other socialbots during that first week, and so could either take countermeasures against their competitors or adopt / adapt strategies they observed in other socialbots. In our competition, there was a little over a week of development, and only 5 days of deployment, and the students were offered the opportunity to make software updates at the 24-hour mark - not enough time to make significant strategy changes, but enough to correct some problems involving timing, sequencing and or filtering. The identities of other bots were not officially revealed until the end of the competition, although several teams had pretty good hunches about who some of the other socialbots were (especially those who immediately followed all the target users).
We provisionally adopted the same scoring mechanism as the initial competition (this was a topic of much discussion during one class):
- +1 for each mutual connection (each target user who follows a socialbot)
- +3 for each mention (@reply, retweet or other reference to a socialbot)
- -15 if the socialbot account is deactivated by Twitter (as a result of being reported)
The Target User Population
As with the initial competition, the 500 target users were based on a single "seed" Twitter account, which was then grown out a few layers based on mutual following links. More specifically (in our competition): 100 mutual friends/followers of the seed user were randomly selected - and filtered by criteria below - and then 4 of each of those users' mutual friends/followers were randomly selected and filtered.
All target user accounts were filtered to meet a number of criteria. Many of them were adopted and/or adapted from Tim Hwang's criteria. I'll include a brief description and rationale (in italics) for each:
- Followers: between 100 and 10,000
[Twitter users with fewer than 100 followers might too carefully scrutinize and/or highly value new followers; those with more than 10,000 followers might be less likely to pay any attention to new followers]
- Frequency of status updates: a tweeting frequency of at least 1 per day, based on the 20 most recent tweets
[Twitter users who don't already engage regularly with other users would be less likely to engage with socialbots]
- Recency of status updates: at least one status update in the preceding 72 hours
[Twitter users who were not currently or recently engaging with other Twitter users would be less likely to engage with socialbots over the course of the ensuing 5 days; I used 72 hours because I started filtering on a Monday, and didn't want to exclude anyone who had taken the weekend "off".]
- Experience: the account must have been created at least 5 months ago
[Twitter users who had not been using the service for long might be significantly more likely to interact with socialbots than those who had more experience; it's hard to imagine that anyone who has been tweeting regularly for 5 months has not encountered other bots before. I'd initially intended to specify a cutoff of 6 months, but it was easier to just to check that the year of account creation was 2010 or earlier.]
- Individual human: the account appeared to belong to a human individual who uses Twitter at least partially for personal interests
[Twitter accounts owned or operated by businesses exclusively for business purposes might be more likely to automatically "follow back" to acquire more prospective customers. Many, if not most, candidate Twitter accounts appeared to be used for both business and pleasure (or, at least, non-business interests), and these were not excluded.]
- Adults only: there is no way to definitively ascertain age on Twitter, but any profile bio with references to parents, Facebook or other indicators that might indicate use by a minor was excluded
- Language: restricted to English language users, and those who do not use inappropriate language in the profile bio or 20 most recent tweets
[To facilitate the use of NLTK and/or other language processing tools, it was helpful to restrict the set of users to those who use English in their bios and tweets ... and do not use the seven words that you cannot say on television.]
- Automatic reciprocal followers: Twitter accounts with references to "follow" in the profile bio were excluded
[Any account with a bio suggesting that the user will automatically follow any Twitter account that follows them would artificially inflate scores.]
I could write an entire blog post just elaborating on the data (and judgments) that I encountered during the filtering of over 6700 accounts that I manually examined - using some supporting Python scripts I created and iteratively refined to support many of the filtering criteria - in order to arrive at the final list of 500. My perspective on human nature, the things we choose to communicate and the ways we choose to communicate about them will never be the same. For now, I'll just offer a few statistics about the 500 Twitter accounts selected as the target user population (including both the mean and the median, given the power law distributions prevalent on Twitter and other social networking platforms):
Since I calculated the "Days on Twitter" for each user, I thought it would be interesting to look at some statistics regarding the frequencies of posting status updates, adding friends, attracting followers and being listed:
Scores and Other Socialbot Statistics
I'll include a few observations and summary statistics below, and provide a brief overview of the strategies that each team employed. In order to protect the identities of all the users involved - the target user population and the socialbots (whose accounts were deactivated at the end of the competition) - I will redact certain elements from the data reported below, and use pseudonyms for the socialbot usernames. The italicized numbers in the Followers, Mentions and Score columns below reflect the official scoring criteria from the initial competition (i.e., restricted to the target users); the numbers in normal fonts in those columns include users that were not part of the target population.
I suspect that if we had the time to more closely follow the schedule of the initial socialbots competition - two full weeks of development, two full weeks of deployment, a day in the middle for updates and full revelation of the identities of other socialbots (allowing more time to consider and possibly copy strategies being used by other teams) - the scores for the socialbots would have been closer ... and higher. As it is, I was very impressed with how much the students accomplished in such a short stretch.
The following graphs depict the growth in statuses, friends, followers and mentions over time for the 10 socialbots, the horizontal axis represents hours (5 days = 120 hours). Due to an error in the socialbot behavior tracking software I wrote, the Followers graph is not restricted to target users (i.e., it includes all followers, whether they are target users or not).
Generally speaking, socialbots that tended to be more aggressive - with respect to numbers of tweets (especially @replies and mentions) and following behavior - were more likely to attract followers and/or mentions than those that were more passive. They were also more likely to get blocked and/or reported for spam (the socialbots who show slightly less than 500 friends were likely blocked by some of those they followed), although as with the initial socialbots competition, none of our socialbot accounts was deactivated by Twitter. Follow Friday (#FF) mentions were very effective, and I suspect that #woofwednesday would have also offered a significant boost to some scores if the competition had extended across a Wednesday, given the canine orientation of some of our socialbots (and target users). The one socialbot that used the Twitter feature for marking tweets as favorites also showed a good return on investment.
The socialbots - especially those who posted lots of status updates - attracted the attention of several other bots. Nearly all of these bots were unsophisticated spambots, typically using a profile photo of an attractive woman, an odd username (often including a few digits) and posting a easily identified pattern of updates including only an @reply and a shortened URL (e.g., "@gumption https://l.pr/a4tzuv/"). One particularly interesting Twitter account appeared to be a "hybrid" - part human and part bot - interweaving what appeared to be rather nuanced human-like posts with what appear to be automatic responses to any Twitter user who tweets "good morning", "good night" or other phatic references to times of day ... which is probably a pretty effective way to attract new followers.
Socialbots, Teams & Strategies
The following is a brief synopsis of each of the 10 socialbots deployed in the competition, and the strategies employed by the teams that designed and developed them:
Sam was one of the more passive socialbots, performing one action - tweeting, retweeting or adding a follower - every 30 minutes, with different collections of pre-defined tweets scheduled for different times of the day. Although the low scoring socialbot, Sam was the one who captured the attention of the aforementioned "hybrid" Twitter account - via a "Goodnight Washington!" tweet - and also managed to capture the attention of - and get mentioned by - an account associated with a local news station (the latter was not a part of the target user population).
Laura was our least communicative socialbot - only 33 status updates over 5 days - and also took a rather gradual approach to adding friends. The team's focus was on carefully crafting a believable persona, posting relatively infrequent status updates (1-3 per hour) that reflected Laura's hypothetical life, following 7 new users after each post, but never mentioning any other users in her status updates. The team's hypothesis was that other Twitter users would be more interested in a persona who seemed to be living an interesting life than in a persona who is tweeting about external topics. I suspect that most Twitter users are more interested in - or responsive to - seeing others' interest in their own tweets (via retweeting and/or favoriting).
Tiger was also relatively quiet, but a very aggressive follower of other users. Tiger's team decided to adopt the persona (canina?) of a dog, and randomly posting pre-defined messages that a dog might tweet. Tiger also incorporated a version of the aforementioned "Eliza" psychotherapist to facilitate engagement with other users ... but this capability was not engaged. I was surprised at the number of Twitter accounts that appeared to be associated with dogs - some with thousands of followers - as well as dog therapists and even dog coaches that I encountered during the filtering of target user candidates, so this was not a bad strategy.
Oren adopted a rather intellectual human persona, alternately tweeting links to randomly selected Google News stories (preceded by a randomly selected adjective such as "impressive" or "amazing") and randomly selected pre-defined quotes. Oren took a gradual but comprehensive approach to adding followers - achieving the highest number of friends at the end of the competition (apparently, no target user blocked Oren) - and would also occasionally retweet status updates posted by randomly selected target users (whether or not they were friends yet). Like Tiger, Oren also incorporated an Eliza capability ... but it was not used.
Zorro reflected what may have been the most intricately scheduled behavior of any socialbot, with variable weights that affected the interdependent probabilities of one of three actions, each of which might occur in a variable window of opportunity: posting a randomly selected status update from a predefined list, retweeting a status update from one of the target users who were being followed already, following new users. One of the strategies used by Zorro was to include questions among the predefined status updates, though these questions were not directed (via @replies) to any specific users.
Katy was the only socialbot to utilize the Natural Language Toolkit (NLTK), using some of the NLTK tools to monitor status updates posted by the target users for the use of common keywords that were then used to find related stories on Reddit (via the Reddit API), including questions posted on AskReddit (the questions appeared to generate the highest number of responses from target users). Some additional processing was done to filter out inappropriate language and the use of personal pronouns (the latter of which might appear odd in tweets by Katy). The resulting status updates were then posted as @replies to targeted users; no retweets or any other kind of status updates were posted by Katy. The rather aggressive strategy of sending 258 @replies to target users may have resulted in Katy being the most blocked socialbot (with 477 friends among the 500 target users, as many as 23 target users may have blocked her).
JackW - one of two Jacks in the competition - also made use of Reddit, looking for intersections between recent tweets by target users and a custom-built dictionary of keywords and stories in selecting stories to tweet about. Unlike Katy, JackW did not initially check for personal pronouns, and may have appeared to suffer from multiple personality disorder during the first 24 hours, before the code update was made. JackW was also less aggressive than Katy, in that the Reddit stories that he tweeted were not explicitly targeted to any users via @replies. JackW was also the only socialbot to take advantage of Follow Friday (#FF), and of the 31 target users mentioned by JackW in a #FF tweet, 11 followed JackW and 7 mentioned JackW in some form of "thanks for the #FF" tweet. JackW attracted the third highest number of target user followers (80) among the socialbots.
Natalia used a combination of pre-defined tweets and dynamic tweet patterns in selecting or composing her status updates, which included undirected tweets, @replies and retweets. Natalia was one of two socialbots who followed all the target users as early as possible (the Twitter API limits calls to 350 per hour, so following 500 users had to stretch into a second hour of operation). She was prolific in issuing greetings, was not shy about asking questions, and was the only socialbot to explicitly ask target users to follow her back. 20% (8 / 39) of target users asked to follow back did follow her, and while it's not clear how many of them were explicitly responding to the request vs. reciprocally returning her initial follow, or responding to other mentions, this was slightly higher than her overall reciprocal following rate of 17.5%. Natalia attracted the second highest number of target user followers (88), and the third highest number of target user mentions (18) among the socialbots.
JackD was our most prolific tweeter, posting more than 100 status updates per day. He attracted the largest number of followers - though not among the target user population - and the largest number of mentions - though, again, not among the target population. A few of the mentions included indications that the target user suspected JackD of being a bot; one did acknowledge that JackD was a "clever bot", but concluded "no Turing Test success for you!" JackD employed an elaborate strategy of finding tweeted links popular among the target users, favoriting those tweets, retweeting them and then using Google Search API to find and tweet a link to a similar page in an @reply. Another strategy was to find popular or trending web pages via Google Search and post status updates linking to them. JackD also included an Eliza component ... which, among other things, gave rise to an exchange in which one user appeared to employ Eliza-like reverse Rogerian psychotherapy on JackD, asking "How do you feel when you retweet that?"
Jennifer attracted the highest number of target users followers (98) and the highest number of target user mentions (34) of all the socialbots. Her profile was the only one to include a URL (a link to a blogspot site created shortly before the competition). Her behavior was carefully crafted from a database of 261 pre-defined status updates containing a mix of interesting facts and questions designed to start conversations. The scheduling of her actions was also carefully choreographed. Jennifer had an Eliza-like capability to respond to @replies using one of the largest collections of prepared responses among the socialbots. One of the most interesting exchanges of the entire competition occurred shortly after the initial deployment when a target user who seemed well versed in the ways of socialbotics called her out, after explaining to her what a bot was (I'll use @[Jennifer] and @[TargetUser] to refer to the conversants):
@[TargetUser]: @[Jennifer] Bot?
@[Jennifer]: @TargetUser Can you elaborate on that?
@[TargetUser]: @[Jennifer] "Can you elaborate..." is a typical auto response. As is "Why did yousay..." You're sounding bot-ish.
@[Jennifer]: @[TargetUser] Why do you ask that?
@[TargetUser]: @[Jennifer] BOT ALERT!
@[Jennifer]: @[TargetUser] Please tell me more.
The target user blocked Jennifer after that last status update, and during the software update at the 24-hour mark, Jennifer's designers added a probabilistic weighting function to reduce the likelihood (1/3) that she would respond to future @replies.
There were several other interesting exchanges - and several that weren't terribly interesting - but I've probably already written more about this competition than most would care to read. If any of the student teams make their code available, or wants to make more details about their strategies available, I'll update this post with the additional information.
Reflections and Projections
Reflecting on the experience, I think it was a worthwhile experiment. Although a few Twitter users may have experienced a few additional instances of fleeting irritation, I don't believe any of the socialbots inflicted any significant harm. After having sifted through thousands of other Twitter profiles and tens of thousands of status updates during the filtering process, it appears that bot-like behavior - by humans or [other] computational systems - is not uncommon. I certainly found substantial corroborating confirmation of my earlier observations regarding the commoditization of Twitter followers.
Attention - fairly recently via followers or mentions on Twitter, but more traditionally via other indications of interest - is a fundamental human need. As a species in which the young are dependent on the care of adults for many years after birth, we have evolved elaborate and excquisite capabilities for attracting the attention of others. Discriminating between appropriate and inappropriate attention-seeking behavior is one of the most significant challenges of the maturation process (I know I haven't mastered it yet). However [much] we may seek attention, receiving attention from others often feels good, and based on the exchanges I was monitoring between the socialbots and the targer users, I believe there were more examples of positive reactions than negative reactions to the attention bestowed by the Twitter bots.
Sherry Turkle, author of Alone Together, has argued that non-human attention from robots is dehumanizing, and that humans who share their stories with non-humans who can never understand them are ultimately being disserved. In my own experience, I increasingly recognize that anything I say or write is something I need to hear or read, and every opportunity I have to share any aspect of my story - regardless of whether or how it is perceived or who or what is receiving it - is an opportunity to reflect on and refine the story I make up about myself.
While I felt initial misgivings about the potential risks involved in instigating a socialbots competition, I am glad we participated in this experiment. Although the students suffered some opportunity cost from not learning more about some of the theoretical concepts of AI, they gained valuable first-hand experience in the nitty-gritty practical work that typically makes up the bulk of any applied AI project: dealing with "messy" real-world data, trying to figure out how to fit the right algorithms to the right data, and determing the appropriate balance of human and non-human intelligence to apply to different aspects of a problem.
If I were to organize another competition, I'd make a few changes:
- Penalize blocking. Add a negative scoring factor of -5 to penalize a bot for every user who blocks it, to disincentivize the rather aggressive behaviors of some of the higher scoring bots - especially those that made extensive use of @replies. The only way I can think of to determine whether one Twitter account (A) has been blocked by another Twitter account (B) is to see whether the user_id of A does not appear in the follower_ids list of B, which only works if A had been a follower of B at some point.
- Better monitoring. Refine the monitoring code to track behavior more effectively and frequently. In addition to the change suggested above, more regular snapshots of a broader set of parameters would be very helpful ... probably requiring a small bot army of observers, in view of the Twitter API rate limits.
- Better scaffolding. Provide more scaffolding for the participants to enable them to start with a more fully functional bot skeleton, and/or an additional API wrapper layered on top of a Twitter API wrapper to enable some basic operations such as monitoring for mentions and/or blocking.
- More inspiring goal(s). Perhaps most importantly, I think that participants would be more motivated with bigger, hairier, more audacious goals, above and beyond "get users to follow and/or mention you" (although attracting followers and/or mentions does seems to be a significant motivation for may human users of Twitter). Designing and deploying bots to help promote greater awareness, understanding and/or cooperation - the larger goals Tim Hwang has been championing - would help set the stage for a far more worthwhile experiment.
Now that the quarter has ended, I'm planning to channel some of my excitement about socialbots - especially the grander goals that we weren't able to effectively address in the AI class - by conspiring with Tim Hwang [and hopefully others] to propose a CSCW 2012 Workshop to host a Socialbots competition at the conference. I think that a hands-on competition like this would help promote the evolution of the conference to more broadly encompass Computer-Supported Cooperative Whatever ... and offer an interesting opportunity for researchers and practitioners to design, deploy, and perhaps debate a relatively new breed of cultural probes.