The US Federal election last week didn’t go as a lot of people, or as a lot of pollsters, had planned. Living abroad, it certainly felt as though a Hillary Clinton victory was certain. There was always a bit of uncertainty in the data, which FiveThirtyEight do a great job of discussing. I became a little uncertain of an easy Clinton victory when I saw the early voting data published by Election Betting Odds, which showed reduced turnout in North Carolina and Florida amongst registered Democrats. If that trend continued nationally (as it well should have, given that turnout is likely correlated across states), it could take away any chance of an easy victory.
A large part of the uncertainty in polls, that doesn’t go away when as more polls are taken, is the systematic uncertainty that arises from potentially always polling the same group of people. This artificial echo chamber limits the accuracy of your information. One way to escape these echo chambers, caused by education, social milieu or and even just situation is to look for big broad slabs of data. And this led me to Twitter.
Looking (albeit after the fact) at the Twitter timeline, I wanted to see if across election day and the day before it, any hints were present about who would win the election. Do the election results correlate with who is being tweeted about more? And does the sentiment of tweets let us make any conclusions about the election? Finally, it’s also interesting to look at how changes along the Twitter timeline correspond to real events.
For people who like to know their data, I consider ‘Tweet Rate’ here, rather than the absolute number of tweets related to a given candidate. The tweet rate was calculated at half hour intervals by sampling the Twitter stream for the queries “Trump” and “Hillary OR Clinton” 20 times within each half hour interval (like this, we also produced an estimate of the uncertainty in the tweet rate). Measures were made to not saturate the search in a given interval. I have normalised the tweet rate for each candidate to its rate at 4AM on Monday November 7, when the “Trump” query was recording a rate of 11.3 tweets per second, and the “Hillary OR Clinton” query was recording 9.54 tweets per second.
First off, let’s consider the tweet rate on Monday, before the election. I think it’s phenomenal that the relative tweet rates for Hillary Clinton and Donald Trump follow each other. A separation on this graph between the curves would indicate that at least on Twitter, one candidate has more momentum for posts regarding them than the other (whether these tweets are positive or negative is not considered just yet).
It isn’t until election day (Tuesday) that Trump-related tweets gain more momentum than Hilary-related tweets. I wouldn’t have expected this honestly, rather I’d expect younger, more-democrat leaning voters to be more active on social media, and thus Tweeting more about Hillary Clinton. Later in this article I’ve performed a very crude sentiment analysis to see if a large proportion of tweets are at least negative towards Donald Trump, which again, could be expected if younger, tweetier voters are criticising him, but it seems to not be the case.
When one considers the election itself, the behaviour of the relative tweet rates becomes a lot more interesting. It’s particularly interesting that the tweet rate for Donald Trump is almost always higher than that for Hillary Clinton. This probably due to the reaction to polling results coming in, and the fact that they lean significantly more towards Trump than people expected. Here, tweeting about Trump is essentially tweeting the news.
The spikes that occur, related to the imminent announcement of the election for Trump, and later, Hillary’s concession speech, coincide quite well with the real events.
Tweet rate is a little superficial if you are looking to draw tight conclusions about the election. It’s better to try and dive into the tweets to see if conclusions can be drawn about a candidate, or policies. Can the Twitter timeline help to indicate why people voted the way they did?
Twitter sentiment is a little hard to gauge, and a good measurement really requires a classifier adapted to the problem domain being studied. Using the sentiment analyser from Sentiment140 though, both candidates have roughly the same sentiment measure before the election, so you could probably conclude that all those Trump tweets aren’t angry millenials tweeting their frustrations.
Just to indicate how noisy the sentiment data is though, here is the relative fraction of classified positive, negative and neutral tweets for each candidate. Neutral tweets dominate here, likely because tweets related to politics are difficult to classify (e.g. the tweet “Trump will tear up every climate change agreement” could be positive or negative, depending on the author reader). Based on this, I’d take the sentiment analysis with a grain of salt, pointing out that it’s best use is to indicate that there is at least no clear bias against either candidate in the Twitter pool.
So that’s what an election looks like on Twitter. What’s cool is that it seems that maybe a far greater and more balanced audience can be reached through Twitter rather than polls. If you look at then Twitter timeline for Monday, Trump and Clinton follow an almost identical behaviour. So maybe it’s not surprising that near identical numbers of voters voted for each one. Sentiment didn’t really help us to work out whether one candidate was more or less likely to win, but it did indicate, albeit vaguely, that the Twitter user base isn’t incredibly biased (the error bars on that are pretty large though, so maybe it’s a bit biased).
Hopefully in the coming weeks I’ll find some time to plunge into the tweet data, to try and see if it can indicate why people chose to vote the way they did. And I’ll post some source code soon too, alongside an explanation of how I automatically calculate the tweet rate (which I’d argue is one of the more robust ways to measure Twitter timelines, especially if you don’t want to be rate limited by Twitter for making too many requests).
One Last Graph…
There is one last graph which I find very interesting in US elections. And that’s the graph of voter participation.
It astounds me that more people chose not to vote in the election than voted for either candidate individually. High voter attendance, and especially compulsory voting, have a tendency to reduce corruption, improve voter satisfaction, and improve wealth inequality. This last point most of all worries me, as wealth inequality is suggested to have played a large part in Trump’s election. The Gini index (a measure of income inequality) of the US places the nation amongst the likes of Qatar and Russia in terms of income inequality, rather than amongst more natural neighbours such as the UK and Canada.
I would like to see both better income equality and better voter engagement in the US, and I think a large part of this is about making sure that all parts of society have something to vote for. Progress towards income equality, towards an increased sense of enfranchisement, and increased social equality all naturally enforce each other and should happen together. For this reason, I think the intense scepticism of Trump’s appeal to working class, predominantly white, people is warranted. Improving income equality, and the lives of traditionally working class families is noble, and underlies a lot of traditional conservative doctrine. But achieving that while making incendiary remarks towards minorities is a nigh-impossible goal (if one can call it that). And the thought that social progress in one domain ought to require sacrifice in another is both myopic and misguided.
Returning to our graph of voter participation, 33% or so of the US population cannot vote. This percentage is about normal for a Western country, but it is worth remembering that it does exist whenever environmental policy is being considered. Don’t forget that when staring down the barrel of an enormous shift in the earth’s climate, the 30% of pretty much every country’s population that will be most affected by it doesn’t have the right to vote on it.
- Sentiment140 for the tweet sentiment analysis
- The TwitterSearch Python package
- Jupyter/Python/Numpy/Scipy (as always)
- 300,000 random strangers who decided to write 140 characters and then click a button