Authors: Daniel Friar, Rafal Kwasny, Mike Swarbick-Jones and Martin Goodson
Read the Business Insider article.
We conducted an analysis to determine if ethnic minority MPs receive more abuse on Twitter than their white counterparts. By using Evolution AI's natural language processing (NLP) platform to analyse Twitter data, we found statistically significant evidence that ethnic minority MPs receive more toxic tweets than white MPs.
We took 3 million tweets mentioning MPs from the past year and used Evolution AI's natural language processing platform to identify toxic tweets and find the proportion of toxic Twitter mentions for each MP. We analysed these results and found statistically significant evidence that ethnic minority MPs receive more toxic Twitter mentions than their white counterparts, on average receiving 15% more toxic tweets.
Evolution AI is a London-based startup that specialises in natural language processing, the computational understanding of human language. We build enterprise-grade AI solutions that can learn to read and understand millions of text documents at a time, without explicit instructions.
Due to the lack of available labelled Twitter data, we trained a model to predict a toxic/non-toxic label on a dataset of Wikipedia comments from an open source dataset. This data was reduced to a binary toxic/non-toxic label and balanced to leave a 50/50 split across the two classes with a total of 32,450 training examples. The Evolution AI NLP platform was used to build a model to classify the data. Our text pre-processing engine was first used to clean the text data and restrict to the first 280 characters in order to make the model more suited to tweets, before being classified into the two classes, achieving 88% accuracy on a held-out test set.
In order to test whether the model could identify toxic tweets correctly, we used our annotation platform to hand-label 1,500 tweets mentioning MPs as toxic/non-toxic, before reducing this to a balanced test dataset of 450 tweets. The trained model achieved 82% classification accuracy on this labelled data.
Additionally, we verified that the model was not biased toward white or ethnic minority MPs by checking that it achieves similar accuracy, precision and recall across these groups, using 200 examples from the test dataset. We were thus able to demonstrate the effectiveness of the Evolution AI transfer learning algorithm on this task, requiring only ten hours of human labelling effort to train and test an effective toxic tweet classifier.
We used the Twitter API to obtain tweets mentioning any of the 581 MPs on Twitter from the beginning of 2017 to present, resulting in a dataset of 3.16 million tweets on 580 MPs. A list of ethnic minority MPs is taken from Wikipedia and joined to the data in order to identify the ethnicity of the MPs. MPs with very few twitter mentions (less than 200) are removed, leaving us with 3,159,227 tweets from 523 MPs with the following breakdowns.
Similarly to the Wikipedia comments, the tweets were preprocessed and the trained model was then used to predict whether they were toxic or non-toxic, identifying 5.0% of tweets as toxic.
A histogram of the proportion of toxic tweets for the MPs is shown below along with summary statistics across the two groups, indicating that ethnic minority MPs appear to receive more toxic tweets.
Since there may be significant differences in the proportion of toxic tweets for MPs regardless of their ethnicity, we used a hierarchical Bayesian model to check the statistical significance of these results. Using this method, we found that with 96% confidence ethnic minority MPs received more toxic Twitter mentions, with the best point estimate indicating that ethnic minority MPs receive 15% more toxic tweets. The appendix below contains more detail on this analysis.
In order to check that the classifier was not biased to either of the two groups, we took 100 tweets mentioning ethnic minority MPs and 100 tweets mentioning white MPs from the test set, with a a 50/50 toxic/non-toxic split, and compared the confusion matrices.
We represented the number of toxic tweets, yi, for each MP i as a binomial distribution with probability pi, where each pi is drawn from a normal distribution (truncated at 0 and 1) with mean μem for ethnic minority MPs and μcaucasian for white MPs, with a shared standard deviation σ. Additionally, these means have shared, uninformative hyper-priors.
The analysis was run in PyMC3, using MCMC with 2 chains of length 10,000 with 500 burn-in iterations to obtain samples from the posterior distributions. The Gelman-Rubin diagnostic was used to judge MCMC convergence, with R<1.0001 in all cases. The confidence interval was obtained from the posterior distribution samples for μem and μcaucasian, where we found that μem was greater in 96% of cases (see the above figure for the distribution of μem−μcaucasian). The point estimate was taken as the mean of the posterior normal distributions, indicating that ethnic minority MPs received 15% more abuse than their white counterparts.
Posterior mean for ethnic minority MPs: 5.48%
Posterior mean for white MPs: 4.76%