The EU Bot — a classifier for the EU Referendum

a project by Jonathan Russell

Work experience for Year 10 happens in the months of June and July, and I, being 15, was one of the many students to be doing the programme. I eventually found my work experience placement at Seldon where I have been working on a project called the EU Bot for two weeks, available to see at http://eubot.seldon.io

You might be asking, what is the EU Bot, and how does it work? The EU Bot is a machine learning classifier (written in Python, with an HTML frontend) designed to identify whether a string is Remain-leaning or Leave-leaning, or even neutral if more than one sentence is provided.

Development of the EU Bot

The EU Bot’s source code is available at https://github.com/SeldonIO/eubot under an Apache 2.0 license.

The development period of the EU Bot involved a lot of learning for me, specifically of many Python libraries.

I started by collecting data from various news sources can be classified as Remain or Leave. For example, a speech by Boris Johnson is classified as Leave. Most of the data came from a small number of news sources, and with more data for the classifier would become more accurate.

I started by learning scikit-learn from this tutorial for spam classification. Since I had a similar binary classification scenario, this was very helpful in learning the basics. I then rewrote the majority of the code to read data as I had structured it, rather than in email format.

I started by writing a generator which would yield the file path of a data source and the text within it. This would be built into a pandas DataFrame for classification. After the generator was done, it was just a matter of plugging that data into scikit-learn’s functions for classification. I used a pipeline for this, as it simplifies the process greatly.

The success rate at this point was very low, as I had been training the classifier on full files. I then rewrote the file-reading code to yield sentences instead of files. This greatly increased the success rate, up to about 75% on both leave and remain.

I then added an input() function to allow the user to type into the program and get a result on whether the input was Leave or Remain.

After doing this, I got to work on the HTML frontend. First, however, I had to rework the structure of the Python to allow jQuery ajax requests to ask for data. After doing this, I got to work on building the frontend. The HTML frontend uses jQuery to make requests to the Python flask server based on the user’s input and uses the data it is given by Flask to hide and show text for the result that is given, showing the user what the classifier thought about their data. Then it was time to put this on a web server. We used the Google Cloud Platform for this.

First of all, I put the HTML, CSS, JS, etc. into a bucket on the Google Cloud Platform bucket. Then, I put the Python and all the learning data on a Google Cloud Platform Compute engine. I then ran the server on the Compute Engine, and set up the ajax to point to it, and it worked!

Conclusion

It has been an amazing experience to work with Seldon, especially on something that I enjoyed as much as I did. The team has been fantastic to me, and I thank everyone who I have been working with. I hope you check out the project.

Thanks for reading,

Jonathan

Contents