Saturday, 10 June 2017

General Election 2017 - and trying out the Python pandas library

I am happy to report that my computer is working again, after a reformat, Windows 7 installation, then a long delay while Ubuntu partition resizer stalled, then fixed that with a GParted iso, and then reinstalled Ubuntu 17.04 and I am part way through restoring data from backups.
QGIS is working with version 2.18.

Although I have used Python csv, matplotlib and numpy libraries to read data from files and plot I hadn't used the pandas library for anything much, so I thought I'd do so. I have in previous code often built up a list manually by setting data = [] and then using append to build up the list, which can be slow for large datasets.

First I need some data, and I will use the general election results for the 6 parliamentary constituencies for the House of Commons in Cornwall:


Constituency,Surname,Forenames,Description,Votes,Turnout
Camborne and Redruth,EUSTICE,Charles George,Conservative Party,23001,70.96
Camborne and Redruth,WINTER,Graham Robert,Labour Party,21424,70.96
Camborne and Redruth,WILLIAMS,Geoffrey,Liberal Democrats,2979,70.96
Camborne and Redruth,GARBETT,Geoffrey George,Green Party,1052,70.96
North Cornwall,MANN,Scott Leslie,Conservative Party,25835,74.2
North Cornwall,ROGERSON,Daniel John,Liberal Democrats,18635,74.2
North Cornwall,BASSETT,Joy,Labour Party,6151,74.2
North Cornwall,ALLMAN,John William,Christian Peoples Alliance,185,74.2
North Cornwall,HAWKINS,Robert James,Socialist Labour Party,138,74.2
South East Cornwall,MURRAY,Sheryll,Conservative Party,29493,74.2
South East Cornwall,DERRICK,Gareth Gwyn James,Labour Party,12050,74.2
South East Cornwall,HUTTY,Philip Andrew,Liberal Democrats,10346,74.2
South East Cornwall,CORNEY,Martin Charles Stewart,Green Party,1335,74.2
St Austell and Newquay,DOUBLE,Stephen Daniel,Conservative Party,26856,69.3
St Austell and Newquay,NEIL,Kevin Michael,Labour Party,15714,69.3
St Austell and Newquay,GILBERT ,Stephen David John ,Liberal Democrats,11642,69.3
St Ives,THOMAS,Derek,Conservative Party,22120,76.1
St Ives,GEORGE,Andrew Henry,Liberal Democrats,21808,76.1
St Ives,DREW,Christopher John,Labour Party,7298,76.1
Truro and Falmouth,NEWTON,Sarah Louise,Conservative Party,25123,75.9
Truro and Falmouth,KIRKHAM,Jayne Susannah,Labour Party,21331,75.9
Truro and Falmouth,NOLAN,Robert Anthony,Liberal Democrat,8465,75.9
Truro and Falmouth,ODGERS,Duncan Charles,UK Independence Party,897,75.9
Truro and Falmouth,PENNINGTON,Amanda Alice,Green Party,831,75.9

Here is the Python code, which expects the above data in a file called electionresults2017.csv which it reads using csv.DictReader which produces an iterator which I convert to a list and create a pandas data frame object.
The code is also available in the dataviz-sandbox repository at my Bitbucket account.


import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import csv

def chooseColour(desc):
    if desc == "Labour Party":
        return "red"
    elif "Liberal" in desc:
        return "yellow"
    elif "Green" in desc:
        return "green"
    elif "Conservative" in desc:
        return "blue"
    else:
        return "magenta"
    
with open('electionresults2017.csv', 'r') as spamreader:
    dframe = pd.DataFrame(list(csv.DictReader(spamreader)))

consts = set(dframe['Constituency'])
print(consts)
fig = plt.figure()
plt.suptitle("Distribution of Votes in Cornwall\nGeneral Election 8th June 2017")
plt.axis('equal')
plt.xticks([])
plt.yticks([])
for p, c in enumerate(consts):
    # print("Constituency of {}".format(c))
    surnames = dframe.loc[dframe.Constituency == c, ['Surname']].values
    forenames = dframe.loc[dframe.Constituency == c, ['Forenames']].values
    descs = [d[0] for d in dframe.loc[dframe.Constituency == c, ['Description']].values]
    plotcolours = [chooseColour(d) for d in descs]
    forenames = [f for f in forenames]
    forename = [f[0].split()[0] for f in forenames]
    names = [f+" "+s for f,s in zip(forename, surnames)]
    names = [n[0] for n in names]
    votes = dframe.loc[dframe.Constituency == c, ['Votes']].values
    votes1 = [v[0] for v in votes]
    namedescsvotes = [n+"\n"+d+"\n"+v for n,d,v in zip(names, descs, votes1)]

    totalvotes = np.sum(votes, dtype=np.int)
    # print(totalvotes)
    ax = fig.add_subplot(2, 3, p+1)
    ax.axis('equal')
    ax.set_title("{C}: {t} votes cast".format(C=c, t=totalvotes))
    ax.set_xticks([])
    ax.set_yticks([])
    ax.pie(votes, radius = np.sqrt(totalvotes/50000.0), labels=namedescsvotes, colors=plotcolours, autopct='%1.1f%%')

                          
fig2 = plt.figure()
plt.suptitle("Representation of Cornwall in the House of Commons")
plt.axis('equal')
plt.xticks([])
plt.yticks([])
for p, c in enumerate(consts):
    descs = dframe.loc[dframe.Constituency == c, ['Description']].values
    descs = descs[0]
    plotcolours = [chooseColour(d) for d in descs]
    print(descs)
    ax = fig2.add_subplot(2, 3, p+1)
    ax.axis('equal')
    ax.set_title(c)
    ax.set_xticks([])
    ax.set_yticks([])
    ax.pie([1], labels=descs, colors=plotcolours)

plt.show()

And the results:

The votes cast for the various candidates and parties in each of the 6 constituencies covering Cornwall and the Isles of Scilly. George Eustice MP uses his middle name rather than his first name Charles.
In comparison to votes cast, here are the parties represented in the House of Commons for constituencies in Cornwall.

I have also tried out matplotlib_venn. It takes as arguments the keyword subsets, which for the function venn2 expects A and (not B), (not A) and B, and (A and B). Below, voteothers is the number who voted for non-elected candidates, the second is by definition an empty set (those who didn't vote and cast a vote for the winner), and votewinner is those who voted for the winner.

Since Cornwall is a one-party state, the colours can be hard-coded.

import matplotlib.pyplot as plt
import matplotlib_venn as venn
...
    # subsets = (Ab, aB, AB)
    v = venn.venn2(subsets=(voteothers, 0, votewinner), set_colors =('lightgray', 'navy'), set_labels=('Other candidates', winnername))
...


The function venn3 expects a 7 element tuple as below. In this case, A is the electorate, B is those who voted for the winner, and C is those who voted for other candidates.

import matplotlib.pyplot as plt
import matplotlib_venn as venn
...
    # subsets=(Abc, aBc, ABc, abC, AbC, aBC, ABC)
    v = venn.venn3(subsets=(novote, 0, votewinner, 0, voteothers, 0, 0), set_colors =('lightgray', 'blue', 'red'), set_labels=('electoral register', winnername,'Other candidates'))
...

It would be nice to make the zero sets disappear, maybe there is a way to do this in the documentation somewhere.

No comments:

Post a Comment