It’s hard to avoid the constant bickering about the results of our last election. Should mail-in voting be legal? Do we need stricter voter identification laws? Was there fraud in the last election? Did it impact the results? These are just a fraction of the questions circulating around elections and voter integrity these days. Sadly, these questions appear to be highly politicized and it’s unclear if anybody is really interested in asking what an optimal election system looks like.

In a true fair and accurate representative democracy, a vote not counted is just as costly as one inaccurately counted. More precisely, a single mother with no childcare who doesn’t vote because of 4-hour lines is just as damaging to the system as a vote for a republican candidate that is intentionally or accidentally recorded for the opposing Democratic candidate.

Therefore, we can conclude an optimal election system really involves optimizing on both axes. How do we make sure everyone who wants to vote gets to vote? And how do we ensure every vote is counted accurately? When viewed this way one can’t help but see the parallels to optimizing a machine learning classifier for precision (when we count votes for a given candidate how often did we get it right) and recall (of all possible votes for that candidate how many did we find).

Back the Truck Up! What is Precision and Recall Anyway

Precision and Recall are two metrics often used to measure the accuracy of a classifier. You might ask “why not just measure accuracy?” and that would be a valid question. Accuracy defined as everything we classified correctly divided by everything we evaluated, suffers from what is commonly known as the imbalanced class problem.

Suppose we have a classifier (a.k.a. laws and regulations) that can take a known set of voters who intend to vote “democrat” and “not democrat” (actual / input) and then outputs the recorded vote (predicted / output).

Let’s assume we evaluate 100 intended voters/votes, 97 of which intend to not vote for the democratic candidate and let’s build the dumbest classifier ever known. We are just going to count every vote as “not democrat”, regardless of whether the ballot was marked for the democratic candidate or not.

N (number of votes) = 100		Output (Predicted) Value
N (number of votes) = 100		Democrat	Not a Democrat
Input (Actual) Value	Democrat	TP = 0	FN = 3	TOTAL DEMOCRATS = 3
Input (Actual) Value	Not a Democrat	FP = 0	TN = 97	TOTAL NOT DEMOCRATS = 97
		POSITIVES = 0	NEGATIVES = 100

To make our calculations a little easier we can take those numbers and drop them into a table that compares inputs to outputs also known as a confusion matrix. To simplify some of our future calculations we can further define some of the cells in the table above

True Positives (TP): Correctly captured an intended vote for the democrats as a vote for the democrats (97)
True Negatives (TN): Correctly captured a vote NOT intended for the democrats as a vote, not for the democrats (97)
False Positives (FP): Incorrectly captured a vote NOT intended for the democrats as a vote for the democrats (0)
False Negatives (FN): Incorrectly captured an intended vote for the democrats as a vote not for the democrats (3)

Now we can slightly relabel our accuracy equation and calculate our accuracy with our naïve classifier and the associated values from the table above.

97% Accuracy! We just created the world’s stupidest classifier and achieved 97% accuracy! And therein lies the rub. The second I expose this classifier to the real world with a more balanced set of inputs across classes we will quickly see our accuracy plummet. Hence, we need a better set of metrics. Ladies and gentlemen, I am delighted to introduce …

Precision: Of the votes recorded (predicted) for the Democrats, how many were correct
Recall: Of all possible votes for the Democrats, how many did we find

What becomes blatantly clear from evaluating these two metrics is that our classifier, which appeared to have great accuracy, is terrible. None of the intended votes for the democrats were correctly captured and of all possible intended votes for the democrats, we found none of them. It’s worth noting that the example I’ve presented here is for a binary classifier (democrat, not democrat) but these metrics can easily be adapted to multi-class systems that more accurately reflect our actual candidate choices in the United States.

There’s No Such Thing as 100% Precision and Recall

Gödel’s incompleteness theorem, which loosely states that every non-trivial formal system is either incomplete or inconsistent, likely applies to machine learning and artificial intelligence systems. In other words, since machine learning algorithms are built around our known formal mathematical systems there will be some truths they can never describe. A consequence of that belief and something I preach to everyone I work with is that there is really no such thing as 100% precision and recall. No matter how great your model is and what your test metrics tell you. There will always be edge cases.

So if 100% precision and recall is all but impossible what do we do? When developing products around machine learning classifiers, we often ask ourselves what is most important to the customer, precision, recall, or both. For example, if I create a facial recognition system that notifies the police if you are a wanted criminal, we probably want to air on the side of precision because arresting innocent individuals would be intolerable. But in other cases, like flagging inappropriate images on a social network for human review, we might want to air on the side of recall, so we capture most if not all images and allow humans to further refine the set.

It turns out that very often precision and recall can be traded off. Most classifiers emit a confidence score of sorts (also known as a SoftMax output) and by just varying the threshold on that output we can trade-off precision for recall and vice-vera. Another way to think about this is, if I require my classifier to be very confident in its output before I accept the result, I can tip the results in favor of precision. If I loosen my confidence threshold, I can tip it back in favor of recall.

And how might this apply in voting? Well, if I structure my laws and regulations such that every voter must vote in person with 6 forms of ID and the vote is tallied in front of the voter by a 10-person bipartisan evaluation team who must all agree … we will likely have very high precision. After all, we’ve greatly increased the confidence in the vote outcome. But at what expense? We will also likely slow down the voting process and create massive lines which will significantly increase the number of people who might have intended to vote but don’t actually do so, hence decreasing recall.

Remind me again what the hell this has to do with Voting

The conservative-leaning Heritage Foundation makes the following statement on their website:

“It is incumbent upon state governments to safeguard the electoral process and ensure that every voter’s right to cast a ballot is protected.”

I strongly subscribe to that statement and I believe it is critical to the success of any representative democracy. But ensuring that every voter’s right to cast a ballot is protected, requires not only that we accurately record the captured votes, but also ensure that every voter who intends to vote is unhindered in doing so.

Maybe we need to move entirely to in-person voting while simultaneously allocating sufficient funds for more polling stations, government-mandated paid time off, and government-provided childcare. Or maybe we need all mail-in ballots but some new process or technology to ensure the accuracy of the votes. Ultimately, I don’t pretend to know the right answer, or if we even have a problem, to begin with. What I do know is that if we wish to improve our election systems we must first start with data on where we stand today and then tweak our laws and regulations to simultaneously optimize for precision and recall.

So, the next time a politician proposes changes to our election system ask … no demand, they provide data on the current system and how their proposed changes will impact precision and recall. Because only when we optimize for both these metrics can we stop worrying about making America great again and start working on making America even greater!