Welcome to Benford Bench: Digital Forensics Research in Dynamic Data

First we FOIA the time-stamp machine logs of several types of machine data sets:
1 Central Tabulator (Machine that receives all votes)
2 Ballot Scanner (Machine that scans the ballots)
3 Voting Machine (Touch screen machine for voting)
4 Mail-in Ballot (When Mail-in ballots are entered into the central tabulator, this data is included when requesting item 1)

In requesting the data, we qoute the 2002 Voting Standards Manual the following way:

 Machine logs are required to be made available for public inspection as specified in the 2002 Voting System Standards Vol 1 to be maintained for the purpose of "verifying the correctness of reported election results. [Sec: 2.2.5.1]" as we will be using a digital forensics model to test admissibility of such methods to the in-election time stamp data. Furthuremore "The timing and sequence of audit record entries is as important as the data contained in the record. All voting systems shall meet the following requirements for the time, sequence and preservation of the audit records...[Sec: 2.2.5.2.1 of Operational Requirements]"

In the data pool we seek to investigate whether there is more or irrevelant emergence of the Benford curve in the time-stamps of the ballot scanner and ballot machine logs of the 2019 Mayoral Election, comparing any deviation from that curve by ward and precinct. With initial testing of the 2018 "ballot stops" of the general election with the central tabulator we did see a heavy inclination to the Benford distribution. However since that data is sent in chunks from the ballot machines, we would not expect to see a full Benford distribution. An even distribution would suggest randomness.

While we look to attain a control pool, our objective at this point is to investigate consistency in Benford distribution among time stamps with In-Election data and comparing that to some more mechanical or less dynamic data time-stamps such as time-stamps from the Central Tabulator or Maintenance logs.
-----------------------------------------------------------------
Method (cat or type depending on system):
# extract specific expression such as "ballot stops" to a file
type/cat elections.csv | grep "expression" > file.csv
# extract all times
type/cat file.csv | grep -o [0-2][0-9]:[0-5][0-9]:[0-5][0-9] >> file.dat
# remove all colons
sed s/[:]//g file.dat > file_final.dat
# Check file line numbering (DOS/CMD only) for check match:
find /v /c "" file.dat
# Run Benford Bench program and compare results with indepedent analysis for verification purposes
# The Benford program may ask some irrevalent questions, as default options are best you can press [ENTER] without any values to proceed to next question. You will want to choose to include all digits of the time stamps, as this is the method to provide Benford distribution for unadulterated time-stamps.
# Simply run the Benford program with
./benford

Results:
# 48th Ward Maintenance Logs of Voting Machines (2016 General Election, Chicago Board of Elections) Pre- and Post- (not In-Election)

2016 - Maintenance Logs

# Central Tabulator (2018 General Election, Chicago Board of Elections)

2018 - Central Tabulator

# Central Tabulator (2019 Mayoral Election, Chicago Board of Elections)


@ First 1210 Batches of Ballot Sends (1-30 at a time) of CT:


First 1210 Batched Ballot Scans (1-30) CT

@ All CT Batches (1-30 at a time):

CT All Batches (1-30 at a time) 2019

# Ballot Scanner (2019 Mayoral Election, Chicago Board of Elections)

# Ballot Machine (2019 Mayoral Election, Chicago Board of Elections)

# Mail-in Ballots (2018 General Election, Chicago Board of Elections)

All mail-in ballots batched CT 2018

# Mail-in Ballots (2019 Mayoral Election, Chicago Board of Elections)

2019 from CT all Mail-In Ballots

Conclusions (so far):

Digit distribution for batches of 1-30 ballots at a time in irregular intervals is relatively consistent for the data sets of 2018 and 2019 and can be distinguished from the maintenance logs of 2016. Mail in ballots is strikingly similar in comparision of 2019 with 2018 which suggests perhaps that CT processing maybe preprogrammed to count mail-in ballots at similar times. Further analysis is needed to confirm that hypothesis.

2016-2019 (4-up) contact sheet (click for PDF):

4-up 2016-2019


Log Data for Download (Released from the Chicago Board of Elections):

# 2016 Maintenance Logs
# 2018 General Election CT logs
# 2019 Mayoral Election Voting Machine and Ballot Scanner logs (pending release: Followup Sent) [CT logs released here]

Created: 2019-3-6, Updated: 2019-4-10, 10:30 PM CST. Project Started June of 2014