I downloaded this dataset from Kaggle where it was uploaded by Ben Hamner. It contains 3 csv files, county_facts.csv, county_facts_dictionary.csv, and primary_results.csv. Using pandas, seaborn, sklearn, random forests, and Plot.ly, I discovered insights and predictions from the data—asking questions like:
What does the education level of the United States look like by County?
How does the counties in the United States look by race?
Which attributes do candidates like Donald Trump and Ted Cruz appeal to? Are they rich or poor, urban or rural, educated or uneducated?
As income, college education, whiteness, and density change, how does that affect the fraction of votes received by each candidate?
I constructed a model using Sklearn’s random forest to predict which candidate would emerge the winner for Democrats and Republicans.
Separated the data into train and test splits with grid search to find the optimal parameters on