I wanted to see how long it would take to repro what I did in 2013 with DeepChem. Turns out when you do something for a job you get much better at it.
Games highlighted in green were predicted to be within 2 points.
About the Model
The final Network had a structure of
76 features per game -> 64 relu (0.35 dropout)-> 32 relu (0.35 dropout)-> 1 linear
These are the features used
To play a “game” we append the two teams feature vectors and the network learns the final score with positive values if the first team won. We “play” the game in both orientations and average the results.
We classify based on the sign of the result 74% of games correctly given a random holdout.
For core prediction we get a pearson r^2 of 0.5 from a random split holdout set, bootstrapped and averaged over 5 trials. You can see the misclassifications highlighted in red.
We see very good enrichment and trend, but the vertical gap is still large.
After throwing the model through LIME for model interpretability the most important features were Adjusted Offensive Efficiency, Strength of Schedule Offense, and Strength of Schedule Defense.
Viewing as Win Probabilities
We can create a mapping between point spreads and win probabilities. Round predictions to half points and then use the probability of winning as the probability that we predicted correctly over our holdout cross validation sets.
I used this same technique recently for determining probabilities that molecules had satisfactory efflux properties in a drug discovery project.
You can see how we get fewer and fewer games the more of a blowout the game is.
Plotting win percentage vs game diff gives the approximate shape we would expect. However it is a bit more bumpy than I would like.
To fix this I convolved over a length 3 uniform distribution filter.
|Team||R 32||Sweet 16||Elite 8||Final 4||Championship||Champion|
Power Rankings or Matchups
Is the model learning power rankings or is it learning matchups?
This is the probability that any team beats any other team in the bracket. It is ordered by my models own internal power ranking. You can see that it generally follows power its own internal power ranking but with slight variation.
The one outlier is my model somehow thinking that North Carolina Central will stomp on Villanova. It could be due to a bug or just that the game is so far out of the realm of games that have ever happened that my model has no basis.
Round of 64
Round of 32
Round of 16
Round of 8
Round of 2
Round of 1
Despite being based on Kenpom data my network is not nearly as high on Gonzaga as Kenpom.
The feature vector we have is lacking in a number of ways.
Now that it is done I wish I modelled as a classification task. Doing this would allow me to use game win weighting as a hyper-parameter. From the results of that I could infer whether 1 point wins were actually valuable and repeatable or luck. Also I could do all the cool bayes stuff that 538 does for it’s infographic.
We can add home field advantage in this scheme fairly easily. I also didn’t encode defensive fingerprint data from kenpom as a one-hot encoded value.
These team fingerprints are also a snapshot in time, they don’t cover things like players going on and coming back from injury.