I built a machine learning model to predict this year's tournament games

2,682 Views | 11 Replies | Last: 5 yr ago by 3rdGen2015
boboguitar
How long do you want to ignore this user?
AG
The main algorithm predictions

Algorithm 2

Algorithm 3

Algorithm 4

The Code

The algorithms have been about 71%-73% correct with test data(that percentage is for matchups, not % of the entire bracket). Will be interesting to see how it does.
DeangeloVickers
How long do you want to ignore this user?
AG
Chalk


czar_iv
How long do you want to ignore this user?
AG
What are your inputs to your model?
"Can I Ask What Exactly Is An Aggie? Sure! An Aggie is quite simply the best thing anyone can strive to be!" - Sydney Colson
boboguitar
How long do you want to ignore this user?
AG
czar_iv said:

What are your inputs to your model?
RPI, SOS, Regular season win %, last 10 games win %, seed, and conference. Would have loved to add a bunch more but I did this in about 3 days in my free time. I'll start much earlier next year.

The data used is from 2000 - 2018 to train/test it.
3rdGen2015
How long do you want to ignore this user?
AG
I was in the same boat. I've never done Kaggle competitions before so I thought this would be fun. Put together a rough model over last weekend.

Don't remember what all I wound up using off the top of my head, but it was something like:

  • Win %
  • Seed
  • Offensive 3PT Rate
  • Defensive 3PT Rate
  • Rebounding Differential
  • ORating
  • DRating
  • Pace

I wanted to do a lot more and maybe next year I'll take the full play by play data and see what kind of interesting things I can find through that. The test accuracy on the model that includes the above was also in the low 70% range, so it's interesting that there's not much difference between our two methods.
boboguitar
How long do you want to ignore this user?
AG
One thing that surprised me was that I one hot encoded the conferences and it made almost no difference in the test accuracy.

If you'd like, PM me and I ca share my final transformed data and you can use it in your model. Would be interesting to see if any better accuracy came out of it.
DollahBillzYo!
How long do you want to ignore this user?
This is pretty cool - love to see some metrics.
Goose
How long do you want to ignore this user?
AG
boboguitar said:

One thing that surprised me was that I one hot encoded the conferences and it made almost no difference in the test accuracy.

If you'd like, PM me and I ca share my final transformed data and you can use it in your model. Would be interesting to see if any better accuracy came out of it.


I would assume the impact conference makes is mostly accounted for with SOS.
EXCELL
How long do you want to ignore this user?
So how is this workin' for ya ???
boboguitar
How long do you want to ignore this user?
AG
EXCELL said:

So how is this workin' for ya ???


78.7% correct as of now. A little over test accuracy. It was 72% at the end of yesterday.

There's a lot more inputs I could have fed it given more time. It just takes a while to build/transform the data into the form you want. I'll probably start building one for cfb starting in June or July.
WestAustinAg
How long do you want to ignore this user?
AG
boboguitar said:

czar_iv said:

What are your inputs to your model?
RPI, SOS, Regular season win %, last 10 games win %, seed, and conference. Would have loved to add a bunch more but I did this in about 3 days in my free time. I'll start much earlier next year.

The data used is from 2000 - 2018 to train/test it.

I wonder if game site location plays any factor in outcomes.
GrayMatter
How long do you want to ignore this user?
AG
Interesting. How much of a factor is experience of a team with upperclassmen and/or teams with tournament experience?
The conversations will be uncomfortable, but we all have to get comfortable with being uncomfortable for progress to be made.
3rdGen2015
How long do you want to ignore this user?
AG
WestAustinAg said:

boboguitar said:

czar_iv said:

What are your inputs to your model?
RPI, SOS, Regular season win %, last 10 games win %, seed, and conference. Would have loved to add a bunch more but I did this in about 3 days in my free time. I'll start much earlier next year.

The data used is from 2000 - 2018 to train/test it.

I wonder if game site location plays any factor in outcomes.

I read something on 538 saying there is data that shows teams play worse the further they have to travel. I don't know how significant of an effect it is though.
Refresh
Page 1 of 1
 
×
subscribe Verify your student status
See Subscription Benefits
Trial only available to users who have never subscribed or participated in a previous trial.