I built a machine learning model to predict this year's tournament games

boboguitar

Joined: Jul 6, 2005

Posts: 37,773

User Profile Ignore User Stop Ignoring

How long do you want to ignore this user?

24 hours One week Permanently Cancel

1:14p, 3/21/19

AG

The main algorithm predictions

Algorithm 2

Algorithm 3

Algorithm 4

The Code

The algorithms have been about 71%-73% correct with test data(that percentage is for matchups, not % of the entire bracket). Will be interesting to see how it does.

4

DeangeloVickers

Joined: Sep 17, 2002

Posts: 48,038

User Profile

Private Message

Ignore User Stop Ignoring

How long do you want to ignore this user?

24 hours One week Permanently Cancel

2:04p, 3/21/19

AG

Chalk

7

czar_iv

Joined: Sep 1, 2003

Posts: 12,134

User Profile Ignore User Stop Ignoring

How long do you want to ignore this user?

24 hours One week Permanently Cancel

3:16p, 3/21/19

AG

What are your inputs to your model?

"Can I Ask What Exactly Is An Aggie? Sure! An Aggie is quite simply the best thing anyone can strive to be!" - Sydney Colson

boboguitar

Joined: Jul 6, 2005

Posts: 37,773

User Profile Ignore User Stop Ignoring

How long do you want to ignore this user?

24 hours One week Permanently Cancel

In reply to czar_iv • 3:23p, 3/21/19

AG

czar_iv said:
What are your inputs to your model?

RPI, SOS, Regular season win %, last 10 games win %, seed, and conference. Would have loved to add a bunch more but I did this in about 3 days in my free time. I'll start much earlier next year.

The data used is from 2000 - 2018 to train/test it.

1

1 edit

3rdGen2015

Joined: Jan 24, 2010

Posts: 4,886

User Profile Ignore User Stop Ignoring

How long do you want to ignore this user?

24 hours One week Permanently Cancel

5:01p, 3/21/19

AG

I was in the same boat. I've never done Kaggle competitions before so I thought this would be fun. Put together a rough model over last weekend.

Don't remember what all I wound up using off the top of my head, but it was something like:

Win %
Seed
Offensive 3PT Rate
Defensive 3PT Rate
Rebounding Differential
ORating
DRating
Pace

I wanted to do a lot more and maybe next year I'll take the full play by play data and see what kind of interesting things I can find through that. The test accuracy on the model that includes the above was also in the low 70% range, so it's interesting that there's not much difference between our two methods.

boboguitar

Joined: Jul 6, 2005

Posts: 37,773

User Profile Ignore User Stop Ignoring

How long do you want to ignore this user?

24 hours One week Permanently Cancel

In reply to 3rdGen2015 • 5:12p, 3/21/19

AG

One thing that surprised me was that I one hot encoded the conferences and it made almost no difference in the test accuracy.

If you'd like, PM me and I ca share my final transformed data and you can use it in your model. Would be interesting to see if any better accuracy came out of it.

DollahBillzYo!

Joined: Mar 15, 2019

Posts: 228

User Profile Ignore User Stop Ignoring

How long do you want to ignore this user?

24 hours One week Permanently Cancel

12:34p, 3/23/19

This is pretty cool - love to see some metrics.

Goose

Joined: Nov 21, 2000

Posts: 34,556

User Profile Ignore User Stop Ignoring

How long do you want to ignore this user?

24 hours One week Permanently Cancel

In reply to boboguitar • 2:34p, 3/23/19

AG

boboguitar said:
One thing that surprised me was that I one hot encoded the conferences and it made almost no difference in the test accuracy.

If you'd like, PM me and I ca share my final transformed data and you can use it in your model. Would be interesting to see if any better accuracy came out of it.

I would assume the impact conference makes is mostly accounted for with SOS.

EXCELL

Joined: Jan 6, 2018

Posts: 1,504

User Profile Ignore User Stop Ignoring

How long do you want to ignore this user?

24 hours One week Permanently Cancel

3:26p, 3/23/19

So how is this workin' for ya ???

boboguitar

Joined: Jul 6, 2005

Posts: 37,773

User Profile Ignore User Stop Ignoring

How long do you want to ignore this user?

24 hours One week Permanently Cancel

In reply to EXCELL • 3:30p, 3/23/19

AG

EXCELL said:
So how is this workin' for ya ???

78.7% correct as of now. A little over test accuracy. It was 72% at the end of yesterday.

There's a lot more inputs I could have fed it given more time. It just takes a while to build/transform the data into the form you want. I'll probably start building one for cfb starting in June or July.

1 edit

WestAustinAg

Joined: Aug 24, 2001

Posts: 32,560

User Profile

Private Message

Ignore User Stop Ignoring

How long do you want to ignore this user?

24 hours One week Permanently Cancel

In reply to boboguitar • 10:12a, 3/24/19

AG

boboguitar said:
czar_iv said:
What are your inputs to your model?
RPI, SOS, Regular season win %, last 10 games win %, seed, and conference. Would have loved to add a bunch more but I did this in about 3 days in my free time. I'll start much earlier next year.

The data used is from 2000 - 2018 to train/test it.

I wonder if game site location plays any factor in outcomes.

GrayMatter

Joined: Mar 29, 2007

Posts: 5,973

User Profile

Private Message

Ignore User Stop Ignoring

How long do you want to ignore this user?

24 hours One week Permanently Cancel

10:31a, 3/24/19

AG

Interesting. How much of a factor is experience of a team with upperclassmen and/or teams with tournament experience?

The conversations will be uncomfortable, but we all have to get comfortable with being uncomfortable for progress to be made.

3rdGen2015

Joined: Jan 24, 2010

Posts: 4,886

User Profile Ignore User Stop Ignoring

How long do you want to ignore this user?

24 hours One week Permanently Cancel

In reply to WestAustinAg • 1:41p, 3/24/19

AG

WestAustinAg said:
boboguitar said:
czar_iv said:
What are your inputs to your model?
RPI, SOS, Regular season win %, last 10 games win %, seed, and conference. Would have loved to add a bunch more but I did this in about 3 days in my free time. I'll start much earlier next year.

The data used is from 2000 - 2018 to train/test it.

I wonder if game site location plays any factor in outcomes.

I read something on 538 saying there is data that shows teams play worse the further they have to travel. I don't know how significant of an effect it is though.