Analyzing unexpected outcomes 2015-2022

#1

Stephen Gray

Well-Known Member
Joined
Sep 12, 2011
Messages
1,047
Likes
1,571
#1
So I've been looking at some game data to try to make sense of how the last few coaches have performed in games in which the expected winner did not win. This can be upsets, where we had the higher pregame probability of victory but failed to convert, or surprises where we won despite not being expected to. Mostly I wanted to see what our talent looked like in those games to get an idea for how we might have been over or under performing relative to that. The API I'm working with can only pull talent data from 2015, so we miss some of Jones early years, as well Dooley for now.

Still working on making sense of things, but one thing to notice is a negative diff means the other team was more talented, white a positive diff means they were less so. A very large positive diff such as in the Georgia State and BYU games indicates a much less talented team beat us, while negative diff for Alabama shows that the team we beat was significantly more talented on paper.

South Carolina is about as talented as we are in theory, but if you look at the probability we were expected to win 94% of the time, making it a rather large upset in that regard. In terms of good surprises, Pruitt has the most so far in 2019, but if you look at the talent differences, he achieved them with more talented players. If you look at his upsets, he also lost those games despite fielding significantly more talented teams. In some cases the differential was huge. Actually, our talent composite right now is the lowest of any of the years analyzed.

From 2015 at least, Jones had one surprise win, with was a somewhat more talented Georgia team that he beat in 2015. Other than that all of the unexpected outcomes during that period for him are upset losses that he lost despite having a talent differential of 107-209.

Here's the API I've been working with: GitHub - CFBD/cfbd-python

If anyone is interested I can share the code.

unexpected_outcomes.png

Partial Key:
prob = Pregame win probability for Tennessee
tal1 = Tennessee talent composite
tal2 = Opponent talent composite
diff = tal1 - tal2
outcome = UPSET/SURPRISE/EXPECTED/TBD (Here EXPECTED is filtered out, UPSET is an unexpected loss, and SURPRISE is an unexpected win)
 
#4
#4
The talent rankings are interesting. I assume it is based on recruiting rankings coming out of high school? It would be cool to see rankings revised for players each year in college (I know, impossible to do). But, looking at our talent, 2020 was our highest year and then it has dropped off. Obviously we had a lot of players highly rated as recruits leave, but it made me think of this issue: JG vs HH. JG was composite rated .9628 and HH was .8903. Maybe that was accurate in high school, but obviously HH has been 10x’s more talented than JG at UT. Yet, my guess is the ‘talent ratings’ don’t reflect that.

Probably it all evens out across the entire team, but just interesting. Great job putting in the time with this work and sharing it!
 
  • Like
Reactions: Stephen Gray
#6
#6
So I've been looking at some game data to try to make sense of how the last few coaches have performed in games in which the expected winner did not win. This can be upsets, where we had the higher pregame probability of victory but failed to convert, or surprises where we won despite not being expected to. Mostly I wanted to see what our talent looked like in those games to get an idea for how we might have been over or under performing relative to that. The API I'm working with can only pull talent data from 2015, so we miss some of Jones early years, as well Dooley for now.

Still working on making sense of things, but one thing to notice is a negative diff means the other team was more talented, white a positive diff means they were less so. A very large positive diff such as in the Georgia State and BYU games indicates a much less talented team beat us, while negative diff for Alabama shows that the team we beat was significantly more talented on paper.

South Carolina is about as talented as we are in theory, but if you look at the probability we were expected to win 94% of the time, making it a rather large upset in that regard. In terms of good surprises, Pruitt has the most so far in 2019, but if you look at the talent differences, he achieved them with more talented players. If you look at his upsets, he also lost those games despite fielding significantly more talented teams. In some cases the differential was huge. Actually, our talent composite right now is the lowest of any of the years analyzed.

From 2015 at least, Jones had one surprise win, with was a somewhat more talented Georgia team that he beat in 2015. Other than that all of the unexpected outcomes during that period for him are upset losses that he lost despite having a talent differential of 107-209.

Here's the API I've been working with: GitHub - CFBD/cfbd-python

If anyone is interested I can share the code.

View attachment 517359

Partial Key:
prob = Pregame win probability for Tennessee
tal1 = Tennessee talent composite
tal2 = Opponent talent composite
diff = tal1 - tal2
outcome = UPSET/SURPRISE/EXPECTED/TBD (Here EXPECTED is filtered out, UPSET is an unexpected loss, and SURPRISE is an unexpected win)
So using these talent metrics, GA State was a much bigger upset. Differences being spread vs. final scores and of course Ga State was early in season, so we did not have as much to lose, aside from dignity , as this 2022 team did. This is what makes the USCe loss tbe 2nd worst ever. IMO. Only 2001 SECG was worse.
 
#7
#7
That’s a thoughtful look at the talent, but I don’t think it explains the loss. Aggregate talent does not matchup against aggregate talent. It is a matchup game. We usually play that like a violin, when we can. They have some of the biggest/best receivers we faced. We had two of our half competent DBs out, and of course Banks. Typically our real blitzes depend on Banks making the call on 3,4, or 5 rushing. He keys on the RB and the fake. With him out, almost all of our blitzes were single “extra man” blitzes. The one time we blitzed an odd set, we nailed Rattler. Playing soft zone on them was agonizing to watch. Playing man resulted in touchdowns. SC had a pretty brilliant gameplan. They kept us honest with play action, ran their slot into TE patterns, and flooded our slow safeties with two receivers in their zones. Missing those three guys made it nearly impossible to handle their passing game. Rattler also had the night of his life, the refs called a terrible game, and Hyatt, Tillman and Small were all playing hurt. It was a perfect storm, and a terrible matchup with those players gone. I didn’t like the matchup with their receivers with us full strength, and wondered how we’d handle them. Full strength, I think we’d have beaten them, but we’ll never know.
 
#8
#8
Nice work, Stephen Gray!

For some of us, watching football and following a team is an emotional experience with some analysis attached. For others, it's an analytical experience with emotions attached. (I wonder if our football personalities are also reflected in how we look at cars, or make love. Please--no chart on that last one!)

So I'm one of those wretches who enjoys the analytical, but is also mathematically challenged to the nth degree. (I have no idea what "the nth degree" means, but it allows me to imply a value without using numerals, and that's my comfort zone).

Picking up on what you've already established with your chart... what caused each of those upsets and surprises?

Sometimes an unexpected trend or storyline takes over: a team has a bad start and the crowd reaction magnifies the impact, whether home or away. An example would be how LSU's opening kickoff fumble this season and our following score negated home field advantage for the rest of the game. LSU played one of their worst games thereafter, but if the fumble had simply bounced out of bounds, the rest of the game follows an entirely different storyline.

The trend or storyline can be changed when an injury removes a dominant player. But, that loss of a key player can also reshuffle the lineup so that a previously successful defensive scheme is negated (like suddenly facing running QB Matt Mauck in the '01 SEC Championship game).

Sometimes it's a few key plays--usually turnovers or explosive plays--which put points on the board for the team that's not actually prevailing on the field, either in time of possession or yardage.

Have any of these factors been digitized? Or are they more accurately related as narrative? I think narrative is the more accurate way to examine a football game, for two reasons.

With 22 players on the field, each capable of missing an assignment (or by skill, speed, power, scheme or anticipation negating someone's completion of an assignment) every football play brings nearly innumerable potential variables.

Furthermore, timing weights the value of every variable. A missed block on a running play can result in a 2-yard gain instead of a 5-yard gain. But whether it happens on 1st and 10, or on 3rd and 4, determines the value of that missed block. A missed block on a passing play can result in a rushed incompletion, a holding call, a sack, a sack and a fumble, an interception... or a quick dump off pass to a running back who breaks a tackle and goes 40 yards for a score. The missed block will count against the player's game assessment in his position meeting Monday... but its impact on the scoreboard might be zero--or even a positive!

So I don't know how deep in data you'd have to go to find revealing statistics relevant to the outcome. Sometimes, team competition is just a movie being written in real time.

It may be frowned upon by movie and stage critics, but in real life, sometimes the ending really is determined by deus ex machina.

 
Last edited:
#9
#9
So using these talent metrics, GA State was a much bigger upset. Differences being spread vs. final scores and of course Ga State was early in season, so we did not have as much to lose, aside from dignity , as this 2022 team did. This is what makes the USCe loss tbe 2nd worst ever. IMO. Only 2001 SECG was worse.
Memphis 1996 was worse.
 
#12
#12
So I've been looking at some game data to try to make sense of how the last few coaches have performed in games in which the expected winner did not win. This can be upsets, where we had the higher pregame probability of victory but failed to convert, or surprises where we won despite not being expected to. Mostly I wanted to see what our talent looked like in those games to get an idea for how we might have been over or under performing relative to that. The API I'm working with can only pull talent data from 2015, so we miss some of Jones early years, as well Dooley for now.

Still working on making sense of things, but one thing to notice is a negative diff means the other team was more talented, white a positive diff means they were less so. A very large positive diff such as in the Georgia State and BYU games indicates a much less talented team beat us, while negative diff for Alabama shows that the team we beat was significantly more talented on paper.

South Carolina is about as talented as we are in theory, but if you look at the probability we were expected to win 94% of the time, making it a rather large upset in that regard. In terms of good surprises, Pruitt has the most so far in 2019, but if you look at the talent differences, he achieved them with more talented players. If you look at his upsets, he also lost those games despite fielding significantly more talented teams. In some cases the differential was huge. Actually, our talent composite right now is the lowest of any of the years analyzed.

From 2015 at least, Jones had one surprise win, with was a somewhat more talented Georgia team that he beat in 2015. Other than that all of the unexpected outcomes during that period for him are upset losses that he lost despite having a talent differential of 107-209.

Here's the API I've been working with: GitHub - CFBD/cfbd-python

If anyone is interested I can share the code.

View attachment 517359

Partial Key:
prob = Pregame win probability for Tennessee
tal1 = Tennessee talent composite
tal2 = Opponent talent composite
diff = tal1 - tal2
outcome = UPSET/SURPRISE/EXPECTED/TBD (Here EXPECTED is filtered out, UPSET is an unexpected loss, and SURPRISE is an unexpected win)
Why do you have the Vandy game of 2022 on their tentatively?
 
  • Like
Reactions: Stephen Gray
#13
#13
Nice work, Stephen Gray!

For some of us, watching football and following a team is an emotional experience with some analysis attached. For others, it's an analytical experience with emotions attached. (I wonder if our football personalities are also reflected in how we look at cars, or make love. Please--no chart on that last one!)

So I'm one of those wretches who enjoys the analytical, but is also mathematically challenged to the nth degree. (I have no idea what "the nth degree" means, but it allows me to imply a value without using numerals, and that's my comfort zone).

Picking up on what you've already established with your chart... what caused each of those upsets and surprises?

Sometimes an unexpected trend or storyline takes over: a team has a bad start and the crowd reaction magnifies the impact, whether home or away. An example would be how LSU's opening kickoff fumble this season and our following score negated home field advantage for the rest of the game. LSU played one of their worst games thereafter, but if the fumble had simply bounced out of bounds, the rest of the game follows an entirely different storyline.

The trend or storyline can be changed when an injury removes a dominant player. But, that loss of a key player can also reshuffle the lineup so that a previously successful defensive scheme is negated (like suddenly facing running QB Matt Mauck in the '01 SEC Championship game).

Sometimes it's a few key plays--usually turnovers or explosive plays--which put points on the board for the team that's not actually prevailing on the field, either in time of possession or yardage.

Have any of these factors been digitized? Or are they more accurately related as narrative? I think narrative is the more accurate way to examine a football game, for two reasons.

With 22 players on the field, each capable of missing an assignment (or by skill, speed, power, scheme or anticipation negating someone's completion of an assignment) every football play brings nearly innumerable potential variables.

Furthermore, timing weights the value of every variable. A missed block on a running play can result in a 2-yard gain instead of a 5-yard gain. But whether it happens on 1st and 10, or on 3rd and 4, determines the value of that missed block. A missed block on a passing play can result in a rushed incompletion, a holding call, a sack, a sack and a fumble, an interception... or a quick dump off pass to a running back who breaks a tackle and goes 40 yards for a score. The missed block will count against the player's game assessment in his position meeting Monday... but its impact on the scoreboard might be zero--or even a positive!

So I don't know how deep in data you'd have to go to find revealing statistics relevant to the outcome. Sometimes, team competition is just a movie being written in real time.

It may be frowned upon by movie and stage critics, but in real life, sometimes the ending really is determined by deus ex machina.


It's a nice chart to look at.

But football isn't played on an excell file.
How exactly is he qualifying the talent of each team?

Recruiting rankings??

That variable alone throws the metrics off.
According to his data, a 3-7 Texas A&M team beating a 9-1 Tennessee team in 2022 would be status quo.

Emotion and mental state cannot be measured. It can only be observed , and that is a large aspect of the game of Football.
 
  • Like
Reactions: Stephen Gray
#14
#14
I'll speak for everyone who went to public school.

Wut?
I went to public school, too. If it’s boring to you, it’s probably not going to make sense. Honestly it’s not worth the investment in that case.

I’m interested in how that talent metric is derived. I’ll take a look at the code.
 
  • Like
Reactions: Stephen Gray
#16
#16
Memphis 1996 was worse.
From an analytics point of view, that Memphis State loss would have to be one of the worst in college football history. I know people would point to the USC loss to Stanford when Vegas had them at 40+ or so favorites as the worst, but Stanford was on an upward trajectory with Harbaugh. Memphis was…..well….BAD all the way around.
 
#17
#17
From an analytics point of view, that Memphis State loss would have to be one of the worst in college football history. I know people would point to the USC loss to Stanford when Vegas had them at 40+ or so favorites as the worst, but Stanford was on an upward trajectory with Harbaugh. Memphis was…..well….BAD all the way around.
That killed any shot we had of working our way back into the National Championship conversation. That Memphis team finished 4-7 including a 3 TD loss to a 5-6 Mississippi State team.
 
  • Like
Reactions: Stephen Gray
#18
#18
I went to public school, too. If it’s boring to you, it’s probably not going to make sense. Honestly it’s not worth the investment in that case.

I’m interested in how that talent metric is derived. I’ll take a look at the code.
I'm a data analytics guy. I use it for work all the time. My post was in jest.
 
#19
#19
That killed any shot we had of working our way back into the National Championship conversation. That Memphis team finished 4-7 including a 3 TD loss to a 5-6 Mississippi State team.
And we had some QB…..can’t think of his name 🤔I was in HS and our team manager had tickets and invited me to go to that game. I’ve sure never regretted staying home 😅
 
#20
#20
It's a nice chart to look at.

But football isn't played on an excell file.
How exactly is he qualifying the talent of each team?

Recruiting rankings??

That variable alone throws the metrics off.
According to his data, a 3-7 Texas A&M team beating a 9-1 Tennessee team in 2022 would be status quo.

Emotion and mental state cannot be measured. It can only be observed , and that is a large aspect of the game of Football.

I don’t disagree with any of that, and yes it is composite recruiting rankings. Those who say football is more than that, I agree. Still it’s useful to look at as one factor. What I do think it shows is people doing more or less with athletes on average.

I don’t think it can explain something like the SC loss on its own. Who you field is not just how good they looked when recruited. Also that’s why I am including pregame win probability as well. Those two variables are helpful to put things into a bit better perspective. Still need to look at other things to draw solid conclusions, it is not enough to function as a real model.

So Tennessee might be similar in composite talent as SC but they also were heavily expected to win that game (94% in a prediction index). Something else I haven’t factored in at all yet is how badly you win or lose.

On the other hand if there is a huge gap in perceived talent and you were 95% predicted to win pregame (lookout at you Pruitt with Georgia state and BYU), or if all of your perceived unexpected outcomes come when you have a big talent advantage it could suggest you are doing s lot less with more.

Definitely this doesn’t tell a complete story. It’s not even enough to make a good model, but it is a useful thing to look to. What I’m working on right is play analysis. The game win probabilities change during the game.

I can pull the play by play information for every game and with it comes a win probability at that point in the game using that prediction index data. Also it includes a lot of useful information that something could be done with, like down and distance and so on. Mostly right now I’m working to get the data and including simple metrics but have some thoughts. Check out collegefootballdata.com blog posts to get an idea of what I want to play with a bit.

Suppose one gets late in the game and blows a big lead to lose the game. Even if they were big underdogs on paper before the game began, if they are far enough ahead where something like a prediction index has them at 90% to win and then blow it then sure, it looks like the pregame expectation was met but it could mean something, especially if a team is doing things like that frequently.

The reason the Vanderbilt game is on there is because I only filtered out EXPECTED in the outcome column. Since the game hasn’t been played it was not assigned a proper outcome. Basically the code I’m playing with generates a big csv file and I was manually filtering things. Not trying to say anything about Vanderbilt. Just an oversight on my part to not filter out the TBD.
 
  • Like
Reactions: LittleVol
#21
#21
Not impossible at all, just a lot of work.

Actually there are metrics like PPA/EPA expected point stuff that can be computed on a per team and per player basis and we could compute such things a lot of different ways. Some ways are simple and some complex and a complex method doesn’t necessarily mean better (but it might).

Looking at how someone or a whole team or a unit like offense/defense has been doing lately with numerical methods could be a better starting spot for a real model. Right now though more data wrangling and getting basic metrics then I am drawing super useful conclusions.
 
  • Like
Reactions: BlinghamptonVol
#22
#22
I went to public school, too. If it’s boring to you, it’s probably not going to make sense. Honestly it’s not worth the investment in that case.

I’m interested in how that talent metric is derived. I’ll take a look at the code.

Yeah, I’ll share it. I didn’t compute it. It’s pulled from an api, the one I linked. Unfortunately his documentation is lacking and the api is mostly just making requests and to his backend where it looks like real computations are made. Gotta go do thanksgiving but I’ll try to get it up here later.

Looking at his python library doesn’t tell you much. I -believe- it’s something like composite 247 or rivals rankings but I’m not 100% I intend to ask on GitHub or send an email about that and the pregame predictions. Right now I’m just playing with it and thought it would be fun to share.
 
#23
#23
Yeah, I’ll share it. I didn’t compute it. It’s pulled from an api, the one I linked. Unfortunately his documentation is lacking and the api is mostly just making requests and to his backend where it looks like real computations are made. Gotta go do thanksgiving but I’ll try to get it up here later.

Looking at his python library doesn’t tell you much. I -believe- it’s something like composite 247 or rivals rankings but I’m not 100% I intend to ask on GitHub or send an email about that and the pregame predictions. Right now I’m just playing with it and thought it would be fun to share.
Thank you for contributing to the Vol community from your interests and knowledge.

FYI:
The average turkey is 70% white meat and 30% dark meat.
41% of Americans prefer white meat, 17% dark meat, 32% both.

Whatever your preferences, Mr. Gray, may your end-of-day tally be a favorable surprise.
 
#24
#24
Thank you for contributing to the Vol community from your interests and knowledge.

FYI:
The average turkey is 70% white meat and 30% dark meat.
41% of Americans prefer white meat, 17% dark meat, 32% both.

Whatever your preferences, Mr. Gray, may your end-of-day tally be a favorable surprise.

Thank you! Hope you have a great one also
 
#25
#25
Thank you for contributing to the Vol community from your interests and knowledge.

FYI:
The average turkey is 70% white meat and 30% dark meat.
41% of Americans prefer white meat, 17% dark meat, 32% both.

Whatever your preferences, Mr. Gray, may your end-of-day tally be a favorable surprise.
How many prefer the turkey neck and giblets?
 
  • Like
Reactions: Stephen Gray

VN Store



Back
Top