PPP: Points per Possession + YPP: Yards per Play + I40/Inside40: Points per trip inside the 40 yard line
The final rating is made up of three separate components: Opponent Based Performance (OPB), Offensive Efficiency (Oeff) and Defensive Efficiency (Deff). Both Oeff and Deff are comprised of the same stats but for their respective side of the ball. They include, points per possession, yards per play, points inside the 40 yard line, turnovers, yards penalized and a few others. Each is give a weight that was determined by its correlation with winning. OBP is based solely on the outcome of the game, straight wins and losses.
Strength of Schedule is determined by the average rating of the opponent. This statistic plays no factor in determining the rating of any team. It is merely there to satisfy any curiosity one might have for the difficulty of a particular schedule.
The approach to determine whether an in-game performance was good or bad, regardless of outcome, is to compare directly with the performance other teams had against the same team. For each opponent a team has, there are 11-13 additional teams whose performance can be directly compared. That is a dozen additional connections we can make for each team. Transitive property doesn't work that well when only connecting two teams by one common opponent but when we add more data points to the equation, the comparisons become a little more reliable.
In the Interactive Chart below, Iowa State's schedule is laid out and hypothetical values are given. Each statistic is a value from 0 to 1 where the value is determined by ISU's performance relative to other teams' performance against the same common opponent. So where ISU has a value of 0.7 for Yards per Play against Iowa, that means ISU had an above average performance relative to Iowa's other opponents in that specific stats category.
Interactive Chart
(permanent feature to be added for all teams in the future)
Iowa State
Is this better or worse than other rating systems?
Over at MasseyRatings, there is a nice table that compiles the rankings of 117 computer systems, finds their mean and then sorts the teams accordingly. What we end up seeing is that the rating systems tend to agree at the top and bottom of the ranking while experiencing increased variation towards the middle. This is hardly surprising as teams at either end have less room to “roam” on one of their sides. And the middle can be placed equally in either direction of where it currently sits. It is the same reason why it is easier to lower your grade point average than raise it.
(Standard deviation of composite rankings for each team)
Comparing the composite ranking with the Goothrey System, we see a very similar distribution as the one above. However, the differences are slightly more erratic as there are direct and near direct agreements throughout the spread instead of just at the ends.
What does this mean? Well, not a whole lot. But it should demonstrate that the ratings produced here are at least reasonable with other established systems. What makes a system more credible than another? It’s ability to predict the future? It’s ability to replicate the past? It’s relation to human polls?
(Difference between composite rank and "Goothrey's" rank for each team)
Which rating system is better? That entirely depends on what any one particular rating system is trying to measure. Oddly enough, they don’t always line up. “Which team is the best” seems like a relatively simple question. But for a sport with such a dynamic landscape, it isn’t. Pitting them all against one another and against the system presented on this site is merely a fun exercise. Perhaps they should all measure the same thing. Perhaps what we will see more of are attempts to emulate the responsibility of the Playoff Committee. In college football’s current format (for FBS), is that what truly matters now?