The fate of an NFL franchise often rests on the decisions made on draft day, as teams seek out prospects who will make an impact as pros. The pre-draft evaluation process, however, is often complicated by the fact that it takes place in an environment prone to . In an effort to provide more objective analysis, NFL's Next Gen Stats team created the Next Gen Stats Draft Model, a refined and more predictive version of the 2019 model. The model aims to answer these key questions about how prospects will fare at the next level:
1) How athletic is a player based on measurable drills at the NFL ?
2) How productive was a player in college based on on-field performance?
3) How big is a player relative to other players in their position group?
4) How will a player perform in the NFL based on their athleticism, production and size profiles?
Supporting the evaluation process
Before we get to the Next Gen Stats Model, consider that the intent in creating it was to support -- not replace -- traditional scouting practices. And while it may be difficult to quantify all aspects of a draft-eligible prospect, the annual NFL creates a unique data-collection opportunity for NFL decision makers, analysts and fans.
NFL front offices spend months parsing hundreds of quantitative and qualitative data points on any given draft-eligible prospect, some with the aid of analytical research. The trick is separating the signal (the inputs that are relevant to a player's likelihood of achieving NFL success) from the noise (those inputs with minimal predictive power). Knowing which metrics are important, how they interact and what the significant thresholds of success are provides tremendous value in the decision-making process.
Our draft model leverages the quantifiable data collected during the pre-draft process by identifying thresholds and interactions of the most important features that best predict NFL success. A prospect's chances of pro success will hardly be represented by any one feature; rather, we want to focus on the collection of key traits that matter most for each position on the field. Enter composite scores.
The results of our position-specific models are transformed into composite scores, ranging from 50 to 99, representing the measurable dimensions of an NFL prospect: athleticism, college production, size and overall profile.
How the draft model works
It can be helpful to think of the model as several different position-specific models, tailored specifically to distinguish between the traits needed for success at each position group, with players separated into the following positions: quarterback (QB), running back (RB), wide receiver (WR), tight end (TE), offensive tackle (T), guard (G), center (C), edge (ED), defensive tackle (DT), linebacker (LB), cornerback (CB) and safety (S).
The models use a decision-tree-based algorithm called XGBoost to predict the likelihood that the player will become an NFL starter or Pro Bowler within the first three seasons of his career. The models are trained on historical data from the NFL combine (since 2003) and on-field college statistics (since 2005), with rigorous feature selection techniques applied to each position-specific model. The resulting probabilities are converted into composite scores for each player -- representing athleticism, production, size and final overall score -- driven by the key traits that best predict NFL success.
The XGBoost algorithm doesn't just build one decision tree; it builds hundreds to thousands of trees, learning more about the relationship between features and the output with every tree iteration. In recent years, threshold analysis . A running back prospect doesn't necessarily have to be the fastest or biggest player at his position, he just has to be fast enough and big enough to succeed at running back. This theory aligns with the objective of decision trees, to find the series of splits and interactions between features that best improve model accuracy. was also critical for the results to be interpretable. Adding monotonicity ensures the relationship between our inputs (i.e., 40-yard dash times) and output (starter/Pro Bowler probability) follows a directional relationship aligned with our understanding of the data.
The results
So how did the models perform when the new versions were trained back on the 2019 draft class? There were 13 combine participants with a final draft score of 92 or higher, and all 13 players started at least four games during their rookie season: ED Josh Allen (99; 4 starts, 16 games), S Juan Thornhill (99; 16 starts), TE Noah Fant (99; 11 starts, 16 games), DT Quinnen Williams (98; 9 starts, 13 games), DT Dexter Lawrence (98; 16 starts), LB Devin White (98; 13 starts, 13 games), LB Devin Bush (95; 15 starts, 16 games), ED Montez Sweat (95; 16 starts), DT Ed Oliver (95; 7 starts, 16 games), WR Deebo Samuel (94; 11 starts, 15 games), WR Marquise Brown (93; 11 starts, 14 games), CB Jamel Dean (93; 5 starts, 13 games) and QB Gardner Minshew (92; 12 starts, 14 games).
We're still awaiting the results from the 2020 NFL , but in the meantime, it's worth taking a look at the college production of perhaps the deepest position in this year's class: the wide receivers. At the top of the production score list, we find three players tied with a mark of 94: CeeDee Lamb (Oklahoma), Jerry Jeudy (Alabama) and Tyler Johnson (Minnesota). Since all three prospects also have a similar size score (between 75 and 78), the athleticism drills at this weekend's combine in Indianapolis will be the ultimate sorter of the top receivers by final draft score. As of this writing, Tyler Johnson is not expected to run the 40-yard dash when receivers work out on Thursday, leaving CeeDee Lamb and Jerry Jeudy as the front-runners for the spot atop our rankings.
What's next?
Throughout the combine and the days leading up to April's draft, we will break down the results of the Next Gen Stats Draft Model for the 2020 draft class. We will highlight the top prospects at each position, break down the position-based traits that equate to pro success, compare this year's class to prospects of the past, identify the most likely studs and duds, and so much more.
Follow on Twitter for updates in the weeks leading up to the 2020 .