Sunday, February 23, 2014

DataPuck: A Primer to Hockey’s Place in the World of Big Data

At next weekend’s MIT Sloan Sports Analytics Conference, research papers on the “datafication” of baseball and basketball will show how those respective sports are utilizing new data streams and techniques to deliver new analytical insights on evaluating player performance. Hockey fans should not expect the same.

"Tilted Ice", the only hockey paper to be accepted for Sloan, will presents an insightful look on how teams (although not individual players) change on-ice behavior in third periods of contests. There will additionally be a panel on Friday dubbed “Hockey Analytics: Out of the Ice Age” that will discuss the use of analytical judgement for individual player evaluation. If contrasted with recent developments in basketball, hockey is still a ways away from a warm period. 

Take, for example, EPV (short for Expected Possession Value), a methodology to be presented in Sloan  research paper "POINTWISE" that utilizes optical tracking data from the SportVU cameras the NBA installed earlier this year in collaboration with STATS LLC to evaluate player performance and decision making. Another Sloan 2014 research paper, "The Hot Hand", additionally utilizes the NBA's optical tracking data to evaluate a different metric of performance (streakiness). While the use of optical tracking cameras in the NBA is old news, the NBA's D-League recently announced a few weeks ago that four teams will be piloting the use of small, wearable devices for tracking player movement and other bio-related indicators.
   
For a hockey fan, the NBA’s utilization of "big data" collection and analysis techniques is a woeful reminder of just how far behind hockey analytics are relative to its peers. How to define big data? For tech companies, the phrase is generally used to describe data whose size exceeds the CPU memory of traditional databases and analytics tools – a bit of a silly definition, since this just means today’s big data analysis tools are tomorrow’s data analysis tools.  Perhaps a better definition comes from the 2013 Financial Times Book of the Year finalist "Big Data”, which suggests the revolution is not behind the tools themselves. While increasingly sophisticated business intelligence and analytics tools help to make big data analysis more economically feasible for businesses and individual statisticians alike, the real revolution is in the world’s shifts towards capturing far more streams of data – all the data – and empirically analyzing it. 

The NBA’s pioneering usage of machine-generated data (as opposed to human-generated data from sources such as emails, photos, tweets etc.) highlights what I believe will be the most common way sports will expand to harness new data sources. While POINTWISE and The Hot Hand will spark discussions at Sloan around additional use cases for basketball, the papers should additionally spark discussions around the use of optic-tracking camera in other fluid, fast-paced sports - i.e., hockey. 

The Next Wave of Hockey Datafication


If I were a betting man, the hockey analytics panel at Sloan will only briefly touch on the idea and potential of optic-tracking cameras for the NHL and hockey, should the question be raised. Based on the panel description, the discussion will more so focus on expanding adoption and usage of currently hockey analytics tools (such as Corsi, Fenwick and PDO helped popularized by Behind the Net and Extra Skater) among hockey decision makers. However, the usage of machine-generated data to create the hockey equivalent of EPV model with 'big data' is an intriguing idea for the NHL to theoretically pilot.

The idea of an EPV isn't new. At Sloan last year, a research paper presented a methodology, dubbed Total Hockey Rating (THoR), that aimed to evaluate NHL players based on the idea that each player contributed to the probability of that a goal is scored and prevented. Of course, the study that suggested Tyler Kennedy was the third most valuable player in the NHL from 2010-2012 should raise some eyebrows. While THoR helps to push forward hockey analytics by presenting sound judgment in its methodology, its usage of hits methodology reflects the real weakness facing hockey analytics – the stats that are easily measurable don’t necessarily reflect what a player's true value could be away from when and where a notable hockey play happens that machine-oriented data could easily replicate.

Like in POINTWISE, an EPV algorithm leveraging machine-oriented data could be devised for hockey that tracks the probability of scoring for every moment a player enters the offensive zone (or if a player has the puck any zone if Ondrej Pavelec is in net). EPV additionally presents a framework to calculate the value of “entry passes, dribble drives and double-teams” in basketball; in hockey, the same could be done for evaluating pieces of hockey strategy be it dump and chases, shot selection and line chemistry (I’m looking your way, Chris Kunitz). One could imagine a world where a trail blazing hockey coach plug players into a zone model similar to the half court model provided by POINTWISE co-author Kirk Goldsberry in this Grantland piece and, leveraging billions of rows of machine, calculate success probabilities of player selection and formation on power plays and penalty kills. Models for the antithesis of EPV – perhaps expected defensive value – could be developed to identify players best at limiting high percentage shot opportunities and creating turnovers. Not only will machine-generated data help overcome the need to use Corsi and Fenwick as proxies for puck possession, but it will additionally help more accurately identify which players correlated with both possession and takeaway ability (not to mention turnover liability).  

While the potential of machine-oriented data to help devise coaching strategy and support GMs in player transaction decisions has seemingly unlimited potential, its application by coaches "on the fly" seems unpractical given the need for reaction prompt decision making (i.e., the home team has an 8 second limit on making line changes in between whistles; the away team has just 5 seconds). The infiltration of iPads and tablets behind hockey benches, as has become common place among baseball managers such as Joe Maddon, is probably around the corner. However, their use will probably be reserved for drawing plays with styluses and streaming instant video replay than plugging in variables to calculate probabilities, which sports such as baseball and football have greater use for given the longer lag time in between plays. 

Data derived from wearable devices as the D-League is testing would additionally help hockey organizations derive new insights on players, given that models to evaluate speed, acceleration, endurance and other bio-related data in basketball are directly transferable to hockey. As noted in Zach Lowe’s article, the collection of bio-related data would raise concerns for the union, with any sort of application of optic-tracking cameras or wearable devices to be meticulously negotiated by Donald Fehr and the NHLPA. However, the discussion of wearable devices, and optic-tracking video cameras for that matter, are still in the future without further buy of more sophisticated analytics among hockey decision makers in general.

Hockey Analytics Today

For the hockey analytics world, the good news is that acceptance and adoption is gradually coming in the sport even if it is lagging its peers. Most notably, the Penguins detailed at a predictive analytics conference in Toronto last year how they’re working with the Sports Analytics Institute to create a player evaluation system leveraging player location and shot probabilities to create predictive systems for goals for/against and lifetime value. The model's first application in practice came in 2011, when the Penguins acquired James Neal from the Dallas Stars in what is arguably one of the most lopsided trades in recent memory. In January, the New Jersey Devils threw their hat into the analytics ring with the announcement of hiring of a Director of Analytics that will report directly to Lou Lamoriello (although it remains to be seen how or if the old-school Lamoriello will leverage the person ultimately hired for the position).

While possession stats are useful to help hockey fans better evaluate their favorite NHL teams and players (with their application in lower AHL, NCAA and junior levels providing a new opportunity to scout players), it’s still early days in the datafication of hockey and its acceptance in hockey circles. 'Intangibles’, ever a point of contention in social media fights between analytical and old school hockey types, should highlight an opportunity for hockey analytics to create new models in favor of arguing with old school thought. As papers such as the Hot Hand suggest, not every old notion should be dismissed where "advanced statistics" fail to exhibit any correlation. Rather, all that could be needed is more data to suggest a correlation does exist.

Take New York City for example, which has embraced big data analysis techniques to identify illegal, over occupancy buildings that are more prone to deaths in the case of fires  Highlighted in the aforementioned "Big Data" book, New York City’s first “Director of Analytics” Mike Flowers highlights an exchange with a senior fire chief concerning an apartment with multiple red flags based on his team's algorithm, with the senior fire chief's gut claiming a building was likely passable because the brick exterior was new. Instead of brushing off the old guard’s hunch and sticking with the team's existing algorithm, Flowers’ team took note of the senior chief’s insight and quantified brick exterior investments through city building permits.

Ultimately, the datafication of hockey presents an exciting opportunity for blogs like this one to grow with the infusion of hockey related data both on and off ice, and maybe get a hockey decision maker or two to listen along the way. While currently available analytics help to deliver new deeper insights for evaluating player and team performance than +/-, it is important to realize their limitations and develop methodologies to better analyze "the coolest game on Earth". 

At the very least, it's more entertaining than watching the Sabres for the foreseeable future.