Skip to content

Estimated Impact

September 16, 2013

jordan-vs-lebron

While I initially introduced my player rating, Estimated Impact, a while ago, I recently gave it a pretty extensive makeover and I’m generally pleased with results. My goal was basically to create a box score only metric that is a) more accurate and better at predicting future team performance than other box score metrics, and b) a very reasonable snapshot of present and historical player performance. Obviously, box-score only metrics have their noted limitations. And maybe more obviously, all-in one metrics are merely a two-dimensional picture of a three-dimensional world. But I’m confident that Estimated Impact does a good job at accomplishing my goals.

I don’t want to bore you with the details so I’ll keep it basic. This metric is based on a regression of certain box score elements against long term RAPM. For each season, I fit each player’s result to the team’s efficiency differential (this is usually a pretty small adjustment).  I did the same with offense only and defense only, then fit off + def to total impact. For 1974-1977 I estimated turnovers by regression and used the same formula as I used in post-77 seasons. For pre-1974, I ran new regressions without any use of steals, blocks, etc. Needless to say, pre-74 estimates are probably less accurate. They’re certainly far from useless though, and they’re probably superior to any other measure of pre-74 production (e.g., WS/48 or PER).

In my own retrodiction testing, Estimated Impact outperforms every box score metric that I know of, and performs nearly as well as 2000s xRAPM [edit: when adjusted to give a fixed rating to very low mp players, estimated impact significantly outperforms xRAPM in retrodiction testing as well] (I’d welcome anyone to reproduce my results). And so I’m confident that it is currently the best all-in-one metric with respect to estimating historical production. For what it’s worth, the results also seem very reasonable to me. You can see all results from 1952-2013 here, or at the NBA & NCAA Stats page at the top right of the site. Or you can download the database here. Enjoy!

-James

Advertisements
23 Comments leave one →
  1. Nathan permalink
    September 17, 2013 5:12 pm

    This is awesome. A couple things:

    a. In the database, the “total rating” for all the players below line 1426 is -3.5. I assume this is a glitch of some sort.

    b. Do you have a similar database for college basketball? If you do but don’t wish to share it, I totally understand.

    Really great stuff. If I follow through with my plan to make a draft rater for next year this will be a huge help.

  2. September 17, 2013 6:24 pm

    Thanks, Nathan. I’m glad it’s useful to you. To answer your questions, I just made everyone who played basically garbage minutes a -3.5 so they didn’t interfere with the rankings, and I don’t have a similar database for college basketball, largely because it’s hard to find and collect historical college stats.

  3. Nathan permalink
    September 18, 2013 3:51 pm

    Hmmm, I guess that makes sense then, although some players seem to be excluded because they played <100 postseason minutes, in spite of playing 10,000+ regular season minutes. Impact should be fine for my purposes though, and you have that for all players.

    Not looking forward to the data acquisition side of this whole thing. Luckily the apbr forums seem to have some good resources on that…

    Unrelatedly, I noticed that the 2000-01 RAPM page doesn't load. This might just be me…but all the other ones work fine.

  4. Mike Goodman permalink
    September 20, 2013 6:27 am

    Great stuff, James! I’ve downloaded and done some playing with the database, in Excel.
    What’s the meaning of the final column (Rating*) in the Career sheet?
    Is “impact” an estimated +/- relative to 100 possessions?

    • September 20, 2013 9:04 am

      Mike,

      Thanks. Impact is indeed estimated effect on the point margin per 100 possessions. The rating thing is just basically a combination of career impact, career wins (for longevity), and career playoff wins (for postseason success) that I tried to scale onto the same scale as impact. It’s basically an attempt to give snapshot of career achievement.

  5. Mike Goodman permalink
    September 20, 2013 10:28 am

    It looks like no consideration is given to ABA seasons. Julius is down there at #38 (behind Ginobili), and Artis is nowhere to be seen.
    I’ve done year by year conversions of ABA to NBA numbers. Enough players jumped leagues to make it translatable. In 1968, the ABA was pretty minor league; by ’71 or ’72, they’d largely closed the gap, in terms of competition level.

    When I multiply your playoff wins by 8 or 10, and add them to regular season wins, I get a ranking close to what you have.
    Karl Malone ranks 3rd in straight RS + PO wins. At RS + PO*10, he ranks 9th.

    • September 20, 2013 1:44 pm

      Yeah, I haven’t taken ABA numbers into account at all, but that is something I will consider doing in the future. I’d be interested to see what you did to convert the numbers.

  6. September 23, 2013 3:28 pm

    Really great stuff, James.

  7. Mike Goodman permalink
    September 24, 2013 3:25 am

    James, on closer inspection, it seems you have a playoff multiplier that reduces a player’s rating to zero (or -3.5), regardless of his regular seasons.
    David Lee has just 65 postseason minutes, and he is “zeroed out”, ranking among the scrubs who barely played.

    Also, what stats elevate Bowen and Battier into the top 200, Eaton as top 150, Majerle among the top 100, or Ben Wallace in the top 50?

    • September 24, 2013 12:30 pm

      Mike, I’ll try to fix that issue with guys who played lots of regular season minutes but not postseason and repost the database.

      As for the people you named, a lot of them seem like defensive guys so defensive numbers probably helped most of them. Bowen and Majerle, I think, played a lot of playoff minutes, which gave them the opportunity to contribute in the playoffs, no doubt boosting their career ratings. Battier played a lot of career minutes and prob had good enough defensive numbers to make it there. And Wallace and Eaton had fantastic defensive numbers (e.g., blocks and drebs, and steals for BW).

  8. Nathan permalink
    September 25, 2013 10:30 am

    How do you go from “impact” to “wins”? It doesn’t seem to be a linear k*impact*minutes formula like i would expect. What am I missing?

    • Nathan permalink
      September 25, 2013 12:03 pm

      Never mind, this definitely makes sense. A player with zero impact (or zero plus/minus) still contributes wins. A team with nothing but zero-impact players will obviously win more than zero games in a season.

  9. October 13, 2013 4:18 am

    James- Interesting stuff. Is the actual formula/weights posted anywhere? Details not boring.

  10. Nathan permalink
    October 18, 2013 3:43 pm

    I know this was primarily intended as a tool for historical comparisons, but would it be feasible to post stats for the coming season on a regular basis? Even once a month would be very useful if it’s at all possible.

    And of course, I second Andrew…details certainly not boring.

  11. October 22, 2013 5:22 pm

    Andrew and Nathan:

    I will be updating estimated impact for the 2014 season (probably on a weekly basis) barring external circumstances. And re the actual formulas/calculations, that’s just not something I’m going to make public at this point in time, but may well do so sometime in the future.

  12. January 11, 2014 4:52 pm

    Hey James,

    I just found this, and it looks awesome. I love that it facilitates comparison between eras.

    How are you accounting for defense in this? I’m suspicious of any box-score metric that claims to capture defensive impact. It seems like a big deal if you want to compare the win-contributions of Bill Walton ’77 to Garnett ’08 (or really any player that had a significant impact on that end of the court).

    • January 11, 2014 5:15 pm

      Thanks, Jorge.

      Yeah, defense is obviously a huge issue with any box score only metric, and estimated impact is no exception – it is far from perfect on defense. That said, the box score information does do a decent job (especially when you adjust for team defensive impact), and I think my metric does a much better job than most other box score metrics.

      So, as you said, guys with huge on court impacts on defense (like Garnett & Walton but more so excellent perimeter defenders) are likely underrated, and others (maybe guys who get lots of steals but aren’t good at staying in front of their man) are probably overrated because defensive information in the box score is so limited .

  13. Randy Marsh permalink
    May 6, 2014 1:19 am

    Sorry to spoil all the fun but what you did has nothing to do with retrodiction because you’re using in-sample-data

    • May 6, 2014 11:38 am

      Huh? If you’re referencing the comparison to xrapm, yeah that is in-sample, but so is xrapm…I did out of sample testing as well

  14. Randy Marsh permalink
    May 6, 2014 12:20 pm

    Indeed, xRAPM is in-sample as well. That, though, only means that one should use neither metric for retrodiction because you’re just fooling yourself and others.

    I know others (e.g. Neil Paine) have done some ‘retrodiction’ tests with xRAPM but you can’t really use what’s posted on http://stats-for-the-nba.appspot.com/ to do retrodiction.

    Anyway, just because xRAPM uses in-sample data as well doesn’t mean that you actually did retrodiction in a correct way – and results of botched ‘retrodiction’ tell us nothing

    • May 6, 2014 12:41 pm

      There’s no other real way to compare the metrics and I just wanted some frame of reference, but you’re right there is not a whole lot of use in ‘retrodicting’ in sample. Like I said, I looked at out of sample results vs other metrics (ie other seasons).

      Regardless, I’m not claiming estimated impact is the best metric around or that it’s better than rapm variants. I just wasn’t satisfied with other public box score metrics so I wanted to create one with generally satisfying results and reasonable predictive value.

      • Randy Marsh permalink
        May 6, 2014 1:05 pm

        You’re right there’s no real way to compare metrics, which is really a shame. People in the speech recognition field can just withhold data from a dataset they generated themselves and then have contestants give their ‘best guess’ on out-of-sample data. This cannot be done in the NBA because all the data is publicly available instantly.
        Anyway, keep on keepin’ on

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

%d bloggers like this: