Kyle Boddy in a Nutshell

This isn’t the grand return I would’ve hoped for–and I’m a little behind on reporting this–but over at The Book blog, Tango asks readers to leave a comment with their feelings about FanGraphs (mine are already known). Kyle Boddy dips his toes into the water, saying:

This is a great example of an article that is very misleading:

The author has no experience in gambling markets, and the commentators are even worse. 99.9% of what is written here is pure drivel and/or speculation.

The internet in a nutshell responds:

The guy who has spent years putting his name on discredited and ridiculous articles about pitching mechanics making summary judgments without any supporting evidence of someone else’s work. When asked for supporting evidence on Twitter, you say you don’t have time to offer up any kind of reasoning for your claims, so you link to a blog post that offers no reasoning for your claims.

You are exactly the kind of critic that Tango is rightfully railing against.

Anyone who has read Praiseball Bospectus before should know how I feel about that. Care to respond, Kyle?

Talk about irony, anonymous user.

I’ll allow myself this satisfaction: Will I be seeing you at the multiple conventions I’m being paid to speak at this year about training pitchers? Or will you be sitting in on the discussions I’ve had with front office executives?

I don’t think he understands irony, but whatever. Someone show me a front office talking to Kyle Boddy about mechanics and I’ll show you a, uh… front office that is doing a bad job? I’m a little rusty with my zingers, I’m no “internet in a nutshell”.

If all it takes to become a mechanics expert that people pay attention to is to write confusing articles on the Hardball Times, I’ve clearly taken the wrong path in life. I’ll expand on this topic in my next post, “How Dylan Bundy’s Kinetic Load Increases Torque and Humeral Rotation”.

S-A-B-E-R-M-E-T-R-I-C-S, Sabermetrics

Deadspin has video of a wonderful scene from this year’s Scripps National Spelling Bee. Here’s a recap.

Emma Ciereszynski: Hi.

Judge: Hi. “Sabermetrics”.

E.C.: Sabermetrics. Can I please have the definition?

Judge: The statistical analysis of baseball data.

E.C.: May I please have the language of origin?

Judge: It’s from an English acronym, plus a Greek-derived English part.

E.C.: Sabermetrics. S-A-B-E-R-M-E-T-R-I-C-S. Sabermetrics.

[Exeunt, to wild applause]

Can we agree that the matter is now settled once and for all? To the average person, there isn’t a more credible arbiter of spelling than the national spelling bee. I generally don’t care about the “correctness” of speech or spelling, but there is literally not one reason to British-ize the word sabermetrics.

So, to Tango and his Tangettes: please, I beg you, listen to reason.

What Is Stupidity? And Which Writers Flaunt It?

Looking to make a quick buck, Bradley Woodrum finally asks the question that we’ve all skipped over, “What is sabermetrics?” Wait, actually, that question has been asked lots of times before. There are whole manifestos dedicated to answering it. As far as short answers go, it’s hard to top Bill James’s: “the search for objective knowledge about baseball.”

Woodrum–without bothering to mention James, the creator of the term sabermetrics–gives us this, instead:

f(\text{Sabermetrics}) = \text{statistics} + \text{scouting} + \text{business} + \varepsilon

where \varepsilon is “anything yet-known or missed by myself.”

I can’t wrap my head around the idea that sabermetrics is equal parts statistics, scouting, and business with a dash of anything else. It’s just so stupid. You realize Woodrum got paid for writing this, right? By his logic, nearly any thinking related to baseball counts as sabermetrics. Who’s to say this random tweet I just found (“Hopefully the #Dodgers get a new owner soon. Mccourt sux ass”) isn’t sabermetric? It does pertain to the business of baseball, after all.

Also, Woodrum needs to refresh himself on how functions work. Sabermetrics isn’t an input for statistics, scouting, business, and anything else; it’s the other way around, sort of.

Woodrum then turns his attention to categorizing each Major League front office by the number of “branches” of sabermetrics they… use? Is use the correct word here? The whole idea of branches of sabermetrics, plus separating teams by these branches, is so stupid it’s making my brain melt. Nevertheless, I’ll stick with use.

Every team uses at least one branch, though only the Dodgers use one measly branch. That makes sense, though, since McCourt “sux ass.” The rest of the league is pretty evenly split between two, two-to-three, and three branches of sabermetrics, though Woodrum doesn’t explain how he made these distinctions or even what branches each team uses. He doesn’t explain what exactly using two-to-three branches means, either.

Ultimately, the problem with this article stems from Woodrum’s writerliness. Without writerliness, there cannot be good writing. That’s just a fact of life. And as I can tell, writerliness can be broken down into three components: creativity, perseverance, and persuasion and other as-yet unknown elements of writing.

So, let’s use this three-component breakdown to redefine FanGraph’s list of authors. If we look at the authors and analyze where their articles seem to come from, whether they’re coming from just creativity and persuasion or all three components, we get this:

3 Component Writers

  • Dave Cameron
  • Carson Cistulli
  • David Appelman
  • Paul Swydan
  • Matt Klaassen
  • Tommy Rancel*
  • Noah Isaacs
  • Josh Weinstock
  • Ryan Martin
  • Jonah Keri
  • Patrick Newman
  • Mitchel Lichtman
  • Pizza Cutter
  • Maury Brown
  • Seth Samuels
  • R.J. Anderson
  • Dayn Perry
  • Brian Cartwright
  • Sean Smith
  • Dave Studeman

2-to-3 Component Writers

  • Mike Newman
  • Wendy Thurm
  • Chris Cwik
  • David Laurila
  • Ryan Campbell
  • Matthew Carruth
  • Jeff Zimmerman*
  • Josh Goldman
  • tangotiger
  • Michael Lee
  • Erik Manning
  • Graham MacAree

2 Component Writers

  • Alex Remington
  • Jack Moore
  • Eno Sarris
  • Mike Axisa
  • Marc Hulet
  • Jim Breen
  • Jesse Wolfersberger
  • Dave Allen
  • Jason Roberts*
  • Reed MacPhail
  • Joe Pawlikowski
  • David Golebiewski
  • Ben Nicholson-Smith
  • Joshua Maciel
  • Bryan Smith*
  • Pat Andriola
  • Steve Sommer
  • Sky Kalkman

1 Component Writers

  • Steve Slowinski
  • Matt Swartz
  • Eric Seidman
  • Albert Lyu
  • Brandon Warne**
  • Lucas Apostoleris
  • Zach Sanders
  • Frankie Piliere
  • Niv Shah

0 Component Writers

  • Bradley Woodrum

* I’m especially unsure about these authors.

** Given Warne’s empathy problems, he is the only writer who appears to be without common sense. This may have changed recently.

I would like to reiterate that these are merely my perceptions (coupled with a poll of anonymous Major League scouts). If you are one of these authors and I have gotten your placement wrong, please feel free to alert me to the oversight. Naturally, I weigh an actual writer’s perspectives more heavily. Can I have my money now?

Sabermetric Ministry of Truth

Tangotiger on January 4, 2012:

Nate Silver had a headline that read: Why I’d Bet on Santorum (and Against My Model)

I thought: NO!!!! Nate, why? Why?

The reason should be clear: why HAVE a model that includes all the parameters you deem relevant, if you then throw away the model if you don’t like the results?

So, I really wish that those people who have forecasting models to NOT hedge their bets here. Either you have a model or you don’t.

Tangotiger on February 4, 2012:

Brian doesn’t blindly follow his off-the-wall forecast. Good for him.

You got that, everyone? Don’t disregard your model if you don’t like the results and don’t put stock in the crazy forecasts from your model.

Yo’ Saberist


The mocking the saberist is fair game if he chooses to spend an obscene amount of time on this stuff, like we do.

The basement thing is so not funny though. Who actually laughs at that? I’d like to hear a good Yo Saberist joke. I’ll laugh at those.

Here’s the best I got:

  • Yo’ saberist is so sycophantic, he uses “saberist” instead of “sabermetrician”.
  • Yo’ saberist is so stupid, when he redid SIERA, a term flipped signs.
  • Yo’ saberist is so shy, he worked for the Cardinals but never got yelled at by Tony La Russa.
  • Yo’ saberist is so lame, he lives in his father’s basement.
  • Yo’ saberist is so stupid, he thinks the WARtheFramework can be improved upon.
  • Yo’ saberist is so hypocritical, he bets against his model.
  • Yo’ saberist is so dumb, he thinks pitchers have literally zero control over balls in play.
  • Yo’ saberist is so illogical, he whines about tone.

I’d love to read your best suggestions. I hope others feel that way, too.

You’re Wrong; No, You Are!

As I’ve mentioned before, the comments section of the Hardball Times is a barren wasteland. But Matt Swartz’s latest treatise on being an idiot with a stats software package attracted some controversy, mostly because he’s an idiot with a stats software package. I’ve archived the SABR fight in case the comments disappear as things sometimes do on THT. And I’ve even highlighted the best parts.

Mike Fast:

The community has shown with certainty that there is little difference between pitchers? I would say that my study of HITf/x data indicated exactly the opposite.

And similarly for team defensive efficiency, a large portion of it is due to how hard the team’s pitchers allow the ball to be hit.

Single-year BABIP is a crude measure of pitcher skill, and it’s leading you to conclusions about the game of baseball that are very wrong.

Matt Swartz:

I’m not coming to any wrong conclusions. I don’t know what you think I’m doing with single season BABIP, but it’s not leading myself to wrong conclusions.

There IS little difference relative to the difference between pitchers in strikeout rate, which is why it takes more than a season to stabilize.

What your study showed was that how hard balls are hit is persistent, and that it is correlated with BABIP. It didn’t widen the spread of pitcher BABIP skill levels in the MLB, which is and always has been minimal compared to the spread in strikeout rates.

I find your comment about “leading you to conclusions about the game of baseball that are very wrong” to be fantastically indicative that you haven’t really read and understood this or anything else I’ve written on the topic of pitcher BABIP. If you did, you could certainly understand your own findings better, and you’d know they aren’t contradictory.

The reason that single season BABIP is a crude indicator of pitcher skill is sample size. The variance in an individual’s BABIP skill level due to randomness is going to be about [.21/sqrt(number of batted balls)]. Knowing this, we can actually pin down that about 75% of single season BABIP variance is due to luck for pitchers with >=150 IP. The rest of it comes down to know the other 25%. We know that regressing team BABIP by the same process would yield another 13% of the variance in BABIP, which means that there is 12% for pitching.

Using single season BABIP to understand that 12% will due a pretty poor job. However, using peripherals and running a regression as I have will eliminate a lot of that noise. In fact, you can explain about 10.4% of that 12% by knowing peripherals. What your study likely did is duplicated some of the effort in understanding the first 10.4% (hard hit balls or correlated with peripherals; check your data, I’m sure it’s true) and supplemented a good portion of the remaining 1.6%.

In other words, nothing you found negates anything I’ve found at all. You’ve come up with a way to use propietary data effectively. Unless you have that available, using peripherals does a pretty good job. I can’t even imagine what it is that you disagree with here, or what you think I don’t understand.


I’m not disputing your statistics. I’m disputing your conclusions about the game of baseball.

“What your study showed was that how hard balls are hit is persistent, and that it is correlated with BABIP. It didn’t widen the spread of pitcher BABIP skill levels in the MLB, which is and always has been minimal compared to the spread in strikeout rates.”

Right. But I did show that BABIP is a poor way to measure pitcher skill. We sorta knew that already, but some people had taken the BABIP findings to mean that pitcher skill was also minimal. I established that that conclusion from the evidence was wrong.

You are correct that strikeout rate picks up some of the hard-hit ball skill that pitchers have. However, it does not pick up nearly all of it.

Moreover, batted ball categories are pretty good at picking up vertical launch angle effects, but they are lousy at picking up how hard the ball is it.

So your regressions are still missing some pretty important data.

Yes, the ways we have found to measure that data so far are proprietary. That doesn’t mean that we shouldn’t learn about the reality of baseball from that data and let that effect how we frame questions, though. I would certainly wonder why BABIP doesn’t better reflect how hard the ball is hit.

I found that almost half of team BABIP was due to how hard the ball was hit. So when you say it’s 12 percent pitching skill, that’s what I’m disputing. You could say that you can only detect that 12 percent of the team BABIP is due to the pitchers, but it’s a leap of logic to say that you’re looking at pitching SKILL there. And HITf/x data indicates in fact that you are not.

Also, I don’t understand why you insist on looking at single-season pitcher/team BABIP to determine that number. It is simpler to calculate, but it’s deceptive. Being rooted to single-season numbers is one of the big failings of modern sabermetrics.


Which of my conclusions about the game of baseball do you dispute?

You found that how hard a ball is hit is highly correlated. This is a self-contained statistic that is only useful inasmuch as it can teach you about singles, doubles, triples, home runs, outs, and errors. It doesn’t do me any good to know the statistic otherwise, except for how it relates to outcomes that affect games. So BABIP is a logical skill to try to infer from how hard a ball is hit, and your numbers do a nice job of hitting on that.

I think when you say “half of BABIP was due to how hard the ball was hit,” you’re either using same year data or R instead of R^2 or doing both. I’m guessing you’re doing correlations, while I’m doing R^2.

But if it’s just same year data, you’re including luck in terms of how hard a ball was hit (of course pitchers will deviate around their true talent rate in this category as well). That doesn’t measure skill. That measures outcomes.

My regressions are not intended to be the end-all summary of a pitcher’s true BABIP skill. They pick up about 80% of the possible variance that could exist in BABIP skills.

Since this seems to be a point of contention—how much variance in true BABIP skill there is to find—I’ll prove to you that R=0.5 or even R^2=.25 is insane for one season of data.

Take all pitchers with 150 IP or more in a single season from 2003-2011. They average 592 BIP. There true BABIP skill is about .30, give or take, so the variance in luck HAS to be .21/592 for the average pitcher in this group. It’s impossible binomially for that not to be true. That’s a random variance if .000354. The actual variance in BABIP for that same group is .000457. That means randomness HAS to explain 77% (last time I got 75% but same diff)! I don’t know how much you think is team defense, but you’re it’s not 0%. If you look at how much variance is explainable by defense seriously, it’s about 13%. That’s just regressing the data.

So my original 12% number is the maximum explainable by differences between pitchers. That’s not what my regressioun found. That was 10.4%. Obviously give or take here or there, but you get the point. Most of it is explained by peripherals.

And just because you’re saying I’m looking at single-season numbers to prove that point, that has nothing to do with the implications of that 12%. The 12% means the standard deviation is pitcher skill is about .007 of BABIP. It can’t be much greater than that, and it has nothing to do with choosing a single season. The same analysis on careers or half seasons or whatever would give you about the same conclusion. I look at single-season because it’s the easiest to run these tests on quickly.

So what exactly do you think are my wrong conclusions? Where in that description of variance will you determine that BABIP skill level has a higher spread than about .007, and where as about .005 or .006 can be explained by a regression on peripherals, tell me what’s wrong here. If you want to say there is value in the last .001 or .002, great, keep at it. It may only be attainable with propietary data, and good for you if you can use it to your advantage. But nothing that I have found here is wrong.

And there it ends, without even a snide remark from Fast on Twitter. I feel like Matt Swartz took Nate Silver’s Baseball Prospectus columns a little too much to heart.

Re-Explaining DIPS, Vol. 1

In this new series, I will highlight one of my least favorite SABR tropes, starting off an article by name-dropping Voros McCracken and explaining DIPS. First off, Matt Swartz in his recent, three-tabled article, “Adjusting defense efficiency by the quality of pitching“:

Fausto Carmona throws a hard sinker on the outside corner, but Ichiro Suzuki turns it into a well-struck ground ball by going the other way, splitting the defenders on the left side of the diamond. We know who should get credit for the single on the Mariners’ side of the box score—there was only one guy with a bat. But who on the Indians will take the blame for the single? Is it Carmona who made the pitch, or the defenders who could not get to the ball fast enough?

Bill James invented Defensive Efficiency, measuring the percentage of balls in play that a defense turns into outs. It became apparent just how useful this would be for evaluation of team defense when Voros McCracken famously concluded that, “There is little if any difference among major-league pitchers in their ability to prevent hits on balls hit in the field of play.” A natural corollary to this thesis says that to measure team defense, one should use Defensive Efficiency rate.