Kyle Boddy in a Nutshell

This isn’t the grand return I would’ve hoped for–and I’m a little behind on reporting this–but over at The Book blog, Tango asks readers to leave a comment with their feelings about FanGraphs (mine are already known). Kyle Boddy dips his toes into the water, saying:

This is a great example of an article that is very misleading:

http://www.fangraphs.com/blogs/index.php/vegas-still-isnt-buying-the-orioles/

The author has no experience in gambling markets, and the commentators are even worse. 99.9% of what is written here is pure drivel and/or speculation.

The internet in a nutshell responds:

The guy who has spent years putting his name on discredited and ridiculous articles about pitching mechanics making summary judgments without any supporting evidence of someone else’s work. When asked for supporting evidence on Twitter, you say you don’t have time to offer up any kind of reasoning for your claims, so you link to a blog post that offers no reasoning for your claims.

You are exactly the kind of critic that Tango is rightfully railing against.

Anyone who has read Praiseball Bospectus before should know how I feel about that. Care to respond, Kyle?

Talk about irony, anonymous user.

I’ll allow myself this satisfaction: Will I be seeing you at the multiple conventions I’m being paid to speak at this year about training pitchers? Or will you be sitting in on the discussions I’ve had with front office executives?

I don’t think he understands irony, but whatever. Someone show me a front office talking to Kyle Boddy about mechanics and I’ll show you a, uh… front office that is doing a bad job? I’m a little rusty with my zingers, I’m no “internet in a nutshell”.

If all it takes to become a mechanics expert that people pay attention to is to write confusing articles on the Hardball Times, I’ve clearly taken the wrong path in life. I’ll expand on this topic in my next post, “How Dylan Bundy’s Kinetic Load Increases Torque and Humeral Rotation”.

S-A-B-E-R-M-E-T-R-I-C-S, Sabermetrics

Deadspin has video of a wonderful scene from this year’s Scripps National Spelling Bee. Here’s a recap.

Emma Ciereszynski: Hi.

Judge: Hi. “Sabermetrics”.

E.C.: Sabermetrics. Can I please have the definition?

Judge: The statistical analysis of baseball data.

E.C.: May I please have the language of origin?

Judge: It’s from an English acronym, plus a Greek-derived English part.

E.C.: Sabermetrics. S-A-B-E-R-M-E-T-R-I-C-S. Sabermetrics.

[Exeunt, to wild applause]

Can we agree that the matter is now settled once and for all? To the average person, there isn’t a more credible arbiter of spelling than the national spelling bee. I generally don’t care about the “correctness” of speech or spelling, but there is literally not one reason to British-ize the word sabermetrics.

So, to Tango and his Tangettes: please, I beg you, listen to reason.

What Is Stupidity? And Which Writers Flaunt It?

Looking to make a quick buck, Bradley Woodrum finally asks the question that we’ve all skipped over, “What is sabermetrics?” Wait, actually, that question has been asked lots of times before. There are whole manifestos dedicated to answering it. As far as short answers go, it’s hard to top Bill James’s: “the search for objective knowledge about baseball.”

Woodrum–without bothering to mention James, the creator of the term sabermetrics–gives us this, instead:

f(\text{Sabermetrics}) = \text{statistics} + \text{scouting} + \text{business} + \varepsilon

where \varepsilon is “anything yet-known or missed by myself.”

I can’t wrap my head around the idea that sabermetrics is equal parts statistics, scouting, and business with a dash of anything else. It’s just so stupid. You realize Woodrum got paid for writing this, right? By his logic, nearly any thinking related to baseball counts as sabermetrics. Who’s to say this random tweet I just found (“Hopefully the #Dodgers get a new owner soon. Mccourt sux ass”) isn’t sabermetric? It does pertain to the business of baseball, after all.

Also, Woodrum needs to refresh himself on how functions work. Sabermetrics isn’t an input for statistics, scouting, business, and anything else; it’s the other way around, sort of.

Woodrum then turns his attention to categorizing each Major League front office by the number of “branches” of sabermetrics they… use? Is use the correct word here? The whole idea of branches of sabermetrics, plus separating teams by these branches, is so stupid it’s making my brain melt. Nevertheless, I’ll stick with use.

Every team uses at least one branch, though only the Dodgers use one measly branch. That makes sense, though, since McCourt “sux ass.” The rest of the league is pretty evenly split between two, two-to-three, and three branches of sabermetrics, though Woodrum doesn’t explain how he made these distinctions or even what branches each team uses. He doesn’t explain what exactly using two-to-three branches means, either.

Ultimately, the problem with this article stems from Woodrum’s writerliness. Without writerliness, there cannot be good writing. That’s just a fact of life. And as I can tell, writerliness can be broken down into three components: creativity, perseverance, and persuasion and other as-yet unknown elements of writing.

So, let’s use this three-component breakdown to redefine FanGraph’s list of authors. If we look at the authors and analyze where their articles seem to come from, whether they’re coming from just creativity and persuasion or all three components, we get this:

3 Component Writers

  • Dave Cameron
  • Carson Cistulli
  • David Appelman
  • Paul Swydan
  • Matt Klaassen
  • Tommy Rancel*
  • Noah Isaacs
  • Josh Weinstock
  • Ryan Martin
  • Jonah Keri
  • Patrick Newman
  • Mitchel Lichtman
  • Pizza Cutter
  • Maury Brown
  • Seth Samuels
  • R.J. Anderson
  • Dayn Perry
  • Brian Cartwright
  • Sean Smith
  • Dave Studeman

2-to-3 Component Writers

  • Mike Newman
  • Wendy Thurm
  • Chris Cwik
  • David Laurila
  • Ryan Campbell
  • Matthew Carruth
  • Jeff Zimmerman*
  • Josh Goldman
  • tangotiger
  • Michael Lee
  • Erik Manning
  • Graham MacAree

2 Component Writers

  • Alex Remington
  • Jack Moore
  • Eno Sarris
  • Mike Axisa
  • Marc Hulet
  • Jim Breen
  • Jesse Wolfersberger
  • Dave Allen
  • Jason Roberts*
  • Reed MacPhail
  • Joe Pawlikowski
  • David Golebiewski
  • Ben Nicholson-Smith
  • Joshua Maciel
  • Bryan Smith*
  • Pat Andriola
  • Steve Sommer
  • Sky Kalkman

1 Component Writers

  • Steve Slowinski
  • Matt Swartz
  • Eric Seidman
  • Albert Lyu
  • Brandon Warne**
  • Lucas Apostoleris
  • Zach Sanders
  • Frankie Piliere
  • Niv Shah

0 Component Writers

  • Bradley Woodrum

* I’m especially unsure about these authors.

** Given Warne’s empathy problems, he is the only writer who appears to be without common sense. This may have changed recently.

I would like to reiterate that these are merely my perceptions (coupled with a poll of anonymous Major League scouts). If you are one of these authors and I have gotten your placement wrong, please feel free to alert me to the oversight. Naturally, I weigh an actual writer’s perspectives more heavily. Can I have my money now?

Sabermetric Ministry of Truth

Tangotiger on January 4, 2012:

Nate Silver had a headline that read: Why I’d Bet on Santorum (and Against My Model)

I thought: NO!!!! Nate, why? Why?

The reason should be clear: why HAVE a model that includes all the parameters you deem relevant, if you then throw away the model if you don’t like the results?

So, I really wish that those people who have forecasting models to NOT hedge their bets here. Either you have a model or you don’t.

Tangotiger on February 4, 2012:

Brian doesn’t blindly follow his off-the-wall forecast. Good for him.

You got that, everyone? Don’t disregard your model if you don’t like the results and don’t put stock in the crazy forecasts from your model.

Yo’ Saberist

TangoTiger:

The mocking the saberist is fair game if he chooses to spend an obscene amount of time on this stuff, like we do.

The basement thing is so not funny though. Who actually laughs at that? I’d like to hear a good Yo Saberist joke. I’ll laugh at those.

Here’s the best I got:

  • Yo’ saberist is so sycophantic, he uses “saberist” instead of “sabermetrician”.
  • Yo’ saberist is so stupid, when he redid SIERA, a term flipped signs.
  • Yo’ saberist is so shy, he worked for the Cardinals but never got yelled at by Tony La Russa.
  • Yo’ saberist is so lame, he lives in his father’s basement.
  • Yo’ saberist is so stupid, he thinks the WARtheFramework can be improved upon.
  • Yo’ saberist is so hypocritical, he bets against his model.
  • Yo’ saberist is so dumb, he thinks pitchers have literally zero control over balls in play.
  • Yo’ saberist is so illogical, he whines about tone.

I’d love to read your best suggestions. I hope others feel that way, too.

You’re Wrong; No, You Are!

As I’ve mentioned before, the comments section of the Hardball Times is a barren wasteland. But Matt Swartz’s latest treatise on being an idiot with a stats software package attracted some controversy, mostly because he’s an idiot with a stats software package. I’ve archived the SABR fight in case the comments disappear as things sometimes do on THT. And I’ve even highlighted the best parts.

Mike Fast:

The community has shown with certainty that there is little difference between pitchers? I would say that my study of HITf/x data indicated exactly the opposite.

And similarly for team defensive efficiency, a large portion of it is due to how hard the team’s pitchers allow the ball to be hit.

Single-year BABIP is a crude measure of pitcher skill, and it’s leading you to conclusions about the game of baseball that are very wrong.

Matt Swartz:

I’m not coming to any wrong conclusions. I don’t know what you think I’m doing with single season BABIP, but it’s not leading myself to wrong conclusions.

There IS little difference relative to the difference between pitchers in strikeout rate, which is why it takes more than a season to stabilize.

What your study showed was that how hard balls are hit is persistent, and that it is correlated with BABIP. It didn’t widen the spread of pitcher BABIP skill levels in the MLB, which is and always has been minimal compared to the spread in strikeout rates.

I find your comment about “leading you to conclusions about the game of baseball that are very wrong” to be fantastically indicative that you haven’t really read and understood this or anything else I’ve written on the topic of pitcher BABIP. If you did, you could certainly understand your own findings better, and you’d know they aren’t contradictory.

The reason that single season BABIP is a crude indicator of pitcher skill is sample size. The variance in an individual’s BABIP skill level due to randomness is going to be about [.21/sqrt(number of batted balls)]. Knowing this, we can actually pin down that about 75% of single season BABIP variance is due to luck for pitchers with >=150 IP. The rest of it comes down to know the other 25%. We know that regressing team BABIP by the same process would yield another 13% of the variance in BABIP, which means that there is 12% for pitching.

Using single season BABIP to understand that 12% will due a pretty poor job. However, using peripherals and running a regression as I have will eliminate a lot of that noise. In fact, you can explain about 10.4% of that 12% by knowing peripherals. What your study likely did is duplicated some of the effort in understanding the first 10.4% (hard hit balls or correlated with peripherals; check your data, I’m sure it’s true) and supplemented a good portion of the remaining 1.6%.

In other words, nothing you found negates anything I’ve found at all. You’ve come up with a way to use propietary data effectively. Unless you have that available, using peripherals does a pretty good job. I can’t even imagine what it is that you disagree with here, or what you think I don’t understand.

MF:

I’m not disputing your statistics. I’m disputing your conclusions about the game of baseball.

“What your study showed was that how hard balls are hit is persistent, and that it is correlated with BABIP. It didn’t widen the spread of pitcher BABIP skill levels in the MLB, which is and always has been minimal compared to the spread in strikeout rates.”

Right. But I did show that BABIP is a poor way to measure pitcher skill. We sorta knew that already, but some people had taken the BABIP findings to mean that pitcher skill was also minimal. I established that that conclusion from the evidence was wrong.

You are correct that strikeout rate picks up some of the hard-hit ball skill that pitchers have. However, it does not pick up nearly all of it.

Moreover, batted ball categories are pretty good at picking up vertical launch angle effects, but they are lousy at picking up how hard the ball is it.

So your regressions are still missing some pretty important data.

Yes, the ways we have found to measure that data so far are proprietary. That doesn’t mean that we shouldn’t learn about the reality of baseball from that data and let that effect how we frame questions, though. I would certainly wonder why BABIP doesn’t better reflect how hard the ball is hit.

I found that almost half of team BABIP was due to how hard the ball was hit. So when you say it’s 12 percent pitching skill, that’s what I’m disputing. You could say that you can only detect that 12 percent of the team BABIP is due to the pitchers, but it’s a leap of logic to say that you’re looking at pitching SKILL there. And HITf/x data indicates in fact that you are not.

Also, I don’t understand why you insist on looking at single-season pitcher/team BABIP to determine that number. It is simpler to calculate, but it’s deceptive. Being rooted to single-season numbers is one of the big failings of modern sabermetrics.

MS:

Which of my conclusions about the game of baseball do you dispute?

You found that how hard a ball is hit is highly correlated. This is a self-contained statistic that is only useful inasmuch as it can teach you about singles, doubles, triples, home runs, outs, and errors. It doesn’t do me any good to know the statistic otherwise, except for how it relates to outcomes that affect games. So BABIP is a logical skill to try to infer from how hard a ball is hit, and your numbers do a nice job of hitting on that.

I think when you say “half of BABIP was due to how hard the ball was hit,” you’re either using same year data or R instead of R^2 or doing both. I’m guessing you’re doing correlations, while I’m doing R^2.

But if it’s just same year data, you’re including luck in terms of how hard a ball was hit (of course pitchers will deviate around their true talent rate in this category as well). That doesn’t measure skill. That measures outcomes.

My regressions are not intended to be the end-all summary of a pitcher’s true BABIP skill. They pick up about 80% of the possible variance that could exist in BABIP skills.

Since this seems to be a point of contention—how much variance in true BABIP skill there is to find—I’ll prove to you that R=0.5 or even R^2=.25 is insane for one season of data.

Take all pitchers with 150 IP or more in a single season from 2003-2011. They average 592 BIP. There true BABIP skill is about .30, give or take, so the variance in luck HAS to be .21/592 for the average pitcher in this group. It’s impossible binomially for that not to be true. That’s a random variance if .000354. The actual variance in BABIP for that same group is .000457. That means randomness HAS to explain 77% (last time I got 75% but same diff)! I don’t know how much you think is team defense, but you’re it’s not 0%. If you look at how much variance is explainable by defense seriously, it’s about 13%. That’s just regressing the data.

So my original 12% number is the maximum explainable by differences between pitchers. That’s not what my regressioun found. That was 10.4%. Obviously give or take here or there, but you get the point. Most of it is explained by peripherals.

And just because you’re saying I’m looking at single-season numbers to prove that point, that has nothing to do with the implications of that 12%. The 12% means the standard deviation is pitcher skill is about .007 of BABIP. It can’t be much greater than that, and it has nothing to do with choosing a single season. The same analysis on careers or half seasons or whatever would give you about the same conclusion. I look at single-season because it’s the easiest to run these tests on quickly.

So what exactly do you think are my wrong conclusions? Where in that description of variance will you determine that BABIP skill level has a higher spread than about .007, and where as about .005 or .006 can be explained by a regression on peripherals, tell me what’s wrong here. If you want to say there is value in the last .001 or .002, great, keep at it. It may only be attainable with propietary data, and good for you if you can use it to your advantage. But nothing that I have found here is wrong.

And there it ends, without even a snide remark from Fast on Twitter. I feel like Matt Swartz took Nate Silver’s Baseball Prospectus columns a little too much to heart.

Re-Explaining DIPS, Vol. 1

In this new series, I will highlight one of my least favorite SABR tropes, starting off an article by name-dropping Voros McCracken and explaining DIPS. First off, Matt Swartz in his recent, three-tabled article, “Adjusting defense efficiency by the quality of pitching“:

Fausto Carmona throws a hard sinker on the outside corner, but Ichiro Suzuki turns it into a well-struck ground ball by going the other way, splitting the defenders on the left side of the diamond. We know who should get credit for the single on the Mariners’ side of the box score—there was only one guy with a bat. But who on the Indians will take the blame for the single? Is it Carmona who made the pitch, or the defenders who could not get to the ball fast enough?

Bill James invented Defensive Efficiency, measuring the percentage of balls in play that a defense turns into outs. It became apparent just how useful this would be for evaluation of team defense when Voros McCracken famously concluded that, “There is little if any difference among major-league pitchers in their ability to prevent hits on balls hit in the field of play.” A natural corollary to this thesis says that to measure team defense, one should use Defensive Efficiency rate.

We Are Truly Heartless

Yesterday, Brandon Warne took a trip down memory lane, reminiscing on his favorite deaths in baseball history. He couldn’t quite remember them all. But the commenters at FanGraphs were helpful, as always.

Brandon Warne says "Absolutely excellent!" to Dernell Stenson's murder

Brandon Warne says "Also a good one" to Donnie Moore's suicide

Brandon Warne says "Also a good one!" to Geremi Gonzalez's death

No word yet how Warne feels about Dave Cameron.

Brevity Is Great

To thwart those hoping to call me a hypocrite, I’ll cut straight to the chase. Matt Lentzner, in his most recent Baseball ProGUESTus column, wastes several hundred words and a whole article to express what I can do in 26: Sample size is important; please include it. And wouldn’t it be cool to have the sample size this stat stabilizes at (according to Pizza Cutter), too?

Wait, I know I can do better. sample size rulez. context iz cool 2. That’s seven words and only 38 characters, leaving plenty of room for sweet hashtags, #imisspizzacutter. If this is your topic for a column on likely the biggest stage in sabermetrics, just sit on it and try for something more interesting and original.

As always, the BP commenters were right on point. skyojohnny chimed in first:

This is one of the most important articles I have ever read in BP and I bet that it will also be one of the most overlooked.

#smh

By the way, it’s bad enough BP couldn’t come up anything more clever than “Proguestus” for their guest writers’ column. That has to be literally the first idea they came up with. To spend time brainstorming names and ending up with something so banal would make me sad. But to capitalize the “guest” is insulting. Because I’m so dumb, I wouldn’t get it otherwise. Thanks, guys.

The Best of MGL, October 2011

Apparently, Major League Baseball had their postseason last month. I was travelling in Treasure Island, Ontario for work and without internet access, so I missed it. A lot of SABR drama flared up while I was gone–too much for me to properly deal with here, unfortunately. The best I can do is publish this compilation of MGL’s best comments from October, 2011, but without any of the snappy backtalk you’ve come to expect from Praiseball Bospectus.

They say that legends are born in October, but only one man can reign over it. MGL easily surpassed 40,000 words (or enough for a short novel) written in comment threads last month. And I’m sure there are even more great MGLian comments not included here that I missed (especially if not posted on The Book Blog). I’d like to add them if you’d be kind enough to post a link in the comments. So, please help me out.

Without further ado, I present the SABR who never sleeps, MGL.

Terrible managing?, October 5, 12:31 AM:

My thoughts are this:

If I could give a manager a piece of paper with the answer to all of these decisions, I would be correct 90%+ of the time and a lot more than the manager would. A lot. I would miss some of the intangibles for sure, but those would pale in comparison to the “numbers” behind my decisions. I would add at least one win a team’s WE, thus I should be paid 5 mil or more…

Worst managing ever?, October 8, 12:17 AM:

#5 and #6, and because you think it, that makes it right? You want to accept that bet also?

I’ll bet you won’t accept that bet. That is because when people who have little expertise on a matter have an opinion on that matter and those opinions are not supported by evidence, they never take those bets. I wonder why?

You see, anything that I posture on this blog, I will always stand by it, because it is almost always based on evidence or my experience, knowledge, or expertise which has been gleaned by evidence. I learned a long time ago that my opinions without a solid base aren’t worth jack…

Worst managing ever?, October 8, 8:00 PM:

Wow, nice folks on BTF. I wonder why I left that site years ago.

What is incredible is how many people there use the result of the plays, the game, and the series to refute my claims.

I wasn’t aware that someone took me up on my bet and then they got to choose their methodology, players, etc.

Before someone takes me up on one of my bets, please do the following:

  1. Say, “I accept your wager.”
  2. Identify yourself and let us know how you intend to pay up if you lose.
  3. Figure out a way to choose a third party to verify or conduct any research that is needed.

If I lose I will gladly pay up and learn something in the process. Someone (who has no idea who I am) on BTF actually said something to the effect, “…because the dickhead never pays up…”

I am doing some research now on the 9th inning thing…

How do great starting pitchers pitch the 4th time through the order?, October 9, 1:24 AM:

So, in the original thread, did I say anything egregious or is the entire post egregious enough to warrant the vitriol on BBTF and even here?

I made the point about Punto, which I think is correct, at least according to my sim, and a 1.5% WE is pretty big. And I have not heard anyone refute that with evidence other than stupid batter/pitcher matchups which we have already discussed (and hopefully put to rest), or the proverbial, “The manager must know something that you don’t know.”

I made the point about the two bunts, which I think are correct, and, again, I have not heard any refutation with evidence. A few people excoriated me for saying that Carp would have been safe anyway, which is debatable (or not) but completely irrelevant to my argument. I simply said that with him running, the bunt win expectancy is going to be very poor compared to the WE from batting. That is because a bunt is always marginal. Throw in a poor/slow runner on the bases such that he is going to get forced a significant percentage of the time, and the bunt is not likely to be correct, even against a very good pitcher. Any arguments there?

I said that bunting with a 2-0 count, when the bunt at the outset of the PA was probably bad, is an egregious error, and I am pretty confident it is, and I have not heard any refutations on that either. Problems with that?

I said that not pinch hitting for Carp in the 8th was bad, but I did not harp on that. I still think the numbers will show that was bad. I admit that probably no manager would have done that – although that does not make it correct. I am strictly speaking of mathematically correct things, and not what would make the manager look good or bad. That is not my job to determine that.

And finally (the 4th or 5th thing I criticized), I said that bringing in Motte was correct. It looks like that may be a tossup, but with Carp not being a top tier starter (according to my projections and others – see ZIPS, Oliver, Pecota, Steamer, etc.) and with Motte being a very good closer, I think that bringing in Motte IS the correct choice, but perhaps marginally so. Again, whether a manager “should” do that is not my business. I am talking strictly numbers.

So why the universal hate, condemnation, criticism, mockery, etc.?

Someone please explain what I did to deserve that? And I am taking about substance and not tone. If someone wants to criticize me for my tone, so be it. I don’t give a hoot about that. Those are all ad hominem arguments anyway for people who sadly have nothing substantive to contribute…

How do great starting pitchers pitch the 4th time through the order?, October 9, 11:09 AM:

It should be fairly easy to figure out what is happening in the 9th, but I can’t do that now.

McCoy, you can do all the speculation and “thinking” you want, and you might be right about conclusions, but without any numbers, they are meaningless. Sorry. Either one strategy or the other yields a greater win expectancy, or it is close in which case I have no problem yielding to the gut, experience, instinct, etc. But, unfortunately, you can’t figure out the answer without “running the numbers.”

How not to do a study, October 10, 1:24 AM:

Tango, why do you bother? You are infinitely more patient and kind than I am!

Worst managing ever?, October 10, 6:53 PM:

What I find especially outrageous, almost scandalous, is that someone could actually write, as in #72, that I said, “TLR is a terrible manager because he let Carpenter pitch the 9th, ” when the title of the post was hyperbolic, as titles or headlines often are, and that the last example of several gave of Tony’s mistakes was leaving Carp in to pitch the 9th. I suppose a loose characterization of my post is, “TLR is a terrible manager because (insert anything I happened to mention in the post),” but in my opinion that is classic spin, mischaracterization, taking words out of context, etc in order to launch an ad hominem attack and deflect attention from the issues at hand and is otherwise totally uncalled for.

Worst managing ever?, October 11, 9:37 AM:

Yes, I may have been wrong about the difference between an elite starter in the 9th who has been pitching extremely well and a closer. Can we put that to rest and move on?

Regardless of how assured I make myself out to be on various issues, I am sometimes wrong, thankfully. Let’s all have a party, hand out awards to everyone who doubted me, and move on…

Starting Pitcher on a great day v Closer, October 11, 8:07 PM:

Circle, your “English” explanation is useless. Do you really think that manager’s “experience, intuition, and expertise” can figure out the right answer? If you do, then you are dreaming. I mean managers make ridiculous, silly mistakes all the time, believing in erroneous things like the hot hand, and batter/pitcher matchups, and other small sample nonsense. Do you think that they magically become genius savants when it comes time to figure out when to take out their starter and when to bring in a reliever.

I certainly agree that if it is a tossup, you probably want your starter in order to save your reliever for a possible extra inning game or for tomorrow.

But, if your starting pitcher comes to bat late in a close game, and it is not an obvious bunt situation (with no outs), and especially with runners on base or leading off an inning, since the pitching aspect alone is a tossup (presumably), then it is a no-brainer as far as pinch hitting is concerned since there is always a large difference in WE and RE between a pitcher hitting and a pinch hitter hitting in a high leverage situation.

All the nonsense about, “The manager believes that the starter can shut them down, and the other team is not making good contact, and the team is ahead in the game anyway,” is just that – nonsense. All those “English” explanations will not help to facilitate the right answers in any way, shape or form.

Anyway, I have not read Max’s article yet, but I agree that there needs to be a lot more investigation and controlling of all the variables before we declare it a tossup…

Starters and Relievers in the 9th Inning and Score Differential, October 17, 3:48 AM:

I hope that someone links to this study in BBTF.

As I said, I love to be proven wrong, since that gives me an opportunity to learn something. However, after all the bashing I incurred at BBTF, I think it only fair for them to see this new research. Not that I wouldn’t get bashed again. After all, it is BBTF.

As well, I think I learned a lot from this research, even though in the end, I think I was vindicated, which is kind of a silly notion anyway. As I have always said, and Guy put it aptly, my opinions are almost always informed. Sometimes they are specifically supported by evidence and sometimes they are not. They are never, however, out of my a**, as most “lay” opinions on sports are. After all, I am an expert in the field of sports analysis. You would think I was a lay journalist opining on sabermetrics, like Jayson Stark, Buster Olney, or Murray Chass, if you read the comments on BBTF.

I also encourage other people to do similar research. For example, why is it that wOBA is so much higher when the game is close? I’m sure we can speculate, but without looking at the components and perhaps even the pitch f/x data, I don’t think we can be too sure of anything in that regard.

I would also like to see how pitching with runners on base comes into play. The starters obviously always started the 9th, but the reliever data I looked at was anytime during the inning. It could be when they started the 9th or when they came in in the middle of the inning, often with runners on base.

As well, although I didn’t adjust for platoon issues, the relievers definitely faced more same-handed batters, especially when the pitching team was losing, suggesting that they were brought in specifically to face same-handed batters at some point in the inning. This needs to be looked at too.

So I don’t think that the story is over, although I think we found a very significant factor that was affecting the data in the prior research…

Starters and Relievers in the 9th Inning and Score Differential, October 17, 11:56 AM:

DavidS, because of the small samples in each category for the starters, it is almost inevitable that there will NOT be a smooth transition and pattern, which is one reason why I broke it down into only 2 categories at the end.

#5, is the Don Malcomb of Big Bad Baseball? My, you’ve come a long way down. Do you enjoy attacking me for little reason? Is it professional jealousy?

As always, if you don’t have anything substantive, intelligent, or otherwise valuable to say in my house, I’d prefer you stay out…

Starters and Relievers in the 9th Inning and Score Differential, October 17, 1:00 PM:

5 out of 9 continuing to mock me while adding nothing to the discussion.

And one with this:

“I don’t need to see any research to show me that good starting pitchers who have pitched strongly and efficiently through the first eight innings should be kept in to try and complete the game. I suspect that almost every serious baseball fan in America knows this instinctively.”

What a pathetic excuse for a serious blog…

Manager mistakes in the 2011 WS: Game 1, October 19, 8:23 PM:

And if anyone makes any reference to the outcome or result of a certain strategy in terms of evaluating or even mentioning it (as a mistake), you are going to be banned for life!

You have no idea how many people in BBTF, said something like, “You are an idiot for suggesting that Carpenter should not have bat in the 8th or pitch in the 9th,” because he got a hit and retired the side in the 9th. Of course calling me an idiot on BBTF is nothing new…

Manager mistakes in the 2011 WS: Game 1, October 19, 9:27 PM:

I was curious about Jay batting second. Andrus batted second all season long. Surely not a good choice but a least it was not an example of a manager “doing something different” for a bad reason, which happens all the time, because they , I guess to show that they actually have a difficult job (in terms of lineups and in-game managing), which they don’t, in my opinion. I think I can train a 12 year old to manager a baseball game. Oh, wait, I forgot about the “double switch” in the NL. Too tricky for a 12 year old…

Manager mistakes in the 2011 WS: Game 1, October 19, 9:45 PM:

Some of you guys are going to be surprised at actually how many mistakes a manager can make in a game or series, especially in the post-season when managers think they have to do “something.” I have been mentally noting these mistakes for 25 years. Maybe that is why I am so ornery…

Manager mistakes in the 2011 WS: Game 1, October 20, 12:12 AM:

“Carpenter wasn’t pitching that great to fear him for the 7th.”

I guess you have not read all the work I did in the other thread! You still think that you can tell whether a pitcher is “pitching well” enough to continue. Managers can’t do that. You should be a manager!

Right, hitters occasionally do that to fool the umpire, but in this case there was no way that was acting.

“Pujols has been IBB’d 4 times in the playoffs. All 4 times the inning has ended with Matt Holliday and no runs scored.

Regardless of the good/bad decision aspect to it, it will likely continue until Holliday makes them pay.”

That is true. That is one reason that managers make so many bad decision…

A player after my own heart!, October 20, 8:53 PM:

Circle, you can use things like that as tie-breakers. Other than that, I’m tired of responding, with all due respect.

Repeat after me:

“Opinions without evidence…

Opinions without evidence….

Opinions without evidence….”

Please spare us.

I don’t care how much knowledge and experience you have.

Here is an MGL’ism along the lines of the Zen-master:

“Experience is often the enemy of the truth…”

A player after my own heart!, October 20, 9:00 PM:

Circle, you’re everything that a manager is, which is not necessarily a bad thing. SOME of your insight, knowledge, and experience is valuable on this blog.

However, a big part of sabermetrics, at least a vestige of it, is to show all the things that managers (and most people in general) believe that are simply not true. Rather than keep digging your heels in, learn something. You must be on this site for a reason other than to keep telling us that conventional wisdom is right and we are wrong. Or, more along the lines of your tone, “Yeah, you guys might be right, but…”

Your “buts” HAVE TO HAVE EVIDENCE for anyone to take them seriously. If you don’t have the expertise to do the research, then cite research from someone who does. If not, and I mean this teasingly, “Shuddupp!”

You say, “If it were me, I would pitch him at home. The splits are so large they must mean something.”

Evidence? No!

Your gonna respond positively, as you always do, and I appreciate that, but save it. Then 10 minutes later, you’re going to say, “Yeah, but…”

How about, “I think the world is flat and that we, as human beings, were spawned by aliens who landed here a long time ago. And we never landed on the moon and 9/11 was a conspiracy by the U.S. government?”

These are opinions same as yours. Evidence? Nah. Don’t pay them any mind and I won’t pay yours. Deal?

Manager moves in the 2011 WS: Game 2, October 20, 10:14 PM:

Tango, #6, I never thought it was that difficult to plot a pitch by eye, despite what some of the pitch f/x guys say. Especially if you watch a lot of games and you mentally adjust for the slightly offset camera angle (some broadcasts are more centered than others)…

Manager moves in the 2011 WS: Game 2, October 20, 10:34 PM:

I thought Moreland was traded to the Cardinals before the game started?

I didn’t really watch Pujols on that fly ball, but of course he should be running it out in the freaking WS! He is one of those guys that almost never runs out balls that are likely to be outs. I don’t care how good he is or how much money he makes. If I am the manager, he either runs them out or doesn’t play. If nothing else, it is a poor model for the younger guys.

So why was Napoli batting so low early in the season? He was an excellent hitter going into the season…

Manager moves in the 2011 WS: Game 2, October 20, 10:48 PM:

It is a freaking hit and a base advance on the throw. If there is an error it is on the RF’er for a bad throw

Manager moves in the 2011 WS: Game 2, October 20, 11:02 PM:

Westbrook is a terrible starter, but as a reliever, maybe he is 1 run better. Only TLR knows that. As you said, Phil, you gotta choose the guy who has the most K, you don’t mind walks. You want a guy who misses bats.

You bring in a high K/BB guy, and if Young walks, you bring in Westbrook, the sinker-baller.

Ah, I should have been a manager or pitching coach!

Remember I said, There would probably be a mistake before the game ends. There were maybe 5 mistakes.

Circle, that throw was terrible. Probably 15 feet off line. I don’t even think that Pujols touched the ball…

Manager moves in the 2011 WS: Game 2, October 20, 11:18 PM:

The media is so stupid and predictable…

Manager mistakes in the 2011 WS: Game 1, October 20, 11:22 PM:

Not thin air. Nothing I say is out of thin air. You can disagree of course. But everything I say is based on 25 years of sabermetric research, mine and others’.

I have Feldman projected as a 4.32 starter (average starter is around 4.08), which is actually not that bad. I take back what I said about him being “terrible”, although I do think he is worse than that as a starter based on what I have seen of him (but I don’t intend anyone to take that seriously) and ZIPS, Oliver, Steamer, and Pecota have him as around a 4.62 which is very poor. So the concensus is probably that he is a mediocre starter at best.

As a reliever, we usually just subtract around 1 run per 9, although I subtract .82 (from the research I have done). And my projection is based on his starts and relief appearances, with each one adjusted.

So that is my “thin air…”

Manager moves in the 2011 WS: Game 2, October 21, 12:27 AM:

Thanks! With that attitude, you would never be allowed on BTTF (Bash The “The Book” Fan”.

BTW, swapping Napoli with Young is more advantageous than swapping Andrus with Beltre. The former generates an extra 7 runs per season, while the latter only adds .5 run per season. So maybe having Andrus at #2 is not all that bad.

If we just switch Andrus with Napoli though (G-d forbid we bat a slow power hitter in the 2 hole), and kill 2 birds with one small stone, we get 12 extra runs a game or 1.6 wins a year…

Manager moves in the 2011 WS: Game 2, October 21, 12:06 PM:

I’m confused. We (saberists) get criticized all the time for not taking into consideration (including in our models/formulas) things like, “Hammy is hurt and may not be able to turn or catch up to Motte’s fastball.”

Yet, LaRussa, arguably one of the best at seeing and utilizing things like that, takes Motte out.

Ah, yesterday, he was a genius and hero, today, he is a goat…

Taking Out Jason Motte, October 21, 8:06 PM:

Wow. Without actually applying some NUMBERS to each of those options, it is impossible to know which one is correct (yields the highest WP for the Cards). Each of us can have an “opinion” on which one is correct, but without NUMBERS, I am afraid that opinion isn’t worth much.

I’ve been doing these kinds of analyses for 25 years and I have no idea which one is correct. I suspect that D might be, but you/I would have to figure out how much the bad “D” in the field costs in WP as well as removing another player (decent chance for a tie game and extra innings). It is really complicated to figure all this out, but it can be done (approximated at least, so we have SOME idea as to which option might have been correct).

I am completely agnostic as far as walking the bases loaded. Normally you never do that with 0 outs (other than perhaps in the bottom of the 9th in a tie game), but here, I don’t know. I don’t think (no, I KNOW) that Dave or anyone else knows without “running the numbers.”

I’m also not sure why Dave is obsessed with the Cards trying to make Hamilton hit the ball on the ground. I have no idea whether that would be better or worse than a fly ball. Lots of fly balls don’t score the runner at third (short ones of course) and lots and lots of fly balls don’t move the runner to third. Same thing with ground balls. Some are base hits, some move both runner, etc. I don’t remember if the IF was playing in or not (probably not), but even if it were, I’m STILL not sure whether a fly ball or ground ball would be better. IOW, I’m not sure whether I want a GB or FB pitcher to pitch, everything else being equal.

Also, I am very uncomfortable when an analyst gets to choose which sample he wants to present to support his point or his opinion. This year only? Last 2 years? 3 years? Career? Lately, as in last half season? You should not be allowed to do that, for obvious reasons (cherry picking your evidence makes your arguments intellectually dishonest, or misleading at best).

For example, Dave said this:

“While Hamilton’s strikeout rate against LHPs jumps to 22.1%, Rhodes K% against LHBs this year was just 16.1%. His career numbers are much better, but he’s not the same pitcher he was a few years ago, and Hamilton had hit an outfield fly against him the night before.”

Yes, he is not the same pitchers, but if this year his K% was higher than his career numbers, Dave would obviously be quoting his career numbers (heck, I would to if I had the choice!). The analysts should NOT have the choice. He should always be quoting a projection which is some kind of weighted career average!

And for the last part of that last sentence, about Hammy hitting a fly ball the night before, David should get immediately thrown into the MGL jail. I can’t believe he even said that in that context. Shame on you Dave!

“…but then you’re essentially inviting Ron Washington to execute a squeeze play. ”

I think there is about a zero chance of Hamilton squeezing just because Motte is playing third. That should not even be in the analysis unless you want to use it as a tie breaker in a dead heat.

Finally, Davis didn’t even mention one of the most important part of the puzzle, and one that made LaRussa’s decision likely awful. Lynn is a terrible pitcher! I don’t care how he has done lately (has it been good). I an other projections experts have him as near replacement level! If you know that he is going to bring in Lynn, as opposed to say, Dotel, then it is a no-brainer not taking out Motte (or putting him in the field).

So while David definitely brought up most of the relevant facts in order to determine which option was best, I don’t think that any of us is any closer to the answer….

Managing the 2011 World Series: Game 3, October 22, 9:49 PM:

McCarver:

“Tailing by 2 instead of 1, he has to hold him (Napoli) at third.”

Seriously, does this guy have any analytical skills?

In the 4th inning, it makes almost NO difference whether they are down 1 or 2 in terms of sending the runner.

And that was not such a great throw from Holliday, if only because he did not back up on the catch and then come in. With Napoli running, that should not have been a close play!

2011 WS discussion: Game 4, October 23, 10:19 PM:

Have you ever known anyone in “real” life who does a lot of unconventional things, even when some of them are incorrect, just because they think they are smarter than everyone else and in order to prove that, they have to do things differently? I do.

That is LaRussa in a nutshell. I’ve said this for many years. When I worked for the Cards and I met him for the first time, he basically laughed me out of the room (he has no use for sabermetrics or sabermetricians – none at all)…

Small sample sizitis, October 24, 3:03 PM:

People still don’t get it. If you have little or no predictive value in the population, as we found in the book, than, no matter what you THINK you know, and no matter what SEEMS to make sense, the sample size barely batters.

We can talk about this until we are blue in the face, and we can explain it in detail in The Book, but we still get this:

Oh yeah, I understand, but:

“The match-up numbers became large enough to tell something.”

Tango, I don’t like that you undersell the lack of predictive value we found with batter/pitcher matchups. I mean, you even went so far as to try and increase the size of the samples by using “pitcher families” (OK, not quite the same thing as facing one particular pitcher), and STILL we found nothing.

Can we please put this to rest? No number of PA gives us any more than de minimus value (at most – it still might be NOTHING – given that we found nothing). Use it as a tie breaker if you want – I don’t care. But please don’t use it otherwise as a decision maker no matter how many PA it is based upon (obviously you can’t have hundreds of PA and even then, there is ZERO evidence that that would have predictive value).

Guys: If 50 or 100 PA have any practical predictive value, then we would find it in 20 or 30 PA. We didn’t. We found nothing. Nothing. N-O-T-H-I-N-G.

So, please, pretty please, stop telling us about Earl Weaver and your grandmother who are so smart that they wait for 20 or 30 PA. We looked at lots of guys who had 20 or 30 PA and found zilch. Nada.

We tell you (in The Book) how much to regress clutch (a lot) given a certain sample size. We tell you how much to regress pitcher BABIP. We tell you how much to regress windup/stretch splits. We tell you how much to regress RHB platoon splits. All of these are a lot even with a fairly large sample size.

We didn’t tell you how much to regress batter/pitcher matchups. Do you know why? Because we found NO predictive value, i.e., no “skill”, i.e., all the various unusual results you see are likely due to random fluctuation, at ANY sample size.

And again, for the 10 thousandth time, you cannot use the argument, “You found nothing at 30 PA, but at 50 PA, there IS something.” If you find nothing at 30 PA, as long as your sample of players is reasonably large, then there is nothing or almost nothing at 50 PA or 100 PA.

Do we know all of this for a fact – i.e. with 100% certainty? No! We know virtually nothing for a fact when they are based on inferences derived from sample data, which almost all sabermetric tenets are…

Small sample sizitis, October 24, 4:42 PM:

One of these days I am going to write a Primer for MLB managers – seriously.

Chapter X: “Mr. Larussa (as a metaphor for most managers), throw those index cards away!”

2011 WS discussion: Game 5, October 24, 11:16 PM:

Tony is closing the barn door. Where’s the horse? The horse?????

Managing the 2011 World Series: Game 3, October 26, 3:20 AM:

Bill, I appreciate the input. I also happened to have played high stakes poker for many years – quite successfully as you might imagine. So I kind of know what I am talking about when it comes to poker! ;-)

I don’t usually make reference to things I am not grounded in…

Meaning in hot and cold zones?, October 26, 1:51 PM:

I have not RTFA yet, but I love the idea of estimating true talent from observed data for anything, especially pitch type and location data.

I have thought about this for a long time. It is especially important for advanced scouting and I think that many teams ignore this fact. For example, if two batters are observed to have exactly the same hot and cold patterns, and they are quite far from league average, but one has 400 PA and the other has 4,000, you would pitch each one quite differently (the one with 400, you would pitch him much closer to that of the league average player). I’m not sure teams do that. Same for defending a batter based on spray charts. Those spray charts have to be regressed!

Now, I’ll RTFA!

Batter-pitcher matchups, October 27, 1:28 PM:

I have not read the article yet, but it sounds like truly great work.

So basically what we have already said a million times before: “Use it as a tie-breaker and nothing else.”

Yes, I would like to see how much to regress the batter-pitcher results toward the expected results (for a given number of PA of course) or how much to regress the expected results toward the batter-pitcher sample.

Colin one thing: I assume (again, I have not RTFA yet) you did not control for GB/FB platoon. If you do that, I suspect that the entire predictive value might disappear. Can you perhaps tell us in the extreme cases on both ends what the average GB /FB ratio is for the pitchers and batters?

World Series 2011 Game Thread: Final Game, October 28, 10:00 PM:

That was one of the worst IBB’s you will ever see. Of course I’ve said that many times.

Do you think that Ron Washington even knows that when you IBB the bases loaded, that if the next guy walks, it’s a run?

I don’t care how good a people person he is, the guy is a stone cold moron!

World Series 2011 Game Thread: Final Game, October 28, 10:01 PM:

I wish that I were part of the mainstream media so that I could tell it like it really is…

That seems like a fitting comment to go out on. I hope you enjoyed this edition of The Best of MGL.