I wonder, after performing this experiment, if Mr Litz still a Neal’s his brass?
I don't know, but I continue to anneal mine. Here's why.
We all know that major factors, such as charge weight variations, are things that we can measure. More often than not, we see definite changes in MV as well as changes in group size when we test different charge weights. But anyone who has done load development in a serious way knows that there are pitfalls. If you use a coin to judge your performance, this may not apply to you. But if you scan and measure your test targets with a computer program and then study the results in a spread sheet, you will frequently encounter data that just doesn't fit as nicely as you might hope.
One big reason for those sometimes mysterious and frustrating test results is the fact that we normally test by firing our hand loaded ammo out of a rifle held by a human and launch bullets at a target some distance away as it travels through an unmeasured and unknown atmosphere. Plus, when we're testing charge weights, we use brass with variations in case volume (even if we sort them), bullets with weight, size, and shape variations (even if we sort them), and we shoot these test rounds through barrels of various temperatures and various degrees of fouling, just to name a few variables which are not fully controlled.
Sometimes these variables add up to overwhelm our expected test results and produce the dreaded "flyer". It goes without saying that none of us on this forum would ever flinch and produce a shooter-induced wild shot........... we're way too good for that, aren't we?
In other words, as we test one thing, we are unable to control a whole bunch of other stuff and that introduces noise, to one degree or another, into our test data.
So when we test charge weight variations, we are normally able to come up with good answers and make reasonable decisions 'cause charge weight is rather straightforward. But what about more subtle factors? For example, what happens if you change your neck cleaning procedure? Let's say you've been cleaning cases with the wet SS method and counting on your Moly coated bullets to serve as seating lube. Now you want to try dry tumbling and swabbing the necks with some sort of magic lube in order to seat non-coated bullets. It's impossible to guess if you would expect to see and measure an improvement or a decrease in group size, MV, or anything else. All most of us could say is that the change, if any, is bound to be small; much smaller than a .1gr change in charge weight for instance.
In addition, do you think you can measure changes in your case neck lube process? Well, maybe you could, but I wouldn't bet on finding a genuine answer before you wear out your barrel. There MUST be a difference, right? The same can be said for flash hole deburring, primer pocket uniforming, and a whole lot of other things we do in the search for that winning edge. That includes annealing frequency.
Trying to isolate a single factor which we expect to have a tiny effect on performance is a fools errand. There is just too much "noise" in our normal testing procedures. So most of us use common sense along with a bit of superstition. I believe that careful brass prep helps so I uniform my primer pockets. I also believe that annealing every time helps, so I do that every time. Can I prove that these two steps help or any of the other countless steps I take in the course of a reloading cycle? No, I can't.
But I can say for sure that my current reloading procedure produces WAY better results than when I first started reloading back when I was using the cheapest bullets and measuring powder with a Hornady Lock-N-Load case activated powder dispenser on a progressive press.
It should be noted that Mr. Litz didn't prove that annealing didn't do any good. All he found is that he couldn't measure the difference, if any. I suspect he would get the same result when testing primer pocket uniforming too. The expected improvement is just too small to tease the data out of the rather significant base noise. But that doesn't mean that tiny, difficult-to-measure improvements don't add up.