• This Forum is for adults 18 years of age or over. By continuing to use this Forum you are confirming that you are 18 or older. No content shall be viewed by any person under 18 in California.

Statistics for Handloaders

My point is this, one or two 3 or 5 shot groups can certainly tell you if a load is bad, but they can't tell you if it is good. You need more data for that, and you can't "ignore the flyer" or you are wasting your time.

I've long maintained the opinion that a small # of rounds can't tell you if a load is *really* good, without some further testing... but it can tell you if its bad.
 
The Lyman article looked at group sizes for 5 shots x 5 shootings = 25 total shots. Then they used a t-test to compare average group sizes vs the background variability, which was determined using the standard deviation of the 5 group sizes and the sample size (5). With only a sample size of 5, it takes a major difference for statistical significance; more samples mean a smaller difference can be detected.

On the other hand, the variability of all 25 shots can be used (such as using distance of the shot to the center of the group) to calculate a sd of the individual shots (not of 5 averages as before). Then the variances (sd squared) are compared using a F-test; and with a sample size of 25 the significance is much better to detect smaller differences.

How many samples are needed? Well it depends on the objective, such as how small of a difference do you need to detect. It's all about being able to separate the signal (degree of difference) from the noise (sd). For example it amazes me to read posts about reducing chrono variability, say for a load where sd = 5 and es = 25. Then a few posts later it is suggested to optimize a load using a ladder with one shot per charge, and looking at the chrono for the flat spot; well if a give charge has an es = 25 you can get a flat spot anywhere depending on chance.

There is no substitute for common sense, and statistics is not interesting to everyone. But just understanding there is noise in everything and judging how to separate the signal from the noise goes a long way.
 
The Lyman article looked at group sizes for 5 shots x 5 shootings = 25 total shots. Then they used a t-test to compare average group sizes vs the background variability, which was determined using the standard deviation of the 5 group sizes and the sample size (5). With only a sample size of 5, it takes a major difference for statistical significance; more samples mean a smaller difference can be detected.

On the other hand, the variability of all 25 shots can be used (such as using distance of the shot to the center of the group) to calculate a sd of the individual shots (not of 5 averages as before). Then the variances (sd squared) are compared using a F-test; and with a sample size of 25 the significance is much better to detect smaller differences.

How many samples are needed? Well it depends on the objective, such as how small of a difference do you need to detect. It's all about being able to separate the signal (degree of difference) from the noise (sd). For example it amazes me to read posts about reducing chrono variability, say for a load where sd = 5 and es = 25. Then a few posts later it is suggested to optimize a load using a ladder with one shot per charge, and looking at the chrono for the flat spot; well if a give charge has an es = 25 you can get a flat spot anywhere depending on chance.

There is no substitute for common sense, and statistics is not interesting to everyone. But just understanding there is noise in everything and judging how to separate the signal from the noise goes a long way.
CharlieNC, that is a great explanation that is easy to understand, thanks.

I also see many examples of where folks make big decisions based on too little data and with no meaningful analysis. I wish we could help folks to know that they are making decisions that have no scientific basis. There is one thread on another forum that asks "if sub-MOA groups are possible?" We know that they are possible with any rifle, it just depends on how many groups you want to shoot to get one and there is a chance you get it on the first try. The replies to that post are amazing.

I wish we could get a tutorial on statistical analysis in our forum "Bulletin".
 
I believe we are all talking about the same thing, just from different directions.
It all comes down to how many shots/effort you are willing to expend to obtain a verifiable/repeatable load. The whole system must be analyzed, not just the load itself.
This became apparent to me when I was a young sailor working as a computer technician in the navigation center of a ballistic missile firing submarine. The holy grail then (1970's), and still today, was Circular Error Probable (C.E.P.). Yes, accuracy is important even with nuclear warheads. C.E.P. gradually was refined to be an ellipse, mainly due to velocity and time of release errors.
Now, cannot C.E.P. be related to small arms group size? And cannot the C.E.P. ellipse be related to the vertical stringing at long ranges? I think so. As it would be very expensive to pull a nuclear sub off of the line and ripple fire 16 missiles with multiple warheads, a lot of high powered math went into figuring C.E.P.
Fortunately there is a less expensive and more efficient way available to us. Mean radius. Mean Radius is approximately 6% larger than C.E.P. but is much easier to calculate (this is becoming more important to me as I click off the years at a faster rate. I don't even buy green bananas anymore). The actual size doesn't really matter since we are looking for the smaller and most repeatable combination.
So, how many shots should we fire to see how a load performs? Just as in racing, how fast do you want to go? How much money do you have?
Statistically, there is no difference between a five or six shot group. More than that and the precision is greater but the efficiency falls off. A four shot group is approximately 3% less efficient than five or six. If you intend to use "group size" (e.g., Extreme Spread) to estimate precision then you'll spend 13% more bullets shooting 3-shot groups to get the same statistical confidence. Hmmm... the NRA protocol of five groups of five shots seems to have been chosen for a reason and not just pulled out of err.... the air.
Personally, since one of my rifles is hard on barrels, I shoot five 4-shot groups. Since this is a hunting rifle and I want to be as close as possible to actual conditions, each shot is out of a cold barrel. This means at least 10 minutes between shots. Because of the timing, it doesn't really matter if I shoot 4, 5 or 6 shot groups. Four shot groups are easier to score than twenty shot groups when they overlap. So, I load up 20 shots and spend a day at the range. Since I number each shot on the target while waiting for the barrel to cool, that means 20 trips plus one to hang the target. I get my exercise which is my cover story to the wife unit for the credit card hits.
From my days as a High Power Rifle shooter, after a while, it becomes apparent that the human factor, from a practical standpoint, becomes the limiting element of the system. Probably the most important thing is to "Get Your Mind Right". Now for the Bench Rest shooters, bless their hearts, the part about "How much do you want to spend" becomes a bigger part of the equation, but "Get your mind right", I believe, is still the most important.
For a more thorough rendering of the mathematical aspect go to:

http://ballistipedia.com/index.php?title=Home

Please read the whole thing, including all of the menu items and the hot links.
 
I just discovered that XL has statistical functions including the T test and the F test. Has anyone used any of the XL statistical functions?
 
The Lyman article looked at group sizes for 5 shots x 5 shootings = 25 total shots. Then they used a t-test to compare average group sizes vs the background variability, which was determined using the standard deviation of the 5 group sizes and the sample size (5). With only a sample size of 5, it takes a major difference for statistical significance; more samples mean a smaller difference can be detected.

On the other hand, the variability of all 25 shots can be used (such as using distance of the shot to the center of the group) to calculate a sd of the individual shots (not of 5 averages as before). Then the variances (sd squared) are compared using a F-test; and with a sample size of 25 the significance is much better to detect smaller differences.

How many samples are needed? Well it depends on the objective, such as how small of a difference do you need to detect. It's all about being able to separate the signal (degree of difference) from the noise (sd). For example it amazes me to read posts about reducing chrono variability, say for a load where sd = 5 and es = 25. Then a few posts later it is suggested to optimize a load using a ladder with one shot per charge, and looking at the chrono for the flat spot; well if a give charge has an es = 25 you can get a flat spot anywhere depending on chance.

There is no substitute for common sense, and statistics is not interesting to everyone. But just understanding there is noise in everything and judging how to separate the signal from the noise goes a long way.

I did not drop off the planet but have been doing some study so I could make an informed reply. I am particularly focused on your comment that the Lyman paper used the t-test that should have been more appropriately addressed by the F-test.

The Lyman paper used the T-test and you gave the opinion that the F-test would be more appropriate for the application addressed. You also gave your basis for the F-test being more appropriate than the T-test. I thought I would see if I can research the subject so I can make an informed judgment as to the validity of your evaluation of the Lyman paper. At this point I have the Lyman paper which by default I give more credibility than "some guy on the internet (you)". Now let me be sure that I make it clear that you may be right and the Lyman paper may be wrong but I would like to have some basis for my judgment between the two. So I am trying to educate myself on the subject so I can have a informed opinion on the subject and form the basis for my judgment.

Let me say that I have enough education, experience and knowledge of statistics that I can concur with much of what you say such as the statement above relative to your observations of the reporting and conclusions of ladder testing which has insufficient data to support the conclusions drawn from the limited data. And, unfortunately, there are many more examples on the shooting forums that could be called out for such deficiencies.

I have a scientific/technical education and work experience so I am capable of educating myself on the subject, I am currently working my way thru "Statistics for Dummies" which is a great place for anyone with a high school education with some algebra to start. It covers the T-test but not the F-test so I will have to go to some of the statistics text books that I have in my library that include the T-test to complete my study of the subject. This is no simple task, I have spent considerable time over the past month or so just to be able to adequately comment and I still have a way to go.

Thank you for your comments on this subject and please stay engaged. Folks with education, knowledge and experience in statistics have much to offer to our sport.

One last point, can you quantify the difference in the results of the analysis of the Lyman paper between the use of the T-test to the F-test?
 
@chkunz I applaud your efforts to increase your knowledge in order to improve your capabilities. As I have only tried to become a shooter when I retired about seven years ago, I have found this site useful with respect to the technologies and skills associated with that journey. While working I found it useful to improve my knowledge and use of statistics as the problems became more complex, and the costs associated with learning answers in the labs/pilot plants/manufacturing were expensive. Learning to optimise loads, and interpreting when something is actually better or worse, are similar exercises.

On to your direct questions. The t-test is used to compare averages, while the F-test is to compare standard deviations (variability); and these tests are based on different distributions and methods. Variability of shots can be characterized by several parameters including sd and range (group size). But the range does not follow the normal distribution which is the basis for the t-test, which renders this approach inappropriate.

The Lyman article also uses ranges of 5 shots, why 5? We all know more shots yields larger groups. So why not use 10 shots? But in their example this would have yielded fewer averages for the t-test and less statistical confidence using this approach. Doesn't seem to make sense? Another reason to use sd instead of ranges as the approaches and logic are less complex to resolve.
 
@chkunz I applaud your efforts to increase your knowledge in order to improve your capabilities. As I have only tried to become a shooter when I retired about seven years ago, I have found this site useful with respect to the technologies and skills associated with that journey. While working I found it useful to improve my knowledge and use of statistics as the problems became more complex, and the costs associated with learning answers in the labs/pilot plants/manufacturing were expensive. Learning to optimise loads, and interpreting when something is actually better or worse, are similar exercises.

On to your direct questions. The t-test is used to compare averages, while the F-test is to compare standard deviations (variability); and these tests are based on different distributions and methods. Variability of shots can be characterized by several parameters including sd and range (group size). But the range does not follow the normal distribution which is the basis for the t-test, which renders this approach inappropriate.

The Lyman article also uses ranges of 5 shots, why 5? We all know more shots yields larger groups. So why not use 10 shots? But in their example this would have yielded fewer averages for the t-test and less statistical confidence using this approach. Doesn't seem to make sense? Another reason to use sd instead of ranges as the approaches and logic are less complex to resolve.
Thanks CharlieNC for the information and the encouragement. I am going to spend some time in this off season finishing up with "Statistics for Dummies" and then with one of my text books, "Basics Business Statistics" (which has a section on the F-test) in support of a shooting project I am working on. By the end of next shooting season I hope to have enough data on my project for analysis. It would be great to be able to have a statistical basis to compare "A" to "B" in my research project. There will be other factors to consider in the comparison, but the statistical analysis will be an important component of the comparison. Thanks again for your interest and help. Best regards, Clyde the elder.
 
FYI... you might be interested in 'Statistics II for Dummies' (not to be confused with 'Statistics for Dummies, 2nd Ed.'). Same author, but IIRC uses mostly Minitab which was kind of a deal-breaker for me.

In terms of more serious texts... you really, really should check out 'Statistics for Experimenters' by Box, Hunter and Hunter. Pretty much one of the seminal works in the area. Not cheap, as its most definitely not a mass-market paper-back. And you definitely need some more math skills to follow what they're talking about - I'll admit, it revealed some definite weak areas for me. But it covers pretty much every thing you're asking about, from simple comparing of means, to proper randomization, replication, and then a whole *bunch* about Design of Experiments - factorial vs. fractional factorial, screening experiments vs response surface methods, etc. etc. etc.

A much more 'lite' but still very useful intro to the same sort of topics as the BHH book is available in the free online course 'Experimentation for Improvement', which you can take for free, or just follow along the e-text and YouTube videos here (I believe the Coursera syllabus just covers mostly chapter 5 'Design and Analysis of Experiments'). You'll probably find the initial experiments somewhat simplistic, but they take you from doing fairly simple stuff more or less by hand, to doing them using a free and powerful computer software system. Again, not everyone's cup o' tea.

HTH,

Monte
 
FYI... you might be interested in 'Statistics II for Dummies' (not to be confused with 'Statistics for Dummies, 2nd Ed.'). Same author, but IIRC uses mostly Minitab which was kind of a deal-breaker for me.

In terms of more serious texts... you really, really should check out 'Statistics for Experimenters' by Box, Hunter and Hunter. Pretty much one of the seminal works in the area. Not cheap, as its most definitely not a mass-market paper-back. And you definitely need some more math skills to follow what they're talking about - I'll admit, it revealed some definite weak areas for me. But it covers pretty much every thing you're asking about, from simple comparing of means, to proper randomization, replication, and then a whole *bunch* about Design of Experiments - factorial vs. fractional factorial, screening experiments vs response surface methods, etc. etc. etc.

A much more 'lite' but still very useful intro to the same sort of topics as the BHH book is available in the free online course 'Experimentation for Improvement', which you can take for free, or just follow along the e-text and YouTube videos here (I believe the Coursera syllabus just covers mostly chapter 5 'Design and Analysis of Experiments'). You'll probably find the initial experiments somewhat simplistic, but they take you from doing fairly simple stuff more or less by hand, to doing them using a free and powerful computer software system. Again, not everyone's cup o' tea.

HTH,

Monte
Monte, thank you so much for the reference and book recommendations, I will definitely add these to my library. I have the math skills and data analysis experience from another live before I retired and although I have been retired many years I can still remember as much as I need for this subject. I even enjoy dealing with the technical stuff. thanks again and best regards.
 
The latter text/course is worth a look if just because it gives a fairly gentle, and targeted, intro to using R (via R-studio). For those who don't have a professional/academic license, its one of the only 'affordable' options.
 

Upgrades & Donations

This Forum's expenses are primarily paid by member contributions. You can upgrade your Forum membership in seconds. Gold and Silver members get unlimited FREE classifieds for one year. Gold members can upload custom avatars.


Click Upgrade Membership Button ABOVE to get Gold or Silver Status.

You can also donate any amount, large or small, with the button below. Include your Forum Name in the PayPal Notes field.


To DONATE by CHECK, or make a recurring donation, CLICK HERE to learn how.

Forum statistics

Threads
169,019
Messages
2,268,373
Members
81,759
Latest member
richard rogue
Back
Top