And statistically valid sample size is an abused and misused term! For example a ladder test is often run using one shot per load to obtain valid results, ie the node as the flat spot of an overall curve. The sample size conundrum arises when one desires a high level of confidence about a single answer, taken in isolation. Target group size, like velocity ES, requires more samples to achieve confidence than using the SD of the same data. But why rehash the same....
To my mind at least, there is a difference between 'exploratory data analysis' aka 'EDA', which is really what most people's load development amounts to, and actually trying to prove that load A is better than load B with some degree of confidence. Running stupidly high shot counts on each level of the former just wastes time and materials.

ebb, I do not take the criticism personal. Most folks agree with you but a small number may choose to consider using statistical analysis to help improve their performance. Mathematics is already a factor in much that we do and statistical analysis is just one more tool that we can use in some of what we do. It takes some effort to learn to use the tool and even more effort to learn how and when to use the tool. Did you read the paper? If not I would recommend that you at least read the introduction. Best regards, ClydeCHKunz, I hope you didn't take my post as negative in any way. The statistics are great for mathematicians, which I am not. If I wanted to do a math project this would be great but I want to make a small group. I was following the math of all this load data that looks at velocity and was watching ES. I tinkered with the load on my 7RUM and got it in to single digits, I was super happy. At distance it was the worst load I had tried so far. I think that rifle accuracy has toooo many variables to allow math to guide you to the goal. Math is exact, rifle accuracy is not, there are no rules. Which bullet is best, surely some of the very best custom bullets will shoot the best in all rifles. NOT.
Yes. If you are looking at extremely small target groups as in short range BR then the individual shots are not normally distributed. I can tell you my shots in Fclass are normal, and would be shocked if most are not. Measure a few of your targets to see. You can also pool ("average") sd to increase the sample size, to achieve a more robust analysis. For example the chrono sd using the same load but shot on different days, or the individual shot variability on those days.Just to continue kicking this can down the road...
@CharlieNC you mention using the F-test to compare the variability between two samples. The subject came up in discussion with someone a little while back, about whether it was possible to compare things such as standard deviation straight up across the board i.e. to say whether an SD of 5 was really that much better than an SD of 7. One of the things that I came across when looking into it further was that many (most) variance tests are heavily reliant on an assumption of normality in the data. If that assumption is incorrect, then the whole test becomes suspect.
Please send a pdf file to oneutdon@gmail.com. Thanks, DonSo for the benefit of anyone wanting to see this information, I have a pdf file of the 7 pages. Too big to post here, but PM me with your email address, and I will gladly send it to you. I will include the first 4 pages in this reply, so you can see a preview.
@CharlieNC are you doing everything in Excel, or are you using more specialized software like this?
I use mean radius and shoot 10 shot groups and multiple ten shot groups to confirm a "pet" load. It is best to shoot the groups on different days. But the question we were discussing is how to evaluate a change in a component to see if there is a significant probability that the change has improved the load or if we are just seeing the normal variability in the data. The reference I gave uses a simplified version of the statistical "T" test at a 95 percent probability and is easy to use, however CharlieNC says the F test is the proper statistical analysis.Mean radius with a minimum of 20 shots will get you started on how a load will shoot.
Charlie, I have been looking in several references to learn when the F test is more appropriate in data analysis than the T test. Can you provide some guidance on this question?Well I took a look at the Lyman pages. Bad news is this is a mis-application, t-test is not properly used to compare variability. Good news is that there is a simple way to test for significance using any number of samples to determine if there is a significant difference in shot dispersion. Group size, like chrono es, is not the best to use. For the five shot examples instead of measuring the group size for the two loads, use the individual shot data. For example On Target gives you the distance of each shot from the center of the group as shown below. The proper test to compare whether the variability is different is the F-test. In Excel you can use the F.test function to do this; the result of this below indicates there is a .37 probability that the variability is the same. Or said another way that there is a 63% chance that the loads resulted in differences in variability. So there is a 63% probability that Load A group size is better than Load B.
There is no magic number of samples that give significant results; it depends on how fine you want to cut the hairs and how much variability obscures your vision. Probably more than you wanted to know.
View attachment 1025183
You can do quite a bit with excel and customize it to suit your objectives; I'm partially on this path now. For work I have used Minitab stat package for years, so I use it the most.
Charlie, I have been looking in several references to learn when the F test is more appropriate in data analysis than the T test. Can you provide some guidance on this question?
The t-test is for comparing the mean of two samples (or one sample against a standard). The f-test is for comparing the *variance* of two (or more) samples.
