A problem with using the SD function on the chrono' (or any calculator for that matter) is that SD only produces useful information if it has (a) a reasonable sample size and (b) the values being input fit, or at any rate approximate to, a 'bell-curve' if the number of readings for each value is plotted in a histogram or graph form.
Most of us only take a modest number of readings of MVs for a load combination, and all too often the spread of those readings bears little resemblance to a 'bell-curve'. It often makes more sense to keep an eye on individual values and the pattern. In my experience, most reasonably closely matched strings simply produce an SD that is half of the extreme spread when the sample size is small eg ES = 20; SD = 8-11, but more often than not exactly 10.
The bell curve or absence thereof problem comes with the a string that reads something like
3,000
2,995
2,998
3,045
3,004
2,997
3,010
3,001
2,998
3,000
Using Amlevin's calculator, our 10 readings give us an arithmetical mean of 3,005 to the nearest whole number and an SD of 14 likewise. Simple arithmetic says the ES is 50 (3,045-2,995 fps)
The problem is one value way out of the bell curve sitting on its own at 3,045 fps. Redo the calculation for the other 9 values and you get an arithmetic mean that hardly changes at 3,003, ES of 15 and SD of 4.2, a result that most people would be very happy with.
So, the issue is one shot out of 10 .... and why was it aberrant? Is it a chronograph / ambient light change on the range problem? Is is a bum case problem? Is it a scales consistency problem? Is it operator problem - eg distracted when weighing charges? ...... etc, etc. (If you're watching individual values as they appear, it can be useful to keep the case for an aberrant shot on one side and see if it does the same thing when reloaded, and if so scrapping it.)
More to the point:
(a) did the 10 shots group well?
(b) if the exercise is repeated, do you get similar results?
A problem I've run into repeatedly in this sort of exercise, is that results are often not repeatable. A string that gives small ES and SD values one week can be very different and poorer the following week using the same components etc. The problems are small sample sizes and the number of variables in smallarms ammunition and handloading practices.
What to do about it? Don't get over hung up on ES and SD; look at the patterns not just the calculated summary results; if the load groups well at the desired range or in a ladder test, that's more important than wearing the barrel out trying to screw the SD down to tiny values. A friend who holds the UK BR Assoc 1,000 yard group record always tells me to look at group and don't worry about spreads and SDs - his record winning load never produced particularly good figures in either.
What might be interesting in all this would be for a forum member who is a statistician or an engineer who uses this sort of data professionally to say what would be regarded as a good sample size in this sort of exercise. I suspect strongly that we'd be wearing barrels out prematurely trying to get reasonably meaningful results here.