I will suggest this in the hope that it doesn't cause controversy or give offense to folks.
TLDR: with skinny short range data, study the shot impacts and groups, not the velocity stats.
There is no statistical strength in your SD/ES estimates, at least for the data you have shown. For the time being, focus on your target data and collect speed if you like but postpone making decisions on the velocity stats just yet.
Calculations of SD on three shot (or even 5 shot) groups is very unreliable and you are fine with just a study of the average and consider the ES, but even that ES isn't meaningful.
You can only use a 3 sample ES to reject a recipe, not to accept one.
If there was a hypothetical perfect shooter with a perfect gun, a giant ES could mean the recipe is a reject when it exceeds your goal standards, however, there is still the possibility that the giant ES was really the best charge and the loader screwed up on a single sample of the brass prep. So small samples of the load are very risky to judge, good or bad.
There is a leap of faith many of us take with the confidence we place in our brass prep. We also use the concept of smooth transitions and the idea that if we are taking fine charge steps it is unlikely that we would get a wild change in one step without seeing it on the neighboring charges. This is still risky.
Your math is correct, but the values have nearly a zero probability they represent an extrapolation that could be used to estimate the ES of a future sample.
For example, just because I run the calculation correctly, doesn't make those 3 sample SD values mean what they represent when there are say 30 samples.
That is not to say if you have a small sample with an ES of 30 versus one with an ES of 6, that the one with 30 is guarantied to be worse than the one with 6, but believe it or not there is still a possibility that with small samples the one with the ES of 6 can throw something larger than the 30 on the very next try and vice versa.
Now that I have made a negative comment on the SD of three shot samples, let me add that if you are forced to take risks on small sample sizes, consider using all of the bullet impacts to compare the group performance and then wait to make a stand on the velocity stats.
If you focus on what you think is a good charge weight and take many more samples of those, then the velocity data can begin to form a better average, ES and finally an SD once you have at least 15 or more samples, and more like 30.
To take risks on small sample sizes, it is best to stick with known pet load recipes in standardized guns. However, if you don't have that as a choice, I will suggest you analyze your target data the way that is shown in this link below.
By taking the impact position data of every shot in the group, and also in the adjacent powder steps, you build at least some probability that you can find the best charge (if there really is one).
https://www.ar15.com/forums/Armory/...-Accuracy-Node-Detection-Technique/42-524007/
The method requires you to analyze the target data and ignores velocity for the time being. It exploits every shot you take, but that also puts the burden of taking good shots on the shooter (all methods do.)
Once you take a decision risk on a charge weight, you should consider taking all the velocity data on it as well as nudge the charge up and down to investigate how well centered you are in the node.
BTW, there is still a possibility that the rig shoots the whole charge range roughly the same. It is entirely possible the gun produces temporary small groups in one test, and doesn't repeat the next, and then produces an equal group throughout the whole charge weight range when you aggregate all the data.
I know that upsets some folks, but the reality is that many rigs would shoot roughly the same no matter what they are fed especially when weather is factored in. That isn't necessarily a bad thing, but I know it upsets those who are being unrealistic about dispersion and statistics.
I hope you get the main point, which is to study the target and postpone the velocity stats as a pivot point on making decisions. In the end, when you take the testing to distance, if the velocity stats are good but the groups are not, those velocity stats are cold comfort. On the other hand, if you get good groups at max distance, then by definition those velocity stats are good enough. YMMV
Good Luck and have fun.