I think it's important not to lose track of the specific goal of load development, which is to find a load that shoots both small and consistently. I think it is fair to say that people have used a variety of different methods to successfully achieve that goal over the years. One thing I think is that often overlooked during statistical analysis of group size, velocity ES/SD, etc., is that pretty much everything we do is aimed at minimizing our results. In other words there is inherent bias at every step of the process, because we are trying to minimize group size and/or velocity ES/SD with every single action we take in the reloading process. In other words, the outputs of group size and velocity ES/SD are not exactly "normal" distributions to begin with.
So what does that inherent bias mean with regard to the patterns we observe? I'm sure most everyone has encountered arguments at shooting forums regarding the minimum number of shots necessary for a group to have statistical significance, either in terms of size or the associated velocity data. My take on that is a bit different from the classic statistical interpretation, and I'll use velocity as example to explain why I think about it in this way.
If one were to fire a large number of shots and prepare a distribution graph of the data, it would commonly be done as velocity on the X-axis (i.e. score), and the number of shots at a given velocity, or within a given velocity range on the y-axis (i.e. frequency). This type of plot is commonly referred to as a "Gaussian" distribution, or even more commonly as a "bell curve". Picture bell curves with the same axis labels/intervals: one is fairly wide and gently rounded at the mean as shown in the attached images (below); the others are much narrower and steeper. It may seem as though I'm taking advantage of semantics here, because assuming they are normal distributions, one could make both bell curves appear the same simply by adjusting the interval on one axis or another. However, this is not semantics for the following reason: the X-axis for both graphs is velocity as measured in fps, which implies certain things in terms of limiting sources of error. The best commercially available chronographs are limited in term of accuracy/precision to perhaps 2-3 fps under absolutely ideal conditions. Thus, the fact that many of the things a reloader does at the bench are aimed at minimizing velocity variance may effectively cause the distribution to become less wide, perhaps even to the point at which the width at a given SD is almost meaningless because of the limited accuracy/precision of the chronographs we typically use. A very tall narrow distribution means that you don't need as many shots to determine whether your velocity falls within a reasonable SD from the mean, because of the units (fps) used for the x-axis are fixed in relation to the accuracy limitation of the typical chronograph.
My point is simply this - stats can be your friend, so record the data and use it where appropriate. But don't start thinking that the stats themselves are the end-all, be-all of load development. They aren't. Remember that most reloaders are trying to do everything they can to make their groups as small as possible and/or minimize velocity ES/SD. That bias can have an impact on how we interpret test results in terms of how many shots are necessary, and what we can really conclude when we subsequently compare group size/velocity data. Along that line of thinking, some people that have posted herein obviously have an issue with ladder tests due to a sample size of n=1 for each shot. That's fine, to each their own. But that doesn't mean shooters have never successfully used a ladder test to develop a winning load...because they have. Just as other methods have been successfully employed. Find the method that works best in your hands, understand what the method is actually doing, and have at it. Getting way down deep in the weeds is not always a necessary or desirable part of load development.