IMO the important thing when doing load development is a good foundation. How many shooters try to get the best out of their rifles without making sure that its bedding (for bolt actions) is optimal (or even know how to determine that their bedding is correct), or do testing without benefit of having anything between rifle and target with which to see what the wind is doing?
Typically they waste huge amounts of resources because they will not load at the range, and evidently finish groups that start with two shots that cannot be fixed by additional ones. If I want to try a new powder in my PPC, I can have the load finished in an hour, assuming that I am at the range with all my equipment set up. You need to fix all of the things that stand between you and getting consistent results.
I respect this method can yield good results. I worked in manufacturing companies for 50 years and depending on the company, the location, the management, etc this was the methodology used in the R&D lab; ie go turn the knobs, test, and declare victory when the desired answer was achieved.
Invariably more optimum, stable results were achieved when a statistically based experiment was conducted to cover the experimental space at one time and in an efficient manner. This does not mean a huge number of tests per experimantal run, as is being questioned by the Litz/Hornady proposition (ie the noise component), but by using wider ranges to improve the signal:noise ratio.
But this type of noise is exactly the nemisis when you turn a knob and attempt to judge if there is a difference by simply comparing A to B.
In addition this does not mean simply changing one variable at a time, in fact statistical experiments are efficient in that multiple variables are varied simultaneously, in the proper systematic manner, such that more valid results are obtained more efficiently; also learning fators such as does optimum seating depth depend on the powder charge.
This is a long attempt to say the Litz/Hornady proposition is correct if you are simply trying to compare Load A to Load B, a lot of samples may be necessary to overcome the noise; eg making a change at the range and shooting a few shots. BUT if the ranges for the experimental conditions are sufficient (ie charge, depth, etc) then this will drive the signal:noise ratio and provide a wider depth of knowledge due to a better characterization of the trends.
A statistical approach does not appeal to many and a reasonable degree of experience is necessary not to get bogged down in the methodology. But the proposition that the necessary sample sizes are too large to properly develop a load simply ignores other options which are not hindered in this manner. For example the ladder results shown in the past by Tom Mousel, ie an Audette ladder to id a node, is a powerful means to gain the signal:noise in an efficient manner and with demonstratable, reproducible results.