I don't think there was anything wrong with the readout (blast pressure) Courtney used in his tests. The issue lay in his choice of a balance that couldn't accurately resolve the minute differences in primer weight necessary. Nonetheless, it can certainly be argued that using blast pressure as a readout would minimize the negative effects of other reloading variables (i.e. charge weight variance, case volume variance, neck tension, etc.) that all contribute to muzzle velocity variance, which was the main reason he adopted that approach.
One of the major issues with using total primer weight to sort primers containing different amounts of priming compound is that the priming compound itself represents a relatively small percentage of the total weight (approximately one tenth of the total weight according to Dave's data in the original post in this thread). As I mentioned previously, the total weight range for 100 Fed 205 primers I weighed on a balance capable of ~.0002 g accuracy was only .0069 g. That is not a very large spread considering that any variance in the weight of the cups and the anvils, which constitute approximately 90% of the total primer weight are included. In other posts on this topic, people have suggested that the weights of the cups and anvils are very uniform, which may be true. However, I can't imagine the variance in their weights is zero. When the overall range of total primer weights varies by only 6.9 mg (100 primers measured), which is approximately 3% of the total average weight, any contribution of weight variance by the cups and the anvils, however small, becomes an important consideration. It's all about limiting sources of error.
Likewise, the notion of using velocity as a readout for such a test may not seem to be the best approach. However, regardless of how such a primer compound weight test is carried out, ultimately, loaded rounds will be fired on the target as a means of determining whether there is a significant effect of sorting. With a well-tuned load, we can typically reduce the ES/SD values for 10-shot group to values of </= something like 10 fps/5 fps. Further, with ES/SD values in this range, the accuracy of a good chronograph has probably not yet become the limiting factor. So the use of velocity as a readout should still be within the limits of experimental error.
Put another way, if the limit of detection for average velocity is in the range of perhaps 2 to 4 fps with a good chronograph and a sufficiently large sample set, lets say 10 shots minimum, average velocity would likely not be the limiting source of experimental error. As we know, unless a shooter is willing to generate a much larger data set (i.e. 50 to 100 shots, or more), differences in average muzzle velocity in the range of 2 to 4 fps are not sufficiently large to draw valid conclusions about shot behavior on the target. We really need minimum velocity changes of 10 to 20 fps (or more) in order to conclude that velocity differences were responsible for creating a particular shot pattern (i.e. high/low) on the target, even at sufficiently long range. Again, this is all about limiting sources of error because unless the velocity differences in loaded rounds are at least 10 to 20 fps, it cannot be stated with certainty whether any shot dispersion observed was caused by velocity changes. In other words, until the estimated shot dispersion solely due to velocity becomes larger than the best actual shot dispersion produced by a given setup, velocity will not be the limiting factor in such a test.
I think that with careful attention to brass prep (i.e. case volume, neck tension, etc.) and charge weight measurement, it should be possible to use average velocity as a readout for such a primer test. After all, we're going to shoot loaded rounds anyhow, if the heaviest/lightest primers in a sample set are not capable of producing velocity changes greater that 2 to 4 fps, we could never reliably conclude that differences observed on the target were due to velocity changes, anyhow. If small changes in the amount of priming compound can produce changes in average velocity greater than the limit of a good chronograph's accuracy, then we could reliably conclude that the behavior we saw on the target was, in fact, due to a difference in the amount of priming compound and the resultant effect on velocity. As I stated above, this is certainly not the only way to carry out such a test. But for most of us, it would be the simplest way, requiring only great care in load preparation and not any specialized equipment.