Let me step into the statistical debate here:
Standard deviation on a normally distributed bell-curve will produce approximately the following:
~68% of the population within plus or minus 1 standard deviation of the mean (average) value.
~95% of the population within plus or minus 2 standard deviations of the mean
~99.7% of the population within plus or minus 3 standard deviations of the mean
34% between 100 and 115
13.5% between 116 and 130
2.35% between 131 and 145
0.15% 146+
The other 50% between 0 (actually 10, different discussion - 6 sigma) and 99
So how does that work out for groups?
Each group is just one sample (score). In order to have a reasonable chance of seeing a group that is the mean size plus 3 times the standard deviation is 100 groups, and that is no guarantee that you will find one. Random chance plays a role in this too. It might take 200, 400, or 1000 to find that really big group.
Hello Keith, let me recast your IQ distribution with a different twist though, … if you had the same person, sit down and take the IQ test, with breaks of course, say over weeks, how many times would that person need to take it, in order to have a reliable indication of their true IQ?
I would posit that one test is all we actually administer, that two should be extremely close to each other, and that if you kept going, and a person’s scores encroached on everything from genius and extremely gifted, all the way down to clinical retardation,
even rarely, then either the IQ test is invalid, or the person is mentally unreliable. We can rule out this kind of “unreliability” in objects like an inanimate gun, in a vice.
Similarly, it should not be possible for a person with a 10 IQ to score 140. In fact, the only way they could, would be if you let them take thousands of tests filling in bubbles - and then picked the single best result, ignoring the others, in which case, testing them only one time would actually be more reliable. A bell curve predicts untenable, illogical results will occur validly, if infrequently, even when there is no possible way they could.
The IQ test not only need not be administered a huge number of times, it also need not be unduly long. (They do not stretch into day over days, like the bar exam did.). They have determined that one test over something like one or two hours is reliable. Analogizing that to shooting, it would be like a shot string we would typically shoot, the first few establish what the ammo does, and the remainders vet a hot barrel changing poi.