...Is it the shape of the pressure curve, efficiency of case fill ratio, ignition characteristics, burn rate retardants, specific impulse of the powder etc...
Yes. Reproducing the same muzzle velocity of a load that shoots well with one powder, but using a different powder may not work well at all. Why? For one thing, the acceleration of the bullet with the new powder, and therefore the barrel occupancy time, will almost certainly not be exactly the same. Thus the load with the new powder will likely require a complete tuning process be carried out, even if it seems close [on paper] in terms of burn rate.
Another way to state basically the same thing would be that the pressures curves would need to be very close, if not identical, for two powders to behave exactly the same in terms of tuning a load with otherwise identical components (i.e. same bullet, brass, primer, etc.). The pressure curve entails not only the burn rate of the powder, but also the relative amount of gas expansion per charge weight increment. The two are not necessarily the same. As you noted, things such as the case fill ratio, ignition chacteristics, and burn rate retardants can directly affect burn rate, thereby also directly affecting gas expansion rate; i.e. the pressure curve.
As David noted, reloading software such as GRT or QuickLoad can be used to analyze various load parameters in an attempt to come up withsome kind of "unifying theory". However, such programs often seem to provide better outputs after the fact (i.e. characterizing a given load empirically after all the pertinent data has been collected and entered into the program), than they do in terms of
predictive behavior. The inherent problem lies in the fact that each piece of pertinent information we use as part of such programs often consists of a large number of variables that are ultimately lumped into a single data entry, such as powder burn rate, for example. Powder burn rate can be affected by case volume, case shape, bullet weight, the primer used, and temperature, among other things. It is simply not possible to store all possible combinations in such a program, even for a single powder, so the manufacturers typically provide a single measured powder burn rate and then allow the user to change/alter that value to best fit their results. Thus, we are adjusting variables to fit the measured data, rather than necessarily predicting a given outcome accurately. Fortunately, the use of such programs for predictive purposes tends to become better the more one works with a given load and can continually record/modify/input updated information.
I find the whole business of load development seems rather well-represented by the concept of seating depth. If anyone really knows
exactly what we're doing when we optimize seating depth, I have yet to hear a full and complete explanation. In fact, I've heard many different explanations for it, but none are really very satisfying. One explanation may seem to cover certain aspects of seating depth optimization well, but not fit well at all with certain other aspects. At the end of the day however, it doesn't really matter, because it is a simple enough thing to do a seating depth test and determine the answer empirically. So although I'd really love to know
exactly what is happening when we tune a load with seating depth, if only for predictive purposes, I have learned to simply settle for the results of a seating depth test. It may well be that as time goes on, more definitive answers to these questions become readily available. Until that time, we can still do most of what we need via typical [experimental] load development processes.