Monday, July 11, 2005

Controlled Explosions

Type inference is really a simple problem, with a simple solution. Just consider each allocation site separately, apply the cartesian product algorithm and you're done. You will have a simple implementation and almost perfect precision. Unfortunately, there are 'some' scaling issues of course.. Let's compare it to chess. Chess is also a simple problem, with a simple solution. Just calculate all possibilities over the next 100 moves and Kasparov will look like a beginner..

However, while there exist nice AI techniques to make simple computers play chess very well, as far as I know, the scalability problem of type inference methods has not yet been convincingly solved in the scientific literature. It might be just a matter of time, with promising techniques such as CPA and IFA. The chess problem, of course, has been researched over a much longer period of time, and by many more people, so we can be hopeful.

However, combining CPA and IFA will not be as straight-forward as I would have thought (ain't it ever.) By initially using a single contour for non-simple types, CPA doesn't blow up for these types, but because of the single contour many simple types will now become polluted when they interact with such 'overloaded' contours, again blowing up CPA. (Consider getting an int from an int list, then using it as a argument.) A solution might be to limit the size of cartesian products during the first iterations, or perhaps even to use 0CFA..

In any case, I guess my priority right now is to get everything working precisely and (hopefully) correctly. It has to _work_ first for larger programs, before I can fine-tune stuff and incorporate butt-ugly heuristics, as well as the neighbour's dog for improving scalability. I'm trying not to think about other scalability problems presented by larger programs, caused by inheritance and the proportionally increased usage of dynamic features (e.g. reflection).

In other words, I don't think I will be able to bootstrap the compiler any time soon (picture a snake with his tail in his mouth.. do snakes have tails??). I actually thought it would be possible to do during my Master's project. Then again, maybe I would have chosen something UML'ish if I had known the difficulties. IMO, it would be great, however, to actually _have_ an open source engine that is precise and works for not-too-large Python programs (say up to around a few thousand lines), that are programmed in a C++-like style.

No comments: