Thought Aquarium

Monday, July 10, 2006

Beta-reduction

After a 7-month hiatus, I'm back to working on the implementation of Deep. I scrapped much of the earlier code, and replaced it with a simpler design. My focus is getting the type system up and running.

I have been struggling to implement beta-reduction "properly", which turned out to be surprisingly tricky. The earlier version of my interpreter used Bruin indices and a closure-based execution model that was reasonably efficient.

To do partial evaluation and dependent types, however, I need to keep expressions in a symbolic form, with explicit sharing of common subexpressions. After toying with various arcane environment manipulations, I finally turned to the literature to see what the standard approach was, only to find that it's still a topic of ongoing research. I would have expected efficient beta-reduction to be a completely solved problem by now. :-)

In the meantime, I have scrapped the Bruin indices, and gone back to simple rewrites, with a bit of lazy copying to avoid the worst performance bottlenecks.

Tuesday, October 11, 2005

Interpreter update -- call by need

The interpreter now implements call-by-need for record projections.
This turns out to be slightly more subtle than the traditional version, because
self-variables use late-binding. A record should behave like a lazy letrec
up until the point that it is composed with another record, at which point all
thunks have to be "rewound" back to their original declarations.

The following code now has linear rather than exponential complexity:

{ add(x: Int, y: Int) = x + y;
fibonacci = [1, 1 : zipWith(add, fibonacci, fibonacci.tail)];
}

Wednesday, October 05, 2005

Intpreter update -- tail-recursion

I changed the reduce() function so that it only makes one pass over the AST. This speeds things up considerably, and fixes the speed issue with non-tail-recursive functions. Now the interpreter crashes with a stack overflow instead. :-)

Unfortunately, it was also crashing with a stack overflow on tail-recursive functions. I discovered that the problem is that Scala itself does not optimize tail-recursive calls. Grrr... I replaced tail-recursive calls to reduce() in the scala code with an imperative loop (Grr...), and now I get optimized tail-recursion in Sym for free. :-)

Interpreter version 0.1

I now have a Hugs-like interpreter that implements the *untyped* operational semantics of Sym. It currently supports:
  • Functions (with currying) and records
  • A set of built-in instructions for scalar data types -- add_i, add_f, add_d, eq_i, etc.
  • A standard prelude (written in Sym) which implements a nice set of standard types:
    Bool, Char, Int, Float, Double, String, and List,
    along with some standard functions over lists -- length, take, drop, map, foldl, ++, etc.
  • A parser with some syntax sugar.
  • The ability to load and reload files, show desugared terms, along with a (very primitive) debugger.
Lists are built into the syntax, but not into the interpreter. The parser translates list syntax into calls to the standard prelude. The same is true for "built-in" data types like Bool and Int. These types are defined as ordinary classes in the prelude, with implementations that call the built-in instructions.

I have made no attempt to optimize performance. Performance is acceptable for tail-recursive functions, but it quickly chokes when trying to rewrite large expressions -- e.g. foldr(add, 0, [1..1000]). Functions can be declared as either call-by-value or call-by-name; call-by-need doesn't work yet.

Friday, September 09, 2005

Call By Need

My recent experience with combinators underscores my belief that a declarative language must use lazy (i.e. call-by-need) evaluation.

The java approach is to create all data structures with long painful sequences of set statements: e.g.
WidgetHolder wh = new WidgetHolder;
wh.addWidget(w1);
wh.addWidget(w2);
w1.setButtScratcher(wh);
.... etc. ...

A domain-specific langauge, such as parser combinators, does away with this tedium, but only if we are allowed to pass the name of a widget as an argument to itself. e.g.
wh = WidgetHolder([w1, w2 &* { buttScratcher = wh; }]);

Some compiler guarantee must be in place to ensure that such calls are indeed circular. Haskell is the model to follow here.

Combinator Madness

Work on the parser implementation in Scala continues.

My first task was to bring my parser combinators into the real world. I ditched the simple but leaky back-tracking implementation given by Odersky, and rewrote them to generate LL(k) grammars, where k is abitrary but limited lookahead. I also added a couple scanner primitives that are optimized to operate over strings instead of characters, thus making things somewhat faster.

I was still unhappy with the monadic nature of the combinators. Given a sequence of two combinators,
P[A] and A -> P[B], the second one must be created dynamically after P[A] has done its parsing. That's a lot of dynamic allocation. Since scala is imperative, and compiles to Java bytecode, I don't have much faith that monads are implemented efficiently by the compiler.

So I added some new arrow-like combinators based on Swierstra's stuff, which combines P[A] and P[A -> B], thus pre-allocating both combinators. This has the cost of requiring the grammar to be context-free, which it already is. :-) The syntax does have a lot of parenthesis, because Scala does not let me declare right-associative operators, which I need to left-factor my LL grammar. :-( Nevertheless, the new combinators work great, and I built a scanner which handles all of the java-style literals out of it. Then I discovered a flaw in my reasoning -- scala does not use lazy evaluation, so it is not possible to declaratively create recursive combinators. Aaargh!

I could try to sort this out, but I have spent two days on combinators so far, and I don't want to spend more time. I've shelved the combinators for now, and ressurected by ANTLR grammar. I dissassembled a couple scala classes, and discovered that if I am very careful about how I write methods, I can write things that should be callable from Java. So the Java parser will call a minimal interface, written in Scala, to construct AST nodes. Of course, this may break horribly if Odersky changes the scala compiler, but I doubt he will do so before I rewrite the whole thing in Sym.