The lost language extensions of MetaWare's High C Compiler

September 24, 2023

Book cover: FM TOWNS High C Compiler v1.7 User's Manual

This book I got in a pile of FM TOWNS books turns out to be a lot more interesting that I was expecting an '80s C compiler manual to be. For as long as C and its relatives have been in mainstream use, it has been necessary to use vendor language extensions to actually get anything done with it, though in today's GCC/Clang/MSVC oligopoly those extensions tend to be focused on the yak-shaving details of dealing with the underlying platform. Things were much more interesting in the 80s, when there were a lot more, smaller companies competing for adoption. Phar Lap wrote one of the first DOS extenders that allowed programs to take full advantage of the 32-bit 80386 processor from the 16-bit-bound MS-DOS environment, and they hired MetaWare to port the High C Compiler to Phar Lap's DOS extender SDK. Fujitsu in turn chose Phar Lap's DOS extender to integrate into the OS for their 80386-based FM TOWNS platform, and High C became the first-party C compiler for the platform. The FM TOWNS came out in 1989, just barely in time for the first ANSI C standard C89 to be ratified. High C has its share of DOS-specific extensions, but it also contains a lot of interesting user-oriented language extensions I haven't seen in other C compilers I've used, ranging from small quality of life improvements to fairly advanced features you wouldn't think would be possible in C, let alone a late-80s dialect of C! Some of these things would take literal decades to make it into some official standard of C or C++, and some of them still don't have equivalents in either language today. Here are some of the extensions I found interesting:

Underscores in numeric literals

manual page explaining that _ can be placed in numeric literals, like 1_000_000 for 1000000

It's a little thing, but it always bothers me when a programming language doesn't let you write long numeric literals with separators to make it readable. Many other languages have had this since C, but C++ didn't get anything like this till C++14, using the single quote as a separator like 1'000'000 instead of underscore, and C only followed suit earlier this year with C23.

Labeled arguments

manual page showing the use of labeled arguments. after declaring void P(int A, float B, Color C, Color D);, you can call it with named arguments as P(C => Red, D =› Blue, B => X*10. 0, A => y);

When calling functions with lots of parameters, or with parameters of nondescriptive types like bool, it's extremely helpful to be able to label the arguments in the call site. This is one of Python's most popular features, and High C's variant works a lot like Python. Argument labels are optional, but when they're present, you can specify the arguments in any order, using argumentName => value syntax, and you can combine unlabeled and labeled arguments arbitrarily as long as every parameter to the function has one matching argument. Neither standard C nor C++ has this feature yet.

Case ranges

manual page screenshot showing the use of case ranges case 'A'..'Z': to match all ASCII uppercase letters

Pascal lets you match a range of values with case low..high; wouldn't it be great if C had that feature? High C does, another feature standard C and C++ never adopted.

Nested functions

manual page screenshot showing the use of nested functions, including the void Callme()! type syntax for declaring the "full function value" type, and the ability to goto from nested functions into the parent function

The previous features were just very nice to have, but here we get into features that start greatly increasing the expressivity of the language. High C lets you nest functions inside of other functions, another borrow from Pascal. However, High C's implementation is much more interesting and complete than standard Pascal or GCC's nested function extension. Not only can you declare nested functions, but you can declare "full function value" types. Unlike traditional C function pointers, these work as nonescaping closures, carrying a context pointer in addition to the function pointer to let the nested function find its captured context again. (GCC infamously did horrible things to allow for nested functions to be referenced by normal function pointers, by writing executable code into the callstack to thunk the context pointer, an obvious security nightmare causing many platforms to disable the feature entirely.) This allows local function references to be used as first-class values, though their lifetime doesn't extend past when the surrounding function returns. Nested functions can even goto back into their parent function, allowing for nonlocal exits to break out of nested functions like Smalltalk blocks, allowing control flow-like functions to be built using them.

Objective-C got blocks in 2009, which can be used as escaping closures, and C++ got lambdas in 2011, but neither language got the nonlocal exit ability. Standard C still has yet to have any official nested function feature.

Generators

manual page demonstrating the generator and yield syntax, along with the for loop syntax to consume it

MetaWare was clearly proud of this since they dedicate a whole chapter to explaining it. All the way back in 1989, they supported Python-style generator coroutines! In plain C! A function declared with the syntax void foo(Arg arguments) -> (Yield yields) can call the magic function yield(values...) multiple times to generate a sequence of values. Callers can then use a new for loop syntax for variable... <- foo(arguments...) do { ... } to run a loop over each of the generator's yielded values in turn.

manual page showing an example of a recursive local function call traversing a tree, and yield-ing to the outer generator function

The implementation even allows for some pretty intricate interactions with the nested function feature. A function nested in a generator can capture the yield operation from the outer generator, and the nested function can call itself recursively to traverse a tree or other recursive data structure, yield-ing at each level to produce values for the generator. I don't think you can do that in Python or in many other mainstream languages with generator coroutines.

manual page demonstrating the desugaring of generators and for loops into nested functions

How does all this work in plain C without a fancy runtime? High C's generators act as relatively straightforward syntax sugar over the nested function feature. When you declare a generator function void foo(Arg arguments) -> (Yield yields), that's equivalent to declaring a normal function void foo(void yield(Yield yields)!, Arg arguments), where yield is an implicit parameter of "full function value" type. Using yield(values) inside the generator body is a regular function call into that implicit function parameter. On the caller's side, a for loop's body is transformed into a nested function, which is passed as the yield argument to the generator. Simple, yet effective. Since nested functions allow for nonlocal exits, break, continue, or goto out of the for loop body work too by doing a goto to the appropriate place outside of the loop.

It's unlikely that standard C would ever attempt to integrate a feature like this. C++20 now has an extremely flexible and complicated coroutine feature, based on compile-time coroutine transformations, and you can probably implement generators using it, though the resulting feature probably wouldn't be able to so straightforwardly compose with local functions.

View and comment on this post in the fediverse, or email me about it.