The P4 compiler

Pascal-P4 is available on sourceforge here:

https://sourceforge.net/projects/pascalp4/

The P4 compiler is the second of the set of portable compilers that originated in ETH, the other being Pascal-S.

P4 is a series of compilers known as Pascal-P. The versions of the compiler were:

Pascal-P1 1973
Pascal-P2 1974
Pascal-P3 1976
Pascal-P4 1976

There were no futher versions produced in Zurich beyond P4. You will find a full overview of the Pascal-P systems in PUG newsletter #4, page 81.

Whereas Pascal-S was designed to load, compile and interpret Pascal programs, P4 was the same idea implemented in separate programs, one which compiled and the other which interpreted. This was possible because an ideal machine was created, and the first pass output assembly code for that. The second pass then assembled the code into memory and interpreted it.

P4 is often called the first "bytecode" virtual machine, but this is not correct. P4's instruction stream was not organized into bytes, as is typical of the JVM (Java Virtual Machine).

The two advances for P4 vs Pascal-S were that a larger portion of the complete Pascal language was implemented, and that defining an intermediate language and parser allowed the back end (the interpreter) to be replaced by a true code generator, and thus achieve a true compiler. The P4 compiler, then, was designed to get Pascal up and running on machines other than the CDC 6400 with least effort.

The components of the original "P4 porting kit" were:

The source for the compiler and the interpreter (pcom.pas and pint.pas).
The "assembly language" source for the compiler (as translated by the compiler itself).

Wirth had several methods in mind to get P4 running on a new architecture:

Create a new assembler/interpreter using the assembly language for the target processor, or another language.
Use a macro assembler to implement the intermediate.
Hand translate the intermediate

If you understand N. Wirth, you would also understand why he did not consider the last option to be amazingly painful. The traditional method to port a new language to an unfamiliar computer is to create a compiler on another computer in that language that targets the new computer, then have the compiler compile itself to the new computer, and move the tape to the new computer to run it.

If Wirth had any takers on his novel porting method, I would like to hear about it. Actually, there were probably a few university projects that used the method.

P4 was not the only means used to port early Pascal compilers based on Niklaus Wirth's work. The other method was to modify the CDC 6000 compiler to generate code for another machine, then bootstrap the compiler to the new system.

Compiling and using P4

The P4 set can be compiled with any ISO standard compiler. P4 itself compiles a subset of standard Pascal, with the following omissions/changes:

Procedure/function parameters.
Interprocedural gotos (goto must terminate in the same procedure/function).
Only files of type "text" can be used, and then only the ones that are predefined by P4, which are "input", "output", and two special files defined so that P4 can compile itself.
"mark" and "release" instead of "dispose".
Curly bracket comments {} are not implemented.
The predeclared identifiers maxint, text, round, page, dispose, and the functions they represent, are not present.
The procedures reset, rewrite, pack and unpack are not implemented (they are recognized as valid predefined procedures, but give an 'unimplemented' error on use).
Undiscriminated variant records.
Output of boolean types.
Output of reals in "fixed" format.
Set constructors using subranges ('0'..'9').

"mark" and "release" are dummy functions in the compiler, since they have no meaning on a ISO standard compiler. What this means is that dynamic space, once allocated, is not freed. This is not a big problem for the kinds of small programs you would typically run with P4.

P4 also has some interesting quirks. "array [1:10] of char" is a valid declaration in P4, and the '..' and ':' tolkens are aliases of each other. The reason is probally lost in history.

The limitations of P4 vs. the full language were deliberate. The idea was to remove any language detail that the P4 itself didn't need, so that it could self compile. Remember that P4 was primarily designed to be a bootstrapper for the language. Unfortunately, some of the limits of P4 persisted into actual implementations of the language based on the P-system, which is a good lesson for language designers: don't implement subsets of your language if you don't want to see that as permanent somewhere.

I placed the files used by the compiler into the headers of the programs. In many Pascals, that allows you to associate a name with the file. If you have a compiler that does not, simply use another method to assign names to these files.

P4 would be a very limited compiler to use on a day to day basis. It does however, have use as:

A toy compiler, to see how compilers work.
As a starting basis for your own compiler.
A historical item

As an example of a real compiler for Pascal, I would recommend also Per Brinch Hansen's book on compilers.

Note that the PUG newsletter #11, page 70 has a collection of bugs or limitations and their solution for P4.

Note that the error numbers given by the P4 compiler were listed in the "Pascal User Manual and Report" [Jensen and Wirth] second edition on page119. This information was removed in later editions of the book. I recommend serious users of P4 get an old copy of the book: Oddly, it is still available new (a new version of the old second edition): Also note (as Steve Pemberton states in his book) that error '399' changed meaning from 'variable dimension arrays not implemented' to simply 'unimplemented', and is used for several unimplemented features in the compiler.

PASCAL User Manual and Report

Note that the compiler contained in the book "A Model Implementation of Standard Pascal" [Jim Welsh and Atholl Hay] is a P-machine that implements a full ISO 7185 Compiler/Interpreter. This probally qualifies as an implementation of the theoretical "P5" compiler. You can find that book here:

A Model Implementation of Standard Pascal. Hardcover.

The "Model Implementation" isn't just a modified P4 compiler, it is extensively parameterized, commented, and has a high degree of portability.

P4 or P5?

P4 was changed slightly to compile under ISO 7185, but itself only compiles a subset of the full Pascal language. If you want to use P4 as the basis of a serious compiler, I recommend you start with the P5 project:

P5 - A full ISO 7185 compiler based on P4

This is P4 extended to compile the complete Pascal langugage.

In addition, the goal of P4 adaption here was to perform minimal adaption to allow it to run on currently available ISO 7185 Pascal compilers. There were many bug fixes in P4 that were made in P5, but not corrected in P4.

Validation and checkout of P4

In the form that P4 was obtained from Steve Pemberton's site, P4 didn't run correctly on my 80386/Windows installation. On 2007/11/14, I finished a series of modifications and testing that resulted in a fully working and checked out compiler. The links on this page have been updated accordingly.

The method used to check the P4 compiler was the same as for Pascal-S, a "cut down" version of my ISO 7185 test suite as detailed here:

http://www.moorecad.com/standardpascal/compiler.html

What was removed from the test were the language features that were not implemented in P4 (remember P4 is designed to implement a subset of Pascal, not full Pascal).

Changes required

The changes needed for P4 to compile and work under Windows are detailed in the source. The biggest change comes from the nature of its CDC 6000 dependencies. P4 assumes that integers, characters and booleans are interchangable with respect to the space they occupy, which is a 60 bit CDC 6000 word. On a 80386, or indeed the vast majority of processors being manufactured today, use "byte addressability", meaning they can address objects as small as a byte. The compiler used to compile P4 represents characters and booleans as bytes, which means that if you start interpreting a character as an integer, you will see the extra bits over the 8 bits in a character as garbage. This comes about in P4 because, for example, it treats "ord" as a no-op, and expects an undiscriminated variant record change from character to integer to work, as it would on a CDC 6000. P4 does include the ability to treat each type of data differently, it was just not implemented in the P4 compiler as it was.

Self compilation

When I finished checking and testing P4, I wanted to have it compile itself. Although I added the changes required to make P4 ISO 7185 compilant, these changes could be commented out so that it could compile itself. However, P4 was not quite able to compile itself, due to several reasons:

P4 passes parameters of type "text", for example the routine readi(var f: text); P4 cannot declare any file, it must use the default header files.
The CDC 6000 routine "halt" is used (exit program immediately). P4 does not implement that.
P4 uses a jump to the program end (an alternative to "halt"). P4 cannot compile such interprocedure jumps.

There are also a series of more obscure factors. For example, P4, as listed both here and in Steve Pemberton's book, has a serious bug that prevents it from actually reading from the prd file correctly (the prd is the input file used to pass its "assembly" language). I won't spoil the fun of finding it yourself. There are more such bugs discussed in the PUG newsletters. Lets just say that it is clear that the p4 machine, as given to me from Steve's site, has clearly not been used to compile itself for some time.

All this means is that you would have to modify P4 to get it to compile itself. This would be non-trivial, especially for the "text" declarations, so I decided that I would indeed modify P4 to compile itself, but the result was best called "P5", instead of P4. In other words, creating a self compiling version of the compiler is a much easier prospect if I remove the idea of trying to maintain it in as close to its "historical" condition as possible.

In the meantime, P4 has reached a high degree of workability in the present version here, and has passed several large tests.

Space efficiency

P4 takes more space than it needs to on a current machine. The reason is that the "store" array, where all data for the running program is kept, has a single record that covers all of the integer, real, character, boolean and set formats. On the CDC 6000 each of these was indeed the same size, a 60 bit machine word. A set of 60 elements would suffice for that machine, because the CDC 60000 used a special character set with only upper case. Thus, "set of char" was still possible.

Even the CDC 6000, however, would be wasteful with characters, since they weren't represented as packed. Each character of a constant string would take 60 bits. To be fair, this is also true of booleans, but booleans are not commonly represented as arrays. The waste of space with string constants is handled in P4 by setting the total string length limit fairly small. It was 16 characters in the original P4 source. Also, P4 avoids the use of string constants whenever possible. Error messages are numeric, and printing of string constants is kept to a minimum.

On a typical microprocessor today, a set capable of representing "set of char" in ASCII is a minimum of 128 bits or 16 bytes, and probally 256 bits or 32 bytes, which means the 8th bit of the character does not have to be dealt with specially. This means that each location of "store" would have to take that much room. A character string would take the number of characters times 32 bytes, and you can see that the space requirements mount up rapidly.

With gigibyte ram stores common, I was able to accommodate P4 by simply turning up the constant values until it was able to accommodate my large test programs. However, the space requirements of of P4 are something that needs to be addressed in a P5 version of the compiler.

Source

pcom.pas. The compiler program. This is my version that I have modified to make it more standard.

pint.pas. The interpreter program. This is my version that I have modified to make it more standard.

Windows version

The following programs were compiled, under IP Pascal for Windows:

pcom.exe - The compiler.

pint.exe - The interpreter.

p4.bat - A batch program to run both passes on a program.

The usage is as follows:

> p4 program

Will compile and run the program "program", which is specified without an extention. Note that you need to hit return when the program starts to produce any output. This is part of the famous problem with older Pascals that they needed to input before they can print anything (which was solved with "lazy I/O").

If you want to run the individual programs:

pcom output.p4 < input.pas

The input file should be a file such as "hello.pas", with the extention specified. The output should be the intermediate assembly file, like "hello.p4". This is where the assembly code for the virtual machine is placed, and can be displayed in a standard ASCII editor.

pint input.p4 output.txt

The input file should be the intermediate assembly file from pcom, like "hello.p4". The output is where you want output from the program to go. This is used by P4 when self compiling. This is where the output P4 machine assembly code goes.

Links

The Pascal P Compiler implementation notes. This contains the original article describing the P4 system.

Steven Pembertons' page. Steve wrote a book on the P4 compiler. His web site is good, and gives complete details of the changes he made to the code, and gives the source for the P4 project.

Pascal Implementation, by Steven Pemberton and Martin Daniels. This is the complete online text of the book, and covers all aspects of the P4 compiler.

Pascal Implementation, by Steven Pemberton and Martin Daniels. On Amazon.com. This book used to be hard to find, but now there are copies of it available for people who want to have the real book, not just the web copy.

For more information contact: Scott A. Moore samiam@moorecad.com