>> On our next homework assignment, we're going to implement a small programming
language and so, I want to in this segment, start explaining in general how
one goes about implementing a programming language.
And then, more specifically, zoom in on the particular simplified setup and useful
setup we'll use on our homework assignment.
So here's a typical workflow for an implementation of a programming language.
Think of it as first taking in some string, which is the text of the program
someone writes down. It would be in a file, of course, but we
could read that file in, and we can think of the contents as being in a string and
we'll call that string the concrete syntax.
It's the syntax the programmer actually wrote down.
The first typical stage is to pass that syntax to the parsing procedure, the
parser. And as we've all have seen if we
programmed, we often error messages. It's the job of the parser to give us
syntax error messages. Things like a parenthesis in the wrong
place, or using a keyword in the wrong position, or something like that.
But if we don't have any syntax errors, The output of the parser is what is called
an Abstract Syntax Tree, or AST. So it's still syntax, but it's in a much
more structured form than what the programmer wrote down.
It is a tree that shows what concerts in the language are being used and how.
So in this example, we would say that we have a function call, function calls would
have say, two children in this case the first expression and the second.
The second one here says oh, it's a constant and the particular constant is 4.
The function has an argument x and then it's body.
It's body is an addition expression. The two sub-expressions of the addition
are both variables with x and we build this entire tree that gives us the
structure of the program that we are suppose to implement.
At that point your language might have some sort of type checker that runs over
this tree giving it additional error messages like if instead of having a
number that would pass this function, we tried to pass it another function it would
say well that's parses. I can create an abstract syntax tree from
that, but I get a type error message. And if I don't have of those, then I pass
this AST onto the rest of the implementation, which is in charge of
running the program and giving an answer. So now, let's talk about the rest of that
implementation. I would say that there are basically two
fundamental approaches to implementing some programming language that I'll call B
on this slide. If I want to implement B, one approach is
I can write an interpreter in another language A.
Right? My interpreter has itself to be a computer
program so I'll use the language A to write an interpreter for B.
Now interpreter is not a great name. A better name would probably be something
like evaluator or executor or something but okay, everyone calls it an interpreter
so we will as well. And the idea is to just take that syntax
tree in B and come up with what the answer would be if you ran it.
Maybe on some input that's provided when you run the program.
The second approach is to use a compiler. So the idea with a compiler is, it is
itself of course written in another language A.
And it produces a program in a third language C and then we just take that
program and we run it. So compiler is also a terrible name.
A better name would be translator but its been called a compiler for many decades
and so we will call it that too. And the key thing you have to do is take a
program in B and produce a program in C that's equivalent, that has the same
behavior, and then you have to rely on there already being an implementation for
C. So, if C is binary code, the kind of code
that your computer hardware can run directly, then you can think of the
computer hardware as an implementation of C, and so your implementation of B just
translates to a program in C and then, you run it on the hardware.
Okay? So, this language A that we use to
implement our language B, we'll sometimes called the metalanguage.
And, the key thing we have to do in this section of the course and on your homework
is in our heads keep straight what's A and what's B.
What is something in the language we are using to implement a language and what is
in the language that's being implemented. I should mention that by the way the
reality on implementing languages is certainly more complicated.
It's not just you have an interpreter or you have a compiler.
Modern language implementations tend to do combinations of these two fundamental
ideas. For example most implementations of Java
start with a compiler that translates your program but not all the way down to
binary, just to some intermediate language, it's often called bite code.
Then there's an interpreter for that bytecode language, but that can be a
little slow and so, that interpreter includes a compiler for compiling down to
hardware. And then, most people would think well
once, once you're in binary, I mean, then it's just interpreted by the chip but in
fact modern chips at least for the x86 say well, actually we don't want to interpret
quite those instructions, so when the programs running we'll actually translate
things internally in the chip and into even smaller instructions and then
interpret those. So the point is there's two fundamental
ideas that can be combined in interesting ways.
And Racket, the implementation of Racket has a similar mix, this is nothing
specific to Java but the ideas are you have an interpreter or you have a The
compiler. One other thing I have to mention is that
a language is defined by written rules for what are the semantics of the different
concerts in your language. Whether it is implemented with a compiler
or an interpreter or some combination there of is an implementation detail.
And so once you understand that, it should be obvious that there is no such thing as
a compiled language or an interpreted language.
There are implementations of a language that use an interpreter or implementations
that use a compiler. Unfortunately for decades people have
confused this notion and you will often hear such phrases like C is faster than
LISP because C is a compiled language and LISP is an interpreted language.
And I wish to emphasize that this really does not make very much sense and I for
one I like to politely correct people when they repeat these, these old sayings that
are just by definition not just wrong but don't really make sense.
Now I will add there is a way that they make sense.
At the end of the section I will talk a little bit about how languages like racket
have an eval construct. When you have an eval construct in your
language your you need to have a language implementation that's still around at
runtime when the program is running because you may need to implement other
programs in the language. But there's no reason why eval can't be
implemented with an in, a compiler, just as well as an interpreter.
Okay, so that's the end of that sermon. Let me now go back to this work flow and
tell you that on your homework assignment, we're not going to do the parser and we're
not going to do the type checker. We're just going to do an interpreter, and
I want to emphasize to you how you don't really need a parser if you set things up
correctly in how you implement one language inside of another one.
So let me show you how we can actually skip the parsing step.
Suppose we wanted to implement programming language B in programming language A.
What we could do, if we didn't want to write a parser, is have programmers
writing programs in B, write the ASTs. The trees, directly in programming
language A. So they don't write strings that we then
convert to the abstract syntax trees. They just write the abstract syntax trees.
And if our programming language A is Racket.
And we give B language programmers structs, this is really not so bad.
What they're doing is they're writing down racket data structures that are exactly
the AST of it represents a program in the language we're trying to implement.
So for example maybe you would have some struct definitions like call and function
and var, and if a programmer came along and just used these Racket constructors in
this way, what they're essentially writing down is this picture on the left.
They are writing down, exactly, this tree. You can actually read it.
You just have to turn it sideways. You have call with two subexpressions, two
subtrees. A function that has an argument list and
an then an addition. On the other side sorry, we shouldn't have
a negation here. It should just be a const.
But if they did want to negate and come up with negative 4, they could have that and
so on. So, this is how we can get programmers to
not give us strings but to give us trees directly.
We just have them write down their programs inside a bracket using
constructors. So, this is already what we were doing
with our arithmetic expression example in the previous segments.
Think of the expressions built out of cost negate add and multiply as forming a
little programming language. Let's call it the arithmetic language,
alright? So, we are using Racket as our meta
language. That's our language A.
Our language B that we want to implement, we'll call the arithmetic language.
People writing programs in the arithmetic language just use the Racket constructors
to write down the abstract syntax tree. And then our language implementation is
the eval-x program. This is an interpreter.
This is exactly what interpreters do. They take a program, which is written down
in racket using these trees, and they produce an answer in the language.
And so we have already implemented a programming language.
We have, in this language constants, negations, additions, and multiplies.
The program itself is a Racket data structure, but it is a program in a
language and the implementation of that language is the eval-exp function.
So now all we have to do to get ready for our homework is learn how to implement
more interesting language constructs then just negations, additions, and multiplies,
but we have the framework that we need.
Now, I think that you will agree with me that most high-level programmers
tend to take compilers for granted.
They write the programs, they apply the programs to the source code, and they
happily sit back and observe how the code is being translated into machine language.
They don't see the actual process, they just see the result.
Actually, they don't even see the result, they just execute it and
then they go on to debug their source code.
Now, this bliss of ignorance is just fine, however,
I think that you will also agree with me that translating a program from
high-level to machine code is nothing short of magic.
And if you don't want to understand this magic,
then I think that you're going to deny yourself from two important benefits.
First of all, you will not appreciate the beauty and
the wisdom that goes into the algorithms and
the technique that come to play in writing a compiler.
And second of all,
you will lose the opportunity to become a much more competent high level programmer.
Because in the process of writing a compiler,
you're going to gain lots of very important capabilities that go far
beyond the narrow context of writing a compiler for high level language.
So, with that, let us go back to the overall roadmap.
That's what we want to do.
We want to write a compiler that translates
programs from the Jack language into machine language.
And, in this course, we decided to write a two-tier
compiler just like the compilers of Java and C#.
So, instead of going all the way to machine code,
the compiler will translate from Jack into VM code.
And then there will be a VM translator that will translate further into machine
Now the good news is that the VM translator has already been taken care of.
We wrote in the previous two models.
So now all we have to do, quote, unquote, all we have to do,
is write a compiler that translates from the high level to the VM code only.
Now, we decided to split the construction of this compiler into two
well defined modules, syntax analyzer and a code generator.
Now, the syntax analyzer is something that
we already developed in the previous modules.
And in order to test it, we decided that the syntax analyzer will emit,
not VM code, but rather, XML code.
And the purpose of this exercise was just to verify that the parser
understands the source code.
That it understands the syntactic structure of the commands.
And is capable of demonstrating this understanding by
producing legible XML code.
In this project, the XML code is no longer relevant, right?
Because in this module, we're going to focus on a code
generator which is going to translate into VM code.
So, what we're going to do is,
we're going to focus on the parser and the code generator modules.
We will rewrite certain elements from the parser that are responsible for
We'll write some more code generation capabilities.
And the result will be a compiler that goes all the way from
Jack programs to VM code.
So the objective, once again,
is to develop a full-scale compiler, or more accurately,
to extend the compiler that we've written before into a full-scale compiler.
So instead of generating passive XML code, we will morph the compilation
engine that we had before into something that generates executable VM code.
So this is the objective and that's how we're going to do it.
Now, when we say program compilation, in this course,
we talk about a language which is high-level and object oriented.
The Jack language.
And here's an example of some code segment that you've seen before,
that is designed to manipulate some points on a two dimensional space.
If you compile and execute this code,
you'll end up seeing some nice results on the screen.
And as we discussed before, in order for this code to work properly,
it has to interact with yet another class file that delivers
the point functionality, so to speak, the point class file.
Now, when you set out to develop something complicated like a compiler,
it always helps to try to simplify matters as much as possible.
So I'm going to make some simplification observations.
And the first one is the following, each class file is compiled separately.
Each Jack class file, just like each Java class file,
is a separate and standalone compilation unit.
So, the overall task of compiling a multi-class program has been reduced
to the more sort of limited task of compiling one class at a time.
So that's my first observation.
And the second one is based on going into the class itself.
So here's an example of the class point.
And what I've listed here is some class-level declaration statements,
and then all the methods, or all the subroutines, of the point class.
And for lack of space I didn't list the code of all the subroutines.
I just listed their signatures, except for one subroutine,
distance, in which I took the trouble to show the entire code.
So here is my second simplifying assumption.
First of all, let us observe that the class consists of two main modules.
First of all,
we have some preamble code that defines the class in some class level variables.
And then we have zero or more subroutine called declarations, right?
And in the case of this class, here's the code which is at the class level.
And here's an example of the code of one particular subroutine.
Now, the simplifying assumption is such that compilation can once
again be viewed as two separate processes when you go down to the class level.
First, you compile the class level code that we see here on the top.
And then you compile each subroutine one at a time.
So, to a great extent,
these two compilation tasks are relatively separate and standalone.
And, therefore, once again, the overall and
rather formidable past of writing a compiler for
multi-class program has been reduced to compiling one subroutine at a time.
So, once again, things are localized and
we're going to use a very modular strategy.
And as usual, we're going to do things one step at a time.
All right, now, what do you see when you compile a subroutine?
You see variables, expressions, if, while,
statements like those, you can see objects and obviously you can see arrays.
So, the compilation challenges that we're going to handle
are what you see here on the screen.
And these challenges are also sort of the table
of contents of the next units in this module.
In every unit, we'll deal with one of these challenges one at a time.
Now, the challenge when I say to deal with them is
to generate VM code that captures the semantics or
variables, expressions and so on, using VM commands.
Now this is not a trivial task because high-level languages today,
including Jack, are very sophisticated.
They give you all these fancy control structures and objects and arrays.
And yet, the target language, the VM language is very simple and limited.
It has only push, pop commands and a few more commands, and that's it.
So, we are facing the challenge of bridging the gap between
the high-level and the VM code.
And we have to somehow express this very rich semantics that you see here
in front of you using the very limited VM language.
But as you will see, we will do it in a surprisingly
elegant and relatively simple fashion.
All right, before we wrap up this module,
I'd like to list some of the lessons that you're going to take from it.
So, first of all you will learn how programming languages work.
You will learn how to implement a simple procedural language like C,
because you will know how to handle variables, expressions, flow of control.
You will also learn to handle arrays and objects in the low level.
And by doing this, once again,
you will gain a very deep understanding of the tools of the trade.
And secondly, you will learn some general purpose techniques which
can be applicable to numerous other problems and applications.
You will strengthen your understanding of parsing,
something that we've dealt with extensively in the previous module.
You will understand how recursive compilation works.
You will learn how to generate code, how to create in the user symbol tables.
And you will begin to learn how to manage memory cleverly.
Something that we will take up also when we discuss the operating system
in the last module of this course.
So all these goodies are coming up in the current module.
So, in closing,
these are the challenges that we have to deal with starting with variables.
So that's what we'll do in the next unit.