Skip to content

Coursera Compilers Assignments Definition

>> On our next homework assignment, we're going to implement a small programming

language and so, I want to in this segment, start explaining in general how

one goes about implementing a programming language.

And then, more specifically, zoom in on the particular simplified setup and useful

setup we'll use on our homework assignment.

So here's a typical workflow for an implementation of a programming language.

Think of it as first taking in some string, which is the text of the program

someone writes down. It would be in a file, of course, but we

could read that file in, and we can think of the contents as being in a string and

we'll call that string the concrete syntax.

It's the syntax the programmer actually wrote down.

The first typical stage is to pass that syntax to the parsing procedure, the

parser. And as we've all have seen if we

programmed, we often error messages. It's the job of the parser to give us

syntax error messages. Things like a parenthesis in the wrong

place, or using a keyword in the wrong position, or something like that.

But if we don't have any syntax errors, The output of the parser is what is called

an Abstract Syntax Tree, or AST. So it's still syntax, but it's in a much

more structured form than what the programmer wrote down.

It is a tree that shows what concerts in the language are being used and how.

So in this example, we would say that we have a function call, function calls would

have say, two children in this case the first expression and the second.

The second one here says oh, it's a constant and the particular constant is 4.

The function has an argument x and then it's body.

It's body is an addition expression. The two sub-expressions of the addition

are both variables with x and we build this entire tree that gives us the

structure of the program that we are suppose to implement.

At that point your language might have some sort of type checker that runs over

this tree giving it additional error messages like if instead of having a

number that would pass this function, we tried to pass it another function it would

say well that's parses. I can create an abstract syntax tree from

that, but I get a type error message. And if I don't have of those, then I pass

this AST onto the rest of the implementation, which is in charge of

running the program and giving an answer. So now, let's talk about the rest of that

implementation. I would say that there are basically two

fundamental approaches to implementing some programming language that I'll call B

on this slide. If I want to implement B, one approach is

I can write an interpreter in another language A.

Right? My interpreter has itself to be a computer

program so I'll use the language A to write an interpreter for B.

Now interpreter is not a great name. A better name would probably be something

like evaluator or executor or something but okay, everyone calls it an interpreter

so we will as well. And the idea is to just take that syntax

tree in B and come up with what the answer would be if you ran it.

Maybe on some input that's provided when you run the program.

The second approach is to use a compiler. So the idea with a compiler is, it is

itself of course written in another language A.

And it produces a program in a third language C and then we just take that

program and we run it. So compiler is also a terrible name.

A better name would be translator but its been called a compiler for many decades

and so we will call it that too. And the key thing you have to do is take a

program in B and produce a program in C that's equivalent, that has the same

behavior, and then you have to rely on there already being an implementation for

C. So, if C is binary code, the kind of code

that your computer hardware can run directly, then you can think of the

computer hardware as an implementation of C, and so your implementation of B just

translates to a program in C and then, you run it on the hardware.

Okay? So, this language A that we use to

implement our language B, we'll sometimes called the metalanguage.

And, the key thing we have to do in this section of the course and on your homework

is in our heads keep straight what's A and what's B.

What is something in the language we are using to implement a language and what is

in the language that's being implemented. I should mention that by the way the

reality on implementing languages is certainly more complicated.

It's not just you have an interpreter or you have a compiler.

Modern language implementations tend to do combinations of these two fundamental

ideas. For example most implementations of Java

start with a compiler that translates your program but not all the way down to

binary, just to some intermediate language, it's often called bite code.

Then there's an interpreter for that bytecode language, but that can be a

little slow and so, that interpreter includes a compiler for compiling down to

hardware. And then, most people would think well

once, once you're in binary, I mean, then it's just interpreted by the chip but in

fact modern chips at least for the x86 say well, actually we don't want to interpret

quite those instructions, so when the programs running we'll actually translate

things internally in the chip and into even smaller instructions and then

interpret those. So the point is there's two fundamental

ideas that can be combined in interesting ways.

And Racket, the implementation of Racket has a similar mix, this is nothing

specific to Java but the ideas are you have an interpreter or you have a The

compiler. One other thing I have to mention is that

a language is defined by written rules for what are the semantics of the different

concerts in your language. Whether it is implemented with a compiler

or an interpreter or some combination there of is an implementation detail.

And so once you understand that, it should be obvious that there is no such thing as

a compiled language or an interpreted language.

There are implementations of a language that use an interpreter or implementations

that use a compiler. Unfortunately for decades people have

confused this notion and you will often hear such phrases like C is faster than

LISP because C is a compiled language and LISP is an interpreted language.

And I wish to emphasize that this really does not make very much sense and I for

one I like to politely correct people when they repeat these, these old sayings that

are just by definition not just wrong but don't really make sense.

Now I will add there is a way that they make sense.

At the end of the section I will talk a little bit about how languages like racket

have an eval construct. When you have an eval construct in your

language your you need to have a language implementation that's still around at

runtime when the program is running because you may need to implement other

programs in the language. But there's no reason why eval can't be

implemented with an in, a compiler, just as well as an interpreter.

Okay, so that's the end of that sermon. Let me now go back to this work flow and

tell you that on your homework assignment, we're not going to do the parser and we're

not going to do the type checker. We're just going to do an interpreter, and

I want to emphasize to you how you don't really need a parser if you set things up

correctly in how you implement one language inside of another one.

So let me show you how we can actually skip the parsing step.

Suppose we wanted to implement programming language B in programming language A.

What we could do, if we didn't want to write a parser, is have programmers

writing programs in B, write the ASTs. The trees, directly in programming

language A. So they don't write strings that we then

convert to the abstract syntax trees. They just write the abstract syntax trees.

And if our programming language A is Racket.

And we give B language programmers structs, this is really not so bad.

What they're doing is they're writing down racket data structures that are exactly

the AST of it represents a program in the language we're trying to implement.

So for example maybe you would have some struct definitions like call and function

and var, and if a programmer came along and just used these Racket constructors in

this way, what they're essentially writing down is this picture on the left.

They are writing down, exactly, this tree. You can actually read it.

You just have to turn it sideways. You have call with two subexpressions, two

subtrees. A function that has an argument list and

an then an addition. On the other side sorry, we shouldn't have

a negation here. It should just be a const.

But if they did want to negate and come up with negative 4, they could have that and

so on. So, this is how we can get programmers to

not give us strings but to give us trees directly.

We just have them write down their programs inside a bracket using

constructors. So, this is already what we were doing

with our arithmetic expression example in the previous segments.

Think of the expressions built out of cost negate add and multiply as forming a

little programming language. Let's call it the arithmetic language,

alright? So, we are using Racket as our meta

language. That's our language A.

Our language B that we want to implement, we'll call the arithmetic language.

People writing programs in the arithmetic language just use the Racket constructors

to write down the abstract syntax tree. And then our language implementation is

the eval-x program. This is an interpreter.

This is exactly what interpreters do. They take a program, which is written down

in racket using these trees, and they produce an answer in the language.

And so we have already implemented a programming language.

We have, in this language constants, negations, additions, and multiplies.

The program itself is a Racket data structure, but it is a program in a

language and the implementation of that language is the eval-exp function.

So now all we have to do to get ready for our homework is learn how to implement

more interesting language constructs then just negations, additions, and multiplies,

but we have the framework that we need.

Now, I think that you will agree with me that most high-level programmers

tend to take compilers for granted.

They write the programs, they apply the programs to the source code, and they

happily sit back and observe how the code is being translated into machine language.

They don't see the actual process, they just see the result.

Actually, they don't even see the result, they just execute it and

then they go on to debug their source code.

Now, this bliss of ignorance is just fine, however,

I think that you will also agree with me that translating a program from

high-level to machine code is nothing short of magic.

And if you don't want to understand this magic,

then I think that you're going to deny yourself from two important benefits.

First of all, you will not appreciate the beauty and

the wisdom that goes into the algorithms and

the technique that come to play in writing a compiler.

And second of all,

you will lose the opportunity to become a much more competent high level programmer.

Because in the process of writing a compiler,

you're going to gain lots of very important capabilities that go far

beyond the narrow context of writing a compiler for high level language.

So, with that, let us go back to the overall roadmap.

That's what we want to do.

We want to write a compiler that translates

programs from the Jack language into machine language.

And, in this course, we decided to write a two-tier

compiler just like the compilers of Java and C#.

So, instead of going all the way to machine code,

the compiler will translate from Jack into VM code.

And then there will be a VM translator that will translate further into machine

language.

Now the good news is that the VM translator has already been taken care of.

We wrote in the previous two models.

So now all we have to do, quote, unquote, all we have to do,

is write a compiler that translates from the high level to the VM code only.

Now, we decided to split the construction of this compiler into two

well defined modules, syntax analyzer and a code generator.

Now, the syntax analyzer is something that

we already developed in the previous modules.

And in order to test it, we decided that the syntax analyzer will emit,

not VM code, but rather, XML code.

And the purpose of this exercise was just to verify that the parser

understands the source code.

That it understands the syntactic structure of the commands.

And is capable of demonstrating this understanding by

producing legible XML code.

In this project, the XML code is no longer relevant, right?

Because in this module, we're going to focus on a code

generator which is going to translate into VM code.

So, what we're going to do is,

we're going to focus on the parser and the code generator modules.

We will rewrite certain elements from the parser that are responsible for

generating code.

We'll write some more code generation capabilities.

And the result will be a compiler that goes all the way from

Jack programs to VM code.

So the objective, once again,

is to develop a full-scale compiler, or more accurately,

to extend the compiler that we've written before into a full-scale compiler.

So instead of generating passive XML code, we will morph the compilation

engine that we had before into something that generates executable VM code.

So this is the objective and that's how we're going to do it.

Now, when we say program compilation, in this course,

we talk about a language which is high-level and object oriented.

The Jack language.

And here's an example of some code segment that you've seen before,

that is designed to manipulate some points on a two dimensional space.

If you compile and execute this code,

you'll end up seeing some nice results on the screen.

And as we discussed before, in order for this code to work properly,

it has to interact with yet another class file that delivers

the point functionality, so to speak, the point class file.

Now, when you set out to develop something complicated like a compiler,

it always helps to try to simplify matters as much as possible.

So I'm going to make some simplification observations.

And the first one is the following, each class file is compiled separately.

Each Jack class file, just like each Java class file,

is a separate and standalone compilation unit.

So, the overall task of compiling a multi-class program has been reduced

to the more sort of limited task of compiling one class at a time.

So that's my first observation.

And the second one is based on going into the class itself.

So here's an example of the class point.

And what I've listed here is some class-level declaration statements,

and then all the methods, or all the subroutines, of the point class.

And for lack of space I didn't list the code of all the subroutines.

I just listed their signatures, except for one subroutine,

distance, in which I took the trouble to show the entire code.

So here is my second simplifying assumption.

First of all, let us observe that the class consists of two main modules.

First of all,

we have some preamble code that defines the class in some class level variables.

And then we have zero or more subroutine called declarations, right?

And in the case of this class, here's the code which is at the class level.

And here's an example of the code of one particular subroutine.

Now, the simplifying assumption is such that compilation can once

again be viewed as two separate processes when you go down to the class level.

First, you compile the class level code that we see here on the top.

And then you compile each subroutine one at a time.

So, to a great extent,

these two compilation tasks are relatively separate and standalone.

And, therefore, once again, the overall and

rather formidable past of writing a compiler for

multi-class program has been reduced to compiling one subroutine at a time.

So, once again, things are localized and

we're going to use a very modular strategy.

And as usual, we're going to do things one step at a time.

All right, now, what do you see when you compile a subroutine?

You see variables, expressions, if, while,

statements like those, you can see objects and obviously you can see arrays.

So, the compilation challenges that we're going to handle

are what you see here on the screen.

And these challenges are also sort of the table

of contents of the next units in this module.

In every unit, we'll deal with one of these challenges one at a time.

Now, the challenge when I say to deal with them is

to generate VM code that captures the semantics or

variables, expressions and so on, using VM commands.

Now this is not a trivial task because high-level languages today,

including Jack, are very sophisticated.

They give you all these fancy control structures and objects and arrays.

And yet, the target language, the VM language is very simple and limited.

It has only push, pop commands and a few more commands, and that's it.

So, we are facing the challenge of bridging the gap between

the high-level and the VM code.

And we have to somehow express this very rich semantics that you see here

in front of you using the very limited VM language.

But as you will see, we will do it in a surprisingly

elegant and relatively simple fashion.

All right, before we wrap up this module,

I'd like to list some of the lessons that you're going to take from it.

So, first of all you will learn how programming languages work.

You will learn how to implement a simple procedural language like C,

because you will know how to handle variables, expressions, flow of control.

You will also learn to handle arrays and objects in the low level.

And by doing this, once again,

you will gain a very deep understanding of the tools of the trade.

And secondly, you will learn some general purpose techniques which

can be applicable to numerous other problems and applications.

You will strengthen your understanding of parsing,

something that we've dealt with extensively in the previous module.

You will understand how recursive compilation works.

You will learn how to generate code, how to create in the user symbol tables.

And you will begin to learn how to manage memory cleverly.

Something that we will take up also when we discuss the operating system

in the last module of this course.

So all these goodies are coming up in the current module.

So, in closing,

these are the challenges that we have to deal with starting with variables.

So that's what we'll do in the next unit.