dcsimg
ANTLR programmer needed - coursework - implement a tiny programming language - $300
1 posts in topic
Flat View  Flat View
TOPIC ACTIONS:
 

Posted By:   xander_comptash
Posted On:   Tuesday, August 4, 2009 03:34 PM

I have to submit a piece of coursework. It's worth an annoyingly small part of the course, and I'm never going to need to use ANTLR or anything similar in the future... So instead of spending 2 weeks on it myself, I'm looking to hire someone already skilled with ANTLR to do it instead! The full page of information is included at the bottom of this (sorry, it's massively long and a bit garbled), and if you think the price ($300) is off, let me know. The money can be sent via paypal or bank transfer, or put into an escrow account or something. Reply here if interested and capable, we can talk via email. Thanks, Alex    More>>

I have to submit a piece of coursework.

It's worth an annoyingly small part of the course, and I'm never going to need to use ANTLR or anything similar in the future...



So instead of spending 2 weeks on it myself, I'm looking to hire someone already skilled with ANTLR to do it instead!



The full page of information is included at the bottom of this (sorry, it's massively long and a bit garbled), and if you think the price ($300) is off, let me know.



The money can be sent via paypal or bank transfer, or put into an escrow account or something. Reply here if interested and capable, we can talk via email.



Thanks,

Alex













##########################

Introduction

The aim of this practical coursework is to implement a tiny programming language by compiling it to assembly code for an abstract machine. The programming language, assembly language, and abstract machine are all specified below.
You are provided with an assembler and emulator for the specified abstract machine, enabling you to run and debug programs that you have compiled.

We suggest you implement the compiler in ANTLR and Java.

To make the job easier, you are also given a skeleton compiler (using ANTLR and Java) which compiles a small subset of the language; you can extend this incrementally to implement the complete language.

Several example programs in the specified programming language are supplied. These are designed to test language features separately and in combination, so that you can implement and test the language features one by one.

Programming language definition

General
No distinction is made between upper-case and lower-case letters, except within strings.

All the normal characters that are available on an English keyboard are allowed within strings or comments in the programs with the following provisos:

A string of text is enclosed in 'single quotes'; a single quote within the string is represented by two consecutive single quotes.
White space (space, tab, and newline characters) or comments can occur anywhere within the program except inside identifiers, reserved words, constants, or strings (space characters in strings are of course allowed but have no special significance). Comments are enclosed in { }. The character } is not allowed within a comment, and comments may not be nested.
Reserved words
The following are reserved words in the language and cannot be used as identifiers:

begin else end if integer program
read real repeat until var write writeln
Identifiers
These can be any length from 1 to 9 characters, the first of which must be alphabetic, and the remainder either alphabetic or numeric.

Constants
Constants may be either integer or real.

An integer constant is any sequence of decimal digits that is not a prefix of a real number.

A real constant takes the form:

real ::= integer '.' integer ( 'E' '-'? integer )?
The grammar and semantics of the language
program ::=
PROGRAM programname ';' block '.'
programname ::=
identifier
programname is an identifier, even one which is used elsewhere in the program.
block ::=
( VAR declarations )?
compoundstatement
declarations ::=
( idlist ':' type ';' )+
All identifiers in an idlist have the same type.
idlist ::=
identifier ( ',' identifier )*
type ::=
REAL
| INTEGER
These two types correspond to 32 bit real values and 32 bit signed integers.
compoundstatement ::=
BEGIN ( statement ';' )* END
statement ::=
variable ':=' expression
If the variable and the expression have the same type then the value of the expression is assigned to the variable. This also applies to the case of integer expressions and real variables. If the expression is real and the variable integer, then the value of the expression is truncated and assigned to the integer variable.
| READ '(' variable ')'
This inputs a value of the required type and assigns it to the variable specified.
| WRITE '(' expression ')'
This outputs the value of the expression in some appropriate format.
| WRITE '(' string ')'
This outputs the string, character by character, onto the output.
| WRITELN
Outputs a newline.
| IF expression relation expression
compoundstatement
( ELSE compoundstatement )?
This executes the first compoundstatement if the relation is true, or the second compoundstatement (if the ELSE part is present) if the relation is false.
| REPEAT compoundstatement
UNTIL expression relation expression
This construct executes the compoundstatement repetitively until the boolean expression becomes true. (This construct enables the language to handle 'total functions'.)
relation ::=
'>'
| '>='
| '='
| '!='
| ' <='
| ' <'
Any of these six relations (!= meaning not equals) can be applied between two expressions giving the result true if the relation holds and false otherwise. Both of the expressions involved must be of the same type.
expression ::=
unaryop term ( ( '+' | '-' ) term )*
In expressions, all of the components are of type integer or they are of type real. You cannot mix the two types in any one expression.
unaryop ::=
( '+' | '-' ) ?
Unary plus or unary minus or no string at all, which defaults to unary plus (i.e., the normal convention).
term ::=
factor ( ( '*' | '/' ) factor )*
Integer division is defined such that you always get an integer result, ignoring any fractional part.
factor ::=
variable
| constant
| '(' expression ')'
All constants are treated as unsigned (i.e., positive). Negative constants are allowed as a unary minus followed by an unsigned constant.
variable ::=
identifier
Assembly language definition

In the assembly language there is no distinction between upper-case and lower-case letters.
A program consists of a sequence of lines, as follows.

line ::=
( label ':' )? instruction? ( ';' comment )?
Every line has three optional parts: an instruction, a label, followed by ':', and a comment, preceded by ';'.
instruction ::=
opcode ( operand ( ','? operand )? ( ','? operand )? )?
An instruction is an opcode with between 0 and 3 operands, separated by spaces or commas.
operand ::=
constant
| register
| label
Each operand may be at most 10 characters long.
A constant is any (possibly signed) integer or real constant, as accepted by the "%d" or "%f" format of C. E.g., 1234 or -9.876.
A register is the letter 'R' or 'r' followed by one or more digits. E.g., R1 or r987654321.

A label is any alphanumeric string that is not a register or a constant. Every label used as an operand must be defined, by appearing as a label on some line, and no label may be defined more than once. A label refers to the instruction on the same line, or to the next instruction if there is no instruction on the same line as the label. This makes it possible for more than one label to refer to the same instruction.

The possible opcodes, and their operands, are listed in the instruction table below.


Abstract machine and instruction set

A more concrete version of our abstract machine is defined as part of a unified instruction set for several units. Differences are:
An unlimited number of registers (R0 to R999999999) are available, not only 64 (R0 to R63).
The size of numbers occurring as immediate operands is not restricted but can be any 32-bit number.
Because of the above two points, instructions cannot necessarily be encoded into 32 bits.
Instructions are stored in a separate memory from data, so programs cannot access or modify instructions.
Successive instructions are stored at addresses 0, 1, 2, etc. in instruction memory.
Instruction addresses used in the JUMP instructions are (absolute) addresses, not (relative) offsets.
An additional instruction, IADDR, provides the only way to find out an instruction's address. The sequence:
IADDR R1,L
JUMP R1
is the same as:
JMP L
The abstract machine is based on the Jouette architecture described in Chapter 9 of the Appel book, with the following differences and refinements:
The machine has an infinite number of registers (actually 1000000000 of them, because of the limited operand length). It is quite possible to use registers R101, R123456789, etc., but probably more sensible to use R0, R1, etc.
R0 is not initialized to 0 as described in the book. You need to explicitly set R0 to 0 if you want to use it this way, e.g., by XOR R0,R0,R0.
Real (floating point) versions of most instructions are included.
All integer and real numbers are 32-bit.
Memory addresses used by LOAD and STORE are byte addresses, which must be a multiple of 4. These instructions move 4 bytes at a time, regardless of type.
The new instruction WRS (to print a string) refers to the memory address of a string terminated by a 0 byte.
Memory addresses (used by LOAD, STORE, and WRS) start at 0. Memory to be accessed by these instructions must be allocated by the pseudo-instruction DATA.
The following table shows the opcodes and operands of all instructions. I means an integer constant. F means a real constant. Ri, Rj, and Rk represent registers, and L represents a label.

Instruction Effect Comments Reference
ADD Ri,Rj,Rk Ri <- Rj + Rk Integer addition Appel, p177
SUB Ri,Rj,Rk Ri <- Rj - Rk Integer subtraction Appel, p177
MUL Ri,Rj,Rk Ri <- Rj * Rk Integer multiplication Appel, p177
DIV Ri,Rj,Rk Ri <- Rj / Rk Integer division Appel, p177
XOR Ri,Rj,Rk Ri <- Rj ^ Rk Bitwise XOR New
ADDR Ri,Rj,Rk Ri <- Rj + Rk Real addition New
SUBR Ri,Rj,Rk Ri <- Rj - Rk Real subtraction New
MULR Ri,Rj,Rk Ri <- Rj * Rk Real multiplication New
DIVR Ri,Rj,Rk Ri <- Rj / Rk Real division New
ADDI Ri,Rj,I Ri <- Rj + I Integer addition: register and constant Appel, p177
SUBI Ri,Rj,I Ri <- Rj - I Integer subtraction: register and constant Appel, p177
MULI Ri,Rj,I Ri <- Rj * I Integer multiplication: register and constant New
DIVI Ri,Rj,I Ri <- Rj / I Integer division: register and constant New
XORI Ri,Rj,I Ri <- Rj ^ I Bitwise XOR: register and constant New
MOVIR Ri,F Ri <- F Real constant moved to register New
ITOR Ri,Rj Ri <- Rj Integer to real conversion (Rj is integer; Ri is real) New
RTOI Ri,Rj Ri <- Rj Real to integer conversion (Rj is real; Ri is integer) New
RD Ri Read Ri Reads integer from stdin New
RDR Ri Read Ri Reads real from stdin New
WR Ri Write Ri Writes integer to stdout New
WRR Ri Write Ri Writes real to stdout New
WRS I Write M[I]... Writes string (from address I to next 0 byte) to stdout New
LOAD Ri,Rj,I Ri <- M[Rj + I] Loads memory contents to register Appel, p177
STORE Ri,Rj,I M[Rj + I] <- Ri Stores register contents in memory Appel, p177
JMP L goto L Jumps to label L New
JUMP Ri goto Ri Jumps to the instruction whose address is stored in the register Appel, p201
IADDR Ri,L Ri <- L Store address L in the register New
BGEZ Ri,L if Ri ≥ 0 goto L If register's contents (integer) non-negative jump to L Appel, p201
BGEZR Ri,L if Ri ≥ 0 goto L If register's contents (real) non-negative jump to L New
BLTZ Ri,L if Ri < 0 goto L If register's contents (integer) negative jump to L Appel, p201
BLTZR Ri,L if Ri < 0 goto L If register's contents (real) negative jump to L New
BEQZ Ri,L if Ri = 0 goto L If register's contents (integer) zero jump to L Appel, p201
BEQZR Ri,L if Ri = 0 goto L If register's contents (real) zero jump to L New
BNEZ Ri,L if Ri ≠ 0 goto L If register's contents (integer) non-zero jump to L Appel, p201
BNEZR Ri,L if Ri ≠ 0 goto L If register's contents (real) non-zero jump to L New
NOP No operation New
HALT Stop execution New
DATA I A pseudo-instruction. Used by the assembler to allocate one byte in data memory initialized to the value I (in range 0..255). New
The assembler and emulator

ASS/MULE (ASSembler and eMUlator for Language Engineering) comprises an assembler for the assembly language described above and an emulator that executes the assembled program. The assembler and emulator are combined in a single program, which is available in executable form for Linux or for Solaris. Remember to make the file executable, by the command:
chmod u+x assmule
To assemble and execute a file test.ass, run the command
assmule test.ass
There are various options available; to see a list of them, run the command with no filename.
Both the assembler and emulator detect as many errors as possible. The assembler checks badly formed programs, illegal opcodes, bad operands, undefined and duplicated labels, etc. The emulator checks for accesses to unallocated memory addresses and control reaching nonexistent instructions.

One of the options ('-y') turns on type-checking mode in the emulator. In this mode, the emulator will keep track of the type of the contents of each register and memory location and print error messages if the program uses a value that has not yet been defined or is of an incorrect type. This can help to detect more errors. Also, in type-checking mode, the tracing facility can print all operands in the correct type.

Compiler structure

The suggested structure of your compiler (CAMLE: Compiler to Abstract Machine for Language Engineering) is a pipeline of five phases, all of which could be implemented in ANTLR; for example:
lex.g Lexical analysis An ANTLR Lexer class that produces a stream of tokens.
syn.g Syntax analysis An ANTLR Parser class that produces an abstract syntax tree (an ANTLR AST).
sem.g Semantic analysis An ANTLR TreeParser class that transforms an abstract syntax tree (an ANTLR AST) to another abstract syntax tree (ANTLR AST).
irt.g IR tree construction An ANTLR TreeParser class that transforms an abstract syntax tree (an ANTLR AST) to an IR tree (also an ANTLR AST).
cg.g Code generation An ANTLR TreeParser class that consumes an IR tree (an ANTLR AST) and prints assembly code to a file.
If you prefer, you can replace the ANTLR components sem.g, irt.g, and cg.g by Java code that does the same job.

You can try compiling and running this skeleton compiler. It consists of each of the five files listed above, some Java source files, and a main program, camle.java that links them all together. This compiler is for a very simple subset of the specified programming language that includes only writeln statements and write statements with string parameters.

To compile the compiler:

antlr lex.g
antlr syn.g
antlr sem.g
antlr irt.g
antlr cg.g
antlr *.java
To run the compiler on the supplied example program, hello.le:
antlr camle hello.le
The normal behaviour of the compiler is to generate assembly code in a file ending in ".ass", e.g., hello.ass. Alternatively, if one of the options -lex, -syn, -sem, -irt is included before the filename, the compiler will instead display (to stdout) the output of the specified phase and then stop. E.g., to display the IR tree generated:
antlr camle -irt hello.le
To help understand the structure of the trees displayed by ANTLR, you can pipe them through the disptree program. E.g.:
antlr camle -syn test0.le | disptree
antlr camle -irt hello.le | disptree
The assignment

This assignment has four parts, described below. The structure of the whole assignment is shown here:
Part 1 Lexical lex.g Whole language
Part 2 Syntax syn.g Whole language
Part 3 Rest sem.g IC RC OP IV RV W A R C L
irt.g
cg.g
Part 4 Extras Whole language
A set of test programs is provided, to enable you to test your compiler and demonstrate that it works.

Part 1: Lexical analysis

Implement the lexical analysis phase, lex.g, for the complete programming language. You can assume that programs contain only valid tokens (i.e., no error handling is needed at this stage).
Test your compiler on the test program test0.le, which illustrates all types of token of the language.

You can proceed to Part 2 even if Part 1 is not fully working. However, Part 1 will need to work before Part 2 can be completed.

Part 2: Syntax analysis

Implement the syntax analysis phase, syn.g, for the complete programming language. You can assume that programs contain only valid syntax (i.e., no error handling is needed at this stage).
Test your compiler on the test program test0.le, which illustrates all syntactic constructs of the language.

You can proceed to Part 3 even if Part 2 is not fully working. However, Part 2 will need to work before Part 3 can be completed.

Part 3: Semantic analysis and code generation

Part 3 comprises all remaining phases of the compiler after syntax analysis. This part is broken down by language feature: you should implement each language feature in turn, by extending sem.g, irt.g, and cg.g to handle that language feature. This requires repeatedly extending these three modules to implement the whole language. When you add a new feature, you might need to modify your existing implementation, even of the lex.g and syn.g modules. You can assume that programs are correct (i.e., no error handling is needed at this stage).
There are ten language features: five kinds of expression and five kinds of statement, as follows:

Description Prerequisites Test program(s)
IC Integer constants W test1.le
RC Real constants IC test2.le
OP Arithmetic operators (+, -, *, /) RC test3.le
IV Integer variables RC, A test4.le
RV Real variables IV test5.le
W write statements IC test1.le
A Assignment statements IV test4.le
R read statements IV test6.le
C Conditional statements RC test7.le
L repeat loops RC test8.le
You can implement the language features in any order, subject to certain precedence constraints, which are intended to make the compiler easier to develop and test. For each language feature, the Prerequisites column shows the features that must be implemented before it (or at the same time). Another way to view the prerequisites is as the following tree:


Your program must correctly compile each test program in order to get any marks for the corresponding language feature. (This does not necessarily mean that you will get full marks for correctly compiling the given test program.)
If you successfully implement all language features and correctly compile test programs test1.le through test8.le, you can then get extra marks by correctly compiling testa.le.

You must complete Part 3, including testa.le, before proceeding to Part 4.

Part 4: Extras

If you have spare time after completing Parts 1-3, you can get extra marks for the following:
Error handling:
Detect and report errors: invalid tokens, syntax errors, type errors, etc. All error detection must be done in the lex.g, syn.g, and sem.g modules.
Efficiency:
Perform any optimizations that reduce the program size, number of registers used, or execution time (number of instructions executed). These statistics are reported by assmule after program execution. Compile the supplied test programs testa.le, testb.le, testc.le, testd.le, teste.le, and testf.le, and run them with the recommended input data. Statistics from an unoptimized implementation are shown below:
Example # instructions generated # registers used # instructions executed
testa.le 150 85 2158308
testb.le 45 23 105549
testc.le 79 41 755
testd.le 62 31 589
teste.le 38 16 180
testf.le 64 26 255
General excellence:
Anything else.
Assessment

This assignment counts for 50% of the marks for the Language Engineering unit. It will be marked as follows:
Part Subpart Example Mark
Part 1 Lexical 15%
Part 2 Syntax 15%
Part 3 Rest IC test1.le 4% 45%
RC test2.le 4%
OP test3.le 4%
IV test4.le 4%
RV test5.le 4%
W test1.le 4%
A test4.le 4%
R test6.le 4%
C test7.le 4%
L test8.le 4%
testa.le testa.le 5%
Part 4 Extras Error handling 10% 25%
Efficiency 10%
Excellence 5%
TOTAL 100%
Submission

Part 1

Submit your lexical analysis module (component lex) by February 13, 2009 (week 15). You need to both show the output that your program produces and allow us to reproduce this output by compiling and running your program. The files you need to submit are:
All source code files (e.g., lex.g).
report.txt: a listing of the output that your program produces using the command:
antlr camle -lex test0.le
Part 2

Submit your syntax analysis module (component syn) by February 23, 2009 (week 17). You need to both show the output that your program produces and allow us to reproduce this output by compiling and running your program. The files you need to submit are:
All source code files (e.g., lex.g and syn.g).
report.txt: a listing of the output that your program produces using the commands:
antlr camle -syn test0.le
Part 3 (interim)

You should have implemented the IC and W features of the language by March 16, 2009 (week 20). Submit only the file test1.ass: the assembly code produced by your compiler from the test program test1.le. Use the interim component to submit.
Parts 3-4 (final)

Submit your complete compiler (component final) by April 28, 2009 (week 22). All source code files must be submitted, including the lex.g and syn.g modules, which may or may not be the same as in your lex and syn submissions. Again, you need to show the output that your compiler produces and allow us to reproduce the output by compiling and running your program. The files that you must submit are:
All source code files (in ANTLR, Java, or other languages) used by your compiler.
Files test1.ass through test8.ass and testa.ass: the assembly code produced by your compiler using the command:
antlr camle testi.le
for each of the test programs test1.le through test8.le and testa.le that you have successfully compiled.
report.txt: a report containing:
A statement of:
which of the 10 language features of Part 3 you have successfully implemented;
whether you have attempted Part 4.
An explanation of how to compile and run your compiler.
A listing of the output produced by the emulator using the command:
assmule testi.ass
for each of the test programs test1.le through test8.le and testa.le that you have successfully compiled.
If claiming marks for Part 4, describe:
What error checking you have implemented.
What optimizations your compiler performs, and how they improve its efficiency, including a listing of the output produced by the emulator using the command
assmule testi.ass
for each of the test programs testa.le through testg.le that you have successfully compiled.
Any other unusually good features of your compiler.

   <<Less

Re: ANTLR programmer needed - coursework - implement a tiny programming language - $300

Posted By:   Anonymous  
Posted On:   Wednesday, September 16, 2009 12:11 PM

Email me at CosineofTheta@gmail.com
About | Sitemap | Contact