|
|
|
@ -0,0 +1,585 @@
|
|
|
|
|
# Medium Intermediate Representation (file mir.h)
|
|
|
|
|
* This document describes MIR itself, API for its creation, and MIR textual representation
|
|
|
|
|
* MIR textual representation is assembler like. Each directive or insn should be put on a separate line
|
|
|
|
|
* In MIR textual syntax we use
|
|
|
|
|
* `[]` for optional construction
|
|
|
|
|
* `{}` for repeating zero or more times
|
|
|
|
|
* `<>` for some informal construction description or construction already described or will be described
|
|
|
|
|
|
|
|
|
|
## MIR context
|
|
|
|
|
* MIR API code has an implicit state called by MIR context
|
|
|
|
|
* MIR context is represented by data of `MIR_context_t`
|
|
|
|
|
* MIR context is created by function `MIR_context_t MIR_init (void)`
|
|
|
|
|
* Every MIR API function (except for `MIR_init`) requires MIR context passed through the first argument of type `MIR_context_t`
|
|
|
|
|
* You can use MIR functions in different threads without any synchronization
|
|
|
|
|
if they work with different contexts in each thread
|
|
|
|
|
|
|
|
|
|
## MIR program
|
|
|
|
|
* MIR program consists of MIR **modules**
|
|
|
|
|
* To start work with MIR program, you should first call API function `MIR_init`
|
|
|
|
|
* API function `MIR_finish (MIR_context_t ctx)` should be called last. It frees all internal data used to work with MIR program and all IR (insns, functions, items, and modules) created in this context
|
|
|
|
|
* API function `MIR_output (MIR_context_t ctx, FILE *f)` outputs MIR textual representation of the program into given file
|
|
|
|
|
* API function `MIR_scan_string (MIR_context_t ctx, const char *str)` reads textual MIR representation given by a string
|
|
|
|
|
* API functions `MIR_write (MIR_context_t ctx, FILE *f)` and
|
|
|
|
|
`MIR_read (MIR_context_t ctx, FILE *f)` outputs and reads
|
|
|
|
|
**binary MIR representation** to/from given file. There are also
|
|
|
|
|
functions `MIR_write_with_func (MIR_context_t ctx, const int
|
|
|
|
|
(*writer_func) (MIR_context_t, uint8_t))` and `MIR_read_with_func
|
|
|
|
|
(MIR_context_t ctx, const int (*reader_func) (MIR_context_t))` to
|
|
|
|
|
output and read **binary MIR representation** through a function
|
|
|
|
|
given as an argument. The reader function should return EOF as
|
|
|
|
|
the end of the binary MIR representation, the writer function
|
|
|
|
|
should be return the number of successfully output bytes
|
|
|
|
|
* Binary MIR representation much more compact and faster to read than textual one
|
|
|
|
|
|
|
|
|
|
## MIR data type
|
|
|
|
|
* MIR program works with the following **data types**:
|
|
|
|
|
* `MIR_T_I8` and `MIR_T_U8` -- signed and unsigned 8-bit integer values
|
|
|
|
|
* `MIR_T_I16` and `MIR_T_U16` -- signed and unsigned 16-bit integer values
|
|
|
|
|
* `MIR_T_I32` and `MIR_T_U32` -- signed and unsigned 32-bit integer values
|
|
|
|
|
* `MIR_T_I64` and `MIR_T_U64` -- signed and unsigned 64-bit integer values
|
|
|
|
|
* ??? signed and unsigned 64-bit integer types in most cases
|
|
|
|
|
are interchangeable as insns themselves decide how to treat
|
|
|
|
|
their value
|
|
|
|
|
* `MIR_T_F` and `MIR_T_D` -- IEEE single and double precision floating point values
|
|
|
|
|
* `MIR_T_LD` - long double values. It is machine-dependent and can be IEEE double, x86 80-bit FP,
|
|
|
|
|
or IEEE quad precision FP values
|
|
|
|
|
* `MIR_T_P` -- pointer values. Depending on the target pointer value is actually 32-bit or 64-bit integer value
|
|
|
|
|
* MIR textual representation of the types are correspondingly `i8`,
|
|
|
|
|
`u8`, `i16`, `u16`, `i32`, `u32`, `i64`, `u64`, `f`, `d`, `p`,
|
|
|
|
|
and `v`
|
|
|
|
|
* Function `int MIR_int_type_p (MIR_type_t t)` returns TRUE if given type is an integer one (it includes pointer type too)
|
|
|
|
|
* Function `int MIR_fp_type_p (MIR_type_t t)` returns TRUE if given type is a floating point type
|
|
|
|
|
|
|
|
|
|
## MIR module
|
|
|
|
|
* Module is a high level entity of MIR program
|
|
|
|
|
* Module is created through API function `MIR_module_t MIR_new_module (const char *name)`
|
|
|
|
|
* Module creation is finished by calling API function `MIR_finish_module`
|
|
|
|
|
* You can create only one module at any given time
|
|
|
|
|
* List of all created modules can be gotten by function `DLIST (MIR_module_t) *MIR_get_module_list (MIR_context_t ctx)`
|
|
|
|
|
* MIR module consists of **items**. There are following **item types** (and function for their creation):
|
|
|
|
|
* **Function**: `MIR_func_item`
|
|
|
|
|
* **Import**: `MIR_import_item` (`MIR_item_t MIR_new_import (MIR_context_t ctx, const char *name)`)
|
|
|
|
|
* **Export**: `MIR_export_item` (`MIR_item_t MIR_new_export (MIR_context_t ctx, const char *name)`)
|
|
|
|
|
* **Forward declaration**: `MIR_forward_item` (`MIR_item_t MIR_new_forward (MIR_context_t ctx, const char *name)`)
|
|
|
|
|
* **Prototype**: `MIR_proto_item` (`MIR_new_proto_arr`, `MIR_new_proto`, `MIR_new_vararg_proto_arr`,
|
|
|
|
|
`MIR_new_vararg_proto` analogous to `MIR_new_func_arr`, `MIR_new_func`, `MIR_new_vararg_func_arr` and
|
|
|
|
|
`MIR_new_vararg_func` -- see below). The only difference is that
|
|
|
|
|
two or more prototype argument names can be the same
|
|
|
|
|
* **Data**: `MIR_data_item` with optional name
|
|
|
|
|
(`MIR_item_t MIR_new_data (MIR_context_t ctx, const char *name, MIR_type_t el_type, size_t nel, const void *els)`
|
|
|
|
|
or `MIR_item_t MIR_new_string_data (MIR_context_t ctx, const char *name, MIR_str_t str)`)
|
|
|
|
|
* **Reference data**: `MIR_ref_data_item` with optional name
|
|
|
|
|
(`MIR_item_t MIR_new_ref_data (MIR_context_t ctx, const char *name, MIR_item_t item, int64_t disp)`
|
|
|
|
|
* The address of the item after linking plus `disp` is used to initialize the data
|
|
|
|
|
* **Expression Data**: `MIR_expr_data_item` with optional name
|
|
|
|
|
(`MIR_item_t MIR_new_expr_data (MIR_context_t ctx, const char *name, MIR_item_func_item)`)
|
|
|
|
|
* Not all MIR functions can be used for expression data. The expression function should have
|
|
|
|
|
only one result, have no arguments, not use any call or any instruction with memory
|
|
|
|
|
* The expression function is called during linking and its result is used to initialize the data
|
|
|
|
|
* **Memory segment**: `MIR_bss_item` with optional name (`MIR_item_t MIR_new_bss (MIR_context_t ctx, const char *name, size_t len)`)
|
|
|
|
|
* Names of MIR functions, imports, and prototypes should be unique in a module
|
|
|
|
|
* API functions `MIR_output_item (MIR_context_t ctx, FILE *f, MIR_item_t item)`
|
|
|
|
|
and `MIR_output_module (MIR_context_t ctx, FILE *f, MIR_module_t module)` output item or module
|
|
|
|
|
textual representation into given file
|
|
|
|
|
* MIR text module syntax looks the following:
|
|
|
|
|
```
|
|
|
|
|
<module name>: module
|
|
|
|
|
{<module item>}
|
|
|
|
|
endmodule
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## MIR function
|
|
|
|
|
* Function is an module item
|
|
|
|
|
* Function has a **frame**, a stack memory reserved for each function invocation
|
|
|
|
|
* Function has **local variables** (sometimes called **registers**), a part of which are **arguments**
|
|
|
|
|
* A variable should have an unique name in the function
|
|
|
|
|
* A variable is represented by a structure of type `MIR_var_t`
|
|
|
|
|
* The structure contains variable name and its type
|
|
|
|
|
* MIR function with its arguments is created through API function `MIR_item_t MIR_new_func (MIR_context_t ctx, const
|
|
|
|
|
char *name, size_t nres, MIR_type_t *res_types, size_t nargs, ...)`
|
|
|
|
|
or function `MIR_item_t MIR_new_func_arr (MIR_context_t ctx, const char *name, size_t nres, MIR_type_t *res_types, size_t nargs, MIR_var_t *arg_vars)`
|
|
|
|
|
* Argument variables can be any type
|
|
|
|
|
* This type only denotes how the argument value is passed
|
|
|
|
|
* Any integer type argument variable has actually type `MIR_T_I64`
|
|
|
|
|
* MIR functions with variable number of arguments are created through API functions
|
|
|
|
|
`MIR_item_t MIR_new_vararg_func (MIR_context_t ctx, const char *name, size_t nres, MIR_type_t *res_types, size_t nargs, ...)`
|
|
|
|
|
or function `MIR_item_t MIR_new_vararg_func_arr (MIR_context_t ctx, const char *name, size_t nres, MIR_type_t *res_types, size_t nargs, MIR_var_t *arg_vars)`
|
|
|
|
|
* `nargs` and `arg_vars` define only fixed arguments
|
|
|
|
|
* MIR functions can have more one result but possible number of results
|
|
|
|
|
and combination of their types are machine-defined. For example, for x86-64
|
|
|
|
|
the function can have upto six results and return two integer
|
|
|
|
|
values, two float or double values, and two long double values
|
|
|
|
|
in any combination
|
|
|
|
|
* MIR function creation is finished by calling API function `MIR_finish_func (MIR_context_t ctx)`
|
|
|
|
|
* You can create only one MIR function at any given time
|
|
|
|
|
* MIR text function syntax looks the following (arg-var always has a name besides type):
|
|
|
|
|
```
|
|
|
|
|
<function name>: func {<result type>, } [ arg-var {, <arg-var> } [, ...]]
|
|
|
|
|
{<insn>}
|
|
|
|
|
endfun
|
|
|
|
|
```
|
|
|
|
|
* Non-argument function variables are created through API function
|
|
|
|
|
`MIR_reg_t MIR_new_func_reg (MIR_context_t ctx, MIR_func_t func, MIR_type_t type, const char *name)`
|
|
|
|
|
* The only permitted integer type for the variable is `MIR_T_I64` (or MIR_T_U64???)
|
|
|
|
|
* Names in form `t<number>` can not be used as they are fixed for internal purposes
|
|
|
|
|
* You can create function variables even after finishing the
|
|
|
|
|
function creation. This can be used to modify function insns,
|
|
|
|
|
e.g. for optimizations
|
|
|
|
|
* Non-argument variable declaration syntax in MIR textual representation looks the following:
|
|
|
|
|
```
|
|
|
|
|
local [ <var type>:<var name> {, <var type>:<var name>} ]
|
|
|
|
|
```
|
|
|
|
|
* In MIR textual representation variable should be defined through `local` before its use
|
|
|
|
|
|
|
|
|
|
## MIR insn operands
|
|
|
|
|
* MIR insns work with operands
|
|
|
|
|
* There are following operands:
|
|
|
|
|
* Signed or unsigned **64-bit integer value operands** created through API functions
|
|
|
|
|
`MIR_op_t MIR_new_int_op (MIR_context_t ctx, int64_t v)` and `MIR_op_t MIR_new_uint_op (MIR_context_t ctx, uint64_t v)`
|
|
|
|
|
* In MIR text they are represented the same way as C integer numbers (e.g. octal, decimal, hexadecimal ones)
|
|
|
|
|
* **Float, double or long double value operands** created through API functions `MIR_op_t MIR_new_float_op (MIR_context_t ctx, float v)`,
|
|
|
|
|
`MIR_op_t MIR_new_double_op (MIR_context_t ctx, double v)`, and `MIR_op_t MIR_new_ldouble_op (MIR_context_t ctx, long double v)`
|
|
|
|
|
* In MIR text they are represented the same way as C floating point numbers
|
|
|
|
|
* **String operands** created through API functions `MIR_op_t MIR_new_str_op (MIR_context_t ctx, MIR_str_t str)`
|
|
|
|
|
* In MIR text they are represented by `typedef struct MIR_str {size_t len; const char *s;} MIR_str_t`
|
|
|
|
|
* Strings for each operand are put into memory (which can be modified) and the memory address actually presents the string
|
|
|
|
|
* **Label operand** created through API function `MIR_op_t MIR_new_label_op (MIR_context_t ctx, MIR_label_t label)`
|
|
|
|
|
* Here `label` is a special insn created by API function `MIR_insn_t MIR_new_label (MIR_context_t ctx)`
|
|
|
|
|
* In MIR text, they are represented by unique label name
|
|
|
|
|
* **Reference operands** created through API function `MIR_op_t MIR_new_ref_op (MIR_context_t ctx, MIR_item_t item)`
|
|
|
|
|
* In MIR text, they are represented by the corresponding item name
|
|
|
|
|
* **Register (variable) operands** created through API function `MIR_op_t MIR_new_reg_op (MIR_context_t ctx, MIR_reg_t reg)`
|
|
|
|
|
* In MIR text they are represented by the corresponding variable name
|
|
|
|
|
* Value of type `MIR_reg_t` is returned by function `MIR_new_func_reg`
|
|
|
|
|
or can be gotten by function `MIR_reg_t MIR_reg (MIR_context_t ctx, const char *reg_name, MIR_func_t func)`, e.g. for argument-variables
|
|
|
|
|
* **Memory operands** consists of type, displacement, base
|
|
|
|
|
register, index register and index scale. Memory operand is
|
|
|
|
|
created through API function `MIR_op_t MIR_new_mem_op (MIR_context_t ctx, MIR_type_t type,
|
|
|
|
|
MIR_disp_t disp, MIR_reg_t base, MIR_reg_t index, MIR_scale_t
|
|
|
|
|
scale)`
|
|
|
|
|
* The arguments define address of memory as `disp + base + index * scale`
|
|
|
|
|
* Integer type input memory is transformed to 64-bit integer value with sign or zero extension
|
|
|
|
|
depending on signedness of the type
|
|
|
|
|
* result 64-bit integer value is truncated to integer memory type
|
|
|
|
|
* Memory operand has the following syntax in MIR text (absent displacement means zero one,
|
|
|
|
|
absent scale means one, scale should be 1, 2, 4, or 8):
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
<type>: <disp>
|
|
|
|
|
<type>: [<disp>] (<base reg> [, <index reg> [, <scale> ]])
|
|
|
|
|
```
|
|
|
|
|
* API function `MIR_output_op (MIR_context_t ctx, FILE *f, MIR_op_t op, MIR_func_t func)` outputs the operand
|
|
|
|
|
textual representation into given file
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## MIR insns
|
|
|
|
|
* All MIR insns (but call or ret one) expects fixed number of operands
|
|
|
|
|
* Most MIR insns are 3-operand insns: two inputs and one output
|
|
|
|
|
* In majority cases **the first insn operand** describes where the insn result (if any) will be placed
|
|
|
|
|
* Only register or memory operand can be insn output (result) operand
|
|
|
|
|
* MIR insn can be created through API functions `MIR_insn_t MIR_new_insn (MIR_context_t ctx, MIR_insn_code_t code, ...)`
|
|
|
|
|
and `MIR_insn_t MIR_new_insn_arr (MIR_context_t ctx, MIR_insn_code_t code, size_t nops, MIR_op_t *ops)`
|
|
|
|
|
* Number of operands and their types should be what is expected by the insn being created
|
|
|
|
|
* You can not use `MIR_new_insn` for the creation of call and ret insns as these insns have a variable number of operands.
|
|
|
|
|
To create such insns you should use `MIR_new_insn_arr` or special functions
|
|
|
|
|
`MIR_insn_t MIR_new_call_insn (MIR_context_t ctx, size_t nops, ...)` and `MIR_insn_t MIR_new_ret_insn (MIR_context_t ctx, size_t nops, ...)`
|
|
|
|
|
* You can get insn name and number of insn operands through API functions
|
|
|
|
|
`const char *MIR_insn_name (MIR_context_t ctx, MIR_insn_code_t code)` and `size_t MIR_insn_nops (MIR_context_t ctx, MIR_insn_t insn)`
|
|
|
|
|
* You can add a created insn at the beginning or end of function insn list through API functions
|
|
|
|
|
`MIR_prepend_insn (MIR_context_t ctx, MIR_item_t func, MIR_insn_t insn)` and `MIR_append_insn (MIR_context_t ctx, MIR_item_t func, MIR_insn_t insn)`
|
|
|
|
|
* You can insert a created insn in the middle of function insn list through API functions
|
|
|
|
|
`MIR_insert_insn_after (MIR_context_t ctx, MIR_item_t func, MIR_insn_t after, MIR_insn_t insn)` and
|
|
|
|
|
`MIR_insert_insn_before (MIR_context_t ctx, MIR_item_t func, MIR_insn_t before, MIR_insn_t insn)`
|
|
|
|
|
* The insn `after` and `before` should be already in the list
|
|
|
|
|
* You can remove insn from the function list through API function `MIR_remove_insn (MIR_context_t ctx, MIR_item_t func, MIR_insn_t insn)`
|
|
|
|
|
* The insn should be not inserted in the list if it is already there
|
|
|
|
|
* The insn should be not removed form the list if it is not there
|
|
|
|
|
* API function `MIR_output_insn (MIR_context_t ctx, FILE *f, MIR_insn_t insn, MIR_func_t func, int newline_p)` outputs the insn
|
|
|
|
|
textual representation into given file with a newline at the end depending on value of `newline_p`
|
|
|
|
|
* Insn has the following syntax in MIR text:
|
|
|
|
|
```
|
|
|
|
|
{<label name>:} [<insn name> <operand> {, <operand>}]
|
|
|
|
|
```
|
|
|
|
|
* More one insn can be put on the same line by separating the insns by `;`
|
|
|
|
|
|
|
|
|
|
### MIR move insns
|
|
|
|
|
* There are following MIR move insns:
|
|
|
|
|
|
|
|
|
|
| Insn Code | Nops | Description |
|
|
|
|
|
|-------------------------|-----:|--------------------------------------------------------|
|
|
|
|
|
| `MIR_MOV` | 2 | move 64-bit integer values |
|
|
|
|
|
| `MIR_FMOV` | 2 | move **single precision** floating point values |
|
|
|
|
|
| `MIR_DMOV` | 2 | move **double precision** floating point values |
|
|
|
|
|
| `MIR_LDMOV` | 2 | move **long double** floating point values |
|
|
|
|
|
|
|
|
|
|
### MIR integer insns
|
|
|
|
|
* If insn has suffix `S` in insn name, the insn works with lower 32-bit part of 64-bit integer value
|
|
|
|
|
* The higher part of 32-bit insn result is undefined
|
|
|
|
|
* If insn has prefix `U` in insn name, the insn treats integer as unsigned integers
|
|
|
|
|
* Some insns has no unsigned variant as MIR is oriented to CPUs with two complement integer arithmetic
|
|
|
|
|
(the huge majority of all CPUs)
|
|
|
|
|
|
|
|
|
|
| Insn Code | Nops | Description |
|
|
|
|
|
|-------------------------|-----:|--------------------------------------------------------|
|
|
|
|
|
| `MIR_EXT8` | 2 | **sign** extension of lower **8 bit** input part |
|
|
|
|
|
| `MIR_UEXT8` | 2 | **zero** extension of lower **8 bit** input part |
|
|
|
|
|
| `MIR_EXT16` | 2 | **sign** extension of lower **16 bit** input part |
|
|
|
|
|
| `MIR_UEXT16` | 2 | **zero** extension of lower **16 bit** input part |
|
|
|
|
|
| `MIR_EXT32` | 2 | **sign** extension of lower **32 bit** input part |
|
|
|
|
|
| `MIR_UEXT32` | 2 | **zero** extension of lower **32 bit** input part |
|
|
|
|
|
| | | |
|
|
|
|
|
| `MIR_NEG` | 2 | changing sign of **64-bit* integer value |
|
|
|
|
|
| `MIR_NEGS` | 2 | changing sign of **32-bit* integer value |
|
|
|
|
|
| | | |
|
|
|
|
|
| `MIR_ADD`, `MIR_SUB` | 3 | **64-bit** integer addition and subtraction |
|
|
|
|
|
| `MIR_ADDS`, `MIR_SUBS` | 3 | **32-bit** integer addition and subtraction |
|
|
|
|
|
| `MIR_MUL`, `MIR_DIV` | 3 | **64-bit signed** multiplication and divison |
|
|
|
|
|
| `MIR_UMUL`, `MIR_UDIV` | 3 | **64-bit unsigned** integer multiplication and divison |
|
|
|
|
|
| `MIR_MULS`, `MIR_DIVS` | 3 | **32-bit signed** multiplication and divison |
|
|
|
|
|
| `MIR_UMULS`, `MIR_UDIVS`| 3 | **32-bit unsigned** integer multiplication and divison |
|
|
|
|
|
| `MIR_MOD` | 3 | **64-bit signed** modulo operation |
|
|
|
|
|
| `MIR_UMOD` | 3 | **64-bit unsigned** integer modulo operation |
|
|
|
|
|
| `MIR_MODS` | 3 | **32-bit signed** modulo operation |
|
|
|
|
|
| `MIR_UMODS` | 3 | **32-bit unsigned** integer modulo operation |
|
|
|
|
|
| | | |
|
|
|
|
|
| `MIR_AND`, `MIR_OR` | 3 | **64-bit** integer bitwise AND and OR |
|
|
|
|
|
| `MIR_ANDS`, `MIR_ORS` | 3 | **32-bit** integer bitwise AND and OR |
|
|
|
|
|
| `MIR_XOR` | 3 | **64-bit** integer bitwise XOR |
|
|
|
|
|
| `MIR_XORS` | 3 | **32-bit** integer bitwise XOR |
|
|
|
|
|
| | | |
|
|
|
|
|
| `MIR_LSH` | 3 | **64-bit** integer left shift |
|
|
|
|
|
| `MIR_LSHS` | 3 | **32-bit** integer left shift |
|
|
|
|
|
| `MIR_RSH` | 3 | **64-bit** integer right shift with **sign** extension |
|
|
|
|
|
| `MIR_RSHS` | 3 | **32-bit** integer right shift with **sign** extension |
|
|
|
|
|
| `MIR_URSH` | 3 | **64-bit** integer right shift with **zero** extension |
|
|
|
|
|
| `MIR_URSHS` | 3 | **32-bit** integer right shift with **zero** extension |
|
|
|
|
|
| | | |
|
|
|
|
|
| `MIR_EQ`, `MIR_NE` | 3 | equality/inequality of **64-bit** integers |
|
|
|
|
|
| `MIR_EQS`, `MIR_NES` | 3 | equality/inequality of **32-bit** integers |
|
|
|
|
|
| `MIR_LT`, `MIR_LE` | 3 | **64-bit signed** less than/less than or equal |
|
|
|
|
|
| `MIR_ULT`, `MIR_ULE` | 3 | **64-bit unsigned** less than/less than or equal |
|
|
|
|
|
| `MIR_LTS`, `MIR_LES` | 3 | **32-bit signed** less than/less than or equal |
|
|
|
|
|
| `MIR_ULTS`, `MIR_ULES` | 3 | **32-bit unsigned** less than/less than or equal |
|
|
|
|
|
| `MIR_GT`, `MIR_GE` | 3 | **64-bit signed** greater than/greater than or equal |
|
|
|
|
|
| `MIR_UGT`, `MIR_UGE` | 3 | **64-bit unsigned** greater than/greater than or equal |
|
|
|
|
|
| `MIR_GTS`, `MIR_GES` | 3 | **32-bit signed** greater than/greater than or equal |
|
|
|
|
|
| `MIR_UGTS`, `MIR_UGES` | 3 | **32-bit unsigned** greater than/greater than or equal |
|
|
|
|
|
|
|
|
|
|
### MIR floating point insns
|
|
|
|
|
* If insn has prefix `F` in insn name, the insn is single precision float point insn. Its operands should have `MIR_T_F` type
|
|
|
|
|
* If insn has prefix `D` in insn name, the insn is double precision float point insn. Its operands should have `MIR_T_D` type
|
|
|
|
|
* Otherwise, insn has prefix `LD` in insn name and the insn is a long double insn.
|
|
|
|
|
Its operands should have `MIR_T_LD` type.
|
|
|
|
|
* The result of comparison insn is a 64-bit integer value, so the result operand should be of integer type
|
|
|
|
|
|
|
|
|
|
| Insn Code | Nops | Description |
|
|
|
|
|
|--------------------------------------|-----:|-----------------------------------------------------------------|
|
|
|
|
|
| `MIR_F2I`, `MIR_D2I`, `MIR_LD2I` | 2 | transforming floating point value into 64-bit integer |
|
|
|
|
|
| `MIR_F2D` | 2 | transforming single to double precision FP value |
|
|
|
|
|
| `MIR_F2LD` | 2 | transforming single precision to long double FP value |
|
|
|
|
|
| `MIR_D2F` | 2 | transforming double to single precision FP value |
|
|
|
|
|
| `MIR_D2LD` | 2 | transforming double precision to long double FP value |
|
|
|
|
|
| `MIR_LD2F` | 2 | transforming long double to single precision FP value |
|
|
|
|
|
| `MIR_LD2D` | 2 | transforming long double to double precision FP value |
|
|
|
|
|
| `MIR_I2F`, `MIR_I2D`, `MIR_I2LD` | 2 | transforming 64-bit integer into a floating point value |
|
|
|
|
|
| `MIR_UI2F`, `MIR_UI2D`, `MIR_UI2LD` | 2 | transforming unsigned 64-bit integer into a floating point value|
|
|
|
|
|
| `MIR_FNEG`, `MIR_DNEG`, `MIR_LDNEG` | 2 | changing sign of floating point value |
|
|
|
|
|
| `MIR_FADD`, `MIR_FSUB` | 3 | **single** precision addition and subtraction |
|
|
|
|
|
| `MIR_DADD`, `MIR_DSUB` | 3 | **double** precision addition and subtraction |
|
|
|
|
|
| `MIR_LDADD`, `MIR_LDSUB` | 3 | **long double** addition and subtraction |
|
|
|
|
|
| `MIR_FMUL`, `MIR_FDIV` | 3 | **single** precision multiplication and divison |
|
|
|
|
|
| `MIR_DMUL`, `MIR_DDIV` | 3 | **double** precision multiplication and divison |
|
|
|
|
|
| `MIR_LDMUL`, `MIR_LDDIV` | 3 | **long double** multiplication and divison |
|
|
|
|
|
| `MIR_FEQ`, `MIR_FNE` | 3 | equality/inequality of **single** precision values |
|
|
|
|
|
| `MIR_DEQ`, `MIR_DNE` | 3 | equality/inequality of **double** precision values |
|
|
|
|
|
| `MIR_LDEQ`, `MIR_LDNE` | 3 | equality/inequality of **long double** values |
|
|
|
|
|
| `MIR_FLT`, `MIR_FLE` | 3 | **single** precision less than/less than or equal |
|
|
|
|
|
| `MIR_DLT`, `MIR_DLE` | 3 | **double** precision less than/less than or equal |
|
|
|
|
|
| `MIR_LDLT`, `MIR_LDLE` | 3 | **long double** less than/less than or equal |
|
|
|
|
|
| `MIR_FGT`, `MIR_FGE` | 3 | **single** precision greater than/greater than or equal |
|
|
|
|
|
| `MIR_DGT`, `MIR_DGE` | 3 | **double** precision greater than/greater than or equal |
|
|
|
|
|
| `MIR_LDGT`, `MIR_LDGE` | 3 | **long double** greater than/greater than or equal |
|
|
|
|
|
|
|
|
|
|
### MIR branch insns
|
|
|
|
|
* The first operand of the insn should be label
|
|
|
|
|
|
|
|
|
|
| Insn Code | Nops | Description |
|
|
|
|
|
|-------------------------|-----:|---------------------------------------------------------------|
|
|
|
|
|
| `MIR_JMP` | 1 | unconditional jump to the label |
|
|
|
|
|
| `MIR_BT` | 2 | jump to the label when 2nd **64-bit** operand is **nonzero** |
|
|
|
|
|
| `MIR_BTS` | 2 | jump to the label when 2nd **32-bit** operand is **nonzero** |
|
|
|
|
|
| `MIR_BF` | 2 | jump to the label when 2nd **64-bit** operand is **zero** |
|
|
|
|
|
| `MIR_BFS` | 2 | jump to the label when 2nd **32-bit** operand is **zero** |
|
|
|
|
|
|
|
|
|
|
### MIR switch insn
|
|
|
|
|
* The first operand of `MIR_SWITCH` insn should have an integer value from 0 to `N - 1` inclusive
|
|
|
|
|
* The rest operands should be `N` labels, where `N > 0`
|
|
|
|
|
* Execution of the insn will be an jump on the label corresponding to the first operand value
|
|
|
|
|
* If the first operand value is out of the range of permitted values, the execution result is undefined
|
|
|
|
|
|
|
|
|
|
### MIR integer comparison and branch insn
|
|
|
|
|
* The first operand of the insn should be label. Label will be the next executed insn if the result of comparison is non-zero
|
|
|
|
|
|
|
|
|
|
| Insn Code | Nops | Description |
|
|
|
|
|
|-------------------------|-----:|---------------------------------------------------------------|
|
|
|
|
|
| `MIR_BEQ`, `MIR_BNE` | 3 | jump on **64-bit** equality/inequality |
|
|
|
|
|
| `MIR_BEQS`, `MIR_BNES` | 3 | jump on **32-bit** equality/inequality |
|
|
|
|
|
| `MIR_BLT`, `MIR_BLE` | 3 | jump on **signed 64-bit** less than/less than or equal |
|
|
|
|
|
| `MIR_UBLT`, `MIR_UBLE` | 3 | jump on **unsigned 64-bit** less than/less than or equal |
|
|
|
|
|
| `MIR_BLTS`, `MIR_BLES` | 3 | jump on **signed 32-bit** less than/less than or equal |
|
|
|
|
|
| `MIR_UBLTS`, `MIR_UBLES`| 3 | jump on **unsigned 32-bit** less than/less than or equal |
|
|
|
|
|
| `MIR_BGT`, `MIR_BGE` | 3 | jump on **signed 64-bit** greater than/greater than or equal |
|
|
|
|
|
| `MIR_UBGT`, `MIR_UBGE` | 3 | jump on **unsigned 64-bit** greater than/greater than or equal|
|
|
|
|
|
| `MIR_BGTS`, `MIR_BGES` | 3 | jump on **signed 32-bit** greater than/greater than or equal |
|
|
|
|
|
| `MIR_UBGTS`, `MIR_UBLES`| 3 | jump on **unsigned 32-bit** greater than/greater than or equal|
|
|
|
|
|
|
|
|
|
|
### MIR floating point comparison and branch insn
|
|
|
|
|
* The first operand of the insn should be label. Label will be the next executed insn if the result of comparison is non-zero
|
|
|
|
|
* See comparison semantics in the corresponding comparison insns
|
|
|
|
|
|
|
|
|
|
| Insn Code | Nops | Description |
|
|
|
|
|
|---------------------------|-----:|----------------------------------------------------------------|
|
|
|
|
|
| `MIR_FBEQ`, `MIR_FBNE` | 3 | jump on **single** precision equality/inequality |
|
|
|
|
|
| `MIR_DBEQ`, `MIR_DBNE` | 3 | jump on **double** precision equality/inequality |
|
|
|
|
|
| `MIR_LDBEQ`, `MIR_LDBNE` | 3 | jump on **long double** equality/inequality |
|
|
|
|
|
| `MIR_FBLT`, `MIR_FBLE` | 3 | jump on **single** precision less than/less than or equal |
|
|
|
|
|
| `MIR_DBLT`, `MIR_DBLE` | 3 | jump on **double** precision less than/less than or equal |
|
|
|
|
|
| `MIR_LDBLT`, `MIR_LDBLE` | 3 | jump on **long double** less than/less than or equal |
|
|
|
|
|
| `MIR_FBGT`, `MIR_FBGE` | 3 | jump on **single** precision greater than/greater than or equal|
|
|
|
|
|
| `MIR_DBGT`, `MIR_DBGE` | 3 | jump on **double** precision greater than/less/ than or equal |
|
|
|
|
|
| `MIR_LDBGT`, `MIR_LDBGE` | 3 | jump on **long double** greater than/less/ than or equal |
|
|
|
|
|
|
|
|
|
|
### MIR return insn
|
|
|
|
|
* Return insn has zero or more operands
|
|
|
|
|
* Return insn operands should correspond to return types of the function
|
|
|
|
|
* 64-bit integer value is truncated to the corresponding function return type first
|
|
|
|
|
* The return values will be the function call values
|
|
|
|
|
|
|
|
|
|
### MIR_CALL insn
|
|
|
|
|
* The insn has variable number of operands
|
|
|
|
|
* The first operand is a prototype reference operand
|
|
|
|
|
* The second operand is a called function address
|
|
|
|
|
* The prototype should correspond MIR function definition if function address represents a MIR function
|
|
|
|
|
* The prototype should correspond C function definition if the address is C function address
|
|
|
|
|
* If the prototype has *N* return types, the next *N* operands are
|
|
|
|
|
output operands which will contain the result values of the function
|
|
|
|
|
call
|
|
|
|
|
* The subsequent operands are arguments. Their types and number and should be the same as in the prototype
|
|
|
|
|
* Integer arguments are truncated according to integer prototype argument type
|
|
|
|
|
|
|
|
|
|
### MIR_INLINE insn
|
|
|
|
|
* This insn is analogous to `MIR_CALL` but after linking this insn
|
|
|
|
|
will be changed by inlined function body if it is possible
|
|
|
|
|
* Calls of vararg functions are never inlined
|
|
|
|
|
|
|
|
|
|
### MIR_ALLOCA insn
|
|
|
|
|
* Reserve memory on the stack whose size is given as the 2nd operand and assign the memory address to the 1st operand
|
|
|
|
|
* The reserved memory will be aligned according target ABI
|
|
|
|
|
|
|
|
|
|
### MIR_BSTART and MIR_BEND insns
|
|
|
|
|
* MIR users can use them implement blocks with automatic
|
|
|
|
|
deallocation of memory allocated by `MIR_ALLOCA` inside the
|
|
|
|
|
blocks. But mostly these insns are used to implement call
|
|
|
|
|
inlining of functions using alloca
|
|
|
|
|
* The both insns use one operand
|
|
|
|
|
* The first insn saves the stack pointer in the operand
|
|
|
|
|
* The second insn restores stack pointer from the operand
|
|
|
|
|
|
|
|
|
|
### MIR_VA_START, MIR_VA_ARG, and MIR_VA_END insns
|
|
|
|
|
* These insns are only for variable number arguments functions
|
|
|
|
|
* `MIR_VA_START` and `MIR_VA_END` have one input operand, an address
|
|
|
|
|
of va_list structure (see C stdarg.h for more details). Unlike C
|
|
|
|
|
va_start, MIR_VA_START just takes one parameter
|
|
|
|
|
* `MIR_VA_ARG` takes va_list and any memory operand and returns
|
|
|
|
|
address of the next argument in the 1st insn operand. The memory
|
|
|
|
|
operand type defines the type of the argument
|
|
|
|
|
* va_list operand can be memory with undefined type. In this case
|
|
|
|
|
address of the va_list is not in the memory but is the
|
|
|
|
|
memory address
|
|
|
|
|
|
|
|
|
|
## MIR API example
|
|
|
|
|
* The following code on C creates MIR analog of C code
|
|
|
|
|
`int64_t loop (int64_t arg1) {int64_t count = 0; while (count < arg1) count++; return count;}`
|
|
|
|
|
```
|
|
|
|
|
MIR_module_t m = MIR_new_module (ctx, "m");
|
|
|
|
|
MIR_item_t func = MIR_new_func (ctx, "loop", MIR_T_I64, 1, MIR_T_I64, "arg1");
|
|
|
|
|
MIR_reg_t COUNT = MIR_new_func_reg (ctx, func->u.func, MIR_T_I64, "count");
|
|
|
|
|
MIR_reg_t ARG1 = MIR_reg (ctx, "arg1", func->u.func);
|
|
|
|
|
MIR_label_t fin = MIR_new_label (ctx), cont = MIR_new_label (ctx);
|
|
|
|
|
|
|
|
|
|
MIR_append_insn (ctx, func, MIR_new_insn (ctx, MIR_MOV, MIR_new_reg_op (ctx, COUNT),
|
|
|
|
|
MIR_new_int_op (ctx, 0)));
|
|
|
|
|
MIR_append_insn (ctx, func, MIR_new_insn (ctx, MIR_BGE, MIR_new_label_op (ctx, fin),
|
|
|
|
|
MIR_new_reg_op (ctx, COUNT), MIR_new_reg_op (ctx, ARG1)));
|
|
|
|
|
MIR_append_insn (ctx, func, cont);
|
|
|
|
|
MIR_append_insn (ctx, func, MIR_new_insn (ctx, MIR_ADD, MIR_new_reg_op (ctx, COUNT),
|
|
|
|
|
MIR_new_reg_op (ctx, COUNT), MIR_new_int_op (ctx, 1)));
|
|
|
|
|
MIR_append_insn (ctx, func, MIR_new_insn (ctx, MIR_BLT, MIR_new_label_op (ctx, cont),
|
|
|
|
|
MIR_new_reg_op (ctx, COUNT), MIR_new_reg_op (ctx, ARG1)));
|
|
|
|
|
MIR_append_insn (ctx, func, fin);
|
|
|
|
|
MIR_append_insn (ctx, func, MIR_new_ret_insn (ctx, 1, MIR_new_reg_op (ctx, COUNT)));
|
|
|
|
|
MIR_finish_func (ctx);
|
|
|
|
|
MIR_finish_module (ctx);
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## MIR text example
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
m_sieve: module
|
|
|
|
|
export sieve
|
|
|
|
|
sieve: func i32, i32:N
|
|
|
|
|
local i64:iter, i64:count, i64:i, i64:k, i64:prime, i64:temp, i64:flags
|
|
|
|
|
alloca flags, 819000
|
|
|
|
|
mov iter, 0
|
|
|
|
|
loop: bge fin, iter, N
|
|
|
|
|
mov count, 0; mov i, 0
|
|
|
|
|
loop2: bge fin2, i, 819000
|
|
|
|
|
mov u8:(flags, i), 1; add i, i, 1
|
|
|
|
|
jmp loop2
|
|
|
|
|
fin2: mov i, 0
|
|
|
|
|
loop3: bge fin3, i, 819000
|
|
|
|
|
beq cont3, u8:(flags,i), 0
|
|
|
|
|
add temp, i, i; add prime, temp, 3; add k, i, prime
|
|
|
|
|
loop4: bge fin4, k, 819000
|
|
|
|
|
mov u8:(flags, k), 0; add k, k, prime
|
|
|
|
|
jmp loop4
|
|
|
|
|
fin4: add count, count, 1
|
|
|
|
|
cont3: add i, i, 1
|
|
|
|
|
jmp loop3
|
|
|
|
|
fin3: add iter, iter, 1
|
|
|
|
|
jmp loop
|
|
|
|
|
fin: rets count
|
|
|
|
|
endfunc
|
|
|
|
|
endmodule
|
|
|
|
|
m_ex100: module
|
|
|
|
|
format: string "sieve (10) = %d\n"
|
|
|
|
|
p_printf: proto p:fmt, i32:v
|
|
|
|
|
p_seive: proto i32, i32:iter
|
|
|
|
|
export ex100
|
|
|
|
|
import sieve, printf
|
|
|
|
|
ex100: func v
|
|
|
|
|
local i64:r
|
|
|
|
|
call p_sieve, sieve, r, 100
|
|
|
|
|
call p_printf, printf, format, r
|
|
|
|
|
endfunc
|
|
|
|
|
endmodule
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## Other MIR API functions
|
|
|
|
|
* MIR API can find a lot of errors. They are reported through a
|
|
|
|
|
error function of type `void (*MIR_error_func_t) (MIR_context ctx, MIR_error_type_t
|
|
|
|
|
error_type, const char *message)`. The function is considered to
|
|
|
|
|
never return. To see all error types, please look at the
|
|
|
|
|
definition of error type `MIR_error_type_t` in file mir.h
|
|
|
|
|
* You can get and set up the current error function through API
|
|
|
|
|
functions `MIR_error_func_t MIR_get_error_func (MIR_context ctx)` and `MIR_set_error_func
|
|
|
|
|
(MIR_context ctx, MIR_error_func_t func)`.
|
|
|
|
|
* The default error function prints the message into stderr and call `exit (1)`
|
|
|
|
|
* MIR is pretty flexible and can describe complex insns, e.g. insns
|
|
|
|
|
whose all operands are memory. Sometimes you need a very simple
|
|
|
|
|
form of MIR representation. During load of module all its functions are simplified as much
|
|
|
|
|
as possible by adding new insns and registers resulting in a form in which:
|
|
|
|
|
* immediate, memory, reference operands can be used only in move insns
|
|
|
|
|
* memory have only base register (no displacement and index register)
|
|
|
|
|
* string and float immediate operands (if `mem_float_p`) are changed onto
|
|
|
|
|
references for new string and data items
|
|
|
|
|
* Before execution of MIR code (through interpreter or machine code generated by JIT),
|
|
|
|
|
you need to load and link it
|
|
|
|
|
* You can load MIR module through API function `MIR_load_module
|
|
|
|
|
(MIR_context ctx, MIR_module_t m)`. The function simplifies module code.
|
|
|
|
|
It also allocates the module data/bss
|
|
|
|
|
and makes visible the exported module items to other module
|
|
|
|
|
during subsequent linking. There is a guarantee that the
|
|
|
|
|
different data/bss items will be in adjacent memory if the
|
|
|
|
|
data/bss items go one after another and all the data/bss items
|
|
|
|
|
except the first one are anonymous (it means they have no name).
|
|
|
|
|
Such adjacent data/bss items are called a **section**.
|
|
|
|
|
Alignment of the section is malloc alignment. There are no any
|
|
|
|
|
memory space between data/bss in the section. If you need to
|
|
|
|
|
provide necessary alignment of a data/bss in the section you
|
|
|
|
|
should do it yourself by putting additional anonymous data/bss
|
|
|
|
|
before given data/bss if it is necessary. BSS memory is
|
|
|
|
|
initialized by zero and data memory is initialized by the
|
|
|
|
|
corresponding data. If there is already an exported item with
|
|
|
|
|
the same name, it will be not visible for linking anymore. Such
|
|
|
|
|
visibility mechanism permits usage of different versions of the
|
|
|
|
|
same function
|
|
|
|
|
* Reference data are initialized not during loading but during linking after
|
|
|
|
|
the referenced item address is known. The address is used for the data
|
|
|
|
|
initialization
|
|
|
|
|
* Expression data are also initialized not during loading but during linking after
|
|
|
|
|
all addresses are known. The expression function is evaluated by the interpreter
|
|
|
|
|
and its evaluation result is used for the data initialization. For example, if
|
|
|
|
|
you need to initialize data by item address plus offset you should use
|
|
|
|
|
an expression data
|
|
|
|
|
* MIR permits to use imported items not implemented in MIR, for
|
|
|
|
|
example to use C standard function `strcmp`. You need to inform
|
|
|
|
|
MIR about it. API function `MIR_load_external (MIR_context ctx, const char
|
|
|
|
|
*name, void *addr)` informs that imported items with given name
|
|
|
|
|
have given address (e.g. C function address or data)
|
|
|
|
|
* Imports/exports of modules loaded since the last link can be
|
|
|
|
|
linked through API function `MIR_link (MIR_context ctx, void (*set_interface) (MIR_item_t item),
|
|
|
|
|
void * (*import_resolver) (const char *))`
|
|
|
|
|
* `MIR_link` function inlines most `MIR_INLINE` calls
|
|
|
|
|
* `MIR_link` function also sets up call interface
|
|
|
|
|
* If you pass `MIR_set_interp_interface` to `MIR_link`, then
|
|
|
|
|
called functions from MIR code will be interpreted
|
|
|
|
|
* If you pass `MIR_set_gen_interface` to `MIR_link`, then
|
|
|
|
|
MIR-generator will generate machine code for all loaded MIR
|
|
|
|
|
functions and called functions from MIR code will execute the
|
|
|
|
|
machine code
|
|
|
|
|
* If you pass `MIR_set_lazy_gen_interface` to `MIR_link`, then
|
|
|
|
|
MIR-generator will generate machine code only on the first
|
|
|
|
|
function call and called functions from MIR code will execute
|
|
|
|
|
the machine code
|
|
|
|
|
* If you pass non-null `import_resolver` function, it will be
|
|
|
|
|
called for defining address for import without definition.
|
|
|
|
|
The function get the import name and return the address which
|
|
|
|
|
will be used for the import item. This function can be useful
|
|
|
|
|
for searching `dlopen` library symbols when use of
|
|
|
|
|
MIR_load_external is not convenient
|
|
|
|
|
|
|
|
|
|
# MIR code execution
|
|
|
|
|
* Linked MIR code can be executed by an **interpreter** or machine code generated by **MIR generator**
|
|
|
|
|
|
|
|
|
|
# MIR code interpretation
|
|
|
|
|
* The interpreter is an obligatory part of MIR API because it can be used during linking
|
|
|
|
|
* The interpreter is automatically initialized and finished with MIR API initialization and finishing
|
|
|
|
|
* The interpreter works with values represented by type `MIR_val_t` which is union
|
|
|
|
|
`union {..., int64_t i; uint64_t u; float f; double d; long double d;}`
|
|
|
|
|
* You can execute a MIR function code by API functions `void
|
|
|
|
|
MIR_interp (MIR_context ctx, MIR_item_t func_item, MIR_val_t *results, size_t nargs, ...)` and
|
|
|
|
|
`void MIR_interp_arr (MIR_context ctx, MIR_item_t func_item, MIR_val_t *results, size_t nargs,
|
|
|
|
|
MIR_val_t *vals)`
|
|
|
|
|
* The function results are returned through parameter `results`. You should pass
|
|
|
|
|
a container of enough size to return all function results.
|
|
|
|
|
* You can execute a MIR function code also through C function call
|
|
|
|
|
mechanism. First you need to setup the C function interface
|
|
|
|
|
through API function `MIR_set_interp_interface (MIR_context ctx, MIR_item_t
|
|
|
|
|
func_item)`. After that you can `func_item->addr` to call the
|
|
|
|
|
MIR function as usual C function
|
|
|
|
|
* C function interface is implemented by generation of machine
|
|
|
|
|
code specialized for MIR function. Therefore the interface
|
|
|
|
|
works only on the same targets as MIR generator
|
|
|
|
|
|
|
|
|
|
# MIR generator (file mir-gen.h)
|
|
|
|
|
* Before use of MIR generator you should initialize it by API function `MIR_gen_init (MIR_context ctx)`
|
|
|
|
|
* API function `MIR_gen_finish (MIR_context ctx)` should be called last after any generator usage.
|
|
|
|
|
It frees all internal generator data
|
|
|
|
|
* API function `void *MIR_gen (MIR_context ctx, MIR_item_t func_item)` generates machine code of given MIR function
|
|
|
|
|
and returns an address to call it. You can call the code as usual C function by using this address
|
|
|
|
|
as the called function address
|
|
|
|
|
* API function `void MIR_gen_set_debug_file (MIR_context_t ctx, FILE *f)` sets up MIR generator debug file to `f`.
|
|
|
|
|
If it is not NULL a lot of debugging and optimization information will be output to the file. It is useful mostly
|
|
|
|
|
for MIR developers
|
|
|
|
|
* API function `void MIR_gen_set_optimize_level (MIR_context_t ctx, unsigned int level)` sets up optimization
|
|
|
|
|
level for MIR generator:
|
|
|
|
|
* `0` means only register allocator and machine code generator work
|
|
|
|
|
* `1` means additional code selection task. On this level MIR generator creates more compact and faster
|
|
|
|
|
code than on zero level with practically on the same speed
|
|
|
|
|
* `2` means additionally common sub-expression elimination and sparse conditional constant propagation.
|
|
|
|
|
This is a default level. This level is valuable if you generate bad input MIR code with a lot redundancy
|
|
|
|
|
and constants. The generation speed on level `1` is about 50% faster than on level `2`
|
|
|
|
|
* `3` means additionally register renaming and loop invariant code motion. The generation speed
|
|
|
|
|
on level `2` is about 50% faster than on level `3`
|