Since the reason for introducing optional static typing is to enhance performance primarily - not all types benefit from this capability. In fact it is quite hard to extend this to generic recursive structures such as tables without encurring significant overhead. For instance - even to represent a recursive type in the parser will require dynamic memory allocation and add great overhead to the parser.
From a performance point of view the only types that seem worth specializing are:
* integer (64-bit int)
* number (double)
* array of integers
* array of numbers
* table
Implementation Strategy
=======================
I want to build on existing Lua types rather than introducing completely new types to the Lua system. I quite like the minimalist nature of Lua. However, to make the execution efficient I am adding new type specific opcodes and enhancing the Lua parser/code generator to encode these opcodes only when types are known. The new opcodes will execute more efficiently as they will not need to perform type checks. Morever, type specific instructions will lend themselves to more efficient JIT compilation.
I am adding new opcodes that cover arithmetic operations, array operations, variable assignments, etc..
Modifications to Lua Bytecode structure
=======================================
An immediate issue is that the Lua bytecode structure has a 6-bit opcode which is insufficient to hold the various opcodes that I will need. Simply extending the size of this is problematic as then it reduces the space available to the operands A B and C. Furthermore the way Lua bytecodes work means that B and C operands must be 1-bit larger than A - as the extra bit is used to flag whether the operand refers to a constant or a register. (Thanks to Dirk Laurie for pointing this out).
I am amending the bit mapping in the 32-bit instruction to allow 9-bits for the byte-code, 7-bits for operand A, and 8-bits for operands B and C. This means that some of the Lua limits (maximum number of variables in a function, etc.) have to be revised to be lower than the default.
New OpCodes
===========
The new instructions are specialised for types, and also for register/versus constant. So for example ``OP_RAVI_ADDFI`` means add ``number`` and ``integer``. And ``OP_RAVI_ADDFF`` means add ``number`` and ``number``. The existing Lua opcodes that these are based on define which operands are used.
Example::
local i=0; i=i+1
Above standard Lua code compiles to::
[0] LOADK A=0 Bx=-1
[1] ADD A=0 B=0 C=-2
[2] RETURN A=0 B=1
We add type info using Ravi extensions::
local i:integer=0; i=i+1
Now the code compiles to::
[0] LOADK A=0 Bx=-1
[1] ADDII A=0 B=0 C=-2
[2] RETURN A=0 B=1
Above uses type specialised opcode ``OP_RAVI_ADDII``.
The basic first step is to add type information to Lua.
As the parser progresses it creates a vector of ``LocVar`` for each function containing a list of local variables. I have enhanced ``LocVar`` structure in ``lobject.h`` to hold type information.
::
/* Following are the types we will use
** use in parsing. The rationale for types is
** performance - as of now these are the only types that
** we care about from a performance point of view - if any
** other types appear then they are all treated as ANY
** Description of a local variable for function prototypes
** (used for debug information)
*/
typedef struct LocVar {
TString *varname;
int startpc; /* first point where variable is active */
int endpc; /* first point where variable is dead */
ravitype_t ravi_type; /* RAVI type of the variable - RAVI_TANY if unknown */
} LocVar;
The ``expdesc`` structure is used by the parser to hold nodes in the expression tree. I have enhanced the ``expdesc`` structure to hold the type of an expression.
::
typedef struct expdesc {
expkind k;
union {
struct { /* for indexed variables (VINDEXED) */
short idx; /* index (R/K) */
lu_byte t; /* table (register or upvalue) */
lu_byte vt; /* whether 't' is register (VLOCAL) or upvalue (VUPVAL) */
ravitype_t key_type; /* key type */
} ind;
int info; /* for generic use */
lua_Number nval; /* for VKFLT */
lua_Integer ival; /* for VKINT */
} u;
int t; /* patch list of 'exit when true' */
int f; /* patch list of 'exit when false' */
ravitype_t ravi_type; /* RAVI change: type of the expression if known, else RAVI_TANY */
} expdesc;
Note the addition of type information in two places. Firstly at the ``expdesc`` level which identifies the type of the ``expdesc``. Secondly in the `ind` structure - the ``key_type`` is used to track the type of the key that will be used to index into a table.
The table structure has been enhanced to hold additional information for array usage.
The parser needs to be enhanced to generate type specific instructions at various points.
Local Variable Declarations
---------------------------
First enhancement needed is when local variable declarations are parsed. We need to allow the type to be defined for each variable and ensure that any assignments are type-checked. This is somewhat complex process, due to the fact that assignments can be expressions involving function calls. The last function call is treated as a variable assignment - i.e. all trailing variables are assumed to be assigned values from the function call - if not the variables are set to nil by default.
The entry point for parsing a local statement is ``localstat()`` in ``lparser.c``. This function has been enhanced to parse the type annotations supported by Ravi. The modified function is shown below.
The do-while loop is responsible for parsing the variable names and the type annotations. As each variable name is parsed we detect if there is a type annotation, if and if present the type is recorded in the array ``vars``.
Additionally for parameters that are decorated with static types we need to introduce new instructions to coerce the types at run time. That is what is happening in the for loop at the end.
The ``declare_localvar()`` function passes the type of the variable to ``new_localvar()`` which records this in the ``LocVar`` structure associated with the variable.
DEBUG_VARS(raviY_printf(fs, "new_localvar -> registering %v fs->f->locvars[%d] at ls->dyd->actvar.arr[%d]\n", &fs->f->locvars[i], i, dyd->actvar.n));
dyd->actvar.n++;
DEBUG_VARS(raviY_printf(fs, "new_localvar -> ls->dyd->actvar.n set to %d\n", dyd->actvar.n));
}
The next bit of change is how the expressions are handled following the ``=`` symbol. The previously built ``vars`` array is passed to a modified version of ``explist()`` called ``localvar_explist()``. This handles the parsing of expressions and then ensuring that each expression matches the type of the variable where known. The ``localvar_explist()`` function is shown next.
::
static int localvar_explist(LexState *ls, expdesc *v, int *vars, int nvars) {
/* explist -> expr { ',' expr } */
int n = 1; /* at least one expression */
expr(ls, v);
#if RAVI_ENABLED
ravi_typecheck(ls, v, vars, nvars, 0);
#endif
while (testnext(ls, ',')) {
luaK_exp2nextreg(ls->fs, v);
expr(ls, v);
#if RAVI_ENABLED
ravi_typecheck(ls, v, vars, nvars, n);
#endif
n++;
}
return n;
}
The main changes compared to ``explist()`` are the calls to ``ravi_typecheck()``. Note that the array ``vars`` is passed to the ``ravi_typecheck()`` function along with the current variable index in ``n``. The ``ravi_typecheck()`` function is reproduced below.
::
static void ravi_typecheck(LexState *ls, expdesc *v, int *vars, int nvars, int n)
Secondly if the expression is a table initializer then we need to generate specialized opcodes if the target variable is supposed to be ``integer[]`` or ``number[]``. The specialized opcode sets up some information in the ``Table`` structure. The problem is that this requires us to modify ``OP_NEWTABLE`` instruction which has already been emitted. So we scan the generated instructions to find the last ``OP_NEWTABLE`` instruction that assigns to the register associated with the target variable.
Next bit of special handling is for function calls. If the assignment makes a function call then we perform type coercion on return values where these values are being assigned to variables with defined types. This means that if the target variable is ``integer`` or ``number`` we issue opcodes ``TOINT`` and ``TOFLT`` respectively. If the target variable is ``integer[]`` or ``number[]`` then we issue ``TOIARRAY`` and ``TOFARRAY`` respectively. These opcodes ensure that the values are of required type or can be cast to the required type.
Note that any left over variables that are not assigned values, are set to 0 if they are of integer or number type, else they are set to nil as per Lua's default behavior. This is handled in ``localvar_adjust_assign()`` which is described later on.
Finally the last case is when the target variable is ``integer`` or ``number`` and the expression is a table / array access. In this case we check that the table is of required type.
The ``localvar_adjust_assign()`` function referred to above is shown below.
::
static void localvar_adjust_assign(LexState *ls, int nvars, int nexps, expdesc *e) {
FuncState *fs = ls->fs;
int extra = nvars - nexps;
if (hasmultret(e->k)) {
extra++; /* includes call itself */
if (extra < 0) extra = 0;
/* following adjusts the C operand in the OP_CALL instruction */
luaK_setreturns(fs, e, extra); /* last exp. provides the difference */
#if RAVI_ENABLED
/* Since we did not know how many return values to process in localvar_explist() we
* need to add instructions for type coercions at this stage for any remaining
* variables
*/
ravi_coercetype(ls, e, extra);
#endif
if (extra > 1) luaK_reserveregs(fs, extra - 1);
}
else {
if (e->k != VVOID) luaK_exp2nextreg(fs, e); /* close last expression */
if (extra > 0) {
int reg = fs->freereg;
luaK_reserveregs(fs, extra);
/* RAVI TODO for typed variables we should not set to nil? */
luaK_nil(fs, reg, extra);
#if RAVI_ENABLED
/* typed variables that are primitives cannot be set to nil so
* we need to emit instructions to initialise them to default values
*/
ravi_setzero(fs, reg, extra);
#endif
}
}
}
As mentioned before any variables left over in a local declaration that have not been assigned values must be set to default values appropriate for the type. In the case of trailing values returned by a function call we need to coerce the values to the required types. All this is done in the ``localvar_adjust_assign()`` function above.
Note that local declarations have a complication that until the declaration is complete the variable does not come in scope. So we have to be careful when we wish to map from a register to the local variable declaration as this mapping is only available after the variable is activated. Couple of helper routines are shown below.
::
/* translate from local register to local variable index
*/
static int register_to_locvar_index(FuncState *fs, int reg) {
if (ravi_type == RAVI_TNUMFLT || ravi_type == RAVI_TNUMINT)
/* code an instruction to convert in place */
luaK_codeABC(fs, ravi_type == RAVI_TNUMFLT ?
OP_RAVI_LOADFZ : OP_RAVI_LOADIZ, i, 0, 0);
}
}
Assignments
-----------
Assignment statements have to be enhanced to perform similar type checks as for local declarations. Fortunately he assignment goes through the function ``luaK_storevar()`` in ``lcode.c``. A modified version of this is shown below.
Firstly note the call to ``check_valid_store()`` for a local variable assignment. The ``check_valid_store()`` function validates that the assignment is compatible.
Secondly if the assignment is to an indexed variable, i.e., table, then we need to generate special opcodes for arrays.
MOVE opcodes
------------
Any ``MOVE`` instructions must be modified so that if the target is register that hosts a variable of known type then we need to generate special instructions that do a type conversion during the move. This is handled in ``discharge2reg()`` function which is reproduced below.
::
static void discharge2reg (FuncState *fs, expdesc *e, int reg) {
The expression evaluation process must be modified so that type information is retained and flows through as the parser evaluates the expression. This involves ensuring that the type information is passed through as the parser modifies, reuses, creates new ``expdesc`` objects. Essentially this means keeping the ``ravi_type`` correct.
Additionally when arithmetic operations take place two things need to happen: a) specialized opcodes need to be emitted and b) the type of the resulting expression needs to be set.
::
static void codeexpval (FuncState *fs, OpCode op,
expdesc *e1, expdesc *e2, int line) {
lua_assert(op >= OP_ADD);
if (op <= OP_BNOT && constfolding(fs, getarithop(op), e1, e2))
return; /* result has been folded */
else {
int o1, o2;
int isbinary = 1;
/* move operands to registers (if needed) */
if (op == OP_UNM || op == OP_BNOT || op == OP_LEN) { /* unary op? */
o2 = 0; /* no second expression */
o1 = luaK_exp2anyreg(fs, e1); /* cannot operate on constants */
isbinary = 0;
}
else { /* regular case (binary operators) */
o2 = luaK_exp2RK(fs, e2); /* both operands are "RK" */
o1 = luaK_exp2RK(fs, e1);
}
if (o1 > o2) { /* free registers in proper order */
When expression reference indexed variables, i.e., tables, we need to emit specialized opcodes if the table is an array. This is done in ``luaK_dischargevars()``.
The Lua fornum statements create special variables. In order to allows the loop variable to be used in expressions within the loop body we need to set the types of these variables. This is handled in ``fornum()`` as shown below. Additional complexity is due to the fact that Ravi tries to detect when fornum loops use positive integer step and if this step is ``1``; specialized bytecodes are generated for these scenarios.
Upvalues can be used to update local variables that have static typing specified. So this means that upvalues need to be annotated with types as well and any operation that updates an upvalue must be type checked. To support this the Lua parser has been enhanced to record the type of an upvalue in ``Upvaldesc``::
/*
** Description of an upvalue for function prototypes
*/
typedef struct Upvaldesc {
TString *name; /* upvalue name (for debug information) */
ravitype_t type; /* RAVI type of upvalue */
lu_byte instack; /* whether it is in stack */
lu_byte idx; /* index of upvalue (in stack or in outer function's list) */
} Upvaldesc;
Whenever a new upvalue is referenced, we assign the type of the the upvalue to the expression in function ``singlevaraux()`` - relevant code is shown below::
static int singlevaraux (FuncState *fs, TString *n, expdesc *var, int base) {
/* ... omitted code ... */
int idx = searchupvalue(fs, n); /* try existing upvalues */
if (idx < 0) { /* not found? */
if (singlevaraux(fs->prev, n, var, 0) == VVOID) /* try upper levels */
return VVOID; /* not found; is a global */
/* else was LOCAL or UPVAL */
idx = newupvalue(fs, n, var); /* will be a new upvalue */
}
init_exp(var, VUPVAL, idx, fs->f->upvalues[idx].type); /* RAVI : set upvalue type */
while (oldsize < f->sizeupvalues) f->upvalues[oldsize++].name = NULL;
f->upvalues[fs->nups].instack = (v->k == VLOCAL);
f->upvalues[fs->nups].idx = cast_byte(v->u.info);
f->upvalues[fs->nups].name = name;
f->upvalues[fs->nups].type = v->ravi_type;
luaC_objbarrier(fs->ls->L, f, name);
return fs->nups++;
}
When we need to generate assignments to an upvalue (OP_SETUPVAL) we need to use more specialized opcodes that do the necessary conversion at runtime. This is handled in ``luaK_storevar()`` in ``lcode.c``::
A number of new opcodes are introduced to allow type specific operations.
Currently there are specialized versions of ``ADD``, ``SUB``, ``MUL`` and ``DIV`` operations. This will be extended to cover additional operators such as ``IDIV``.
The ``ADD`` and ``MUL`` operations are implemented in a similar way. Both allow a second operand to be encoded directly in the ``C`` operand - when the value is a constant in the range [0,127].
One thing to note is that apart from division if an operation involves constants it is folded by Lua. Divisions are treated specially - an expression involving the ``0`` constant is not folded, even when the ``0`` is a numerator. Also worth noting is that DIV operator results in a float even when two integers are divided; you have to use ``IDIV`` to get an integer result - this opcode triggered in Lua 5.3 when the ``//`` operator is used.
A divide by zero when using integers causes a run time error, whereas for floating point operation the result is NaN.