Our main focus over the past two weeks has been updating all clients to PoC5 compatibility, and it’s certainly been a long journey. Changes to the VM include:
- New initialization/code mechanism: Basically, when you create a contract, the provided code is executed immediately and the return value of that code becomes the code of the contract. This allows you to have contract initialization code, but still maintain the same format (nonce, price, gas, to, value, data) for both transaction and contract creation, and makes it easier to create new contracts via forwarding contracts.
- Reorder transaction and contract data: Now the order is (nonce, price, gas, to, value, data) in the transaction and (gas, to, value, datain, datainsz, dataout, dataoutsz) in the message. Serpent maintains the parameters send(to, value, gas), o = msg(to, value, gas, datain, datainsz) and o = msg(to, value, gas, datain, datainsz, dataoutsz).
- Fee Adjustment: A fee of 500 gas is now charged when creating a transaction, and several other fees have been updated.
- CODECOPY and CALLDATACOPY operation codes: CODECOPY takes code_index, mem_index, len as arguments and copies the code from code_index … code_index+len-1 to memory mem_index … mem_index+len-1. This is very useful when combined with init/code. Now we also have CODESIZE.
But the biggest change was in the architecture surrounding the protocol. On the GUI side, the C++ and Go clients are evolving rapidly and there will be more updates on that side soon. If you’ve looked closely at Ethereum, you’ve probably seen something like this: dennis lotto, a complete implementation of the lottery and GUI written and run within a C++ client. From now on, the C++ client will transition into a more developer-oriented tool, while the Go client will start focusing on user-oriented applications (or rather meta-applications). On the compiler side, Serpent has undergone several substantial improvements.
First, the code. You will be able to peek and see the Serpent compiler under the hood. all featuresCan be used with exact conversion to EVM code. For example:
72: (‘Access’, 2, 1, 73: (”, ”, 32, ‘MUL’, ‘ADD’, ‘MLOAD’)),
This means that what access(x,y) actually does internally is actually compile recursively whatever x and y are, and then load memory at index x + y * 32. So x is a pointer to the beginning of the array and y is the index. This code construct has been around since PoC4, but we have now further upgraded the meta-language used to describe translations to include even if, while and init/code in this construct (they were not special cases). Now only set and seq remain as special cases, and seq can be removed if desired by reimplementing it like this: Rewrite rules.
The biggest change so far has been PoC5 compatibility. For example, if you run serpent compile_to_assemble ‘return(msg.data(0)*2)’ you will see:
(“Begincode_0″, “CALLDATACOPY”, “RETURN”, “~begincode_0”, “#CODE_BEGIN”, 2, 0, “CALLDATALOAD”, “MUL”, “MSIZE”, “SWAP”, “MSIZE”, “MSTORE”, 32 , “SWAP”, “RETURN”, “#CODE_END”, “~endcode_0”)
Here’s the actual code:
(2, 0, “CALDATALOAD”, “MUL”, “MSIZE”, “SWAP”, “MSIZE”, “MSTORE”, 32, “SWAP”, “RETURN”)
If you want to see what’s happening here, let’s say a message is received with the first data being 5. So you get the following result:
2 -> Stack: (2) 0 -> Stack: (2, 0) CALLDATALOAD -> Stack: (2,5) MUL -> Stack: (10) MSIZE -> Stack: (10, 0) SWAP -> Stack : (0, 10) MSIZE -> Stack: (0, 10, 0) MSTORE -> Stack: (0), Memory: (0, 0, 0 … 10) 32 -> Stack: (0, 32) , memory: (0, 0, 0 … 10) SWAP -> stack: (32, 0), memory: (0, 0, 0 … 10) RETURN
The last RETURN returns 32 memory bytes starting with 0, (0, 0, 0 … 10), or the number 10.
Now let’s analyze the wrapper code.
(“Begincode_0″, “CALLDATACOPY”, “RETURN”, “~begincode_0”, “#CODE_BEGIN”, ….. , “#CODE_END”, “~endcode_0”)
For clarity, I have omitted the internal code described above. The first thing you see are two labels. Begincode_0 andendcode_0 and #CODE_BEGIN and #CODE_END guards. Labels mark the beginning and end of internal code, and guards exist in later stages of the compiler. The compiler understands that everything between the guards must be compiled as if it were a separate program. Now let’s look at the first part of the code. In this case, the final code has ~begincode_0 at position 10 and ~endcode_0 at position 24. endcode_0 is used to refer to these positions and $begincode_0.endcode_0 refers to the length of the gap between them, which is 14. Now remember that during contract initialization the call data is the code that the user enters. :
14 -> Stack: (14) DUP -> Stack: (14, 14) MSIZE -> Stack: (14, 14, 0) SWAP -> Stack: (14, 0, 14) MSIZE -> Stack: (14, 0, 14, 0) 10 -> Stack: (14, 0, 14, 0, 10) CALLDATACOPY -> Stack: (14, 0) Memory: ( … ) RETURN
The first half of the code explains how to set the internal code to memory index 0… Make sure you cleverly set up your stack to push to 13 and then immediately return that chunk of memory. In the final compiled code, 600e515b525b600a37f26002600035025b525b54602052f2, the internal code is nicely positioned to the right of the initialization code that simply returns it. In more complex contracts, initializers may also provide functionality such as setting specific storage slots to values, calling or creating other contracts, etc.
Now let’s introduce Serpent’s newest and most interesting feature: imports. One common use case for contract lands is to give contracts the ability to create new contracts. The question is how do we put the generated contract’s code into the generated contract? Previously, the only solution was the awkward approach of first compiling a new contract and then putting the compiled code into an array. Now there is a better solution. It’s just an import.
Enter the following in returnten.se:
x = create(tx.gas – 100, 0, get(mul2.se)) return(msg(x,0,tx.gas-100,(5),1))
Now enter the following in mul2.se:
return(msg.data(0)*2)
Now, if I run snake compile returnten.se, I get execute a contract, and voila, you can see that it returns 10. The reason is clear. The returnten.se contract creates an instance of the mul2.se contract and then calls it with the value 5. As the name suggests, mul2.se is a doubler, so it returns 5*2 = 10. Be careful with your imports. It is not a function in the standard sense. x = import(‘123.se’) fails and import only works in a very specific create context.
Now let’s say we create a monster contract of 1000 lines and split it into files. For this we use inset. Enter the following into Intoouter.se:
If msg.data(0) == 1: inset(inner.se)
And enter the following in inner.se:
return(3)
Running serpent compile external.se gives me a nice piece of compiled code that returns 3 if the msg.data(0) argument is equal to 1. And that’s it.
Future updates for Serpent include:
- This mechanism has been improved to avoid loading the internal code twice if you try to use import twice with the same file name.
- string literal
- Improved space and code efficiency of array literals
- Debugging decorators (i.e. compilation functions that tell which line of Serpent corresponds to which byte of compiled code)
However, in the short term we will be focusing on bug fixing, cross-client test suites, and ongoing work. ethereumjs-lib.