JVMLS 2015 – Nashorn for Java 9

bayswater when sitting down a blatant t plug the vm tech summit in the old world which is a little bit more runtime centric and a little bit less language and java centric which we started doing but Jay focus last year it was very invoke dynamic Eden but we hope to spread it out a little bit more subject wise this year for example last year we also had Slava from the v18 there which was quite interesting so we’re trying to look at these kinds of runtime things as well so i just wanted to to advertise it here in the the other place in the world where this kind of geeks hang out in one room at the same time good party so stockholm in February I mean it’s it’s awesome I still giving specifics but it’s it can be windy and snowy but usually it turns into slush really quickly and it’s dark so you don’t need to you know yes yes no no 20 maybe 10 no so this this is the last session I’m placing JV MLS and i will be talking a little bit about Naz a’ren for java 9 and our execution model and how it changed because we we did a lot of performance work for nas horn when it comes to executing programs quickly for java a 260 in the optimistic type work that i presented last year and we have to pay the price of stable runtime performance with startup time and bite cogeneration which we have heard various grievances of the slowness of at this conference so i’m going to talk about nine or a stretch go for nine how we pay back or amortize the costs for extra type computations ok reset the timer here so I starting 0 so you all probably know me I used to work for article and since like I’m a currently employed but next work I’m starting us as performance and service stack architect performance architecture for Clara now which is Annie payment provider based in stockholm sweden was just rolled out in the United States a couple of months ago so we’ll see sounds like fun but I’m definitely going to miss this place so I don’t as I’m unemployed right now I don’t really need the safe harbor statement it just came in there with a template so it’s just just just always wanted to do that like here we don’t you don’t need in safe harbor statement for this presentation or the usual rules apply I might be lying anyway so the agenda dynamic language is on the JVM everyone knows about that and why so I don’t really need to to spend a lot of time in it nas weren’t in general performance quick recap of the optimistic type work in measurements and how we got near native performance for a lot of interesting scenarios with invoke dynamic on the JVM and then start up a warm-up performance which is the main topic of this presentation and how we tweaked it and how our new execution architecture looks like or where we’re going and and future work this enables so running alien languages on the JVM isn’t something that should be a foreign concept to anyone in here and I’m saying alien language is not just dynamic language is because anything non Java is alien both if it’s like traditionally dynamic like JavaScript or something in between like Scala so people have been deploying stuff that isn’t Java as byte code on the JVM since 95 and they keep doing it and this is not something that is going to slow down now that we have invoked dynamic and mov on projects actually running full speed so alien languages which seem like a cute concept but when you work with javascript is more this kind of alien that we’re we’re we’re used to so me and Attila and Michael and hannah’s and several people here are wrestling games for a couple of years and we’ve actually managed to get them to perform and do our bidding to a certain extent which is fun so it’s always the same struggle there people want the JVM because it’s automatic memory management and these jet optimizations and native reading capabilities and all aspects of a runtime that don’t need to re-implement if you just want to develop a language and come out at the bytecode and run it there’s the hybridizations you have the entire jdk class library that you can access from your Jack scripting classes which is also I thought hybrid program with something that’s weird and slightly tacky when I first saw it but actually realized with projects like avatar Jas it’s extremely powerful if you use it for for certain things and you have all these man decades of high-tech in the JVM you can fall back to yes programmers centuries even and looking at like code basis for Rhino losberne Mozilla spider

monkey and Google v8 java versus native implementations of JavaScript runtime so that means this obviously shows that it’s a good idea to write less code and still get the language so the nasrin project also a very quick recap for whoever’s on the internet you’re all intimately familiar with Mazarin in here was the invoke dynamic proof of concept that the lang tools team started writing in 2010 which was a compliant JavaScript runtime rain to be a compliant JavaScript runtime to show that invoke dynamic was sharp enough for real real use Nasser is open it’s fast and with this disclaimer asterisk they’re saying that comparable performance native implementations and domains where it matters which means that we’re running getting good results and things like the v8 benchmarks and avatar service run node workloads are similar to to v8 workloads in a node environment and so on you got the scripting package so people can write their hybrid applications and extensibility through that so long term now some girls will also add rested various conferences this we want to see if it’s a toolbox or other dynamic languages not just JavaScript on top of the JVM for instance this year I had a thesis student who did a typescript front end with relatively little effort running on top of mouse burn so we’re trying to expose things in those weren’t to be a more platform or more platform agnostic for dynamic languages we’re already using the Dino link library for a call sites and we’re looking a little bit at jruby 9000 which excites us with the ir format is something that looks reusable and conceptually complete or near complete for several languages on the JVM dynamic languages on the JVM so we sort of want to provide some kind of invoke dynamic way of language implementation in the long run so performance then which is what mostly been spending my time on what does performance meet performance time in any runtime has the sum of execution time the time it takes to run your program executes your program and the rest which I could run time over a header it’s like garbage collection waiting for locks generating the code that is to be run anything not executing the program performing the workload and for a q60 java a 260 we spent a lot of time getting execution time down so invoke dynamic optimizations working with JVM so invoke dynamic run faster incremental in linings after Java 7 for call sites with Indies looking at field access times when we have variables in scope how do we get them quickly native code implementation and for me I mean the libraries the JavaScript libraries in Mazarin things like array and string functionality and functions in JavaScript implement them fast and the type specializations because byte code is strongly typed we have to use the scenario type as we can as primitive type as we can to get enough performance on a hot spot looking at the jet rowlands type optimization work all this stuff that we’ve been concentrating on so far brings execution time down generates faster code generates better bytecode and has the the JVM generate better native code from the bytecode so as of a 260 we checked in optimistic types which is there but disabled which is a major overhaul of the model how we generate code how we generate bytecode for hot spot from JavaScript programs and using primitive types or where we can and as narrow types whenever we can because boxing reigns supreme if you just use the most conservative representation of byte code for a dynamic language which is objects everywhere 1840 yes yeah I suck a 240 it’s still disabled though but it’s there so this brings us to runtime overhead which is sort of stretch goal for nine i guess and there is a JEP coming it’s actually two Jeff’s coming and they haven’t been publix published yet and I’m cut off from Oracle so I don’t really see the status of their but varane time overhead we’re looking at increase in jet speed minimizing relinking of coal sites and by cogeneration can we can we do tiered recompilation of the jet class installation speed in the JVM like the compile units being classes containing a method is the minimum compile unit in Java and if you like shuffle too many of those into the system you get you get overhead so so especially it start up in warmup was very affected by optimistic types since the optimistic types metaphor sort of works like generating guessing that we have an arrow type like an int for an unknown type that we can’t statically derive and then if we’re wrong we regenerate a class from a continuation point when we notice that we’re actually overflowing or we’re

getting something back from memory that wasn’t the int whistle it after so there’s a lot of retakes in in the optimistic type branch even though steady-state performance is really nice we added a lot of overhead getting there warming up and starting up so and there are a lot of use cases where it’s really important to be rather fast when you run something for the command line you have frequent press Start restarts you have people working with rebel environments every deployments you have evalso alien never seen never never before seen code in production environments like one new line you can’t generate 40 bytecode classes if you execute that all the times you have the list pesky scenarios that actually turn out to be quite common it’s it has to feel fast when we execute from the command line and optimistic types did nothing to help that so we have a few things like for an 18-14 8u 60 that but what helped start up we have lazy compilation which is a cogeneration strategy i think is defaulting as one right now that only the first time we actually execute a function do we do the complete cogeneration of it we have a code caching feature so you can store both type information optimistic type of information and serialized code generated code on disk and four consecutive runs you can start up faster by deserialising that information but still there’s a long way to go to get to like your typical native application you start v8 and boom prompt is back right even if we completely emanate eliminate all start up overhead in nas horn still starting the JVM is significant amounts of work compared to your average native application but we have to do what we should do what we can do so optimistic types spoken about this both 2013 2014 so those videos are online so if you want to dive deeper into this in these few slides I’m going through feel free to look at the Oracle media network for information there is a jack number 196 that goes into some detail about how the implementation is done so what we did there was to attack execution time and the execution time has to do with the invoke dynamic implementation java.lang invoke behind the scenes call sites because every like fifth instruction every fifth bite or something like that in in our generated the JavaScript byte code is an nd so it’s really important that in goes fast and implicitly and explicitly will get boxing everywhere in the libraries in the representation of generated code and the vm can do relatively little to actually get rid of this boxing and also originally the first national versions a function like this that returns a plus B it was conservative all the way we can’t know statically with a and B are here because the plus operator in JavaScript can handle any object type numbers objects combination thereof so we generated by t 0 that looks something like this which it’s not really important to understand what all the bytecode does here but we’re dealing with a prefixed operators we’re getting a scope object and we have two dynamic call sites while I will look up look up B and then we do any invoke static to something called script runtime ad which is maybe 50 lines of Java code correctly implementing the semantics for the JavaScript add operator that takes anything and anything takes objects here so even if we do like 1 + 2 there is an awful amount of boxing going on there is a call to a very large way about the inlining threshold function script runtime ad and lots of cycles to to just add two numbers if these two numbers soon the optimistic type world the code will look a little bit different still like read the numbers from the scope but the red lines here are optimistic getters let’s say give me an int back because I assume this is an int even though I never seen it before and if it isn’t throw an exception and same with B and then we do any books dynamic link to an integer ad which if we’re lucky we’ll just be a an x86 add instruction jump an overflow because if we’re a bigger than 32 bits we can’t represent this Java int anymore even though it’s still a JavaScript number and all these guys all the red optimistic assume that isn’t into operations they throw an exception if we have the wrong type or if we overflow which we christened the unwarranted optimism exception and so any operation any optimistic operation in the nozzle an optimistic type byte code is surrounded with a try-catch handler method have a Combinator that catches this unwarranted optimism ception and that in turn takes the local state of the frame of the bytecode method that we’re executing in this case we massage the stack so that nothing is left on the stack whenever an optimistic exception triggers because when you

throw an exception in Java you lose the bytecode stack so everything is nicely written to local variables in the bytecode and every optimistic operation invite code is associated with a program point so we can uniquely identify it when we regenerate the method I’m just really quickly going through what I talked about last year for though so I haven’t seen it before and the rewrite exception is thrown to the linker with the local variable states the frame state and the linker will generate the rest of the method from this continuation point make its way out of there it’s basically taking a continuation with the top stack frame making its way out of there and then regenerate in the method more pessimistically for example if this int overflowed will regenerate the method with a double there in its place and call it the next time so we have this technique for taking continuation when something goes wrong unwinding the rest of the method and regenerating a less optimistic version that we call the next time and by doing this we can serve a bytecode that is is rich in primitives which hotspot likes if you start to serve up byte code that is boxed everywhere no vm in the world can help you it’s sad i would like you to be that way and there are experiments with escape analysis but hotspot is good at executing java bytecode that was java alien language is not so much so optimistic types really help there so we use whatever static types there are it’s stupidly throw them away if we see a static type in javascript or can derive a static type rather from an operation we will use it we guess the rest of the types say that their ins and then we have a chain of erroneous guesses it slowly widens the types and causes regenerations of the method and where this continuation mechanism once is written and implemented it’s quite elegant that can be used to exit code from from a program point and I’ll reuse it for some things you’ll see now so the other thing we have to do is to retain primitive storage wherever possible if we do in addition of integers and save it to a scope variable how do we represent the scope variable well the three real part is to the trivial thing will be the conservative thing will be just have an object and put a java.lang.integer there with the value and again this course is boxing at the JVM can’t eliminate JVM cannot explode an object field that we use for storage even if we have a story on your objects in it rather than number objects so right now we have a little bit slightly kludgy solution that is dual fields we call it we’re waiting for our handles to appear and we don’t want to use unsafe so every scope variable in Azorian is represented as a for a bit long and an object and the 64-bit lung is for all the primitive types double to rob its intrinsic ID for numbers otherwise truncated or return it to the long if it is 64-bit so that’s really quite fast and it doesn’t cost any boxing for for primitive types so that’s the other half of the optimistic type problem where we store scope variables in memory so we play with tagged raised not like some misc things that when we have some proof of concepts but we never really checked in anything about any modifications to save memory on this have a better representation of memory a little scope variables in JavaScript and dynamic languages but i think the VAR handles are going to help us get rid of this and also eliminate various checks that we we don’t need for internal arrays and for typed arrays and other javascript structures so i wrote some code actually during Jake reach to to prove to all the Luddites who say that unsafe needs to be there forever that door handles actually had some performance going and I was quite impressed with the the sandbox branch and instance for some simple cases a lot of stuff got eliminated so so I’m a believer in more handles and the last thing we did for optimistic types is that we have a lot of native methods was right which for us is the class library like the JavaScript array or JavaScript string whatever native to us and they are have been since the beginning in nasrin they’ve been annotated by javascript meta info as an annotation we implement them in java we say that here is math.max for instance it’s arity it’s attributed lives in the constructor and not in the prototype and if you want to implement math.max in javascript it takes an arbitrary zero or more a number of arguments and returns the biggest one of them and of course to represent this generically you have object self object ellipsis orgs boxing array allocation de-allocation things behind the scenes that’s really horrible when you just want to like give me the math max of two numbers which is the typical use case it’s a lot of overhead to represent that operation the Special K the typical case of just doing a math max of two numbers representative this is very painful but

javascript typically looks like this the only thing we can say is that it returns a number that’s the only thing which is a double as conservative as we can be thats the only thing JavaScript spec gives us so we added specialized functions basically the linker looks at the primitive arguments that we preserved so well everywhere else both in scope and in byte code and and we had an inversion that goes to Java line math.max and returns an int that the linker will use if we have int and for lungs and for doubles and for all the other types that we have in the Java language under US and and then you notice that these take exactly two elements because that’s the that’s the way people use max of course we can add a max would take zero elements to which returns negative infinity and if someone wants to do that it’s really fast just to show the concept of like adding specialized functions that the linker can can recognize so we ended up with some pretty good runtime performance after warm up running the octane benchmarks the blue bars there are i know and it’s normalized on the red bars which is nas horn when we release jdk 8 the first public version of Mazarin and with optimistic types we don’t have the green bars so we have orders of magnitude of the steady-state performance much better than before because we’re careful with our types and because we specialize and because we don’t widen anything to objects behind the scenes by mistake but there is a cost which is the topic of this presentation i’m doing all my own click part this year by the way because dr Buckley put me through so a lot of a lot of grief last year with with his lawyers you know so this is chilly she will be featured in much of the clip part in this present presentation so start up and warm up until steady state because being wrong guessing a type to be narrower than it was causes lot of continuation taking and regeneration class installation so if we looked at the start up time for the octane benchmarks which is when do they start cutie after generating and executing enough codes they get to this first cycle of benchmarking if we normalize it on a 260 with optimistic types the blue ones without optimistic types I mean and then add optimistic types basically enable optimistic types the way they are in nine right now you see that we have several orders of magnitude longer startup time and this is just from like you you enter a command line for the benchmark until it starts running so it’s even worse because the benchmark pale has to warm up as well but just like getting to a state where where the benchmark start executing is significantly slower and yeah binary orders of magnitude several times is that is bad enough right it was slow the way it was and I think it’s slower now so two things stick out here and under sort of two nightmare scenarios PDF jas which is a really large script gets significantly slower so it seems like startup time degrades none linearly with size of scripts the number of functions we have to index not even execute or but but index so this is one of them I mean it’s not just PDF GSI verify that this this is true for script size in general and the other one is raytrace which is a very a lot of guesses turn out to be too optimistic there basically it works with double arithmetic and we generate an all integer benchmark first because we’re stupid we try to salvage as much as we can from looking at scopes and like being doubles but still we do a lot of regeneration and that also costs so those are the two prohibitive startup things we’re wrong too often and the script is large which also has this trickle-down effects of having us to regenerate things in the optimistic type world and looking at bytecode mass class is installed it’s even worse because then we normalize the blue one before optimistic types how much byte codes we do many bytes of byte code generated as you can see it’s like 18 times more an hour for a trace than it used to be so this this this looks scary yes of course but he’s generated and installed so also when it comes to number of classes that we generate and you also have like several times more so there are a couple of nightmare use cases a lots of dynamically evaluated throwaway code people doing evals such but very on the on the time they run them like different de vals different relax and loops and things like that relinking hurts is type in validation hurts us just because we have these nine X’s here scope loads we do nine pessimistic type guesses in the worst case or optimistic type guesses so I

mean we can look at the scope and see the exes it’s sort of a well-behaved citizen and it’s the same X but it’s also equally possible in JavaScript that X is like a getter which has global side effects so that we really have to do nine re computations and nine regeneration and nine continuations so this feels this feels bad and nonlinear so javascript is horrible in that X is never just an X potentially and and a lot of stuff you only know at runtime this kind of thing if you’re wrong and raytrace is full of compound arithmetic expressions where we’re actually wrong like this you you grow nonlinearly very quickly in generated classes if you want to use the optimistic typing strategy which cost you footprint and it cost the startup time even though yes the class or unloaded and thrown away and also I’ve got some customer cases on web bug reports a lot of people use JavaScript from Java they end up with this this is a benchmark that Michelle pointed my attention to it’s about say three megabytes of Java code one class water generated code that does something with a script engine on every line basically every line is an eval and they threw a lot of exceptions in this new code and then doing this optimistic types and say okay is this an int eval ok wasn’t that I guess it’s a ee vela or just objects here anyway and same code comes again we generated again there are some some outliers that are really bad if you if you use optimistic types and every like evil here we’ll give you 20 classes so so we have some issues look I performing is really good when it when it gets table but there is some cost because the type matrix part of invalidations makes the number of classes and bites generated grow nonlinearly and also the script size because various optimizations and assumptions also makes it girl down linearly so what you thought was a cute kitten here to to the left is actually a really grumpy seventeen-year-old big cat here on the right if you if you don’t do some some creative surgery so this is the trickle-down pyramid you start with javascript which leads to bytecode which contains invoked dynamics which contains lambda forms which no matter how good the caching is the bladimir implement it leads to more bytecode which leads to class installation which leads the system dictionary lock which is very global and the wrong time team say that we will never ever change this it’s not a bug Alexius giving them some some really nice benchmarks that prove that is actually very bottleneck e down in the system dictionary if you like have a lot of lambda forms even if they’re cached and they basically say not a bug so someone else has talked to the runtime team so this level the more bytecode you have incrementally worse things get very quickly so optimistic types definitely gets real time performance in steady state but they make hotspot unfeasibly slow to warm up and the bigger the method in hot spot the worse the optimizations JavaScript methods tend to be bigger than Java methods and hotspot doesn’t do very well with 60k bytecode it does very well with 200 bytes of white coat of it yeah dynamic languages for you so c2 also is not given linearly increasing bytecode size C 2 has non linearly increasing compile time as well so there’s various various vm internals there as well but the most i wouldn’t blame c2 for them for the most part of this i will blame like the installation of huge amounts of bytecode in a non java manner by wickedness of optimizations try to get out of this most type gets are invalidated once we take one continuation ok was a double it wasn’t it an object wasn’t an end and then they are correct forever I mean they won’t keep ping-pong into greater type so if we just like new the right type from the beginning if we had static types like my thesis guy Andrea’s Gabriel stone had in his typescript front end and everything is so much faster but that’s the entire point to JavaScript that types are not used so if we somehow knew the types the first time we start getting would really generate so much new code and the other one is like true for for all historic versions of now is run as well that even for cool sites that aren’t monomorphic we build sort of an if-else if-else tree of metal handle Combinator’s using the object shape which is called the property map in nasrin to check which one of these coal sites is the relevant one depending on the object shape and this actually gets rid of most real inks even if we have like biomorphic or try more cool sites and it’s fairly quick so it’s it’s immensely beneficial for startup time not to regenerate code though it should be it should be clear but this is one of the the observations that we can make so let’s assume that steady state performance is indeed good

enough for now and optimistic type solve that problem we now need to attack startup and war up time to sort of payback the huge loan we took out in class generation overhead and as I mentioned we’ve done some previous work with lazy compilation and co caching and that work well but don’t take us all the way through because I mean code caching with optimistic types certainly gets rid of a lot of typing validations and you can see really feel it in programs of behave sort of like right race where you end up with steady steady type hierarchies after a long while but it’s still slower so what would we do without having to write a lot of new code because we work with tiered yet for instance could we do like add a profiling pass to the jit so we don’t generate serious bytecode unless we know the type or unless we have some type profile information and like make a pessimistic non-folding non continuation generating non bytecode spewing pass in the jet and that would actually be really simple because very little code would need to be written 0 code wouldn’t be written and we can already generate arbitrary level of type pessimism very simply on demand for rewrite exceptions and if we say that everything is objects we don’t throw any rewrite exceptions and we can track what type it really is by looking at this is a box number or is this a really an object so also this would lead to no significant test matrix growth as as programs would execute the same way with an initial profile and pass and the con here would be the type illusion because if we work with pessimistic code like we did the initial versions of an as horn we will write to Y values to scopes we would write java.lang.integer to object feels and scopes instead of preserving the primitive and as soon as you drop the primitive chain from a variable life cycle in JavaScript there will be boxing and you’re screwed so so it’s really important to keep a type primitive it can be primitive to get performance from the JVM here so you could probably solve that by like adding return value filters d boxing stuff and then you’re looking at lambda forms for them and invoke dynamics and stuff like that yeah so and as a lot of cogeneration overhead because we didn’t get away from the root cause Oh over a header we still have to generate bytecode a small compile unit and regenerate it again we see the system dictionary locks and byte code is actually the bottleneck here which is sad because we generate bytecode and Bridget and we’re sort of atty ridgid with type guesses so and are also various other class registration Horrors lambda forums define anonymous classes and every invoked dynamic leads to to a lot of work behind the scenes so this code generation overhead having to generate more bytecode sort of a deal breaker we decided after a while that was several ways we could attack this but and it looks like just overhead our jet overhead generating bike out is expensive and we don’t get close to like rhino interpreters start up with the tiered approach no matter what we do and and also the type problem that I discussed we miss up domestic types in scope and have to them afterwards with candles on some kind of filters and it’s kind of slow and complex so our nemesis seems to be bytecode land in this pyramid and here more bytecode equals more pain because of the trickle-down effects and the multiplier what’s beneath the bytecode in the JVM so if we could only emulate them profile the AST before we generate bytecode collect types and use them for the first jet if it’s like a method was called again or just execute the AST until something is hot and only then send it to the jet and that’s an interpreter on my interpreter of my own virtual machine and larry’s with layers and wheels within wheels is that really I mean when we came up with this doll seemed like a bad thing to do but we were implemented everything else on top of hot spots so let’s try a just did a thought experiment and did a mock-up and try to implement an AST emulator a very lightweight interpreter that basically threadlocal runtime reentrant and can emulate nodes in the JavaScript AST given a frame a scope and and something that saves us here is that a lot of the functionality that javascript uses is in the script object class in the script runtime class in naz a’ren is common to the runtime so hopefully wouldn’t need generate a lot of codes what I did is took the node class which is the master class for all all the ast nodes and I added an abstract interpret method which takes a frame which is the scope we’re in and local variable state and then I did a very simple interpreter mock-up for instance the leaf node and f is simple to implement you say the result is undefined you use type conversions to

boolean interpret the test node and if result is true i’m using the runtime logic the slow pessimistic versions that automatically box up my primitives here and interpret the pass node get the result otherwise interpret the fail know to get the result return the result at least interpreter returns as a very wrappers for logging so it’s very simple conceptually to write an interpreter and control flow is done by throwing various exceptions like jump exceptions for our inline final ease go to and break and continue exceptions that may or may not contain a label return exception is really pretty trivial to write interpreter so a while node it’s probably small code but i’ll put slides online later a while no checks that if we have a test and it’s not know what while we interpret the test and it’s true interpret the body and catch any breaker continued exceptions and then direct the flow to the correct place by wreath rowing it or swallowing it and we also have an OS are check and I’m getting to that later because if we get stuck in low loop and interpreting obviously this is orders of magnitude slower than executing bytecode in steady state we need to get out of there so it was very simple to plug this into the nazarene architecture there’s something called a compiled function which represents one version of a JavaScript or native function with a certain number of parameters that have certain types like the math max intent version would be a compile function even though its native and it contains method handles for invocation and for the constructor if you are created with the vehicle with new and it has a certain type method type specialized on paramus or generic and script function data which is mita info for a function in JavaScript or native function has zero or more compile functions so i added a very small subclass the compiled function called interpreted function which basically represents trampoline and a trampoline is is something that says interpret myself when calls so the call node which represents a javascript Cole has an interpret function when you call that it will return a script function and that is really a trampoline so when you call that it will bootstrap the thread with the interpreter for that actual function and that’s the way you get into the interpretation loop so invoke it and you interpret yourself so we have the interpret method for all nodes and it’s really maybe 10 to 50 lines of code praised he notes it’s not that much to implement the actual interpreter logic and the interpreters function sup Gus the compile function is really very small gym class and the frame that I’m passing around here is no pake interpreter state which is local variables and scope basically and the scope is the same as the rest of the runtime and uses the same access methods and type functionality and computations as the JIT code would do so when it comes to kohls and stuff I mean we seem likely that we’d have to duplicate a lot of functionality implement link logic for all indie calls for instance linking a call a getter setter is really complex in mouse room because of JavaScript semantics and would you have to implement that in the interpreter again and the several things that seem to all the mathematics and all the operations would you would you need to reimplement them so it seems really infeasible if were to recreate stuff but luckily we can fall back to code that already exists in the runtime part of Nazareth the same code for from from the interpreter if we do in addition we can use script runtime add and unbox doesn’t matter the interpreter can take time executing as long as you keep its logic down because no matter how much time it takes its orders of magnitude faster than generating even pessimistic bytecode and installing it and throwing it away so this this enormous super code here is like what it takes to link a sink simple setter a JavaScript setter and that in turn has several sub functions and we can’t reimplement on the logic like that in the interpreter but it’s simple we can just call the the link logic and do the same thing as what if we saw on ND of course it takes time to look up a call site and it’s a it’s creating call sites and method handles behind behind the scenes but there are various tricks to that as well but so it blends pretty well with the the runtime logic without generating the bytecode and I mean if I had to reimplement everything all control code control logic for everything it wouldn’t be feasible to do this but the interpreter is actually pretty slim so there’s a few interfaces that basically added getting set methods to two things that I’ve gathering setters and javascript in otherwise i implemented the interpreter methods and didn’t take that long to do with tickle on was to like get all the special cases right so it could be compliant but here’s an access ER again I apologize for my my terribly small fonts here but if you do get for a dex which is an axis

node this lookup getter thing creates a call site and the coal site is stored in the axis node and reused if it’s not invalidated and the lookup uses all the same link logic as the naz a’ren does in them in in the static code paths with bytecode so it ends up with relatively little new code and foremost logic we just use stuff that already exists and even though add is implementing the full sledgehammer script runs I mad for objects we narrow types back and we have control of types and we can store a type profile or interpreting so very early in the project we notice that even if we link call sites that’s probably the slowest interpreter operation startup is anyway significantly faster then then if you want to go through a bytecode layer and executing the interpreter code is like five to a hundred times slower it’s a very jagged graph then then executing optimized warmed up written by code so we can’t stay in the interpreter for long but that was never the purpose because their observation was if we get the types right from the beginning without generating any bytecode or incurring the class installation overheads we gonna define so we need to transition fairly quickly to cheat code and time to steady state must also not suffer from this we can’t even if we execute faster we still have to stabilize to executing optimized it code very quickly so it seems intuitive that we should do collect the initial type profile like to throw this way and get it so right now we’re using number of invitations as the only jit metric with no explicit byte code but we have like a return value filter on the interpreter trampolines Council many times it’s been invoked and I think it’s like five times or something we start getting a code and then we already have a type profile map to the program points there are the same for the AST no matter if we’re going to create code bytecode or if we’re going to interpret it so it’s deterministic to test and we’re aggressive transferring to cheat code loop’s is the special case because you can’t keep executing the loop for long then you’re never going to get started so we have using a rewrite exception they’re the same concept that’s already implemented in a sore and for failing a type guess and what we did was one of these program points that we have to identify optimistic operations back edge is our program points now too they fit really well into the program point framework and we throw a rewrite exception if if we execute the back edge too many times it counts as an operation to accept the optimistic operation fails and we translate to jet taking continuation with already existing code so all this code already existed in the optimistic jets it was quite simple to plug it in so we reuse the program points as I said loop nodes are now optimistic implement the optimistic interface which means that they have a program point on the package there are some technicalities stack trace and security stack trace it’s not a special case in jet code because we generate java bytecode with line numbers and variables and everything and it looks really pretty just throwing an exception will have the correct JavaScript methods in the stack trace mixed with the runtime Java methods but if we throw an exception from code executing in the interpreter of course we wouldn’t see where the code is we’ll just see a lot of node interpret whatever so so we need to rewrite stack trace the throne from the interpreter to actually look like the JavaScript code automatically doesn’t age it and that’s perfectly doable it’s just bookkeeping if you have GE that calls F that prints a and it doesn’t exist and it would the JIT co is throw a reference error when it gets to a and the red parts they’re also again I apologize to a really small font are three JavaScript methods its stack trace rjs line 75 and to doing this in the interpreter won’t show you any JavaScript code bill issue is several node interpret the node execute than something I was completely impossible to identify where this exception came from but we just trap any exception that’s all the way out of the interpreter and rewrite it because we have this frame this comes the this context of what the interpreter and what jit frames were executing both can be on the stack at the same time of course and replacing the interpreter collections of lines here with javascript line number only happens on exceptions is pretty simple to do and some native classes need to take to access the interpreter frame as well for stack trace reasons because you can manipulate stack traces from error objects in JavaScript and at least in Osborne so the nasty until that generates takes all those annotations for the native classes has has another annotation for passing the frame object in some special cases where we need to so it’s actually quite simple as well and when it comes to security we need to use the method humble look gaps for all call sites that we can reach from the scripts package where the GTO resides and if we try to use the lookup that we get from the ir where the interpreter is

implemented it is to privileged so here is a special case that for every source file we enter we need to provide a method look up the right security level so we don’t try to use a true privilege look up or it will be completely insecure so what we do is that we enter when we enter a new source code that we haven’t seen before the trampoline generates small bytecode layer small bytecode wormhole method as we call it basically writes the correctly privileged the script package privileged look up for that source file to the source and we use that in the interpreter and first till I can tell we can’t get out of the privileged domain for that so the bytecode is like very simple it just has to generate this wormhole class which is entering a new source file that we haven’t seen before just Martian the arguments to the real interpreter method and make sure to that there is a lookup installed with the correct privileges and there are various optimizations you can do i’m almost wrapping up here so for instance we can cash call sites when we when we have a calls like forget or a set or a call we store that in the in the node or we looked it up but we can also reuse the same call site in the runtime in ways we can’t do in the jet for example if we have several nodes that all call g we can have a scope cash that it’s the same scope and no invalidations no no side effects have taken place so we don’t need to do another indie look up slow link g in the second call site so we can deal nifty little tricks like that and save save some interpreter time which is mostly link time so we have some caching mechanisms so I messed it up just before I left auricle so I think until I will clean it up and make you make sure that linking which is the main bottleneck but also the main reason we didn’t have to write almost any code for this it works well so the other thing we have is the program points it’s a problem because when we get code in our own bytecode vanilla bean 64 case we have to split large methods we have to do things with fine lees in line them similar to java sea does and things like that so so we can generate correct bytecode and if you’re interpreting something you don’t really need to generate byte codes we don’t need to split any code we don’t need to inline final ease but the problem is that naively reusing the program points for 40 s are jumping into the jet from a loop package you need to have the same program points as you would if you prepare it for jetting otherwise you can’t map a type to a program point or any other information to a program point so we need to run a lot of the passes that would be unnecessary in something that always interpreted and never moved to the jet stage so program points are assigned very late so given the current representation of program point which is like a statement in the AST in a position in the AST we run a lot of transforms that the interpreter Lee doesn’t care about just we can identify a program point ingested code later when it might be large or split or whatever so I’ve experiment a little bit during the summer with a fuzzier program point representation it’s actually this is not complete but I’m using the number the index of the of the atom in an expression and the position of the source as a tuples identifier program point in that case a program point the AC doesn’t look exactly the same from from interpreter mode and jit mode but we can still identify a program point because it would have the same index and the same a numbered location in the source there are some issues but but I think it’s doable and then I can avoid running a lot of code transformation passes just to get the ast in a form for the interpreter sort of like a hand low as RS and identify our positions in code so all these pre passes like splitting code and computing final ease I don’t really need to do them a try finally I interpret that as a tri finally in that note so there’s another strength with two execution environments we can do background processing bootstrapping one with the other we can do speculative jesting in the background we can even do like multi-threaded speculative jetting in the background and using futures I played a little bit with like simultaneously generating several bite conversions of for the different parameter types using futures in the background again this leads to bytecode installation overhead and system dictionary lock but it’s interesting now that we have a bootstrap layer that it frees up a lot of things that were required to be synchronous and dependent on each other before so the current results with this tests are clean we are JavaScript compliant an interpreter only a mixed-mode is going to jits start our performance is significantly better than before and the initial footprint and cogeneration time is much lower and it’s important to go to the JIT quickly or will take longer to reach a steady state with good performance and type info is usually correct for any program point that a transition I mean neither terror test this but it passes all the tests sweets a is coming and is moving through the process and I guess will be made

public shortly and if we look at bytes of byte code generated during the startup and there’s a little green bar that’s almost not there anymore so during the startup to get to the same position we’re almost exclusively interpreting for the same workload the same man and same octane benchmarks that I showed you at the beginning and the little byte code that we generate our object shapes classes with a certain number of fields and things like that but no code not like this early in the game crypto is actually started to hand over to the jet by the time we get to the steady state but that’s the only one and number of classes generated also shrunk quite significantly actually shrunk the far below what we had even before optimistic types well that’s not so hard whereas drown because we’re interpreting and startup time it looked like this this is when I showed you in the beginning and now it looks like this we’re actually better than we were without optimistic types to getting to the steady-state so we actually paid back and amortize the way more than the cost of implementing optimistic types and even better when i do my experimental fuzzy program point implementation not running these splitting transforms and only transforms and identifying program points with a slightly more complex algorithm we paid it way back the purple bars are low we’re actually more than twice as fast in several cases than we were even before optimistic type starting up and getting to the same steady-state it’s pretty cool given the relatively small amount of code that this interpreter project contains ongoing a future work I’m doing some things for execution overhead that the interpreter enabled I’m adding some profiling to coal sites and allocation sites if you have this JavaScript code that takes returns to newer a million times and then immediately pushes a string to the array the current nasrin will represent this array internally optimistically as an inter a and then it will turn it into throw that away create an object array and do an irate copy so so there’s overhead like with throw away a million into race here in this so what i will do here is with the interpreter I know that this program that will allocate here when I change the types of an array I know where the call site where the allocation site for the array comes from and I will tag this allocation side for the jet that it’s no use I mean do an object array here immediately and that sets a factor six or some time on this benchmark if you just execute this a million times without like having to do the wrong type guess so i can use the interpreter to to create allocation hints in the JIT very simply and I can do partial compilation I can compile only hot nodes I don’t need a compile entire method I can create an ad note I can turn that into a method handler just as the ad I can specialize that but i did with compile functions i can have an ad for ents as a method handle the signature i can add for any intending 17 as a signature so i can save partial method trees as method handles in the nodes so if i had a node with the interpret function you can think of like a small cache of old e regenerated versions of the nodes you can build a jit function from invoking lots of method handles instead of doing one code massive bytecode when you compute it so the signature can be an Internet int and in turn in 17 and you can add mathematical constants and you can basically have partial evaluation with this approach as well so we don’t need a compile entire large method we can just build something out of individual method handles that can also be reused with the typical usual disclaimers of a profile pollution so this is sort of a truffle style behavior that we can achieve without a modified vm by actually just interpreting and saving various profiles after they’ve ingested as as partial master handles that’s a pretty cool thing that we should look at and now with Valhalla is coming we can do the peaks and pokes but safely more handles forget us and setters we can finally get rid of this duel field storage to keep the life of a primitive a primitive all the lifetime we can get rid of bound chicks for example for our internal spill pool which is an object array or a long array in the script object with VAR handles and we can implement typed arrays in the JavaScript with VAR ax handles much faster than than before so this looks very promising the stuff I did a J creat with the the sandbox branch and it looks initially very good not to type the race but i’m sure that Alexei will point out what the problems are so we’re doing some day for integration probably on the future not going on right now but we can provide events and J for integration for any dynamic dynamic language toolbox that that Mission Control can consume and definitely warranty to experiment more with parallelism with more course we can do more speculative jetting in the background and bootstrapping like that and we can improve a speed-up java.lang invoke there’s still boxing going on it shouldn’t be going on a

special cases compile unit still when we generate bite care the smallest compile unit in java is still a class with one method in it and class dynamic sure nothing is happening right now but i think that from Valhalla the possibility of smaller compile units may spring which would be very useful I know the Michael helped has been experimenting a little bit with this with some kind of in constant poll and maybe we can get rid of like weaving an entire class whenever we want to inject a very small amounts of code in the JVM so research continues for multi language framework when we have time we’re a bit understaffed but hopefully we have time we got time to do this thesis which should be public you can google it and that we added a typescript front-end to nazrein when I was fairly pain-free and other pluggable be in front and should be investigated so I guess this puts the end 22 JV MLS this year and it’s been a interesting spring with all these things the partial method combination and the interpreter and the insights of that it was relatively little work to get this there and that we could pay it off so well the optimistic types that were actually starting faster than we did even before we had it so this is questions demos beer time and if you have any questions you can let you think of later you can tweet them to me or email them to me my current email is Marcus at larger Annette and as on how much to Remy who isn’t here I guess we should end with a unicorn chill out session that’s it thank you