Video: Scala + GWT
GRZEGORZ KOSSAKOWSKI: Thanks for introduction. So I'm from University of Warsaw. I've been working here at Google Atlanta for about three months kind of changing projects to make Scala in GWT work better. So first we chose to start with a question why Scala in GWT, why anybody would care? So Scala brings a lot of development techniques to Java developers. And it means five channel programming, better improved object orientation, we've tried syntaxing composition. And at the same time it works very well with existing Java libraries. And on the bottom you can see a teaser of what you achieve do in Scala.
This is a real code snippet I've taken from [INAUDIBLE] of GWT. So I didn't make it up. It's real code. And I will explain later on what's the real difference, what's happening on the right-hand side and why you get sharper code too. It can read this. So that's one reason to work on integration with Scala in GWT. So the basic task is to connect to compilers. And this sounds challenging but if you try to approach this problem you immediately see that both compilers are working with abstract syntax tree. So you know the structures these compilers are processing. And so essentially what you need to translate from one AST to another AST from Scala internal AST to GWT internal AST. But there is a problem actually because with [UNINTELLIGIBLE] right? It's like both projects don't want to make AST as a part of API. They want compilers to evolve. And if we would make a direct approach of just writing code that just translates from one AST to another we would introduce direct type collection between these two projects and it means all hassle– it introduces a hassle like you have to coordinate the review list, you have to coordinate changes in APIs and none of those projects none of Scala team are working with [UNINTELLIGIBLE] on that.
And there is actually interesting observation that you had the situation right now with JDT so Java support in GWT is implemented in exactly this way that you have another JDT compiler in GWT and you have translation from [UNINTELLIGIBLE] and JDT and structures to GWT data structures. But the difference is Java is evolving much slower. So if there is a change like a new release of a compiler it's like every few years and with Scala it's every one year or probably even more often. So we need a loose connection, right? It's like still we want to translate it from one data structure to another but with some loose connection. So we need a stable API that both projects can target. And this is independent in some way, some manner from both projects. And since we are working with compilers it makes sense to introduce another language.
Compilers are working with languages. So we can introduce another language. And the language we're introducing during this summer is called Jribble. And this is our stable API. Before I go into describing how Jribble looks like and details of Jribble, but first we should answer a question: But what about Java source code? This is a language which GWT is supporting and why we shouldn't just translate Scala to Java and don't even bother with GWT at all. But there are serious problems with this approach. So the first bullet is important. I mean it would be so beneficial to have direct translation from Scala to Java. But if it was possible or if it was easy then someone already would implement it. Right? It would have already been implemented. And there were actually attempts to do that in the past. There are two examples– very good examples– where translation is really problematic. One is that in Scala you have unchecked exceptions — all of them.
So you can slow every exception and you don't need to declare this in your message signature. And as everybody knows you have to declare exceptions you are scaling in Java. And you can imagine some kind of program analytics that tries to recover this information– which exceptions you are throwing but this would be hard. It's a rather hard problem. And the second problem, which is even more serious, is that constructor calls doesn't follow Java rules. And it means that you can't have a super constructor call as the first statement in a definition of your constructor. OK? And there is no way to work around this issue. It's kind of artificial Java restriction which I believe the motivation behind this rule is to protect programmers from making mistakes– big mistakes. But if it's a compiler which generates these calls it should be fine, right? And the reason why you need these calls is it's mixing composition, I believe.
So you can try it and it keeps mixing composition in Scala. And this is really serious problem. So that's why this approach didn't work. But there in another common language which is called Java byte code. And again everyone on Java platform is targeting Java byte code. So what about this as a stable API? And it doesn't work either because basically Java byte code is very low level and it doesn't carry structure of your program. So you can have an example of an expression [INAUDIBLE] and nested expression and it's being translated to a list of very simple operations. And we need this structure in GWT. GWT realize when doing optimization and translating programs into complex Java script. It just relies on having this structure. And another problem is arbitrary control flow which is found in bytecode.
And Jribble adds method and field signatures when you reference them. For example when you want to call a method, you give the whole signature including L types. And this is needed because we want to know exact coordinate of every method we are calling without parsing and reading all the files. Again, so you should have a method defined in other class and I'm calling it from other class. On the call side, I want to know exactly which method I'm calling. If you have a situation with [UNINTELLIGIBLE] methods for example, you don't know which method you are calling, right? So you need the full signature to know which methods you are calling. And yes, Jribble files are verbose enough that they can be parsed separately. So you parse them in an independent way. So this is the architecture of my project and the bottom thing– Java to gwtc– it was before I started doing– hacking my stuff.
And I introduced the fragment on top. And it consists of two pieces so you have a Jribble backend on scalac side which means I implemented new backend for Scala compiler which instead of generating Java bytecode it generates Jribble. OK? And then on GWT side I implemented a way to parse these files in GWT with ASP nodes out of these files. And the thing in center is the Jribble library. So it basically– it defines Jribble language and it defines a parser for Jribble language and things like that. So I thought it would be beneficial for you to have an idea how Jribble looks like, how Jribble code looks like. So as you can see it's very verbose. Like for everywhere you reference or define a class you have always fully qualified reference. We replaced dots with slashes for references. And you can see massive signature in this line before parameter list. So that's how Jribble looks like. And now I will go into more details on each side: on Scala side, Jribble side and GWT side, what I was really doing there. So on the scalac side, it was just another backend.
So Scala has JVM backend which compiles Scala code into bytecode. And it has backend to target [UNINTELLIGIBLE] bytecode which I don't know status of. And the nice thing about the fact that this is another backend is it's isolated from the rest of the compiler. So I'm not really messing with internal parts of the compiler. It's like I'm just getting– at the last phase of the compiling I just plug in my new backend and then I create output in form of Jribble. So that is important property because if Scala compiler guys are going to accept my code they want to be sure that I don't really mess up with their own functionality. I don't introduce new bugs and things like that.
And this backend includes both transformations and printing of trees like abstract syntax trees. Which means I first meed to normalize trees in a way that they are more similar to Java construct. So for example in Scala everything is an expression. Like you can have– like block is an expresion. If statements– which is a statement in Java, in Scala it's an expression. It's actually a nice replacement for conditional operator in Java. So then you have to normalize all these things so that they are, again, become an expression. So this is the transformation part. And it really is simply formatting the trees in the form of Jribble. And another thing I had to implement was extending scalac internal testing framework which is called partest. They have their own testing framework and I had to add support for Jribble. And then the whole thing is implemented in Scala which is nice to work with. And the challenges I faced on scalac side is as I mentioned, many expressions in Scala are not expressions in Jribble. So you have to really think carefully how to translate these constructs to valid Jribble constructs. And the second problem is really, really difficult and I don't have any good answer.
So that is pattern matching logic which pattern matching is very powerful in Scala. But the internal implementation can emit arbitrary jumps. So it means you have some form of goto instruction that is being used to implement pattern matching– and again, I have the same problem. I cannot emit goto expressions in Jribble. And I tried some tricks to override goto into massive calls and things like that but I'm still not sure how reliable this is and this can lead to stack overflow, call exceptions and things like that. So this is really difficult part. And Scala has unification of types which means basically– first of all, everything is an object in Scala. And there are two types which are call Nothing and Unit. Unit is similar to void in Java but in Java it's not really type because you cannot have a variable of void.
And in Scala you can't have a variable of type void. And Nothing is probably– OK, if you know what is bottom type than this is Nothing. If you don't know I won't be going into details because it's like kind of– you need some time to explain it. But my infestation of this unification is the expression in the bottom there. You you see like that true all– and you have an expression which is throwing an exception. And it has actually type Nothing. And another problem is our operator both in Scala and Java is likely overweighted. OK? So it doesn't throw an exception because on the left-hand side you have [UNINTELLIGIBLE] so it doesn't really [UNINTELLIGIBLE] right-hand side and it just returns a result. Translating it to Java or Jribble needs some careful thinking and attention. It's really tricky part. Actually I have still a lot of bugs around this kind of constructing in Jribble backend. So it's still not resolved completely. So that's all about Scala side.
And there's Jribble side. So Jribble consists of language specification. Right now the language specification is only implementation. I didn't have separate specification. I plan to have one but I didn't have time to define it. And Jribble is a library. So it defines AST nodes, parsers, printers– things you usually want for a language. And it's written in Scala which was, again, fun to work with. And the last point is for people interested in or familiar with Scala. I'm using Scala library for parsing. And you have an example of how parse rules look like. And this is just Scala code and it's DSL, so it's a nice library too. And you still can read what's happening. It's a parser for parsing while statement. And I was actually very, very happy when I was using this library. I think it's a really, really good one.
Just to comment on that. And there was actually an interesting problem how to test parsers. Language like Jribble is actually complex. So if you look at the grammar, it's complicated. So then there is a challenge how to really test parsers– if it's parsing everything correctly. Like writing casual union tests is really, really tiring. I don't know. I would have to write hundreds of union tests and then I'd parse a fragment of source code and I would need to create all AST nodes by hanging the code to check if things are equal. It's really, really tiring and probably would give up after 10 tests. So I decided to go another road. And if you think what is a parser? It's basically a function from a string to AST node. And you can define the printer which goes in the opposite direction. It takes AST node and it prints it to a string.
And you can compose this to function. So you take the parser– sorry, printer– and then parser and what you should get is just a function of an AST node, OK? So once you have this observation what you need is lots of AST nodes. Just to test the quality of functions. And I'm using ScalaCheck framework to generate lots and lots of AST nodes. And using this approach I've got more than 3,000 tests, random tests, for parser. And this approach turned out to be very, very beneficial. So I told myself many times during the summer that I had to refactor and change the groundwork Jribble first of all because I didn't think of some special case or special need. So I had to refactor like half of the grammar of Jribble and as a consequence I had to refactor the parsing code. I wasn't really afraid of doing that because I had so many tests and they were catching every mistake you can imagine. Random testing is very good about boundary conditions that you usually don't think of or you are too lazy to really test every boundary condition. And usually there are bugs really often waiting for you for these boundary conditions.
So I really loved it. And I was using Simple Build Tool which is like a replacement for [UNINTELLIGIBLE] which is much, much for building and running tests and everything. So Jribble challenges: Language must encode a lot of information. So normally what you have, you have a source code and compiler is reading your source code and once it has the whole source code in memory, it tries to recover the information that is not explicit in your source code. So for example, if you are calling a method, based on all information it will calculate coordinates of the message you are really calling. OK? Because it's not really explicit in source code which methods you are trying to call. And there are many, many other things like that.
Like you reference a field and in the place where you reference the field and you don't have information of the type of that field. OK? You have to go to the definition of the field to see the type and things like that. So it's very, very verbose. If you compile Scala's standard library to Jribble it's almost 60 megabytes of Jribble output. It's so huge. But it compresses very well. So you can go down to– as far as I can remember with standard disk compression to nine megabytes. So it's so verbose and so repeated. And actually the most significant reason for so huge size is because we have fully qualified references everywhere, right, so you repeat this information all the time. And grammar turned out to be complicated enough to make parsing challenging. Which means it's almost like Java with some things being removed. But I had similar problems that people flagged when they tried to parse Java. And the last point is parser combinators were invented by functional programming community. And they are really, really functional. Basically the parsers are constructed of lots of [UNINTELLIGIBLE].
The problem with that is for a lot of Scala code, if you're compiling to bytecode, it maps to– like method calls are being mapped to message calls. And if you tried to used Java tools for filing, you're fine. You might get strange method names because Scala compiler is doing a method name modeling sometimes. But still you are fine. But for highly functional code, I tried profiling with [UNINTELLIGIBLE] and the result was really horrible. I couldn't get any useful information of the results that's good. Because what you really need is to profile how much time you spend on evaluating the expression and not really culling a method. So you have complicated expression and you would like to know how much time you spent on evaluating an expression. And Java profile's going to give you that information and my approach will probably be to implement special combinator for this parser and use speechwriter. So I guess it might be interesting for you.
I looked into that problem and it would be a good approach. So now gwtc side. On the GWT side I just plugged my Jribble library in parallel to JDT. So it's like a parallel path. And what I need to do is parse Jribble files which is being handled by Jribble library. Parsing, everything is being done by that library. So I get AST nodes of Jribble form and I need to translate them to GWT AST nodes. And Jribble is designed and defined to be as close to Java as possible, so translation is usually very, very straightforward. Just take one AST node and translation is basically almost always one to one. And the nice about this approach is that, again, it's isolated from the rest of gwtc internal data structures and functionality. And yes, that was the approach.
You define everything in Jribble and you have as small an impact on GWT as possible. So again, I really tried to make it focused so it can be merged eventually to official release I hope. So GWT challenges: Obviously I had to program in Java again. And that was quite painful. I don't like to go into details but it is. That's one point but another point is you have really a lot of these AST nodes. Really, really a lot of them. And every single set is straightforward but due to the large number of cases you'd spend the entire day doing the same thing which is tiring. And especially if you realize that you could do pattern matching and you'd get 10 times smaller code if you once worked in Scala. And so the third point is it's kind of a complaint but I don't say I have a better answer to that. So in terms of data structures in GWT working in the way you create like empty AST nodes and then in second phase you add more information. And it's so easy to miss something.
Now you can see? Yep. I think font size could be bigger. Ah very good. Yeah that's better. OK. So the problem is I don't see this on my screen. OK. Now I see. So two months ago the only difference between this demo and the demo I presented two months was that I had [UNINTELLIGIBLE] class to finish here for defining [UNINTELLIGIBLE PHRASE] OK? And the only progress I got is implementing that functionality on scalac side and Jribble and gwtc side to replace this with lambda definition and with implicit definition. So what's happening for people who are not familiar with Scala is here I have the lambda definition. It says I don't care about parameter because I didn't use it on the right-hand side. And so it's a function from ClickEvent to [UNINTELLIGIBLE] or Unit. So I'll pass a lambda function here, OK? To the bottom right obviously a button expects– oh, I cannot see– it expects a ClickHandler.
So what I do– and this is usually defining in some library router or something you always reuse– so you're defining implicit convention from function which goes from ClickEvent to Unit– and remember Unit is like a void in Java– which returns a ClickHandler. And here you have this analysis class definition with a method and onClick will call it the function you pass. So basically Scala compiler works in a way that if it sees something that doesn't type check– that type doesn't match– then it tries to find an implicit definition that will make the inserts compile, OK? And this is what's happening in this sample. And this works. So I can show you it compiles hopefully. [TYPING] And while this is being done I can tell you that this is surprisingly hard to get right. Because you need to implement static fields. And actually I'm cheating here.
It still looks like you take a Scala library because it needs Scala library which defines functions and things like that. So it looks Scala library is being compiled and I'm just using Scala library and everything worked. I hit to manually remove some things because surprisingly enough Scala function definitions depend on Scala collections which I didn't even realize. And Scala collections use all language features you can imagine. They are so advanced that they simply use everything. And yes, here we've got this resolve compilation and it's called example. [TYPING] OK. And hopefully nothing will break. I click a button, I get [UNINTELLIGIBLE] from Scala, right? [APPLAUSE] This is the only thing I've done. Like anything else more sophisticated really is Scala collection, OK? You need a list or map or something to get anything done. And then you pull in all the collections from Scala and this is– so basically you need 100% of all Scala features to get anything running.
And I will switch back to slides and just comment more on that. So is there anything more? There no gradual progress in this project which is really, really disappointing, OK? It's been like two weeks and there is nothing running completely. Oh, just a bunch of unit tests. And that's all. And as you can see, two months later I have almost nothing better running in terms of GWT applications. And I already told that Scala collections library exploits every language feature. So where are we at? What is the picture of this project? And if we recall our diagram, I've got most of things implemented on the left-hand side and in the center of this diagram. Which means I can compile entire Scala library in Jribble with some little bugs that I am aware of. But I know how to fix them. Most of them I know how to fix them. And the result is that I can parse 96% of Jribble files for standard Scala library which corresponds to more than 4,400 files. So this is kind of big achievement. I told you that it exploits almost all the language features.
I've got 4% remaining. And again, this is fun. I've got 90% running quite quickly and then next every percent is like few days of working. OK, so for the 6% it was like two weeks to get this. And I would expect the 4% would be like a month to get done and correct. Because it involves things which are most complicated: Unit, Nothing and pattern matching logic. And the missing part is the right-hand side of this diagram which is getting everything, every possible node translated. And I thought this was easy but I discovered last night that I forgot that GWT doesn't support entire Java library, standard library, right? So Scala library, standard library, in a few places depends on Java standard library. And now the only thing you can do is write by hand replacements of that functionality.
It's the same way how GWT [UNINTELLIGIBLE PHRASE] in standard Java library. And this is actually a lot of work. You have to go through every file, check everything or create a framework that would be checking for you for things that are not allowed in GWT. Yeah, it's kind of a lot of work. To comment on the status of the project, I believe that two months of really good engineering work would bring you to Showcase it working but still inefficient, maybe with bugs, and definitely won't support, only from code line. OK? So this is like two months of engineering still missing. So, I mentioned Showcase many times so I think at the end of my talk I would like to go through quickly. It's like a big application showing lots of GWT features. But it's complicated enough that you could imagine that if there was any point in this project then Scala should really shine.
OK? And there was external contribution to my project. There was a person that started translating Showcase into Scala. And we will go through that. This is the fragment that I presented at the beginning of my talk. And what is happening here is you set up many options and you repeat the same method call all the time– do I have a laser thing? Yes. So what's happening is you call a set of many options all the time and the first parameter and the last parameter is the same. The only thing that is changing is the second parameter, OK? So it's kind of obvious that this should be refactored in a form that this repetition is removed. And with Java you probably could. You'd find another method. But this is like nobody does it because it's too heavyweight. So in Scala it's very natural. What's happening is you define a list of things you want to pass through to method. And then you say for each and– you have these method call.
But with second thing you could underscore which means you take every element of the list and pass through to this method, OK? And that's how you get much more readable code size and it really explains things. And another example– yes, I've already showed you getting rid of anonymous classes which are really annoying Java problem and everyone agrees. If you have only single method you have to implement then it's really annoying to have this definition. So with implicit convergence, you can easy get rid of that. And you can see– again, a fragment of Showcase– how it looks like. And the third example is that Scala has built-in XML support. So you can have XML literals mixed in freely with Java and Scala code. So instead of concatenating strings which is bad for many reasons, you have a nice XML literal. And this actually more secure, right? Because we all know problems with concatenating strings. So not only more concise and it's better to read this code, but it's more secure. So this is like another really obvious benefit.
And I believe that there might be much more. So this is something you can get without defining GWT APIs and with very little [UNINTELLIGIBLE] module and everything will be working. But I believe if you create more sophisticated wrappers around GWT API you would get even more benefits. So what's the future of this project? And first of all dev mode support. I didn't look into that at all. It's like if you cannot have your code running and compile with the common line there is no point in going into that. So I didn't have time even think about it too much. So the obvious goal is to get Showcase running. It's big enough to really prove that you've got nice, decent coverage of features. And the third point is profiling and optimizing Jribble parser.
Right now the parser is really, really slow and it's not because it's functional and it adds some overhead but because there is backtracking that I couldn't identify. It's horrible. I mean six minutes to parse Scala– Jribble of Scala standard library. And if these three things are being implemented and done, I hope that merging extensions with gwtc and scalac would be possible. And translating more samples to Scala would be beneficial too. And at the end of my talk I would like to thank a few people. So, Rob Heittman was the person who was taking care of infrastructure for me. So we had our own center for code review, which was [UNINTELLIGIBLE] and he was taking care of all this stuff which was really beneficial to me. The second person, Aaron Novstrup, he translated Showcase into Scala and again, I'm really, really thankful for that because I didn't have enough time to spend time on that. And obviously Lex Spoon for hosting me, for answering 20 questions a day and being patient enough.
And I'd like to thank this company because it's not like in every company you can work on that kind of project. And thanks for releasing so much good software on open source licenses. I really appreciate it. And at the end there are some pointers to homepage, groups, code review. And do you have any questions? What's the status of getting the Jribble back into Scala? I didn't talk to Scala people. So I attended a conference by Scala guys. It was in April this year. And I mentioned that I'd been accepted as an intern in Google. And they seemed to be very interested in the project. But since then I didn't really talk about merging that because first you want something running. But I believe that shouldn't be a problem given the fact that I was really careful to not touch anything. It's really just adding a new package– like Scala package with new code.
And there's almost no modification on the rest of the compiler. So I hope this will be accepted. But this is just my opinion. I didn't talk to those guys yet. Also, your build step: You first run scalac with a output to Jribble and then you run gwtc with a different fran reader instead of reading the Java code? Yes. So actually on GWT side it can read Java and Jribble files at the same time. And this very important so you can have half of your application implemented in Scala and then this will be compiled to Jribble. And half of your application will be implemented in Java and this will work well. So it's more like– I'm basically going through class parts and I'm trying to find files with Jribble extension and then I fire up my own parser and all my logic for these files. And Java files are being parsed as they are right now in GWT. So it's just file name extension? Yes. It's just file name extension.
Any other questions? In your slide about the difficulties of translating from Scala to GWT AST, I was curious at what level– this may be not a useful question but I'm curious what level of the Scala compiler you're hooked in to. Because I would assume there's a more AST-like level and then there's probably something slightly lower level that's more appropriate for generating bytecode or something like that? Yes. Actually the answer is complicated because Scala code– so before it reaches bytecode there is something called ICODE in Scala. So it has like normal AST nodes then it translates it to ICODE and only then to bytecode. First of all, still Jribble backend is not only printing stuff to Jribble. It's really sophisticated logic which translates constructs which you cannot represent in Java in the right way to [UNINTELLIGIBLE PHRASE] constructs that can be represented in Java.
So if you would like to hook Scala compiler directly to GWT then the question is where are you going to put this logic, right? And the only sensible answer is you really need to do all this work in Scala because really this logic is sophisticated and implementing this in Java would be really, really tiresome. And then you need to add some code to Scala compiler anyway. And still Scala compiler is evolving very quickly. They really, really modify their AST nodes. So there would be a really big problem in synchronization of release and things like that. And another benefit of having Jribble is that other languages can target it. Like you can imagine a Ruby compiler– a JRuby compiler for compiling in Jribble. So I think from GWT point of view it's much more beneficial to have that kind of standard language that everyone can target. So Jribble right now it expresses mostly Java constructs, sort of winnowing it down to something useful for this purpose.
OK? So it would be very, very space forward for a good compiler to inline all the whole thing directly into an anonymous class implementing and then keep the lambda thing completely. And you'd still have the lambda in your code but compiler to easily optimize it. This is just inlining. But this is obviously only in special cases when you don't rely on that fact that lambda is really a class and you do something more about it. If you just take it and execute it then you can optimize it away. And probably most of the time that would be the case. Any questions? Maybe questions for [? Visit? ?] So you were saying, and if I misheard you just tell me, but you said that 96% of the Scala standard library you could translate. Were you saying that from basically translating that into Jribble. Is that what you refer to? Yes.
So it's still- it's not like GWT can handle that. The reason why it cannot handle that is, as I mentioned, I have representation in Jribble but it still refers to Java standard library, OK? So the missing part is emulate all these codes somehow. And this is not very difficult. It doesn't require any compiler knowledge actually. It's more like you have to go through all cases and think of I don't know– you have three line method in Scala so you'd have to replace it with something that you have in GWT. So things like that. It's actually converting all of Scala standard library into Jribble. He's only parsing 92% of it. OK, I see. Anybody else? So thanks for attention, for coming, and thanks for hosting me here in Atlanta and hopefully see you in the future. Thank you. [APPLAUSE] .