Łukasz Langa – Thinking In Coroutines – PyCon 2016

(presenter) Good morning everyone (audience members) Good morning! (presenter) 2-5-1 confirms we are OK Good morning everyone We are running a little bit late, so a short introduction This is Łukasz Langa He’s talking about coroutine’s asyncio framework in Python Please give a big hand and applause to Łukasz [applause] (Łukasz Langa) Good morning everyone This is Thinking in Coroutines My name is Łukasz Langa, I’m from the Internet You can find me in lots of places [laughter] In my free time, I’m helping Facebook run Python Specifically, our mission for 2016 is get as many projects as possible on Python 3 We are doing pretty well You can check how far along are we on our booth I invite you to do that Specifically, it’s very popular at Facebook because Facebook loves async There’s a lot of things about asynchronous programming that just clicks with our minds for some reason As you can see, the PHP variant that we’re using is both typed faster and has async 08 The C++ libraries that we’re using to actually use as much CPU power as we have on our servers are also asynchronous in their nature So is the RocksDB database layer that we also open sourced It sort of clicks with people at Facebook They understand what it means They understand why it’s cool, but why? What does async mean for a programmer? The easiest example is two variants of the same thing We want to serve a response to a user requesting a webpage If we are able to fan out the requests to data sources, the response time is going to arrive sooner So, the latency is going to be better for the user Everybody is happy One of the things that I learned at Facebook is that it’s the latency that drives the experience, not necessarily the time that takes to complete an operation So, responsiveness an important thing, also in UI or mobile, especially On Python, we didn’t have a great story about this And very often people would just say, “Come on, like, you can just spawn threads.” “It’s going to be awesome.” In fact, it’s far from awesome [laughter] The problems with programming with threads is that it’s very, very hard to see the global state of your program So, not only is it hard to debug problems, it’s also hard and complex to think about shared state And obviously, we need locks Speaking of locks, this also means that your threaded application might not behave as well as you would hope The reason for it is, obviously, the global interpreter lock which we still have until Larry actually does the gilectomy that he just started and announced on the language summit So, I have my hopes up, but currently the state is this This is an actual slide from an internal talk I have given about global interpreter lock problem We were observing that the blue thread operations are just executed increasingly rarely, and we didn’t know why It wasn’t that they were actually doing more work or we had anything that we were blocked on, but for some reason, basically, we had some CPU leak, and that was, like, the strangest thing ever What it actually turned out to be is that our red thread that was absolutely uninvolved with anything interesting — just set up by somebody to just do some logging of what is happening in the application – – had a bug in it, and just grew a collection over which it was repeatedly iterating on, which means it spent increasing amounts of time doing the same thing And because of the global interpreter lock, everybody else had less time to do their actual work So, my talk here is basically to say that you don’t need all this You can use asyncio, and that solves some of the problems Actually, this is not a new problem, and this is not a new solution Back in the early 2000s, before Guido was even in the United States, and before we had PEP 8, there was a project started by Glyph called Twisted, and Twisted was, at the heart of it, the same thing But it was only after Greg Ewing added “yield-from” to the language where suddenly it became apparent that we can have a really nice syntax

to express what Twisted has been doing for all those years Twisted was, at the time, sort of like PyPi is now Everybody was talking about it, but people were really afraid to take the leap and start using it So, the hope was that with better syntax, with a simpler way of thinking about it, we might actually be able to at least start writing simple programs in it So since “yield-from,” Guido has been really excited about adding this functionality to Python And with Glyph’s help, we arrived at asyncio So, what is it? This is it; that’s asyncio in its very core It is an event loop that calls callbacks That’s the most important slide in this talk, so, really internalize that there’s no magic, no complex things about asyncio The real thing that you need to understand is there is one event loop calling callbacks What this means is this is a framework that can maximize the use of a single thread We can’t achieve, inherently, any parallelism with it It is more for concurrency if we get to coroutines But this is basically what it does But this sounds magical, so let me actually dive inside the code that does it to show you that it’s true When we use asyncio we would do this thing We would just say, “asyncio, get event loop,” which gets the event loop for the current thread, and we say, “Run forever.” And if you actually do this, what happens internally is we are actually running a loop That’s the code of asyncio right there You can check it, it’s open source So, I invite you to do that But that example was a little annoying because it was just running nothing forever So, what does the loop call? What it calls is, whatever you want, right? If we have any simple function that we would like to invoke, you can just tell the loop, “Please call it soon.” The terminology, “calling soon,” comes from the fact that it might already have something that it is running currently So, “As soon as you’re free, please call the thing.” We are calling “3” any things with different arguments on that very slide, and we are calling “later,” at least after two seconds, a call to loop.stop So, this is what actually happens Everybody is happy, and it looks like, yeah, we achieved pretty much with a little code So, this is awesome, and as I was saying, this is exactly what is executed on the event loop: a loop calling callbacks No magic here Obviously, the call to stop the loop after two seconds isn’t busy looping, waiting for the two seconds Actually, the loop takes — uses a selector provided by your operating system to be smart about what callback should be called next It’s not a first-in, first-out loop But it doesn’t really matter, right? That’s a technical detail The thing is, it just executes callbacks one after the other So, that sounds all great, but what if any of those callbacks were really slow As I told you, it’s just one thread It’s very simple If any of those callbacks is slow, everything else coming after it is going to be slow We can simulate that very easily by calling some operation that is going to just wait for a long time That might have been a URL call That might have been a database call, like, anything of the sort If we do this, then actually what we’re going to see executing the same code here — that everything else is waiting increasingly long and long and long, basically breaking the assumption that asyncio helps us with concurrency in any way Fortunately, asyncio is helpful in the regard that if you tell it, “pythonasynciodebug=1,” when you run your program, it will yell at you for having callbacks that are very slow to run So at least we can be warned about a situation like this that’s bad How can we solve it? How can we split work in a way where we are still executing this great amount of work but in a way where we are not stopping anything else from being executed in the meantime? Well, obviously, the answer is coroutines So, instead of just creating a plain old function, like we were in that previous example, what we’re doing instead is, we’re creating a coroutine function So basically we’re saying “async def,” and instead of just assuming some operation is going to be slow, we are explicitly awaiting on it

One distinction that I like to make is this: When we’re saying “coroutine function,” we’re saying this is the function that we have defined that creates coroutines A coroutine is already an instantiated call to a coroutine function The reason why it’s important is that that coroutine has a state and it’s going to take a while until it’s complete So, whenever I say “coroutine function,” I mean the thing that we put in the code Whenever I say “coroutine,” we mean the thing that’s already instantiated By the way, asyncio.sleep is a coroutine function, and its instantiation is already a coroutine, right? So this is it: How can we invoke coroutines with asyncio? It’s also easy We just create tasks on loops basically saying, “Please just execute this asyncronousy.” Create a task and it’s going to execute pretty well If we do this, if we execute this code, we’re back in concurrent land, everybody is happy We can go home, right? Specifically, what happens when we create a task is, really, asyncio is going to create an instance of a class called “task.” So let’s look inside, like, how does this work? It’s actually pretty easy Tasks instantiate a few boring things, hence the ellipsis But what it does at the end of its init method is, it says, “Oh, an event loop “By the way, please call soon one step of me, just one.” So what does it mean it has steps? What does it mean tasks have steps? A step might be one draw from my generator, because coroutines are based on generators But that also might mean we already have our result, in which case we’re just going to set the result on ourselves or maybe there was an exception, so we’re going to set an exception on ourselves If not, we’re just going to keep on drawing, but how do we do this? We’re going to do this by saying, “And by the way, I’m done now “Call me soon.” So, you can actually imagine that the execution in practice looks like this We’re just calling steps on all the coroutines that we have, one after the other, assuming that each of them is going to take a little time Hence, we’re actually maximizing the usage of a single thread We’re not going to wait on external I/O, but instead, we’re going to keep our CPUs busy So, that’s the model on which an asyncio is built Having that trampoline of the tasks, just casual, simple steps, one after the other But everybody lies, and I did lie to you, too There is one single thing that we forgot to do in our primitive example Specifically, when I said, “Please call later “After two seconds, just stop the loop,” or whatnot, there’s a little conflict between the coroutines that are already scheduled to be executed that are going to take more than two seconds, and our will to just stop the loop after two seconds Fortunately, as I said, I think I was friendly to the programmer that just, you know, comes to asyncio for the first time and doesn’t really expect those situations to happen If you use “pythonasynciodebug,” what is going to happen, it’s going to tell you that, “I’m closing the loop, “but there is still work to do “What are we doing?” So instead, maybe there is a way to say to the loop, “Please execute this for as long as it takes “until it completes.” And actually, there is and it’s called exactly as you would expect: “run until complete.” That’s true for a single coroutine, but what if we have many of them? And this is where the composability of coroutines comes in Asyncio provides many of them, and one of them is “wait.” You can create multiple tasks and tell the loop, “Please wait on all of them.” This is how you would express this “Run until complete,” we’re now actually going to execute everything as you would expect No lies here anymore This is how it runs So, that’s all great, but very often we would like those coroutines to actually return a thing for us, and the example didn’t really have this before But returning, as you would expect, in blocking programming from top to bottom, works exactly simple, right? You just return a thing If you have many tasks, all of the results are stored on the consecutive tasks object, so you can actually print them out If we do this, you can see that we are actually executing the functions at first, and then we are gathering the results and printing out the results, in which case we are seeing that, yes, actually some of the coroutines took pretty long to execute But concurrently, we were able to start them all, sort of, at the same time You understand, it’s not parallel,

but it is by means of steps which means it’s pretty fast anyhow If we only had one coroutine, the result can be simply taken right from “run until complete.” The same thing is true for exception handling It’s done exactly as if you would expect it So, if there’s any exceptions, you can get to them by looking at the task’s exception method In this case, if we do this, we’re going to see there is an exception type error: asyncio.sleep can’t really tell how many seconds we want it to sleep here If we wouldn’t want this exception to bubble up to our code somewhere else but just to handle it internally in the coroutine, you would do it exactly as you expected already knowing Python You just wrap it on “try-except,” and it does the right thing, in which case no longer any type errors It does the correct thing So, just to summarize this — and I mean, really, you now understand asyncio and you can go forth and spread the gospel From the blocking world, you invoke coroutines by either creating tasks on the loop and saying, “Hey, when you’re running, these are the things that you have to handle,” or you can simply say to the loop, “I’m going to wait as long as it takes “Run until complete.” If you’re inside a coroutine, again, you can create tasks So you can say, “I don’t care when this is going to be executed, “but just handle it,” or you can await, which means, “I’m going to wait as long as it takes “and only continue my execution afterwards.” This is really important, because what it does is, it makes all the switch points in your applications explicit What that means is, you can really reason about the shared state knowing that in between await calls, nobody else is going to modify your shared state, but any await call might be actually execute some different coroutine that might touch your shared state This is much easier to reason about than a threading situation where everything is implicit So, that’s the summary The good thing about asyncio is that there’s already a lot included for you already You can find all sorts of goodies in the docs already So obviously, you can create connections that supports SSL as well, that supports IPv6 We use it, so I know There’s TCP, UDP, Unix sockets, you name it You can do this You can also write clients with it just as easily Again, it supports SSL It supports UDP by using the same datagram endpoint You can watch file descriptors which is sort of the same thing but for local dealing with changes You can also invoke subprocesses These days, I find myself using asyncio even if I don’t do any networking, because just, you know, fanning out a hundred subprocesses is much easier with asyncio than doing the same thing with the blocking subprocessing module So, yes, this is it And I was obviously making fun of logs and whatnot and saying, “Hey, we basically can mostly get away with it,” but if you really want data logs, you can still have them, right? [laughter] You can still use those sorts of primitives Obviously, I’m kidding Sometimes they are needed, but before you get and try to use them, there’s queues Queues solve a lot of problems really nicely, so maybe this is actually a thing that you would like to use instead This is what asyncio provides Obviously it would be like, “Hey, where’s my Django?” There’s no Django, but there’s aiohttp which is pretty high level — actually achieves a lot So, I recommend you looking at the documentation of it It is actually pretty powerful If you want a database, Postgres is there, MySQL is there Actually, it’s pretty awesome, because it also supports the core API of SQLAlchemy Obviously, not the ORM, because SQLAlchemy does things implicitly for you, and that doesn’t work, with the summary of OS being really explicit about switch points, right? But the core session handling and whatnot is the same, same API If you want speed, then you can also have UV loops A UV loop is a new implementation of the event loop You can just drop in the reference implementation and have it run at least twice as fast in Prod So, asyncio is pretty popular these days There’s a lot of things happening, so I recommend you look at it But if you really want speed, you will inevitably find yourself dealing with already existing blocking APIs, and that makes you really sad, because you can’t do anything about it… or can you? Yes, you can There is a concept of executors in asyncio, which basically means if you really have a terrible function coming from a third party library and we can’t do anything about it —

it was just implemented to wait forever for something — we can just say, “Hey, I’m going to use a pool of threads and just go forth and run those slow things in those threads.” For stuff like sleeping I/O, with networking or whatnot, threads actually deal pretty well in Python, so you’re going to see that this thing might be slower than using the asyncio equivalent, but it’s pretty fast, too, so executors are nice But as I said, the GIL is a thing, so if there is a CPU-intensive computation that you’re doing, I would recommend you use the ProcessPoolExecutor instead So, asyncio provides both, right? Thread pools are easier, process pools are more robust, but you pay the price for picking and unpicking arguments to your process pool So, these are basically the good and bad sides of using either Now, let me spend a few minutes on, sort of like, how this is already used in production So, at Facebook, we are already 100,000 lines of code in — services that are on asyncio — and growing daily I’m pretty sure you know, by the end of the year, that number is going to be meaningless, but this is actually from this morning, so you’re the first to hear about this Instagram is already pretty heavily invested in asyncio as well Again, we hope to be even better at the end of the year Facebook uses Thrift very heavily, internally If you want suddenly to create clients and servers that are asyncio enabled, you can do this like this You just create a new namespace and the Thrift compiler is going to do the correct thing When it does it, you can just import your service and then start implementing your methods that are going to create responses for you, which is pretty easy If you want to create a connection, you’re using the exact thing that you would expect You just create a server factory, and then you just create connections if you are on the client’s side If you already have a connection, then you simply simply await on calls to the service that you’re using There’s no magic here You simply do the same thing as you would do in the blocking world, just prepend it with asyncio Same thing with calling subprocesses We do it quite often The API is pretty obvious here What I very often like to point out to people is this thing: You should be letting know that your subprocesses should die with the parent You do it like this It’s just a few magic lines in ctypes But actually, it tells the kernel, “If you’re using Linux, OK, “if the parent dies, the children die with it.” That’s useful for production Speaking of signal handling, obviously asyncio support setting signal handlers for all of the things — use it So, actually using asyncio for quite a while now, I think it’s going to be a year in two weeks since we deployed our first service on asyncio Let me tell you this: Just using Python 3.5 is much nicer to use with asyncio and much nicer in general, so don’t bother with 3.3, don’t bother with 3.4 It’s more complicated for you and the syntax is horrific, and there’s less opportunities for nice debugging Garbage collection, specifically, is better So, that’s one of the recommendations The other recommendation is obvious, right? Everybody knows it, and nobody does it… at least enough [laughter] But really, with asyncio, suddenly you realize that, “My threaded application that had, like, this massive state “and I didn’t know what it’s doing,” suddenly you just have a few coroutines, and you can just tell to the loop, “Just execute this guy.” And at the end of that execution, you can check, “Hey are there any other coroutines that end up being there for some reason?” So, it’s much easier to write a unit test for asyncio than there would be for a threaded application You can mock out coroutines easily You can just tell to the loop, “Just run this one thing until complete,” so, there’s just no excuse For some reason, I find that very often people would be terrified about the prospect of opening a new empty file and writing class, you know, or whatever test case For some reason, that’s considered a massive psychological block But it only takes a minute There’s no excuse; do it Set up debugging As I said, asyncio is very, very friendly, explaining what you’re doing wrong if you let it So, a few things that you can do Set up logging if you need it It’s very verbose, but you can do that Set up the GC debugging in which case, you’re going to see, like, “Hey, I did something horribly wrong “and suddenly I have a lot of uncollectable items,” and at the end, you can just set the loop to debug

which is the equivalent of “pythonasynciodebug=1,” in which case, it’s going to start telling you all those things about coroutines being left when you’re shutting down and stuff like this So again, this sort of thing One particular example that I found a little frustrating at first, was just instantiating a coroutine function to a coroutine but not awaiting on it, really Again, asyncio is really friendly It’s going to tell you, “Hey, you had instantiated some coroutines, “and never awaited on them “Is that OK?” It’s not, but you’re going to know about this So again, Coroutines are based internally on generators You don’t have to know about this now, but they are So, stop using StopIteration This is wrong This is going to cause trouble for you But in Python 3, as I said, there’s many nice things about it It can just return from a generator It’s awesome; do it — it does the right thing I like to prefer ProcessPool executors even though they have inherent costs to start with, but at least they’re known and there’s no surprises with the GIL Those GIL surprises are the worst, because it takes a while to debug those I hope that I convinced you to just read the docs and read the source, because it’s pretty clear There is a reason why Guido named asyncio a reference implementation So do it One last thought: In Python 3.4, when asyncio was introduced, you had coroutines expressed as a decorator, and you had to use “yield-from.” No longer; now you have this nice syntax But you still might experience some code that is using coroutines and “yield-from.” OK, I used images My name is Łukasz Langa That was Thinking in Coroutines Thank you very much [applause] (presenter) Thank you Łukasz We have no time for Q and A on this session, but Łukasz said you just go and talk to him if you have any questions And the next session is going to start at 11:30 sharp Thank you, and thank you again Łukasz (Łukasz Langa) Sure