Automated Testing at Scale in Sea of Thieves | Unreal Fest Europe 2019 | Unreal Engine

>> Hello. I am Jessica Baker I am a Software Engineer at Rare, and I am here to talk to you today about Automated Testing at Scale in Sea of Thieves First, a little bit about me I am Software Engineer on the Gameplay Team at Rare Software Engineer if I am trying to apply for a mortgage; Gameplay Programmer if I am trying to sound cool I have been there for two years During that time, I have worked on all kinds of different stuff ranging from AI to a little bit of backend services Before that, I actually was doing a mechanical engineering degree, so obviously, I am interested in physics and maths and simulations But I am also really interested in engineering processes and how we can do those across different engineering disciplines That is part of the reason why I am so interested in automated testing Now, if you have not heard of Sea of Thieves, it is an online multiplayer pirate adventure game which we shipped last year on Xbox One and PC It is a free-form, socially-focused game where you are sailing around with your friends in a crew on your pirate ship doing piratey things Looking for treasure, you are fighting enemy skeletons, getting in ship battles We have been releasing it as a game under the games as a service structure We have been releasing these regular updates culminating this year – not culminating, but the latest one is going to be our anniversary edition launch at the end of April, I believe You would have seen the trailer for that if you went to my colleague John’s talk earlier In this presentation, I am going to talk a little bit about the journey we went through in developing Sea of Thieves and why we found that automated testing was a really good way of facilitating this games as a service model I am going to give you a little bit of an overview of Unreal Engine’s automation system, what it gives you straight out of the box Although, I will not be covering everything you could possibly use Mostly what we have been using, which is actually we are developing with 4.10 rather than the most up-to-date version, so there may be more that I won’t be able to cover here I am going to talk a bit about how we extended Unreal Engine 4 for our needs, about how we use our tests, and how we get a change from a developer all the way out to general release as bug-free as possible Lastly, a little bit of the pointy end of automated testing, actually writing effective, automated tests that do what you need them to do Why use automated testing? Why for Sea of Thieves? Mostly, again, to do with the games as a service model We need regular content updates to keep players coming back for lots of new content, lots of new interactions regularly We need to have a really quick response to player feedback to keep our community happy, and that means flexible releases We actually have the capability to ship multiple times in a week Lastly is to do with how the design of the game works from minute-to-minute gameplay It is designed under the tenant of tools not rules Instead of giving the player rules, things like you can bale out your ship if it is filling up with water Instead, we just give you a bucket and then we are like, do what you want That might mean scooping up water, it might mean scooping up vomit and chucking it at your mates That means each new tool we add introduces a whole new host of interactions to test We are constantly adding interactions that need to be tested Just to take it back to the very basics, I like to explain things from first principles because we all have gaps in our knowledge at any level of complexity An automated test is a program or a script that can execute a root through your software and check that it behaves as expected We might want to break that down into three parts: setup, input, and output Setup is where we set up the environment, the behavior where observing is going to happen Input is actually triggering that behavior, and output is seeing what the outcome is and testing that that’s what we want it to be As a really simple example, say you’ve written a Function that just adds two integers together, returns the result

If you were testing that, you might – just for the sake of argument, so we have something to set up, let us say it takes those integers by reference for some reason We are assigning those integers to local variables Then we can pass them into our adding Function That’s our input, the trigger for the behavior Our output is the actual result of that, which we expect to be 11 We name our tests according to this structure We call them Given, When, Then If we were trying to name this test, we might call it, given two integers, when passing to the adding Function, returns the sum of the two integers Let us talk about what Unreal gives you right out of the box Unreal provides the Automation Framework which is able to execute automated processes on any kind of Unreal build For testing, it provides this FAutomationTestBase Class, which has a Function called RunTest, which I’m sure I do not have to explain what that does When it is instantiated, the FAutomationTestBase will register itself with the Automation Framework, which can then be triggered to run it If you want to write your own test, you can override this FAutomationTestBase Class, you can inherit from it, and override the RunTest Function If this returns true and we do not hit any exceptions, error logs during the course of the test, then the test will have passed Unreal gives you a couple of helpers to deal with some of the boilerplate involved in that The first one is IMPLEMENT_SIMPLE_AUTOMATED_TEST They have given me a laser Here is the name of your test Class You can give it a pretty name for it to show up in Editor, and the automation test flags can tell you what kinds of builds you’re running this against There, we’ve just overridden the RunTest Function to test that 1 is still less than 2, because if that’s not true anymore, something’s horribly wrong I have noticed there was an error on this slide when it was too late to fix it This will not compile, obviously It is not returning any value, so let us just imagine it says ReturnTrue at the end Now, the Unreal Documentation recommends a test as an example of how you might want to write a simple automation test I will not bother trying to read the whole thing, but you will notice that by the Unreal standards, they recommend that you do one test for a particular Class, and you test everything for that Class in the single test That is not our standard, which I will go into a bit more later But that is how they recommend you do it You will have noticed that the RunTest Function takes a string parameter The simple tests do not actually use this But it also provides IMPLEMENT_COMPLEX_ AUTOMATION_TEST In this one, you can override a GetTests Function as well You can provide an Array of strings to run the same test body on This is another sort of toy example I have made an Array of strings which are just the names of all the days of the week Our test body is just checking that they all contain the word “day” One thing to note is that each test case will be considered a separate test in the session frontend I will explain in a moment You are also provided with the means to do latent automation commands This is quite similar to how the test helpers work You can create commands which have override and update Function, and this will keep running every frame It will run this update for every frame until it returns true If you want to use this, for example, you might have an object that takes a little while to initialize You can kick off the initialization and then run this automation command which might check every frame, is this object initialized yet, and when it is, then it returns true, latent automation command is finished For running automation tests, you have a couple of options First is the session frontend I think you can attach this to various kinds of Unreal build, but it is definitely available in the Editor This will list all the tests that are available in your build

You can filter them, run them, check through the logs, debug, and you can find them in the Editor under the Developer Tools section in the Window menu You can also run tests through the command line We use this to integrate with our continuous integration software TeamCity TeamCity is able to execute various jobs on our build farm automatically We use it to build our builds, we use it to distribute, deploy, but we can also use it to run test suites on our builds We do this regularly, and the most regular one is every 20 minutes up to overnight Obviously, the 20-minute one is a much quicker test suite We choose the quick test, but also the ones that are most critical for everyone to keep working Things like, check that the game will boot up and run The last one is we have also rolled our own tools unit testing Our unit test running tool works very similarly to the session frontend and filter; run your test, check the output, debug But it just reduces the overhead of having to spin up a whole Editor if we want to use a session frontend A better use of this complex automation test helper than checking that day spellings are right, is you can use your GetTest Function to get the Asset reference strings of all of the maps within a map test directory Then you can use your RunTest Function to load up the map, run it, and wait for some sort of test success, test failure Event to come from the level Blueprint That means that adding a new gameplay test to work around the fact that Unreal unit tests are so atomic and do not support things like Actors – to run a gameplay test, you just added a new map Then, you can use the level Blueprint to actually execute the gameplay feature and check the output of that This screenshot is an example from my colleague Rob Masella’s GDC talk on this same topic He went into a bit more detail, but effectively, this test will force play input so that the player can approach the wheel, grab it, turn it, and then we can check that the wheel has turned That is one of the capabilities that we have access to, is the ability to fake player input We have added our own utilities for networked gameplay testing These nodes will pass execution of the Blueprint between the client and the server, which is obviously really useful for checking things that happen across the network We might want to set stuff up on the server and then observe on the client, or we can keep passing back and forth You can very thoroughly check gameplay scenarios with this You can do a map test for every single interaction But there are some challenges with them, one being the speed A Blueprint map test can take up to 20 seconds, or on average, actually, 20 seconds to run Whereas a coded test is typically 0.1 seconds Secondly, because you are sort of testing an environment where everything is real You are using real gameplay objects in real time They can be a little bit unreliable For example, the skeletons firing cannons feature, which does what it says on the tin, pretty much But if a player ship goes into range of a cannon on an island, a skeleton will spawn and start firing cannonballs at it My bit of the work was doing the physics prediction algorithm so that the skeleton knew how to aim the cannon to hit the ship in a satisfyingly realistic way To test this, I set up a map test and Blueprint with a skeleton, a ship, cannon, and wait for it to start firing Because we want to make sure that a cannonball can land near or on the ship We can check that that happens by checking the cannonball every frame and seeing if any of those checks are within range of the ship Of course, the problem here is that there’s no guarantee that our frame rate will be frequent enough that we are actually going to check it while it is in that area Of course, there are other ways to solve this

in this particular example But it illustrates how time, latency can be a problem for testing, or what you might call one of the Four Horsemen of the Testing Apocalypse I was a bit tight on time for this talk, and I thought about removing this slide, but then I did not want to Now you have to deal with it Our four horsemen are: latency, randomization, globals, and dependencies These are elements that you ideally want to be able to remove or isolate from your test environment In this case, latency is the one that is causing us problems because it is not necessarily deterministic But map test is still useful System and bootflow test, golden path gameplay tests or integration tests – these sort of tests kind of – you want to check that all of your elements are working together so the fact that map tests pull in all these dependencies can be really useful But if you do want to check every sort of permutation of your gameplay, what we did is we took it all the way back to FAutomationTestBase We added our own helpers that would add intermediate levels of inheritance between AutomationTestBase and your test Class, to use as test fixtures These will provide utilities that can be applied to every test so that you don’t have to repeat yourself One of these can just be setting up a map test in code so we can create a utility that will create a World, get it ticking, get the right Game Mode on it, and then to create your test, you can inherit from that We have macros that will wrap up all that boilerplate As well as this, if we do not need a whole map and we just want to test individual Actors, we have an FActorTestFixture, and that will create just an empty World with minimum stuff in it that you can just use to spawn Actors into Now, remember I was talking about the Unreal example of a unit test being one test per Class and all the checks inside it? We find that this can be a bit problematic for testing Actors, because you might end up with persistent State between tests, and also it is good to just be able to have one test be one scenario You just have a list of tests passed, tests failed, and you instantly know which scenarios work For this, if we are wanting to do multiple tests for one Actor, we can add Actor-specific test fixtures as well, which you will just handle the utilizes, make sure we are not repeating ourselves For example, if we are wanting to test the spyglass, then we can create a utility that will spawn up a spyglass and have an Actor wield it or whatever else we need it to do for it to work properly We have a few other test types as well The Asset audit is very similar to the map test in that it loads up Asset references, but it does it for every Asset in the game or in our build, and then we can set up an Asset audit test for each Class of test, for each Class of Asset Say you add a voyage type Asset and we want to make sure that if you have minimum amount of gold from a voyage or maximum amount of the gold for the Asset, a designer cannot accidentally put in 400 minimum, 300 maximum You can put that in as a check Screenshot comparison, which the rendering team used This will automate a scenario, take a screenshot, and compare that against the stock’s screenshot of the scenario working as expected Finally, performance tests, which we touched on in my colleague John’s talk earlier where we set up a nightmare gameplay scenario and put output metrics so that we can make sure it is going to run smoothly on all of the platforms we ship to and all the hardware we support It breaks down a bit like this on how many tests we have As you can see, Actor tests are by far the most common We are checking a lot of gameplay through that Unit tests do a lot of similar jobs as the Actor tests, might handle more engine stuff because it is basically unit tests, though you cannot spawn up an Actor Map tests I think in this data includes both the Blueprint and code and map test I actually nicked this slide from Rob’s talk that I mentioned earlier He referred to them as integration tests, but they are the same thing, if that is confusing at all if you are watching that later

In total, we have over 23,000 tests to run That is not including the Asset audit tests on the basis that if you remember from earlier with my complex test example, it included every test case as a separate test even though it is the same test body The same goes for Assets We have the same amount of Asset audit tests, 81,000, as we do Assets Overall, that is over 100,000 tests I am going to go through how we actually use these tests, all 100,000 of them What is important to note here is that due to our need for flexible, fast releases, we use a continuous delivery process This means that in theory, we can ship at any time We try and keep our build constantly bug-free, or as bug-free as we can That plays out in a bug count graph that looks like this yellow line The gray line is representing the bug count on Banjo Kazooie: Nuts and Bolts, one of our previous titles which was under a more traditional gameplay process where you reach feature complete and then go through and fix all the bugs That is a peak of over 3,000 bugs Of course, this bug-fixing time can be very unpredictable in how long it will take, and that means it is hard to schedule That is when you are likely to get crunch Whereas by keeping our bug count low, we have managed to reduce crunch significantly on the Sea of Thieves project It means that we are in theory able to ship at any time To get a change from a developer to a player is going to go through several stages with verification in between each one The stages are the local changes on a developer’s machine, that gets submitted to source control once it is verified, we take a preview build daily for internal testing, a limited release – so this is players in our insider program who are under NDA, have access to early builds of the game, and then all the way out to general release The last thing we want, of course, is for a bug to reach here It is seen by a lot of players, it is going out on Twitch to hundreds of thousands of people We can prevent that by doing different stages of verification at each stage of delivery In an ideal world, we would be wanting to get rid of 100 percent of our bugs before they even get checked into source control through these preventative measures We do not live in an ideal world, unfortunately, but we can get rid of a lot of them and maintain this continuous delivery process The first one is the session frontend, which I mentioned earlier Each developer checks in their changes with a full set of tests for any new interactions they have added, and they are expected to make all the tests pass to the best of their knowledge before they can check it in In this case, I might have changed something on the scale, I can see what is working and what is not Secondly, this is TeamCity again You can submit your changes to TeamCity to be consolidated with the very latest version of the build, and then we can run a test suite on it This has to pass 100 percent of the tests for us to be allowed to check it in This is one of my failed ones where I tried to clean up the voyage generator tests and caused 180 build problems, which I am really glad that I caught before I checked it in I think it was something like a typo and it did not compile, so all of the tests failed You have to get a green remote run before you can check in Lastly, it is still a good idea to give your change a quick manual test to catch any test coverage you have missed or to catch any things that are not so pragmatic to test with with automated testing Visual issues or audio issues, particularly We have probably eliminated most of our common logic errors during this point But then once you are submitted to source control, this is where our regular automated build verification that I mentioned before comes in If any one of these critical jobs fails, light goes red,

and nobody can check in until it is fixed, which means that we are enforcing our continuous delivery It also means that as soon as something is broken, or we are stopping anyone from checking in anything else, we can usually pinpoint exactly which change introduced the issue and either back it out or fix it We still have manual testers We have a lot fewer of them, and the great thing about using that alongside automatic testing is they are doing a lot less of the sort of routine manual testing or having to check things every time there is a new change to it Instead, they are doing what manual testers are best at, finding weird and funky new ways to break the game Automated tests are great for the issues you might be able to predict, and manual testers are great for testing things that you couldn’t possibly have predicted, and great for picking up, again, those audio or visual issues When it goes out to players through the limited release, we obviously do not want any bugs to reach any players at all But if they do, then this lets us pick up some of the lower repro bugs If it only happens every 1 in 1,000 times, then we might not pick it up with our manual testers, but the reporting here is really handy for picking up things that are low repro One of the things that makes people nervous about this kind of process is this sort of front-loading of quality into this first section, so doing all these checks before you are allowed to check things in They say, doesn’t it take a really long time to get a feature done? The thing is, you are saving time later by checking the quality now That comes back in a lot fewer bugs It is so much easier to prevent a bug than it is to dig through your three month old code later which has been changed six times since and try and figure out what is going on from that Another nice thing about the manual testing is if they discover an issue that could be caught by an automated test, we can then add in a regression test to stop it from being broken again A nice thing is that even though we are constantly changing the game, adding new things, generally, if things break and we fix them, they stay fixed, which is much more sustainable when we are constantly adding new things Lastly, I am going to talk about some best practices for actually writing automated tests and how you can make them effective and descriptive of where the issues are in your game That is another hang up about automated testing It is more trouble than it is worth; you have to rip up your production code to make it testable I definitely felt the same way very early in my automated testing career, so much so that I tweeted this back in 2017 “It is all fun and games until you have to write the tests” Now, thanks to some good practices I have learned, I can write good, testable code straight up without having to rip it apart, fix it later, figure out how I am going to test it With that at the forefront of my mind in using some of these techniques I am going to go through, I feel much more positive about it now It makes me think about my use cases, makes me think about my interface, and because obviously I do it so well and perfectly every time, it is a joy to write the tests If I could edit the tweet, I would probably make it say that As an example, I am going to use our Alliances feature This is a feature where in Sea of Thieves you sail around with your crew of friends on your ship, and the Alliances feature allows you to form an alliance with another crew You can do voyages together and share the rewards To keep track of all the alliances that might be on a server, we use the Alliance Service In our terminology, a service is a globally accessible object, and it exists for the whole lifetime of the server This is good for storing data needed by different systems Here are some example alliances We have got one between Crew C and Crew D, and we have got another one between Crews E, F, and G Let us add some public Functions, an interface to this service We definitely want players to be able to form alliances

If we call that with Crew A and Crew B, that is now added to the storage If we want to query this data, as an example, let us say we want to get the number of alliances It would be an output of three When we come to test this, we might be thinking about our one code path, one scenario is one test rule We might interpret that as meaning we want to check each of these Functions We are staring with the Form Alliance You might want to do a test where your setup is to instantiate the Alliance Service, your input, the trigger for the behavior is calling FormAlliance with Crew A and Crew B, and then checking that alliance storage to see if there is now an alliance between Crew A and Crew B We have already had a problem How do we check this private data to make sure the expected behavior is happening? We might add a Get Function We might even be tempted to make that data public Instead, we are going to interpret our one code path rule to mean one public input output flow Again, we instantiate Alliance Service, call FormAlliance Crew A and Crew B, and we check the GetNumberOfAlliances now returns 1 Now, this is a method called, test behavior, not implementation Instead of testing that certain triggers create certain internal States, we can examine actual use case flows to check that they behave as expected One of the benefits is that the internals are not affecting whether the test passes or fails It is really helpful for refactoring where you are changing all the implementation but you want the behavior to remain the same Do your tests still all pass when you change your code? Let us talk about another one of our horsemen of the test apocalypse Dependencies If another Class or Function is dependent on the Alliance Service, we want to avoid the implementation of the Alliance Service from affecting their tests Imagine we are not running tests for the Alliance Service this time Instead, just say for fun, we have a ServerFriendlinessService The query is the number of the alliances on the server to determine how friendly the server is An example, if there are more than two alliances, we are going to label it a friendly server Otherwise, it is a curmudgeonly server We are going to try and test this Note that we are not testing the Alliance Service We are writing tests for the ServerFriendlinessService We do not care what the Alliance Server is doing We just want to make sure that this works on the ServerFriendlinessService For that, we are going to have to make sure there is an Alliance Service for it to query If we are very good with testing behavior, not implantation, we are going to call FormAlliance three times to make sure there are three alliances to get Then we can call this customer Function on the ServerFriendlinessService and check that we have got that output, so friendliness is set to friendly What we can do instead is add an interface to the Alliance Service This does not necessarily have to be a 1-to-1 interface specifically for the Alliance Service It could be, say you have got a container which is holding items It might want to refer to the items via a storable interface We are adding our public Functions to this interface and querying the Alliance Service through that interface This means that in a test environment, we can not bother using the real Alliance Service at all We can use the mock Alliance Service In order to make this GetNumberOfAlliances call return three, which is the input we want for the test, we can set just a random number of alliances to return to three, and override from the interface the GetNumberOfAlliances Function to just return that or just make it say return three if we are only using it in this test This means that in our test, now all we need to do is set the number of alliances to three It is a little bit more boilerplate, but one of the good things about this good practice, mocking out dependencies, is that we do not have to know

about the internals of other Classes We just need to know about their interface It means that any changes or breakages – because before, we were relying on these FormAlliance, GetNumberOfAlliances Function to work, for the ServerFriendliness test to work We no longer are reliant on that because we are not using the real Alliance Service The next point is also about isolating functionality, but that is doing it through the design of the interface Let us forget about the other Functions we had before We are writing tests for the Crew Class, which is representing a crew in game Say this has some functionality, which probably already looks dodgy to you It wants to check if it should share rewards with another crew It wants to do that by querying the alliance interface We will say, okay, you want to know who you are allies with? Here are all the alliances that we are storing We are doing it very nicely through the interface, of course We are going to cycle through these alliances, find if there is an alliance that contains its own CrewID and the other crew’s CrewID to check that Then even if we are mocking out the Alliance Service in a test environment, we have to know all about how it actually stores alliances If that changes, we then have to update all the crew tests, which is a nuisance Instead, we are going to care less about what the Alliance Service actually does, and just ask it for the information we need You might start by saying, actually, I just want to get the crews that I am allied with Which, on the crew side, involves much less alliance-rated gubbins We move that all over to the Alliance Service Code looks much more concise now Then our test, all we need to know about is CrewIDs, which the crew already knows about Or we could be even more concise than that and we could just check, am I allied with this other crew? What it means for crews to be allied, the crew does not care about that All it needs to know is that they are That makes the testing all that much easier This principle is orthogonality, treating each Class as a black box that only provides the information that you need to know about it, and keeping that logic internal as possible It is generally a good object -oriented programming practice, and it is really maintainable If anything about alliances changes, all you need to change is alliance-related Classes Using all these methods and some more – this is just a primer – it means you get not only instant feedback on how your code is working, but it also makes you think about thoughtful interface design by making you actually use your interfaces straightaway In summary of what we have talked about today, Unreal provides basic automation testing support out of the box It is quite straightforward to extend that, and you can get really good returns, really thorough testing out of that It makes a great companion for sustainable games as a service through things like making sure that bugs stay fixed It enforces good object-oriented programming practices by making you think about your interfaces It takes care of all of that routine testing so the manual testers do not have to Now, of course I am here to tell you that automated testing is great I do think that But I am not going to make you think that it is going to solve all your problems Not a silver bullet That is very illustratable through my favorite Sea of Thieves clip of all time, mostly because of this guy’s reaction [Laughter] Yeah, mistakes do happen, and we are always catching new ways to improve our testing processes to make sure that ships stay where they are supposed to be on the water Obligatory hiring slide If this all sounds good to you, if this sounds like a good way to work, then we are hiring and you can go to that URL and check out what all our available roles are Thank you for listening I have left some resources up of some of our recent talks I think Rob’s talk has just gone up on the GDC Vault I think that is available now, about eight minutes before this talk started There are some Rare tech blog post

I have got a blog which I forgot to put up on this slide, but if you go to my Twitter, it is linked on there Thank you for listening [Applause] ♫ Unreal logo music ♫