PLUG: iVEC SC13 Student Cluster Competition Team

perth linux users group talking linux and open source then everyone as was introduced will be talking to you about the competition that we’re entering in 13 weeks from now in november which will be will have to travel to the US to compete in we’re one of eight teams who will need to compete in our ability to develop a linux cluster and administrate that over a 48-hour period to get as much good data out of it as possible but before i go into detail about the talk i’d like to give you a little and an anecdote about a case study that I think really exemplifies what this competition is about so the case study is about this machine here which was in the top 500 ranking of fastest supercomputers it was number one twice in 2008 and once in 2009 you’re looking at two hundred and ninety six wrecks there and 120 odd thousand cause so it’s quite a big machine and so it was the very it’s called the IBM Road Runner and it was the very first machine in the world to hit 1.0 had a flop sustained and it ran Linux of course in this case it ran both red hat enterprise linux and fedora and I believe that’s because it had two different sets of cause one was AMD opteron cause and the other was the cells cause that you also see in PS 3s but an enterprise version of those course so any software that had to run on that basically had to be designed specifically to run for this hybrid architecture and it became fully operational five years ago in 2008 plus about 450 4 million dollars but was the fourth most energy efficient machine at the time and is still one of the 22 past well the reason I’m talking to you about this and perhaps some of you know the story it made the news recently and I’ll get to the punch line shortly and I apologize if you’ve already heard it but I was interested in this 45 million dollar I think so it looked at the budget for the WA to see what sort of things you can get with that and I found that in last year’s budget we assigned 32 Australian for this high-tech facility built from scratch on top of a waste to dump basically it’s the west australian institute of sport high performance service center for athletes to prepare for the 2016 olympic games in rio so you’ve got a room in there 10 meters tall so people can do pole vaulting indoors you’ve got track and field you’ve got hydrotherapy we’ve got a whole floor full of staff now imagine if i told you that five years from now after this has been running for five years they’ve decided oh it’s too expensive to run we’ll have to shut down the facility I think there’d be a bit of outrage in the West Australian athletes and that’s exactly what happened with that IBM roadrunner machine it’s been running for five years and the nuclear US nuclear administration which runs it have said it’s too costly to run so we’re shutting it down and so as reading the internet articles about this and of course with the internet everyone often is quite outraged about things and there’s a bit of outrage about this and one person said they should rate supercomputers by petaflop spur kilowatts that is the real efficiency and the ranking that means the most it doesn’t make much sense to have such an incredible powerful computer if you can’t afford to keep the power bill paid and this is what this competition is all about and in fact you saying they should do this but they have been doing this for the last seven years in 2007 in addition to the top 500 ranking which ranks computers just by pure power there’s a green 500 ranking which rate ranks and by flops per kilowatt and that the the other reason why people are outraged about this is because it was running nuclear codes so you guys just can’t hand it down to some University so that they can then use it for their science they they have to destroy the hard disk read though and so it feels like such a big race so I was curious about this because they the official line is they’re shutting it down because it’s too costly to run and yet I was reading stats that it’s the fourth most energy efficient machine

when it launched and at the current time it’s still the most 141 when it was shut down it 141 ranked on their most efficient supercomputers so it is true that new super computers today are four times as more energy efficient as this but so the internet were outraged that they were shutting it down because it was too much power but looking in deeper it seems like it’s also being shut down because it’s too costly to find programmers who can program for the sell ps3 architecture this hybrid thing where s work for the cell and the ABM AMD opteron at the same time so these two things is what this competition is about it’s about making sure that we bring a cluster to this competition that meets the needs of the science applications that we’ve been given in advance and also is efficient use of power so the competition gives you 20 6 amps of power I have 110 us volts so works out to be about 3.4 3100 watts and within that framework you can bring in any machine that you can find hardware vendors to donate to you we get told in advance that there’s three applications that we need to optimize this hardware for and optimize the software stack that we put on this hardware for the applications involve weather forecasting nanotechnology simulations and also graph lab which is a graph traversal and data analysis package I might back to the slide later so that’s that’s a framework of what this talk today is about it has an outline of will be speaking and the different subject matters so I’ll just give you a tiny bit more about the student cluster competition then I’ll pass it over to back you’ll be talking about how our team formed and the training that we’ve engaged in that brought us to where we are today how our coaches vision brought this to reality I coach used to I showed you the IBM Roadrunner it used to be number one when it got knocked down to number two it got replaced by a machine called Jaguar which has now been upgraded to tighten and tighten now has been knocked down in to the number took place but this machine came out from Oak Ridge National Laboratories in the US and our coach Rebecca used to work for overage she used to have tasks related to that machine and now she’s come here to birth to work with our clusters here I’ve a cat purr and one of her dreams was to send former team together to this competition in the u.s. there has never been an Australian team that has come across to the u.s. to compete and so bet Beck will talk a bit about how that dream became a reality next effort will talk about how we’ve gone analyzing the requirements of the competition how we’ve analyzed the software and profiled it and and go on about selecting the hardware that will take to the competition next we’ll have a couple of students talking about the benchmarks and applications which will be tested against during the 48-hour competition one is just a high-performance Olympic application which is pure flops pure flops isn’t that meaningful for real data analysis and scientific open source applications so it’s possible to bring a machine that has a really great flop score for HP will impact but it doesn’t do so well in the other applications so GraphLab is the data analysis graph traversal software image before Nemo five is the nanotech and Worf is the weather forecasting software we may have a demonstration if we have time of some of the profiling we do the software and finally our coach will talk about a bit more about I’ve a clear and earth and

what what sort of people they employ and what sort of jobs there I don’t know maybe the idea for some people listening today of doing system administration for tens of thousands of cores on a cluster might be interesting to you so Rebecca will talk about how it’s how what sort of opportunities are available at APEC and we’ll have some questions at the end so just a little bit about the origins of the cluster competition it did start in 2007 interesting in li enough at the supercomputing conference 2007 which is run by the ACM and I Triple E that was also the place where that green 500 ranking was launched that I mentioned before that ranked our computers by flops for what I found that quite interesting that both these things started at the same time in the same face listening to the founder one of the founders talk about it it was just chance that he thought you know setting a power limit two sockets in a wall would be a good limitation to give people a boundary but also freedom to to meet top ties whatever Hardware they can but maybe it was something at that time that and I think 2007 was when the hurts on machines were getting ridiculous and power consumption was getting ridiculous and so people really started to be concerned about what sort of that cost effectiveness we’re getting out of machines so the competition runs over nine months basically and we’ve been doing it for about six months now getting the hardware together training in the software stack version control compiling profiling familiarizing yourself with how to schedule jobs on multiple nodes it’s a lot different from just running a single Linux box at home so the eight teams that are competing in November they spend the six months going through that process they liaise with hardware vendors choose the hardware that they’d like to bring a webbing sponsored by SGA and the video are providing some GPUs for us there are several competitions running ones out of America which is the one we’re going there’s also one in Europe and I’d just like to mention a few comments that this founder said about this competition which I thought were quite interesting so the big difference between this us competition which started it and the European competition that followed is the u.s. competition is 48 hours continuous no breaks so you’ve got six people on the team and you’ve got a lot your time so when some people are sleeping some people are monitoring the jobs and scheduling things and if something goes wrong if there’s a hardware failure there they’re pulling out the hardware hard disk drive and replacing it and if it’s using too much power they will arm go off and you have to put in the correct commands into your machine to to drop that power stop the job so you can resume it later with less resources and he described it when comparing it to the European competition where people have to go home at five pm and it’s just a day thing as this and I thought it was quite interesting um this is kind of what it looks like with this everything so this is this quick the Europeans they’ve got a much more humane way of driving this competition where the students are kicked out after 6pm or something so his thing Reese kind of treating us inhumanely i’m a bit concerned about it as well and he goes on to say we have lots of pictures of students sleeping under desks and chairs and dead pizza boxes and soda cans and all that we provided a bunch of food for the students and every morning I would come in at six o’clock and it would be completely decimated so it sounds like it’s an interesting competition he was having an interview with someone at the time and the interviewer added in this interjection at the time that I he knew of one of the kids had a friend in the military or something and he had military spec caffeine it seemed to do the job I don’t know what military spec happening is fun it sounds like a very interesting event so this is our team there’s 11 of us we’ve got maybe half of us here today I’m now going to pass it

across to back we will talk about how our coaches vision of dream of that team across from Australia everyone so I’m Beck and as you can see that the young team that we are going over with and so it consists of Rebecca mainly because you know she was the one who came over and she had been a judge in the supercomputing competition before and so she decided I’ll she aspired to send a team from Australia which also happens to be the first one from the southern hemisphere and so it was initially actually quite difficult getting the team together Rebecca actually had to go to universities like murdoch ECU curtain you WA to assemble the team and actually i might initially at least we came up with on this way replies from students so they kind of paint at this whole movement and i’m sorry the team is cool consists of evatt Rebecca Andrew Adam and Jake and also some people who aren’t here today so we’ve been really lucky we’ve got sponsors from SGI nvidia alenia and rio tinto and st ismaili um we’ve been allowing a lot with them and you know getting like the designs of our systems and and also they’ve been providing funds first in trouble and if he does actually given us eight GPUs to work on and alee nails also sponsoring our debugger cult DDT and also our profiler pop map and rio tinto’s also sponsoring our student troubles and the conference will actually cover the conference registration and they are accommodation of food as well so what we’ve been doing since March was we’ve been having weekly meetings for about three hours a week and just to make sure that everyone’s on the same page when actually came together we actually had a few lectures and also some guest speakers from I Veck who came and you know generally shared their knowledge with us and also we have been designing our system and icing really closely with SGI the past couple of months and I saw have been on discussing the hardware that we all need and so ever it will now talk about hardware that we’ve chosen so hardware is obviously pretty critical to a computer a bit of an understatement going to do a little thing here I’m going to ask you what you would think of if you were designing as cluster for a competition where you had a power limit what are some of the things you guys were taken to consideration I’d be thinking of things like AR AR m processes or laptop hard drives or sorry laptop CPUs or I suppose something something that’s designed to run on a battery like efficiency of power supplies I guess file system and stuff so all those things yeah and efficient network switches ones which don’t us don’t waste power for when you’re not actually transmitting the stuff on particular ports and so I promise I’m not mining you all for ideas right now exactly not what we’re doing so yeah those are some of the things cost is a really would be an implementing for most people right most businesses if they’ll building a cluster cost would be pretty important space which is more I meant to say architecture which is what you guys talking about how you’d hook everything up I would run utility which basically means okay cool it can crunch numbers at six powder flops but can it actually do anything else like it’s a good at solving complex programs is it good for sort of utilitarian sort of stuff power consumption which is what you were talking about really important for us obviously as the cluster competition is basically how much power or how much flops can you get per watt because that’s all it is really how much power can you generate new computer from nashbar you’re drawing and again costs cool so now the SGA has sponsored us and basically said build whatever the hell you want for for this competition that eliminates costs so those things don’t matter anymore and that’s a personal dream of mine to be told to build the most powerful computer you can wear cost is no object cool so that leaves us with some options what do we do if we’re building hardware

you can go for an all cpu node approach which means that yes it’s really flexible but it might not do as well as some of the other thing as a GPU in terms of crunching pure numbers there’s a thing that they run gumption sort of split into two weird parts where they have linpack which I don’t if you guys are aware of what it is going to go into more detail when Jake goes over that but limb facts basically how many floating operations you can do per second it’s swaps which is a pretty scummy way of determining how good your computer is but that’s the benchmark at the moment got all accelerator nodes which would be like a new Xeon Phi that have come out that’s Intel swine and there’s nvidia tesla they’re basically weird sort of parallel lives host nodes sort of feed off your main one and then increase your optimal optimize your speed by am running the code through them cool and then you’ve got hyper in cpu and accelerator which is sort of mix and match two GPUs and you CPUs and see what comes out the best yeah it’s the pretty simple obvious answer to that why not both have CPUs and GPUs run them together get them to talk and pass back information between them and basically use GPUs for the heavy number crunching stuff and then Steve used everything else that’s pretty much what we chose to do this is the architects that we’re looking at at the moment that’s a stack pretty cool how we how we came to decide upon this is we sort of ran all the power consumption figures of each individual CPU and GPU that we’d thought of using into a very complicated software called Microsoft Excel made spreadsheet and sort of spout out what’s the best power we can all what’s the best computing power we can get for minimal power consumption so this is pretty much what’s going to happen with the stack that we have custom GPUs CPUs means a we were embarrassed ourselves from running limp if you didn’t have any GPUs you’d be looking at all orders of magnitude less than the next team that had one GPU so you would be doing pretty bad allows this potentially accelerate mystery up which we have no idea what it is it could be GPU friendly it could not the promo chip use is that not a lot of things supported to run on them so in a competition sort of scenario we’ve got 48 hours to sort of port an entire piece of code onto a GPU and you don’t even know what that code is you don’t really want to be chewing through time spending hours doing that when you could be potentially optimizing other things and yeah GPUs also allow us because they get more grant for what they’re using allow us to maximize our power and computing power and I think Jake’s now going to talk about limb fat hi I’m Jake I’m going to talk about linpack so linpack was a library that was developed for supercomputers back in the 1970s to allow linear algebra and basically one of the main operations is matrix are doing operations on matrices and when supercomputers got lots of memory the problem sizes got really large and the calculation time got quite large as well so to try and estimate how long a problem i take they decided to release a benchmarking tool so initially that was just a 100 by a hundred matrix of floats randomly generated and then it did a series of matrix operations time to get your flops then as time went on computers got more powerful you can specify the size and you could work out from that how how long your problem i take if you were to run it on your supercomputer then in two thousand they released hpo which is high performance lim pack written in c and it’s optimized for parallel competing so they took advantage of openmpi which is the standard framework the distributed supercomputers handles all the networking communication for you blast is just a linear algebra library and it’s become the standard benchmark when you’re talking about supercomputers when you talk about what we’ve been talking about already are you talking about you’ll impact score how many are floating point operations per second you can do but it’s also pretty useful for stress testing so when you first start you want to run linpack on it to get your score for your supercomputer but you also do it to burn in you you CPUs also for maintenance if you just fixed a

a node you want to check that it’s running optimally you one limb pack on it and that’s one of the big issues is running limb pack now as well because there’s so many nodes on some super computers often you’ll get hardware failure so you have to run at multiple times before you actually get your your floating point operations per second some days sometimes it takes days or weeks even with some of the largest supercomputers okay so now we’re in the new era we’ve been talking about GPUs already so in about 28 / 2010 and video released like better support for GPUs and supercomputing so they released the CUDA accelerated linpack which basically allows you to run a limb pack on your GPU but there’s some disadvantages and advantages of this the disadvantage is having separate memory between the GPU and the CPU and so you’ve got to communicate between thats that causes a bit of overhead there and open mpi which is I was talking about earlier that’s a cpu only library so when you de wants to call that it has to talk to the CPU first and then call the command so you get some latency there as well popularity is being gaining with GPUs now because of their efficiency originally that wasn’t the case because our only really designed for single single precision now they’re now they’re optimized for double precision as well which is what linpack uses so they’ve become more efficient which is why they’re good with their flops but what now and it’s become the standard in the student cluster competition linpack benchmark so everyone’s running CUDA hpl now if you don’t have a GPU you pretty much get destroyed in the aisle impact score so here’s a here’s a graph of the student cluster competition impact scores it’s in teraflops and in 2007 was when the the competition first started there people probably weren’t used to the competition maybe but everyone was using CPUs back then and about 2010 which is when they first started using gpus and it wasn’t a big jump up but like I said that was sort of to do with double precision and also probably you know the tools weren’t there as well for debugging and optimizing that there are now and as you notice score for 2013 which is eight teraflops that was actually taken from the European competition in June so you know when we do our competition November the new Intel chipsets will be out so we could even possibly get up to maybe ten teraflops which is pretty awesome as well so some things to sort of tonight when it comes to limb pack was first released it was really designed for vector processors which are really good at mathematical operations and whatnot then modern CPUs came around scalar systems x86 and now it’s sort of going back to vector when it comes to GPUs as well so it’s quite interesting and even though lim packs used to rank the top 500 supercomputers it doesn’t push memory doesn’t push network it doesn’t push the hard disk read right times so you could run you could have the best impact score and then have a really terrible performance on an application so for example it’s even worse now with hybrid systems when if you really want to you could buy really low voltage CPUs by a ton of top-end GPUs get the best linpack performance per watt and everything and then when you run a single word application you’ll get really slow performance because you’re it’s all want to see you’re not the GPU now I’m going to hand you over to andrews going to talk about our graph lab hello I’m Andrew and i’ll be talking to you about GraphLab GraphLab is just a large collection of graph based applications probably more accurately referred to as a toolkit and yeah and it’s divided into six main sections those sections are topic modeling graph analytics clustering collaborative filtering graphical models and computer vision now I’ll just briefly describe each of these sections and for handing her over to coach Rebecca starting with topic modeling now topic modeling it basically finds groups of documents that contains similar topics within it’s used for applications such as online trend analysis to find popular topics on the web and you know other such applications not listed alright moving on to graph

analytics this is actually pub lee wuhn of graph levels with largest areas it has a whole range of applications and algorithms I’m fortunate look on this term over to you don’t have that much time but a couple who listed up there including page rank and graph partitioning page rank is mostly used to rank web pages for use in search engines such as Google they actually use it but they also use a whole bunch of other algorithms as well and graph additional well if petitions graphs quite useful yeah bratface applications including group including graft lab itself which uses it in other algorithms including its spectral clustering algorithm which is actually one of graph labs to clustering algorithms along with k-means and so i’ll just really really quickly discuss clustering because it’s really quite basic it’s just a very common data mining technique and it groups data together face that share similar attributes it’s good for sorting through large on organized data sets alright collaborative filtering as another section of GraphLab and is quite commonly used in recommender systems to predict people’s interest such as move these books and TV shows it’s um so I used a lot by companies such as Facebook to target their their advertising to particular audiences a graphical models is a rather short section of GraphLab but quite interesting it contains several prediction models for a whole range of applications once that’s being image reconstruction which we’ve got an example up there you can see on the top left image there was an original image they’re just a bunch of different shades of grey and black and just below it is the same image but with a heavy amount of noise being applied to it running one of graph data structured prediction models on the noisy image actually gets rid of the noise completely unfortunately does lose position around the color boundaries as you can see in the right image there neither brings me to the last section of graph lab which is well just contains a single function really in that is image stitching which is just used to stitch together like panoramic images I’m sure you all know what they are proof of individual images dist together to make a big one and I left very briefly very quickly actually yeah wrapped up GraphLab so I’ll just pass you on to is it your next all right I’ll she wants Rebecca all right so I’m pinch-hitting today for nemo five-hour Nemo five person can be here but he did make these really nifty slides for us so i’ll tell you about Nemo five so name of five stays for nanoelectronics modeling tools part 5 or number five I guess it’s free for academic use its kind of open source in that sense except that it’s really not exactly open source so I had to register with the Nemo five folks in order to get the the source code it’s been they’ve been working on it for over 15 years it’s developed by a group at Purdue University in the United States the purpose of Nemo 5 is to model at the atomic scale so it simulates nanostructure particles and I’m sorry no structure properties such as strain relaxation phonon modes electronic structure and self self-consistent Schrodinger plus on calculations and quantum transport an example that is of interest especially to our folks is modeling quantum dots I think they were studying some of that in there eww a course where they found out about this opportunity to participate in this cluster competition so nemo 5 is really it’s a collection of modeling tools and in a sense what happens is that you you design your own inputs with either the aid of an inbuilt scripting language or you use Python and then you select the geometry the physical conditions and the solver routines that you want to use and all of that comes out of Nemo 5 it uses a lot of other libraries mainly for the linear algebra routines some of those include slep see which is the eigenvalue solver pet see which is a nonlinear equation solver and AR pack it uses q hole for small multidimensional tensors and it’s very parallelizable through the use of MPI it’s actually a code that I know that has run on the entire system

of Jaguar which at the time was the number one most powerful supercomputer had 16 18,000 688 notes so it can paralyze up to that degree it’s not currently really designed to make use of GPUs this is something of course that they are investigating and there are certain components of libraries that can use GPS but overall it’s really not a GPU accelerated code now I saw somebody earlier here who was wearing an XKCD shirt oh here we go so you’d appreciate this so the student who has been doing Nemo 5 has been very very frustrated with trying to compile it I have to confess i have not had the time to sit down myself and figure out exactly how to compile it so he’s been looking on internet websites and using google to try to find how to solve the problems that he’s encountered so he felt like this was particularly apropos for him he says never have I felt so close to another soul and yet so helplessly alone as when I google an error and there’s one result a thread by someone with the same problem and no answer last posted in 2003 who were you done for a coat or nine and what did you sing so I’m sure we all have had that experience so he has yet to successfully compile Nemo 5 because it is hard I didn’t quite realize how challenging this one was I had thought that GraphLab was going to be the hardest one so I’d really focused on that with the team but Nemo 5 is pretty tough it has a very strange and eccentric make system and there’s not a lot of useful documentation on compiling it unfortunately unlike Wharf which is the third application that we’re working on so if you’re real interested in nemo 5 you can find out more about it at this URL alright so i’m also pinch-hitting for the wharf person and i have even less material on that so Worf is a weather prediction model code it comes out of n car which is the National Center for Atmospheric Research in the United States in Colorado Wharf is a really great code it has a pretty simple build system so if people were pretty much able to get it up and compiled at least so far running it has been a bit challenging for them but it’s it’s a work in progress and unfortunately this is also we were going to do a little demo with Worf but our demo guy isn’t here so we’ll just have to skip that and so I will switch gears talk about avec now so my name is Rebecca Hartman Baker I am a supercomputing development and application specialist at I Beck I just call myself a computational scientist I think that’s a lot easier as Adam indicated before I used to work at oak ridge national lab in the united states i’m sure you can tell i’m not from these parts you know after hearing me say a single sentence but I came here to Australia to I BEC because it was a once-in-a-lifetime opportunity to join a supercomputing Center and help build it from the ground up and one of my dreams has already been fulfilled which is to to create a 18 of students to participate in the student cluster competition at the annual supercomputing conference so I’m really excited about that and I’m really glad to be here so I’m going to tell it to talk to you about four main things I’m going to give you a little introduction to I bec i’m going to talk about our super computing resources applications like why in the world would anybody ever use a supercomputer and finally talk about working with I’ve egg so who here has actually heard of I BIC hey all right good that’s good so our goal or purpose in life I guess is to energize research and uptake of supercomputing data and visualization in Western Australia and beyond that sounds so good doesn’t I rick is an unincorporated joint venture between CSIRO and the for public universities here in Western Australia so technically nobody actually works for I back technically I work for Curtin University but I try not to you know be overly partial to curtain because I feel like I work for I Beck and I’m trying to engage in supercomputing uptake at all of the universities including CSIRO most of our funding comes from the government of Western Australia they pay my salary

so thank you government the commonwealth government and also contributions from our partner institutions so here are a our partners CSIRO is our main agent what do they call it some kind of agent something rather but also the four public universities in western australia so curtin university murdoch university edith cowan university and university of western australia we have facilities across the perth area so at CSIRO arc that’s kind of our our HQ here and that’s where we have our posi Center which is where our big supercomputer is coming in Curtin University we’ve got uptake and visualization facilities there Edith Cowan we’ve also got uptake and visualization facilities we have supercomputers currently at murdoch university and the university of western australia now in 2009 after the global financial crisis australia wasn’t suffering as badly as perhaps some of the other countries but australia figured out you know we really need to have sort of a stimulus program and as part of that they had the super science initiative and it was a 1.1 billion dollar program and the purpose of it was to kind of refresh or bring in some new research infrastructure across australia and so in three places is really where the primary investments were one of them was here in perth for the posi center for a radio astronomy in geosciences supercomputing center the life sciences they got a place in victoria and climate science and water management in camera so the posi centre is of course the one that we’re all the most interested in today the federal government gave us 80 million dollars to build a new supercomputing center that would first of all provide a boost to the supercomputing facilities and possibilities here in Western Australia but would also help to a support Australia’s bid for the SKA so first thing we did was we spent 9 million dollars on super computers so we bought these two machines that I’ll tuck boat next after that we’ve started building a big building to house supercomputers in I’ll tell you all about that too and then finally designing procuring and installing a petascale super computing system and that is what attracted me here so i worked at Oak Ridge National Laboratory there a petascale supercomputer was kind of old news we’d had one of those since 2008 been there done that but the exciting thing was getting to come here and help to bring that in and help to grow the super computing capacity and the supercomputing uptake in the community so it was a once-in-a-lifetime experience I hopped on a plane I’ve never been here before I came here sight unseen one year later I haven’t run away yet I also haven’t gotten deported so it’s good okay so stage 1a the first thing we did was we bought this machine called epic it’s located at murdoch university it is a data center in a pod okay so it’s like a trailer with a supercomputer in it which i think is pretty awesome this is like the thing I saw in my second day here I thought that is like the coolest thing on wheels so it’s actually not on wheels though but anyway so it’s a supercomputer in a pod it’s at murdoch you can see it this is a reflection of a car that’s not an actual car so it’s not actually that big its regular building size so epic is a really great machine it is it debuted at number 88 on the top 500 when it first came out so the 88 fastest supercomputer in the world was here in Western Australia it has a 9600 course on 800 notes 12 cores per node twenty four gigabytes of memory per node and qtr I be connectivity between nodes great machine the next year we brought in Fornax Fornax is named after a constellation it is a GPU cluster so it has 96 notes each node has one cpu and one GPU on it and this was kind of we kind of brought this in as a pathfinder machine because you know GPUs are all the rage but what we wanted to see was whether people are

actually going to use a machine like this and how hard it is or easy hopefully but hard it is to get ko’s up and running and working on this type of a machine so as I said we built this building the posi Center we just got occupancy of it in April so pretty recently it’s a really great building it has a big supercomputing room that I think is I think it’s like half a hectare or something trying to be metric here I think it’s half a hectare anyway one really innovative thing that we did do in this building was uh we’re using ground water cooling to cool our super computers okay I mean because you know supercomputers get really hot I mean if you’ve ever put a laptop on your lap you know you can get burned from that well supercomputer consists of you know a thousand nodes of laptops right all in one tiny space so it’s going to get really hot so you have to have a really good cooling system and so here we have this there’s the mullaloo aquifer which has a temper of 21 degrees and so we suck up the water from that and we cool things off in there it heats up to 30 degrees and it comes right back down into the aquifer but downstream from where we picked up the water okay so nothing is so that’s a very green initiative here our cooling costs are basically negligible the only cost would be with operating the pumps but actually we have that covered because we put a bunch of solar panels on the roof so we estimate that we could draw up to 1.9 megawatts in this facility so that’s pretty good so the next thing is a petascale machine right so that it’s called Magnus it is a Cray xc30 supercomputer and right now we have the first stage of it so we just brought in a smaller machine to get ourselves prepared for the bigger machine right now it just arrived it’s a 69 teraflop sandy bridge system it has 208 compute nodes each of which have 16 cores 64 gig per node another thing is we’ve got two petabytes of storage and we just finished our acceptance we’re going to begin early user access at the end of august so we have about ten projects lined up who are going to be able to use the machine for the next couple of months okay another thing is because remember we’re also in support of radio astronomy we have the real-time computer now the real-time computer is for radio astronomy so for the data that comes off of the saddle the dishes up in the merchants own area they have a 200 teraflop Cray xc30 supercomputer it’s going to have Ivy Bridge processors or current one has sandy bridge is going to have ivy bridge and it’s going to arrive in September so that’s going to be really exciting machine it’s not going to actually look like that but that is a picture of it xc30 with a fake idec thing on it nice thing about a Cray supercomputer is you get to pick a mural to put on it it’s pretty awesome that’s all reason we bought it alright so in addition to all that supercomputers we also have other kind of supportive supercomputers if you will so we have a date data analysis engines and a visualization machine so we have a six terabyte shared memory SGI UV 2004 visualization and then we have 34 SGI rackable servers and so this is how we really got into contact with SGI about sponsoring our team so it’s pretty exciting they have some pretty awesome kit out there finally we also have some high performance storage so we’re going to have three petabytes of luster sanic sea and storage vaccine is just kind of crazed branding of luster storage we’re going to have six petabytes of disc at the by the end it’s going to be a pretty awesome thing also we have archives which are not so high performance storage so we have a huge tape library I don’t know if you all have ever seen tape libraries they’re pretty cool though they’ve got the robot goes in grabs the tapes and read them that’s pretty neat I really like those things I took my son in to see it because I knew he would love it he’s six anyway that’s a that’s what we have right now Magnus phase two so phase two when the requirements for this machine is it has to be a petaflop or more it’s going to

be a Cray xc30 because that’s what we’ve gone with for the architecture it’s going to have about a thousand notes hopefully more than a thousand nodes the exact composition of the system is as of yet undecided right now we are in we have actually sort of decided but we haven’t gotten official approval so I can’t tell you what it is unfortunately ideally though it’ll be delivered beginning in april 2014 and then it will be operational by the end of june otherwise all that stimulus money turns into a pumpkin and we don’t get to have a super computer and then Cray Sue’s us and it’s a bad bad thing so that’s the hardware explanation so now i’m going to go switch gears and talk about software so what do we do a Tyvek our purpose is to build a science engine I mean we have all these nifty supercomputers and that’s great and I’m sure system administrators would be more happy if we didn’t have users but the truth of the matter is that we actually need to have users so that we can justify being funded for example so what we’re trying to do is build useful computing environments from the beginning to the end from cradle to grave of what you’re doing well that sounds kind of take it if I guess but you get my point so a lot of people ask me okay this is really exciting i’m really thrilled for you that you have these huge supercomputers but why would you ever use a supercomputer well the reason really is because there’s some things that you have to simulate using a supercomputer that you can’t do in the lab or you can’t observe so there are a lot of phenomena that are just playing too complex to be studied by theory or experiments and so numerical simulation can fill that gap it’s kind of a third leg of science there’s also some phenomena that are too expensive or time-consuming or dangerous to do and then there’s some that are just plain impossible to create in the lab so you don’t want to create a core-collapse supernova right an exploding star if you created that in a lab no one would live to tell about it so that could be a problem also the Earth’s climate you don’t want to sit around and find out what’s going to happen if we add all of this carbon dioxide to the air right we don’t want to find that out we want to simulate it first so that we’ll know and then there’s other things that are just dangerous you know you want to combine some chemicals but they might explode you know or they might create some kind of environmental impact so instead you can just simulate these things so there are a variety of problems in not just pure science research but in engineering in product development and even in logistics that you can use supercomputing for and you can really benefit from it I’m going to show you a couple of really interesting examples so first one here is the 3d hydrothermal simulations of the perth basin I’m acid basement the perth basin so you want to apply a finite element simulation to the perth-based and you want to simulate it using finite elements and the reason is that you wanted to determine the heat transfer and this is useful because we could probably actually power the entire city of perth and more using geothermal energy if it weren’t for the cost constraints so if you can do that you can figure out where the heats at you can figure out how to get do it you can figure out a lot of things and be able to use it and interestingly some of these simulations were used as input to our strategy for our ground water cooling system so that’s actually a real life application our own back over here was a student intern and she did this project i’ll show you some pretty pictures i have no idea what they mean but they’re very pretty I guess that’s perf there and I don’t even know what else but it’s very nice isn’t it fantastic alright so another thing I was telling you about was core collapse supernovae now this this is something that we did at oakridge a lot we worked on a lot of astrophysics applications because they’re very interesting and they have a lot to do with energy anyway so this is a really great story we had our people they were simulating these core collapse supernovae so he’s exploding stars basically they collapse upon themselves and explode and they turn into a pulsar and that is the extent I know about this topic but for the first time we had enough compute power that instead of assuming radial symmetry and assuming that it was a sphere like all good physicists do they were able to actually simulate the whole star so that was pretty exciting but when they were doing it they discovered there was this wobble in the orbit of the star and they had no idea why they thought oh no you know this must be

something this must be an artifact of our numerical methods they checked it and it was not a numerical method problem so then they contacted their friends and colleagues who are the observers the astronomer observers and they said hey have you ever noticed a wobble in the orbit of these core collapse supernovae and they said you know we haven’t really noticed that yet but we’ll go back and look well sure enough there is this wobble in the orbit and in fact they discovered through this interaction that the frequency of the wobble is proportional to the frequency that the Pulsar would emit radio waves f the star exploded and became a pulsar so these results were actually published in I can’t remember if it was science or nature but one of those one word very prestigious journals so they discovered this using computation this is a this is a physical scientific phenomenon that if it weren’t for computation we would still not know so I think that’s something very interesting and it goes to show you that this is really a legitimate way of doing science it’s not just people playing with computers and making pretty pictures or something like that okay so another thing that we’ve done is we worked with Rio Tinto on indigenous archaeology so Rio Tinto as we all know is a mining company they really like to mind things and then ship them out right well one of the things they like to do is build railroads in order to ship all of their stuff to the coast unfortunately they were going to build this railroad that went right through and in digitus petroglyph site so instead of doing what I would have expected which is change the route of the railroad they instead hired our people to go out there take lots of pictures and create a virtual indigenous site and in this way they were able to preserve the site for at least the memory of it without having to move their railroad alright switching gears here working with I Vic so some really cool things that that we have going on at i bagged four people at other institutions so we’ve got summer research into internships for students we have industrial internships also for students we have our partner program where we partner with companies and government agencies outside of our five partners and we also do some consultancy so we have summer research internships for undergraduates and in this in doing these internships these students who already have you know science knowledge of whatever are able to gain hands-on experience with advanced computing technologies and so like I said one of our team here was one of our summer interns last year that’s how I found her and pounced upon her to participate in this project it’s a 10-week summer project is how it works and they are supervised by an active researcher and they get to use our I Vic resources for their for their project they present their results at our Beck symposium which is an annual symposium in February and I’m a lot of them also end up publishing academic papers our I Beck funds these these internships for our partner institutions but also it is possible for them to pay to have a intern and currently we’re having a call for internship proposals pardon me another thing is this is a fairly new program that we have we have industrial internships so we have a workplace integrated learning this is a collaboration between us and the Australian computer society and its co-funded by industry and so basically it’s kind of almost the same as our students our summer internship program except that the students are placed in a company and they have an academic supervisor and they have access to our I Vic facilities and the project’s last anywhere from eight weeks to 18 months so this is a really good way to get in on the action a lot of companies try to use this to see well maybe a supercomputing good for my company or not it’s a good way to figure it out it’s pretty good low-cost way to figure it out we also have a partner program and so uh agencies and companies can partner with us and for a modest fee they can get access two packages of training and skills development super keeping expertise and access to our facilities so that’s another option if your company might be interested finally

we do consultancy so we do have consultancy a lot and we have companies that we’re working it with in a variety of sectors including the mining sector just sometimes they just want to know okay we want to have some we want to have a cluster what’s the best thing how should we do this and we can help figure that out for you and help set it up so that’s really the end of our presentation I wanted to thank all of you for the opportunity for us to come here and present and I like to thank Adam for chasing up this opportunity for us the team for coming along and for preparing the slides I back of course because with best and our commercial sponsors as well especially SGI who has just gone out of their way really to help our team to get a good machine so with that if you all have any questions I’m more than happy to answer maybe the students would answer as appropriate yes so the question what file system using it looks like we’ll be using the distributed which is standard on the I beg machines so the question is have reconsidered floating-point gate arrays not really so these competitions have been running for the last seven years and as a community website which details exactly what machines have been submitted at each previous competition and what performance they they they got out of it and I’m not aware of one that did use that what we haven’t mentioned though is that there is a second competition running at the same time the one we’re entering has no limitation on what budget you can apply this second competition is okay you’ve still got the 26 employment stuff but you in addition got only $1,500 you us and I think someone mentioned earlier considering arm chips and so I imagine many of the people who will be entering this one will be looking at these low-cost solutions such as arm and I don’t think this second competition has been running too long so there’s not a lot of data out as to what machines people are submitted so we might see something like that in that category any other questions so that’s a really good question I can answer that yes oh okay yeah I’m sorry so the question is what kinds of limitations are we going to be facing like for example with Lynn peg we’re going to be facing a limitation based on our our flop so that you know how well our CPU or GPU is can produce flops so for something like GraphLab you’re right it is definitely not a flop intensive thing so typically um in my experience as a computational scientist most most the vast majority of applications you don’t run into any problems with flops limiting your performance the place where you really run into problems is is memory so it could be memory just how much memory you have it could be memory bandwidth and most applications that run on a really big supercomputer are actually there for the memory they’re not actually there for the flops so yeah so that’s part of what we’ve been doing actually is profiling our codes using a linear map to just find out what’s going on you know so how much memory are they using what kind of memory bandwidth what kind of communications are going on so communication can also be another issue if you have a relatively small problem that you’re running on a relatively large number of processors you can run into a problem where you’re spending all your time communicating so those are the three main bottlenecks how reliable okay how reliable is our

cluster or okay well hopefully with a cluster it’s relatively small so back here we did have we didn’t present it but we have a kind of idea of the size of our machine way way way way back here of course so our machine is going to have eight nodes oh there we go ok so our machine has eight nodes we were comparing to Ryan which is currently the largest machine in Australia just came out about two weeks ago it they had an unveiling of it so our cluster is going to have eight nodes now on your average computer your mean time to failure is couple years so now we have eight computers effectively here so I’d say but really have to worry that much about failure we will have some spare hardware just in case of course it’s better safe than sorry but surprisingly you’d be surprised at how well these machines actually hold up even the really big ones Solomon on Jaguar which had 18,000 688 nodes oh we would have a node go down once every couple of days whereas the predictions would tell you’d have one node go down every couple of hours so really it’s pretty amazing how resilient these machines can get you got any other questions we all questioned out all right so yet so the question is about compiling Nemo five is that open source for documented so Nemo five is an interesting case it’s open source in a sense in that it’s free but it’s not freely distributed so you can’t just go and download it you have to go and register with them and then they decide that you can download it of course I didn’t get rejected from it or anything it was pretty quick process but just saying it’s not as open source as some of these other things like GraphLab you can just freely download it and Worf same thing you just freely download it so because of that there’s just not as much information out there on it they do have some private forums where you can look for this information but I suspect that most of the problems that our student was running into here are not Nemo five specific they’re probably just compilation specific so my goal this week actually is to get Nemo five compiled see what I can do yeah ha good question so how does I Vic engage others like outside of academia like in industry so we actually have an industry and government uptake program and so we have a person who is devoted to that full time and so the best the best way to get in touch with him is probably to you know take a look at our I Vic website which is just I Vic org and you can look for the industry and government uptake program or alternatively this is an even easier way you can email help a Tyvek org and we’ll get your message to the right person yes so there is a surprising amount of interest I did not expect that we would have that much interest here but we do we have a number of companies who already are partners who are we have interestingly a couple of government departments as well so i think the Department of Commerce is one I could be wrong though but but we definitely have a government department who is a member and then we have others like other mining companies who work with us regularly so yeah that’s good question

somebody you had a yeah ok so the operating system on our cluster and whether we have optimized it in any way ok that’s a really good question the answer is we have not done anything to it because we haven’t actually received our cluster yet from SGI so it’s it’s on his way the check is in the mail or something like that is what they told me I don’t know so hopefully it’ll get to a soon we’ll be able to do some stuff with it we’re definitely going to run linux on it it’s the obviously the best choice beyond that I don’t really know I like I said I’m a computational scientist I’m not really a system person but I’m definitely learning a lot from this experience myself so that’s pretty great yeah okay yeah that’s an excellent question so yes we already have super computers and the question is about the usage of those super computers how much are they being used currently and the answer is they’re actually they are flat-out as you say our epic machine in particular is three times oversubscribed so we have annual calls for projects and we usually get about at least three times the amount of calls that we could actually accommodate so there is there’s a lot of demand that’s right and so so Magnus will be about 10 times epic or more and we anticipate that okay at first maybe it’ll be a little empty but people’s people will solve that problem for us there their needs will grow it’s kind of like you know when you go from being a poor student to actually having a real job and suddenly your your income grows and so does your expenditures like so it would be the same same sort of deal so the vast majority of the demand on the system comes from our academic partners it also we participate in the National allocation scheme so we do get some people from universities across Australia any other questions we got it all covered we answered everything that’s great well thank you guys so much for inviting us