How to Integrate and Visualise Big Data – Andy Cotgreave

good afternoon everyone everyone have a nice lunch everyone learned something good today so far it’s all a bit confusing what to do with big data what do you think we’re already confident you’re confident you know what you’re doing big data no i think it’s a it’s pretty challenging even defining the term is getting harder and as it gets more and more famous or overused term it becomes harder and harder to define anyway hello everyone my name is andy grieve i’m a senior product consultant at tableau i’m very grateful for you all choosing to come and see this presentation hopefully we’ll get all learn something out of this session which we’ve entitled how to integrate and visualized big data and so i’m going to i’m going to show tableau so in the second half of this session and just talk about the ways in which we think you can all get benefits from visualizing an exploring big data in a visual way you know see seeing the data to have yourselves understand is kind of the key and the secret what I’m going to go through is i’m going to give you six tips that firoza really emphasize how to get the most out of big data and then we’ll go into a dumber all right feel free to ask me questions throughout the presentation and you know i can obviously do Q&A at the end this session tends to take about 35 minutes to get three so if you want to ask questions just fire them at me and i’ll either answer them straight away or i might poke them at the end we’ll sort of see on the complexity of the question so I’ll try and get through the slides pretty quickly because you’ve probably seen quite a lot of slides today so there are six tips first of all simplify this this slide is from consultancy called the 451 group I don’t know if anyone’s heard of 451 group nope anyway this slide is deliberately meant to be complicated and hard to read right so it’s not because I can’t design slides each one of these represents a current up-and-coming or established database technology and the point of this slide is that the database technology is overwhelming right now you know the guys this morning we’ve been talking about this that you and I was reading about orica how Oracle’s under threat at the moment because Oracle has kind of been incumbent but there’s all these new kids on the block just taking over and everyone’s got different databases to do different roles so right now there’s there’s a huge amount of foots and change in the data landscape and in terms of visualizing this stuff how do you simplify access to visualizing the data well one thing you can do is find a tool and we are not the only tool that does this but any tool that connects whatever database technology you end up sitting going with you need tools that will connect to many many of those tools tableau for example connects to you over 35 different databases on the Left who do various flavors flavors of Hadoop have got data status tax as well in there and also massive scalable commercial databases vector vector wise Teradata greenplum etc you know we are always striving to make that collection of data connections grow and grow and grow so as this as this complex landscape evolves you need to sort of have a nice simple tool on top of it which can do the visualization layer because you don’t want flux in every single part of the process now an example of that eBay is one of our big customers I mean he bows probably customer of every single bi tool going they’ve got 52 petabytes of data which is massive so in their Hadoop cluster that’s 42 petabytes that’s kind of the advanced than that annal analytics team is working there that’s kind of the team of data sciences that Andrew clegg was talking about this morning whereas for everybody else you know the data scientists have questions right but everybody else whether they’re a sales guy trying to work out if he’s on quota if it’s an HR person looking to find out where they’re going to get their next high from or marketing at Howard allegiant everybody else has questions to ask the data right if you just protect that later just and give access only to the team of data scientists then you’re disabling the potential discoveries that everybody in your company can discover so what what eBay do is they’ve got a 10 petabytes Tara data warehouse which essentially is one enormous table and what they do is they point tableau at it and anybody in the organization can query stuff about what’s going on in the website what action metrics are happening so that was tip 1 simplifying but we can’t just simplify we have to

coexist and organizations some of these data data providers are very helpful about this so Clyde era and teradata you know this is you don’t need to read this but what this is a document named created saying look with certain things you need to do we’re great at but certain aspects of what you want to do with your data we’re not so great at so for example what they put together was if you’re tied era cantera data what would you use each of those technologies to do and co-existing is really important because as this landscape changes you’re going to be picking and choosing technologies possibly quickly until you narrow down and fix on just a few what that means is that you will have many databases data roll over the place you’ll have ad hoc database connections like Excel or CSV files and you need to be able to blend them all together into one view in order to gain insight because ultimately that’s what we’re trying to do this is an example where i blended Hadoop data with teradata data to create one scatter pot to see some relationships in the data you’ll have to excuse me today I’m feeling dreadful so if my voice completely packs up I’ll just I’ll just demo silently I’ll do all by mine visualize this is the key thing for me I was a customer of tableau for four years prior to this at the University of Oxford doing just data analysis and tableau unlock things in the data sets we had that just where we couldn’t have dreamt of doing now I’m going to do two slides which is going to teach you cognitive science now you can go and do full of the university degrees about cognitive science and visual analysis but i’m going to show two slides to show you the theory so if I ask you a question on this how many nines are there on that table is that hard question to answer yes okay so I’m going to make one visual change which will make that question really easy how many knives on that table is that question now easier to answer so that’s clearly a kind of silly example but accountants do this they check later and non-profitable stuff read write and they do that for a reason because when you encode data visually it plays to our strengths we evolved our visual system over billions and millions of millions of years and it turns out all that evolution played into the hands of data analysis because we can visually see things that we had originally evolved for survival and to find the good berries and the bad berries but using color in this case or length or size or area is a way of encoding data and seeing things in your data that previously would have been hidden and the guy from pentaho talked about this slightly this morning as well is that but what we at tableau call the cycle of visual analysis is vital to get into your business because if you are only visualizing your data at the end of the project then you’ve missed a whole world of opportunities because if you can visually explore your data as soon as you’ve got it or as soon as you’re trying to find structure or take decisions then you will find problems in the data find new stories in the data new questions will be generated and when you integrate visualization into every aspect of the project or your data then you’ll get really hopefully some really good new insights very quickly number four is empower and here I actually disagree with something Andrew Clegg said this morning Andrew clogged from Pearson so we we sponsored a report for the economy Economist Intelligence Unit say well how’d you foster a data-driven culture and what we found in that kind of echoed what are sentiment is that the more you enable everybody to have access to data and be able to answer questions of that data the more financial reward you will have now so for this question you know what strategies proved successful right that the biggest success is top-down guidance and mandates from executives you know that’s you know people like you trying to drive forward a big data or any kind of data projects or culture if it comes from you then that will permeate through the through the through your company and lots of people will be able to answer questions and with Facebook is another one of our big customers and i’ll tell you the sort of the anecdote we know from them and explain why i think it’s different to the way Andrew Craig sees the world so Facebook bought tableau for for various different for various projects and they very successful with that but when they came and presented at our customer conference they started presenting and talking about things that they had no idea we’re going to be were going to happen so for example they had really

inefficient meeting room usage right and this was kind of causing lost time and lost productivity so somebody managed to get a whole bunch of data from the exchange servers may show up with some of the data they had and they change the way they use the meeting booked meeting rooms right now as a result of that they saved you know a bunch of money not billions of dollars but a significant amount of money and imagine if you have a tool and you free up the data and then anybody in your organization can be free and empowered to answer small questions right if you if you have a hundred people who can ask any questions or your whole organization thousand people and they each ask questions and save ten thousand dollars or ten thousand pounds a year you’re making massive massive changes to your bottom line and you’re empowering those end users if you keep all our data just in the hands of data scientists then those data scientists are only going to be able to answer questions at the rate they can explore the data themselves and they’re not as close to the date of questions itself the guy trying to work out how to use meeting rooms has a real problem the data scientist is divorced from that so for us it’s really important to get the data in the hands of everybody because you know you need to trust them to kind of be able to come up with valid metrics of the data they at that you know they care about the company they care about getting the right numbers out of the database but until you free these things up you don’t see the kind of incremental results that people at Facebook Cacique number five is integrate and again we’ll go back to that slide data comes from everywhere you know we’ve got to try and simplify things but blending data all into one place is really powerful capability and you know nobody here would really predicted they’d be able to get all our database in one day the warehouse even within 5-10 years ago we’ve been trying to do that for 20 years in this industry right and we’ve still got data all over the place because there will always be text files and Excel files that exist somewhere in your database so you need to bring them together and we’ll we’ll have a look at that in in the demo and a good example of that was Obama and they that election campaign last year was extremely data-driven and they came up with this amazing well they just had an amazing team that created a bunch of different systems they had loads of data but they managed to mash it all together and you know they were way more efficient than Romney’s IT settle he spent Romney spent a lot more money on IT but these guys kind of took this empowering integration complexity approach simplify approach well to win the election movies a bit far to claim that the data won the election but you know and the final one is evangelized you know you’re here today Bureau representing organizations of many people each of whom have like the guy at Facebook they’ve got database data related problems this is the not it’s the big data problem the Andrew Clegg said this morning and you know once it’s up to everybody really to try and push this codes change through the organization you should never be protective about the tools you have all the databases you have you know get results and push those out through the business and I think it’s a you know all this technology means nothing if the benefit is not push through to everybody in the organization I think I did sir I did do six maybe I said more but there’s six tips and what I want to do now is me to a demo of tableau and hopefully you’ll see some of these things happen and how I explain how are we go through the demo just who has seen tableau before before today I mean maybe something who’s been to us done so you brought your all brand-new to tableau you’ve seen tablet before but everyone else is brand-new it’s a tableau yeah it’s great I’m gonna have to sit down for this soul disappear for a bit so right so what I want to do is talk through a hypothetical case study where I’m going to use google analytics data okay this may be the kind of data you’re using it may not be this is tableau desktop now I’ve kind of got so these are all the workbooks I’ve been working on recently I’ve been working on some crazy artwork based stuff which is not very business rather than but I’ve got a bunch of save data connections here on the left and the first thing we do is connect to some data now if my internet connection is going to work I’m going to I’m going to pull down some Google

Analytics data but what we can see on the left-hand side here is all the data connections that we can make in tableau and i’m going to show google analytics but the procedure is essentially the same whatever we’re using whether it’s a due date stacks Oh data teradata vertica you know with each one of those is kind of a fully tiered data type so fingers crossed the Internet’s going to work that’s a good sign and can I type my password right first time let’s see okay there’s anybody here using google analytics data a couple of people all right so i’m going to download live Tableau Software dot com dimensions and what I can select from google analytics is any of their g8 fields and measures so I’m going to I’ve got dates i’m going to add in country and some chit i’m going to use some of the predefined magically measure groups the page usage i don’t think i need bounces or goal completions or revenue okay so i’m not going to click ok and let tableau and google talk to each other so what’s happening here is tableaus going to the cloud in this case it happens to be google analytics and it’s pulling down data and I’ve now got data from tableaus google analytics site i’ll just quickly draw you show you what’s going on so I’ve got seven hundred thousand visits let’s look at this over by day over the last five weeks right so what’s happened here tableau created this data extract it went to the cloud and it pulled down data its pulled down these dimensions and measures and I just dragged and dropped and I can see the hits per day on tableau software com so I’ve got data from Google Analytics and looked at some data in I mean it took 120 seconds this is google analytics but whatever that data is wherever it is it were you would have something equivalent when you connect to it now things like Hadoop if you’re trying to run a query on her deep when it’s a massive cluster then the obviously the processing can take a long time so we have options you can pull extracts down and create snapshots which will live locally and then work on that to get the instant response that we just saw so so far so good I mean visits over time I can look I can see this on the google analytics site right that’s nothing particularly new but certain things I can’t do with on the google analytics site is for example do forecasting so I’m just going to two clicks and show forecast and now I’ve got new data jet tableau is generated new data so the blue highlighted there is the actual visit to tableau software calm and the lighter blue just highlighted on the right on the right that’s now what tableau is forecasting will happen based on the seasonality and the data yeah it happens that our visitor and hit counts for Google Analytics is pretty predictable so that prediction is very mind-blowing right but you know obviously if you have slightly more volatile data you will be able to see the trends in a bit more see more interesting trends but this is predictive analytics just available without me having used I mean the only time I’m so far use the keyboard is to type my password it and we can tweak these we can put in trend lines and moving averages and all that kind of stuff which we can look at if you want I will come back to the time series in a moment another way we can explore the data and and the key thing about visual analysis and the cycle of visual analysis is to be able to see the data in different ways very very quickly because every time you draw a view it might answer you a different question so I’ve selected dimensions and we’ve got this patented bit of technology called show me which is essentially a recommendation engine so I can choose a filled map because I’ve got country selected now I’ve got a map drawn and if I now click visits I want to see my visits per every country I can click the map again now I’ve got a filled map for all my visits now tableaus an American company growing globally but we are still dominated by visits from the US if I turn this to a dot map I can tweak things like the size and start seeing slightly different trends so I can begin let’s just get rid of that and that I can begin to see the dominant you know Europe’s kind of

second behind the u.s. so this is cool now Google Analytics again this is something I could do in Google Analytics so how can exploration of the data help me cope and add new things to this cloud-based data source well Google Analytics doesn’t contain market segmenting or continental groups maybe this year I’m going to be targeting the BRIC countries so i can select Brazil the Russia India China and Brazil if I can hit it let’s get rid of that one sorry it’s slightly harder without the mouse well we’ll just put that one anything and I can grieve these things together so keep your eye on the left hand side of the pain and the color legends as well so i’m going to create a new group and i’m going to rename this bricks so if you see that if you can see down at the bottom let me just zoom in at the bottom of now I did this color legend based on what I selected visually just by visually having to look at things I’ve got a new dimension of here in the left so I’ve now created data that didn’t exist in Google Analytics or didn’t exist in my Hadoop cluster or didn’t exist in my wherever you’re dangerous so i’ll rename that group region you know we can I could select smaller so i could say okay let’s take North America I like that as well so I just group now orange represents North America so this is where things are going so I’ve created this group I mean I could be creating these based on the biggest countries or the smallest countries in this case is by geography but this new region field I can now do further analysis with so I’m going to bright bring this down onto the label shelf actually i’ll put on the deep on the color shelf fact then let’s make it as a stacked area and now what I’ve done here I’ve gone back to my my time series so we’ve added some forecasts stuff and now I’ve added the new dimension that I’ve just created so I’m visually creating new dimensions and groups and now I’m enhancing the kind of stuff tab like Google could give me any way with you data so you can see North America dominates let’s put that at the bottom of the thing and then the BRIC countries well there’s not a great deal of visits from the BRIC countries maybe we could look at this row by row or column by column you know and I don’t like that view so I’ll just undo an undie all right so what what I hope you’re seeing at the moment is that I’ve looked at touched data over time it’s taking me no time to get this data from the cloud I’m looking at on a geographical distribution of where things are happening i’m creating new dimensions and then i’m using those new dimensions to enhance analysis elsewhere so let’s have a look at another way of looking at country and visits so let’s go back to show me we can say do a tree map but we can do a bar chart or a treemap any kind of thing we want to do so this tree map is showing me visits and country and as I can see United States dominates the total amounts of visits to tableau software com I assume those at the back can’t see the label because it’s tiny so if I just so we can change the label I make that a bit bigger for everybody okay maybe that’s a bit better so we can dynamically change these labels as part of the analytical process this is good i can see the united states has lots of visits all right but united states also has lots of people india i can see it’s pretty big in terms of the number of visits but i know india has over a billion people so as a marketing person maybe i’m trying to think about well what’s my penetration in terms in each territory what’s my visits per population well that’s data that doesn’t exist in google analytics and this is the real life situation where you like okay this database has these great date this great data in it but it’s incomplete so what i’m going to do is answer that sort penetration question by talking about by connecting to a different data set this is exile and this is how we connect to an excel file i find my excel file i choose the worksheet in this case there’s only one worksheet we could link multiple ones together with your custom sequel i’m going to connect live to it now I’ve got two data connections all right population is available to meet a blows picked up if you just look here you can see this orange link here so tableau it

happens in these do data connections country-region exists in both datasets so it automatically I said I can link or blend on those two fields if the name is different then you can customize that obviously but so long as those data can this common we can do this blending they were about to show you so I’m going to drop population on color shelf so what’s happening here is that this tree map is being sized by the Google Analytics visits count and it’s being colored by the Exile population calculation so this is data from multiple data sources on one view without me having to write any code or any script or even fill out any wizards or anything in this case and now we’re going to begin to see while India looks like there’s a lot big population now so maybe my penetrations not so good and what’s this one china similar kind of thing lots of visits but I 1.3 billion people that’s quite a lot right that’s still not quite a great end result in terms of analytic so I’ve got visits and I’ve got population I’m going to create a calculation which gives me visits per million from these two dinosaurs sources so I’m going to enhance the data model and we’ll call this visits per million sorry for this bit I do need to use my keyboard so I’ve got some visits and I’m going to take data here I can I’ve got all my fields available in each data connection I’m going to go to the excel file and choose population groups I’m just going to divide it by population and times it by 1 million just goes up right is that right that’s 10 million isn’t it 123456 so what I’ve done here is write a simple calculation but notice this calculation is coming from multiple data sources you don’t have to do pre-processing in order to get multiple data sources into one third data source just so you can do you some essentially basic analysis so I’ll click OK and I’ve now further extended my data model in Google Analytics by adding this visits per million side so now what I’ll do is put that on to the color shelf I’m going to change the way i encode these colors to give me a diverging palette so that things are a bit more obvious i’m going to go orange to blue right now I’m beginning to see some really interesting stuff so what’s going on here is blue represents what big squares represent lots of visits blue squares represent how I visits per million so high penetration within the territory and red represents places where there’s pretty much large populations so the pad that visits per million is pretty low and we can begin to see some of these blues that are popping out now island has a Blue Square has lots of penetration I can quick filter and get rid of some of the ones that are a bit smaller so what I’m going to do here is let’s just bring it over this side again further explore this data by filtering out certain countries that have low visits per million because I want to see where I’ve got lots of penetration so as I drag this filter what tableaus doing is visually removing the ones with low penetration when I let go tableau then redraws this view to show just those with a penetration over three hundred and twenty four point three hundred twenty four per million at this stage it’s like okay I know the u.s. is really big let’s just get rid of the US and now I’ve got something pretty interesting because I’ve explored this data I’m looking at just countries with pretty high penetration rates you know I’ve done that kind of arbitrarily I’ve got rid of the u.s. because i know that’s a success and i can see that you know singapore clearly our sales team in Asia are doing great work in Singapore because we’ve got loads of hits to our website right and so this is a kind of a visual exploration of the data that I’ve been able to do very very quickly the data I’m getting from the cloud let’s just reset that so it’s all of them there so we’ll just call that visits per million we’ve created three views currently they’re all separate which is which is fine but we can bring all of these together onto a dashboard so what I’ll do is I’ve just did a new dashboard and you kind of get how it works I drag and drop things onto my dashboard and let’s

do visits per million down here I don’t like that oh whoops I’ll put that on top of the map I saw that particular like that either right you know and we can just keep playing with this till I get something I’m happy with and so I’m going to do that as a line chart because it’ll be a bit prettier right you know I’m changing the view very very quickly just to find something that is pleasing to me and kind of gives me the results I want to see so we’ve now got these views all in one place and I can say well to continue that exploration of visual analysis I want to be able to click on the map because I understand geography and use that as a filter so I’m going to say click on the map use this as a filter and now when I let’s see a bunch of countries in Asia I get my visits per million in Asia in the tree map and the visits from those countries in the timeline I could choose my Amir countries and see the information so within the countries i’ve selected at the bottom UK Israel Island Switzerland have high penetrations the size legend here while the sides legend only refers to the map so I’m going to bring the size legend onto the map the color represent the size legend where it’s appropriate and I’ll just call this google analytics analysis show that title and now talk about a little bit of a summary so what we’ve got here is we’ve pulled data down from google now we could have done this from bigquery from wherever you know i don’t know where your data rich but wherever it is you’ve seen how we connect to these places we’ve done analysis very very quickly we’ve added new dimensions and measures to the to the connection so we’ve enhanced the day connection in a way that works with the way we think about data you know I can see what I want to select I’m going to see that grab it and create new dimensions from it we’ve had taken that extra dimensionality and put it onto the other work on to the other worksheets and then we brought data in from multiple connections you know you can update there on 10 different connections all on the same view here and then finally we brought it together and T into a dashboard that we can that now I could publish this to tableau server but let’s do that I’m going to publish that to tell the server I’ll explain what tableau server is well this is going let’s just published a dashboard click yes to that everything we’ve been done so far is in tableau desktop so this is a desktop application problems your pc i’ve been doing the exploration cycle the analysis and at some stage you might what you will hopefully want to share your insight with the rest of the organization so one way you can do that with tableau server and i’m going to go directly to the browser window here so now i’ve switched to chrome but this that’s single that single app action i just did to publish this would mean that if I’m if i can access this tablet server on my ipad i would get the same experience so now i mean i’m in a browser and i get the same interactive experience that my data analyst gave me i can select the countries in North America maybe it wasn’t too visible but now this is representing these countries i’ve selected i’m in the browser with no add-ins here and this that would work just the same on an ipad and this is a really key piece of the empowerment and integration aspect of integrating big data is that you won’t empower your end users until you make data and data visualization tools available wherever people are and by making this stuff available in the browser you in mobile you’re really going to help people play with data wherever they are and the final piece of that is like well so say my executive manager is using his ipod and he’s showing this stuff to some chief executives or some some customers and he doesn’t like this tree map well so there he is he’s in a meeting and all these guys an ipad where you can select the tree mackin tree map and click Edit and now we’re still in the browser or i could still be on the ipad i can say well and the i don’t like your tree maps i’d rather see this as a bar chart fine so we can show as a bar chill I don’t like it as a side by side bar chart fine I’ll do it as a bar chart colored by showing visits per million and I haven’t updated to my

release version so i do get these bugs occasionally but you get the idea with in browser editing to ask new questions of the data you know and if he if we’re looking at this on an ipad he would also see that so i’m going to stop there to just summarize we’ve I’ve shown you the just go back to PowerPoint that’s incorrect is now I’ve got evangelized twice I think number one should be simplified shouldn’t it did anyone notice that so simplify is about having a tool that gets you to the data quickly whatever your data is coexist it’s about well look we have multiple data types let’s bring them all together visualize I hope you’ve really seen that because that is ultimately the essence of tabloid empowerment is about giving everybody access to these tools wherever they are integration is the piece about bringing multiple data sets together as well and I did a demo and that’s where I’m going to stop so we’ve got a plenty of time for questions but I hope that’s been useful if you want to not find out more tableau software com you can download the full functional product for tableau desktop and tablet server it’s a 14-day free trial so you can try to sell yourself because I’m employed to make it look easy but it’s not until you point your day to run it that you’ll know whether that’s tuned up so please go on to tell the software I don’t come try that try it out for yourself and there’s lots of online training and resources and examples as well so with that I will hand it over to questions does anybody have any questions you