Data-Driven Storytelling: From Evidence to Impact

I’d like to introduce you to today’s presenter Charles Phillips Charlotte’s been teaching journalism at Stanford since 2014 most recently she founded big local news a collaborative data sharing effort and platform for journalists she also is co-founder of the Stanford open halt policing project a cross-departmental effort to collect police interaction data and evaluate racial disparities and she is a founding member of the California civic data coalition an effort to make California campaign finance data accessible before coming to Stanford Phillips worked at the Seattle Times for 12 years in reporting and editing roles with the investigation team and across the newsroom most recently as a data innovation editor in 2014 she was involved in coverage of a landslide that killed 43 people which received a Pulitzer Prize for breaking news in 2009 she was a loan editor in the newsroom when four police officers were shot at a coffee shop and was integrally involved in the subsequent coverage which received a Pulitzer surprised for breaking news she has twice been on investigative teams that were pull surprised finalists Phillips also has worked at USA Today and at newspapers in Michigan Montana and Texas she shirts she served for ten years on the board of investigative reporters and editors and is a former board president she regularly trains others on how to collect assess analyze and present data in ways that will affect change welcome Cheryl welcome audience and go ahead and please begin thank you thank you very much really glad to be here and be able to talk a little bit about how to use data in effective ways I use data as a journalist but there are a lot of ways to use data and Express the import of pad that you find in a way that will make change happen in your own organization so that’s what we’re going to be talking about a little bit today how to go from data to storytelling and from evidence to impact so the agenda just to lay it out a little bit is again we go from data to understanding we go from the understanding from the data to the narrative to the storytelling and then the last piece of it is to integrate visuals so that we can again create impact right away just for the way that our user or reader is interacting with the information and being able to understand it the very beginning of understanding data from me starts in my role as a reporter somebody who is looking to understand from by interviewing and so just like I interview a person I want to interview data and that’s the same thing that you would want to try to do in in kind of journalistic lingo you could go to almost any news story in any publication and you will find in the first paragraph or two the answer to the who what when where why and how if you can answer those questions by analyzing your data then you can help focus yourself and figure out where you need to go with your with your storytelling so I want you to keep in mind those five W’s and the H who what when where why and how they’re critical as you question the data that you collect and then move forward with the storytelling from that data one of the first things that is important to do is to just look for the overall patterns that might tell a story so this particular visual shows how much money law local governments were spending to lobby the state legislature in Washington state you’ll see I have a very simple bar chart here and what it’s telling me is that the city city governments are the ones that are lobbying more and are spending more money than any of the other local governments to lobby the legislature so if I were looking at this lobbying data then I would know that that’s where I want to try to focus my reporting to see we cities how much and for what right so the pattern is what leads me to my next step in the data analysis I’m going to look at the kind of the biggest pattern and figure out what’s going on that that biggest trend which is in this case cities and delve more deeply into the data there once I’m there I can kind of

look for the outliers that might help tell the story so in this particular case we’re looking at a bar chart of how much money is spent per resident per person in a particular City and we see this town of Algona which spent $18,000 which is not a lot of money but if we look at how much money per resident we see it’s five dollars and 97 cents that’s actually quite a bit far higher than some of the other larger cities so this makes an interesting story this makes an interesting anecdote from the data to be able to dive into the storytelling so those are the kinds of things where you where you kind of look for overall pattern and then you look for the nuggets of information that stand out and and delve into those more deeply to be able to do a little storytelling around that overall pattern the story that we ended up telling in this case is about Algona we have what in journalism is called a lead that first sentence Algona is a small town with an equally small dream to build a community center then we add some context right we add how big the city is where it is all those who what when where why and how questions we add a little color like let’s let’s talk about the kind of the human side to this town they have an annual holiday social and Easter egg hunt a Halloween party and they then the local officials decided you know what they wanted to do something more for their community they wanted to build a community center so they decided to shell out for them a lot of money to hire a lobbyist and we have this great quote we actually talked to the mayor and he you know he has this this important quote where he says you know what for six dollars a person if he can get us that community center that’s a real good return so that helps add kind of the the person and the narrative to this piece but I want to point something else out here and that is we use the figure six dollars we didn’t use the number that we pulled from the data which was five dollars and I believe 97 cents we rounded up that’s because sometimes when you’re doing your data analysis you want to be precise at the beginning you wanted to do that per capita rate but then when you’re trying to tell the story you want to think about is precision necessary here or do I just need to get across the main point in this case just getting across the main point was enough if I had put the five dollars and 97 cents into the presentation it would have kind of cluttered the the overall story and it wasn’t necessary so that’s one thing to think about as you’re moving from data understanding to data to storytelling with the data speaking of the data how did we get that data in the first place so in my mind pretty much everything is data any information out there is data the the lobbying data that we got actually came like this in a form in a in a page of information this happened to be a PDF we extracted that information and then turned it into rows and columns so that we could analyze it so how did we find this information when one of the things that I assume is that you all use Google Google is not the only search engine out there it’s a great search engine but one of the things you might consider doing that when you’re looking for data that might inform or add context to information you already have is using another search engine DuckDuckGo is an example of that that’s because different search engines use different algorithms and so the results that you get might be different but you also can just do a lot more with Google when you search you can target your search by using this advanced search function you can use specific phrases and put quotes around them and that will help winnow the results that you get you can also search even by by the site for example you could type site colon and then whatever your search terms are and it will only search within that site you can look for specific types of information so if I know that government reports are often in PDF I might do file type colon PDF and whatever my search terms are I might do the same thing if I’m looking for data that’s already in rows and columns I might look for a spreadsheet file type : X tau SX all of this helps again filter all of that information you might get from a search engine so that your results are a little more targeted and focused and that helps you find the information you need a little easier but another thing to know is that Google is not going to give you the answers to everything this particular data would not show up on a

Google search because it was searchable within the site as a database so the Commission in Washington state that collects this information has a searchable database that you or the a search engine on its own site that allows you to pull this information the census is the same way you’re not going to go find a census table just by searching Google at least not in the most effective way but if you go to the census site then you can search and extract all of the tables that you need a lot of information is like that it is kept in a database on a website the SEC is another example if you want financial reports for a public company you need to go to the SEC site and search within it that’s a really important thing to know when you’re looking for data but just keep in mind that any piece of information can be data you just need to know how to bring it in extract it put it into your your rows and columns so that you can then do some more effective analysis so let’s talk a little bit about bringing data into rows and columns and how the structure matters in this particular case we have an entity which happens to be on all of these rows a city but it could have been a try the county and so on we have a field for compensation and we have a field for expenses you’ll notice that the entity gives me something that says the city of El Ghana the problem with this is that I can’t join this field of information to other information that would add context like census information so what I need to do is I need to create a new column or field of information that will give me the entity type is it a city is it a tribe is it a county and so on so I extract the information from entity I take city of out of entity and I put it into a new column that will have the type of government and I also need to think about like what is the most important data that I need do I need compensation and not expensive joining both this is where when you first get data you need to really understand every bit of information and what it means in this case the reporter made a few phone calls you have some questions and we decided we needed the total of the compensation and expenses so don’t be afraid to pick up the phone and say well what do you mean by this column or read the dictionary that goes with a data set and find out what each column needs so in this case we added the compensation expanded expenses and then we ended up with a data set that looks something like this where we had an entity clean column and I kept the original because I never want to throw out the original because I want to make sure I’m fact-checking as I go and that I don’t make any mistakes so now I can compare and say oh yeah it is still the City of Seattle it’s just that I took out Cydia and then I I do the same thing with my math I have a total column but I have kept the compensation in expenses columns but I can always double check and now I have an entity type city these counties tribes and so on this allows me to do a lot more analysis context matters before I had for example the City of Seattle had lobbied spent the most money for the legislature Lobby the legislature than any other city but if I think about how big Seattle is that may not be so surprising when I add in the context of the population I can see again that pattern with Elgar but otherwise have been completely hid and again then we ended up with our final story we didn’t just turn out a narrative that was text we also did a visualization so one of the things that I think about when I’m trying to do a visualization is how to accurately reflect the main theme of the story and then also allow my user or reader to explore themselves if they would like we can you can see this in any number of sites where you can hover over something and you get a little bit of extra information that’s called a tooltip or you can filter the information and look for your own community or your one area of interest in this particular visualization by selecting a square on that tree map that’s the top part of the visualization it will filter the bar columns below so if I select City then all I will see are the cities sorted so I can see which city is is spending the most same thing with tribes we did our story about cities so we but we might want to explore tribes this allows the user to do some of that and everybody always has other questions we’re trying to make the data accessible to our users but not clutter it up in a way that that takes

away from the main story and here we have the final results of the lobbying effort by Algona they ended up doing a ribbon-cutting just a few years later with their new community center so I guess it was a good six dollars per person spent when you’re first going from the data to understanding one of the things that you do is you use the shape of the data to guide your focus and then you use that focus that focus that you got by looking at what the data and the patterns actually say to guide where you’re going with your storytelling the whole power of using data and storytelling is that you’re surfacing patterns that you might otherwise not see and you would go off and do an anecdotal story that isn’t representative isn’t truly a trend and so this allows you to be much more insightful with how you do your storytelling and the change that you might affect one of the kind of big pitfalls of using data to tell a story and really any kind of effort to tell a story is collecting too much information and then not winnowing it not focusing so I call this notebook dump because as a young reporter I had a real problem with this I wanted to prove that I had done a lot of work to my editors it’s the same thing when you’re analyzing data you spent a lot of time analyzing data you end up with lots of facts and you want to prove that you’ve done all this great analysis and so you put it all in your report you really don’t want to do that because it clutters everything up and it muddies your focus so you end up with not a clear sense of what you’re where you’re going with the data analysis and what your recommendations might be so I actually used to go out with a notebook just like you see here in a slide and for example I would be assigned to cover a parade this is often an assignment young reporters kid and I would go out and interview the grandparents and I would interview a little bit of an older kid and maybe I would interview a mom and maybe I would talk to somebody on top of a float and they would all say really great things like how fun the parade was and how good it was for the community and weren’t they unicyclists great and all of the things that you might expect you hear and then I would go back and I would write my story and I would put in all of these quotes and I would give my story this big prize to my editor and they would cut out half of it why because it was repetitive it was duplicative and not necessary right they just needed to find a nice little nugget of story maybe one or two vignettes and then and then move on like leave the reader happy and like oh this was a good story I followed an arc there was a character if I put everything into my story everything into my report then it’s really hard to follow a storyline it’s really hard to know what I’m supposed to be doing and where I’m supposed to be going for my understanding your job as somebody who’s analyzing the data who’s analyzing the patterns is to be able to create a frame for the reader for your user so I’m going to give an example of where somebody did use data in a way that was kind of a notebook dump and this is a story of city workers in Fullerton is from a few years ago and what the reporter was trying to do was take a look at how many city employees earned more than $100,000 and so I might argue a little bit about the premise of the story in the first place because I’m not quite sure why that’s important there’s not a lot of context given is it a good thing the city workers are earning more than $100,000 like a hard-earned labor battle or and they were underpaid before or is it a bad thing because the city budget is in arrears we don’t know there’s no sense of context here so one of the things you want to think about when you’re analyzing data is you want to make sure you’re infusing it with that context but beyond that the the writer breaks a few other rules for one thing they they take those numbers and they just plop them in here in these paragraphs right you don’t need that many numbers even though you’re doing a lot of data analysis when your job is to is to kind of excise numbers from any report you do so that the numbers pop then the numbers that you use are the important ones the ones that people should pay attention to so let’s take a closer look at these numbers first the writer put that 26% of the workers earned more than $100,000 and seven earned more than $200,000 in that first paragraph there are one two three four five numbers really shouldn’t you have no more than three numbers in a sentence it just gets really hard for a reader to process all of that the other thing is a lot of times you might have a number and you can translate it into something that’s a little more understandable if I were writing this

story I would probably take away that precision of 26% and say one in four Fullerton city workers because it’s just an easier way to understand the information the other thing that this writer does is the writer says average total compensation for all of the workers all of the reported positions one of the things to know about reporting with data is that you want to understand why and when you would use average why and when you would use median because in salaries and in home prices anything where there is no upward bound or the upward bound can be very high the average can still so if there’s the mayor makes $500,000 and everybody else makes far less than the average would skew high you want in that that big $500,000 would skew the average you want to look at both the average and the median if they’re close together then you’re probably safe using the average if there’s a big difference between the median and the average that may be where you want to dig a little deeper or you might just want to use the median think about home prices if there’s a mansion on the end of your block the average sales price for your block is going to be skewed because of the price of that mansion the other thing this reporter did was they included part-timers in this average compensation now I would say that if there were part-timers who were making more than $100,000 that would be a story in and of itself but by including them in this data analysis it actually dilutes the impact of the of the overall story because it depresses what that average might be then the last thing that I see that this reporter did was that they repeated the thing the first sentence they said Oh 262 made more than a 100,000 oh that’s 26 percent I’m not sure why we’re repeating it and it just in another number heavy way so that’s those are kind of like examples of where a reporter did some analysis and then they put all of that analysis into a story but because they didn’t take the time to focus themselves and look at what they were trying to really report it it became kind of a muddy mess the reporter also made the data available this is clearly not a visualization there is some sorting to it the total is sorted and in order from largest to smallest but there’s also a lot of extraneous information again when you’re looking and working with data excise out any information you don’t need especially if it’s information you’re providing to your readership right so we don’t need the total mages wages subject to Medicare we don’t need the benefit pension formula which seems to be the same for everybody we don’t need the employee share pension contributions unless we’re doing something about pensions and we don’t need deferred compensation which apparently is zero unless we’re trying to write about the fact that deferred compensation is zero for everybody and so on you get the idea all of that information we don’t need it so let’s get rid of it so when you’re first getting the data how do you try to filter things out how do you figure out that focus spreadsheets are a great way to start you want to sneeze sorts you want to filter information you can use pivot tables so when we go back to that lobbying story we used pivot tables to be able to take a account of the number of cities or the number of entities lobbying the legislature so there were X number of cities and X number of counties and maybe the median for each one of those that’s a pivot table where we’re summarizing information by a category and we can also then do some basic charts spreadsheets are a great way to get going and data analysis if you want to join data let’s say the census data – the lobbying data then you might want to level up – using a database program that’s one of the things that we go into a little bit in my course so you can join data in a spreadsheet but it’s a little more complicated it’s much easier to use a database program and learn a tiny bit of querying kind of code that’s so that you can join things and kind of build up to a more robust level so what we did with this was we had a city code for census data it’s called the FIPS code made a city code for the lobbying data and where they matched we added in that census population figure so it’s a pretty easy thing to do but using a database program will help you do it much easier so this is an example a different kind of 100 thousand dollar salary story and this is one that the reporters at the Seattle Times did where we we looked at the data and we found a pattern by department so we created a pivot table and we looked at

we also pulled the data in overtime so at the same time that you’re winnowing and you’re kind of filtering the data you also want to be able to make sure you’re getting that context one of the things that you can often do is look at data over time as opposed to a snapshot they help add kind of information and insight so we looked at the salary data over time the number of employees making $100,000 or more and we saw this very market trend that there were three departments the utility department the fire department in the police department there their number of employees making a hundred thousand or more head shot up now of course there are storms and in those cases those departments are out covering all kinds of things so they have more overtime so you would expect them to be higher but what caused the spike the increase that wasn’t because of storms it turns out after we found this pattern we did a little reporting that all three of the unions for these departments renegotiated their contracts right before the Great Recession and all of the other departments in the city their contracts expired in the middle of the Great Recession and so their percentage increases for employees did not increase at the same level so what did we do when we went from data to narrative we told that story which was very focused but we also set our visualization so that that’s what people saw the instant they went to the visualization want to get this kind of pre-attentive look where you don’t even realize that what the message is but you just get it right away like just in a second that’s that idea of pre attentiveness and so by setting the theme of the visual to match the theme of the narrative we help send a nice focused message now we also allow our reader to explore the data and look at the other departments at the same time so how again do you go we’re going to talk a little bit more how do you go from data to narrative this is a picture of the Skagit River Bridge which is crosses a Skagit River on the way to Canada from Seattle on Interstate five pretty important bridge and it fell into the water after a truck hit one beam so it was crossing the bridge it had a pilot car the pilot car driver was not very attentive was on the cell phone on a cell phone and beard a little bit too far to the right the truck followed hit the beam made it across the bridge everything else fell immediately collapse into the water three cars went into the water this happened on a Thursday night I was at home feeding my children got called into the office and immediately downloaded this bridge inspection database so that I could figure out what was going on with this bridge that it would collapse by just one hit on one beam and cause this kind of calamity and this is what I saw I saw more than a hundred columns of information it’s overwhelming if I had printed this out it would have been about three or four inches thick so how am I going to make sense from that all of this data to what was going on with this one bridge well I can filter it I can look at just this one bridge so the first thing I do is I take a look at the bridge inspection database and I just filter records that apply to the one bridge and then I look at things like I read this data dictionary and and I was interested in one particular thing this critical feature inspection it turns out that bridges can can be something called fractured critical which basically means if one thing gets hit everything can go well gee I wonder how many other bridges have that same feature and can I identify that so then I broaden my scope I first looked at the Skagit River Bridge I find that it has fracture critical and I say how many others are like this bridge and I and I take a look and I find that there are seven other highway bridges that have similar risks to this Skagit bridge now my first query didn’t give me just seven bridges it gave me I think a couple of dozen but a lot of those bridges were maybe small County bridges with very little traffic not as high of an interest to my readership so again I want to find out the the pieces of information that are most important and that means I win oh I filter out everything that has a smaller traffic count and I’m looking just for highway bridges because that those get more traffic and they’re kind of more important in terms of truck traffic and commuting traffic and all those other things so then I was able to within about an hour with my colleague Michael in bloom tell a story about how there were seven other highway bridges with

Clinton low clearances just like the Skagit River bridge where an over height load can destroy a span so when we’re thinking about from data to visuals the with the mapping with the bridge story the next thing that we did was we took a look at a visualization where you can filter where the reader can filter and we can again create that frame so we start with the number of bridges and then we winnow down and we winnow down and we winnow down to be able to find bridges that are structurally deficient that have problems it may be interesting for some of you to note if you’ve ever driven in Seattle that the viaduct so the sufficiency rating the structurally deficient rating goes from I believe zero to 50/50 being good zero being bad and if I recall correctly the least structurally deficient rating for the viaduct which which is no longer there and it’s been replaced by a tunnel was four so we found a real problem with bridges overall just because of this one calamity that happened with the scheduled River bridge when you’re trying to go from data to visuals you also need to know kind of the right grammar you need to know which charts are the most appropriate charts as you go along so so here we are taking a look at a story that we did at the Seattle Times at the kind of the impact of the Great Recession on the Washington economy what we used were unemployment charts and these are and this is information over time what was the unemployment rate over time from World War two on now we could have done you know one big line chart all the way across which we did and not do all these smaller charts on the bottom but we wanted to be able to show the discrete periods of time and what happened like right after right around the world war two right around like a variety of different kind of recessions through time and then we wanted to be able to highlight the ones that we’re most interested in which is the Great Recession so you’ll notice that all the small snapshots of the of the fever lines are gray and then the one to the right is red and then that is also carried through in the overall visual at the top so the grammar here to keep in mind is that line charts show trends over time and that’s that’s the best way to kind of display numerical data if you have it the longitudinally but you also can use when you’re building your visualization you can use color to convey a mood or a tone and and you can use your narrative to do the same thing we even actually use the word red in the first sentence of this particular chart right where we’re talking about kind of the this sea of red essentially that we had found ourselves in with with unemployment figures and then we used the color red in kind of a innovative artistic way because we wanted to bring across this kind of sober feel how did we do that that big line chart we actually did watercolor one of our artists went out into her studio and painted a watercolor of this unemployment chart the whole point here is that that helps set a tone and a mood and you can do that with data if you if you are thoughtful in terms of how you create that tone and and you’re using the right chart in the first place so a bar chart is one that you would use when you want to show categorical data this happens to be searches per every 1000 stops by race from the State Patrol’s and we broke it up into all races Asian black Hispanic other and white and we just were looking at the difference that’s right now you’ll notice that these are not sorted they’re just there well they’re sorted alphabetically we’re just there’s very few categories so it’s easy enough to look at this way bar charts that are vertical like this are great for small slices of categorical data what if you have a lot of bar charts it could become really tiring to try to look all the way across but you can then use a horizontal bar chart which which can show the categorical data and it’s also best used when order matters so this allows us to sort by the number of stops by county for example and it’s just a really way to get a quick and visual and understand what’s going on across these counties so that’s another way to think about how you can use bar charts and especially again these horizontal bar charts are really effective when you’re looking at sorting one order does matter you’ll notice that both of with both of these bar charts one thing I did not do was create a different color for every bar because that’s really not necessary here and it

would just clutter the visual so it’s the it’s the kind of the size of the bar or the width of the bar that matters here not the different County the the names of the county in terms of coloring them something different that would actually make it harder to understand this is a more recent chart from the pandemic and so you’ll notice it’s of the whole US this is the confirmed Coby cases but what we’re looking at is a normalized version right so if I had just mapped Kovach cases for the whole US what you would see is not this you would see instead stirrers based on on centers of population and only clusters based on senator centers of population instead you can see some counties kind of in the Mountain West in the Midwest that are actually rural but have high rates of kovat cases there’s a small County in Montana it’s called tule County and it’s got a very high rate because it had a even though it’s a rural County it had a lot of cases based on its population that means what we did was we we did another per capita rate we did per one cases per 100,000 population and that’s what we mapped because then it makes it much easier to really understand what’s going on and to be able to compare these different regions anytime you make a map you want to make sure that you’re not just mapping population clusters because then it’s not an effective map think about what you’re trying to communicate and whether a map is necessary the other thing that we did is we wanted to be able to target our story to our audience so here while you’re seeing a national map one of the things that we did with this map was we made an embeddable at the state or county level so let’s say you are trying to get out this you know something that’s mapable at a local level to your county or your community then build the map at that level or create a way to do that so for example here’s Louisiana if I want a local newsroom to be able to use this map they can take the code and in just embed the state of Louisiana and then you have a tooltip that allows you to look at the various counties so that’s kind of a really important thing to think about is who is the focus who is your audience and make sure that you’re targeting your visualization and your narrative toward that audience so here’s one more in terms of knowing the right grammar and that is what do pie charts show so pie charts show parts of a whole but if you use more than two or three slices it’s going to be really hard to understand if I were using this data and actually this is a really sad example because was indeed published by a major newspaper which I won’t identify here some time ago and this is their city budget so I could do two different things with this pie chart one is if I am trying to show and again it depends on where my focus is what is my data analysis showing me and where is the focus of my story if I want to show that the police and fire departments make up more than half of the city budget then all of those other colorful slices are going to be turned gray and the category is going to say other and then I have a story with a focus and in visual with a focus right if I wanted to show in some kind of order all of the various departments and how much money they were getting I would probably turn this into a horizontal bar chart sorted by in descending order so it would be police fire sanitation and so on all the way down so that’s kind of an important thing to think about if the focus of my story might be white Cultural Affairs got no money or very little money compared to all of these others and I might think about again another type of bar chart information to show it but it probably wouldn’t be a pie when you’re trying to build a pie chart think about pac-man you really want to be able to have the simple visual of two slices maybe three at the most and then kind of finally one of the really important things to think about is that behind again every data point is a human story just like Algona and they’re trying to get the six dollar bang for a buck so they could build a community center you’re looking for that human story this story is a story that we did an investigation into the use of prescription methadone and a preferred drug list at the state of Washington and a lot of states have where they basically said if you were poor if you were unmedicated and you’re in chronic pain the doctor was required to use this preferred drug list and a prescribed methadone while methadone had more side effects and there were more fatal

overdoses so what did we do we used a map but we overlaid we use census data to show the median income and then we mapped every death and so we were able to show that in areas of with lower median income had far more deaths from prescription methadone than areas that were affluent about I think one or two days after we ran this story with all of these various visualizations the main visualization mean the one in the center and the and and we found through the data itself of an impacted family who had lost a family member because she had a fatal overdose we used their story their personal story to help explain what these patterns were in the data one or two days later the state updated its preferred drug list they put an emergency warning on the risks of methadone and then they stopped the whole use of preferred drug lists for the state so that’s again going from evidence to impact but using storytelling to get there so this is kind of a kind of a quick highlight of where what I cover in my course the things that you learn in the in this course will be that evidence to impact kind of focus the best practice is in normalizing and analyzing data so you can get there how to translate data points representative samples from the data into stories how to tell the stories in a visual way and then effective tools to both analyze and visualize data as well as the search techniques you need to find the data in the first place so that’s it for my presentation I’m happy to take some questions thank you very much excellent Thank You Carol presentation very helpful and in many different ways we have gathered a range of questions from the audience so let’s go ahead and get started great first the first question is we’ve had a number of people asking about different types of Technology search engines but I thought one that was really applicable and in many ways with all the information we just gathered was that what would you recommend using for data analysis if you don’t know how to code huh that’s a great question there are multiple tools out there that you can you can use for data analysis without coding of course a spreadsheet always a good thing you don’t have to know how to code you can level up with the use of formulas to do more complicated things so spreadsheet whether it’s Excel or numbers or a Google sheet is a great way to start but if you want to then do some of that more complex work where you’re having to join in different data sets there are still some other tools that are very useful tableau public is free and allows you to visually both visualize and analyze data and do joints and data cleaning at the same time also there’s a program out there called workbench and it allows you to kind of step by step it’s all in the browser it’s all in the cloud you import your data into it it might be listed under like CJ workbench or computational journalism workbench but it allows you to kind of step-by-step go through an analysis and then you can always remove a step or add a step and in you it’s it’s kind of where you point you click you add your steps and it walks you through that process there are also a lot of tutorials and examples for how to do that so those are those are a couple in terms of data visualization another one that’s really really useful one is free is a site called flourish FL o URIs H and it is a really great way to visualize data and it’s without having to code excellent thank you so another question that we received was how do you know if an example you pull out from the data is representative of the data as a whole yeah that’s a really important question and so that’s why you really want to look at that overall shape of the data so a lot of times if you’re looking at numbers for example you’ll get that kind of bell curve you want to find your examples in the middle of that bell curve unless you’re trying to focus on an outlier right so that’s really important to think about like Algona was an outlier but first we looked at like well where was most of the money being spent was from cities that was our pattern and then now let’s take a look at well you know what’s going on in the cities and we found an interesting outlier so you just want to kind of figure that out with the methadone story we we knew we needed to find like the pattern that we saw was that there were a lot of cases

in these kind of lower-income areas so we so we wanted to not find an example from a higher income area we wanted to find an example that represented the data so that’s those are the ones those are the data points who looked at so you can kind of like look for that bell curve and find what’s going on in that main bulk of it and ignore those outliers too to make sure you’re getting a representative example great thank you another question we received was related to what we’re seeing now with coab in nineteen there are many articles that are being published about Koba 19 and this is a complex topic and and hard to explain to readers and and yours and how can an author choose or design create and publish data at regarding whether it’s Kovac 19 or any scientific news or research that can help the reader understand this complex complex topic yeah that is such a challenge I mean the main thing that you want to think about is how do how to make it how to break something complex down into into small discrete kind of steps or pieces and tell the story in those in those discrete spots so a lot of times it’s it’s making that chronology making a narrative that goes from this thing to this thing to this thing the Washington Post has a great visualization called simulate Asst that basically does a step by step like this is how a disease you know transfers and now let’s let’s make another step let’s pretend we have this kind of population and let’s see what’s happening but now let’s take this kind of population and see what’s happening and it really breaks it down into very discrete pieces and that’s one way to try to make things a little more easy to understand but the other thing that happened with that particular visualization on that example is that the reporters also talked to the experts right they didn’t just say oh here’s data and here’s what I think is happening and I’m going to visualize it they had a reporting step a process in between there when they said here’s the pattern that we see what does it mean I don’t fully understand it I’m going to go find an expert and talk to them about what I’ve seen is what I’m doing accurate it does what I’m doing makes sense and how would this expert try to how to synthesize it for the year and then I can use that to help me with my storytelling excellent thank you the next question is somewhat related even if your data if your data indicates a trend one how deep do you do to explain that data to you know uncover the truth and making sure that you’re on you’re not biased in your answer plus you know how far do you go to differentiate between painting a rosy picture or including the outliers to show a different story yeah so I think I’m gonna start with the second part of the question first because I mean of course just like with anything you can try to paint a rosy picture or a dour picture really as a as somebody who’s analyzing data for insight your your goal is to paint the data to be able to tell the story and its most accurate right so you want to check the veracity of what you’re doing all the way through and so if your example is rosy because it’s representing what’s happening in the data then that can be very useful and that’s fine but if it’s a complete anomaly then how are you going to cast that so there is a field called solutions journalism and the whole idea of solutions journalism is is that sometimes you don’t want to just say oh you know here are all the bad things and we found it in this data but you might want to find outlier examples and use those as a way to say look this is one place where it’s being done right and this is what they’re doing and how it’s different from what the majority of the data points show right so there are ways into those kinds of stories that are effective and and don’t kind of initial lead the audience so yeah I mean I one the other thing I would say in terms of like how to make sure you’re accurate is again you know like if you’re pulling out just one data point and it’s the only data point like it in the in the whole analysis you don’t want to represent that that’s something that that is typical right so you just again have to make sure that you’re you’re checking yourself all the way along is this is this based on on the pattern great thank you

this was an interesting question yeah so a lot of times data is not numerical and that doesn’t mean you can’t count it though but you can turn it into into numerical things so for example if you trying to give us some good examples I have maybe reports of certain things right and I get a report from a government trying to think of examples let’s say it’s disciplinary reports of doctors and it’s all text just you know just reports like paper reports this doctor did this thing that was you know not appropriate and this doctor did that thing that was not appropriate there’s no counts of doctors but I can take all of those reports and try to categorize them myself so I basically take that that text data and I turn it into categorical data and then I count my categories and so then I can do something that’s kind of a little more effective with the analysis then you see this even with with they’re not quite as popular now but it used to be where you can create word clouds right which word shows up most frequently that can be useful sometimes in fact there is a there is a open-source platform out there called overview docks where you can import PDF files and then tag them based on the types of things that you find and export out those tags with counts you can also do word clouds just to sift through the information like I want to just look at all of the times a particular phrase shows up and all this trove of documents and I can then create a word cloud of that so that I can better figure out what’s going on there are a lot of ways to kind of look at narrative or textual information super thank you let’s see here time for at least one more question how do you write a story a report that relies on data but without using too many numbers you touched about that touched on that a little bit in your presentation and then how would you do that we talked about it a little bit right now and then how does that go deeper and for the course that you teach yeah that’s a really great question so I really try to make sure that I always try to think of the data as kind of like points into humans right and I so I try to figure out first that representative pattern and then find the points that match up with that and then I do actual just reporting making phone calls talking to people finding finding more detail about those specific points of information so I can get the backstory in the context and then I write that out in a narrative way and there are a lot of tips and techniques that you can use when you’re writing narrative lis about numbers one of the first ones is that you don’t use too many numbers I try never to put more than three numbers in a in a paragraph and I certainly wouldn’t want to stack a lot of numbers where you have paragraph very number dense and another one and another one I want to tell the story from those numbers and then I use the numbers sparingly but with greater impact and I go into this quite a bit in the course in terms of how to tell that narrative story and how to how to look at the data in a way that helps me tell a narrative story ready thank you we have let’s try one more squeeze one more question in how do how do you know the data you are using doesn’t have accuracy problems with number of questions about accuracy and activities so if you can address that yeah that’s that’s such an important thing and from the very beginning when I get my data I do a lot of kind of veracity checks so you can do some things that help you get a sense but whether you want to trust the data that you have or not so let’s say you get a data set and there’s a column that has a lot of missing information like it’s just not filled out all the way then you might wonder like well why is it blank in some cases and why is it not and so that gives you a way to then go back to the one to the entities that might have the missing information and and find out what’s happening I’ll give you an example of this Uniform Crime Report data which is published by the FBI is really interesting data to use but you never want to compare cities to cities why some cities don’t participate it’s a voluntary program and for example Chicago does not send their uniform crime report data to the FBI so Chicago is always missing so you want to know those caveats you want to make sure you understand the flaws in the data you can sort the data in a variety of ways to look for for kind of outlier numbers that don’t make sense so let’s say you have salaries and there is a salary

that’s like a million probably a typo probably not real right so you kind of make a filter for those outliers and then you do some reporting to make sure you can update them to accurate numbers or figure out what’s going on or if in worst case scenario exclude them entirely sometimes the lack of good information might be the story might be the thing you want to focus on if you’re doing a report and you’re supposed to be tracking a certain thing but half of the branches in your company are not reporting it in a way that’s useful then then that might be something where you’re actionable no item focuses on where your story telling your report telling focuses on and sometimes just being aware of the gaps and how the gaps can tell a story is really important and I actually go into that quite a bit in the course excellent thank you that is time we have today for questions I want to thank you Jarrell and all of you in the audience for joining us today I’m also if you found this presentation to be helpful in any way we encourage you to share the recording with their colleagues friends and family or whoever you think will find this webinar useful again thanks for taking the time to join us today and I hope you all have a great rest of your day thanks again thank you