Telling stories with data
* I am gonna talk about data visualization.
* Go through examples.
* Mention some specific technologies along the way.
* Feel free to stop me with questions.
Alan Palazzolo
Interactive News Developer
* I am not a data visualizatoin expert.
* I don't really have good design skills
* I am a programmer, a builder
* I work on the web and it is my medium. So, most of my examples will based around the web, but data visualization can be in many places.
* Interactive News Developer at MinnPost
* I try to make the news more exciting.
* Our goal is to make news, not work on the systems that provide it.
* We do great work because we have a great (though small) team.
* Kaeti and Tom
Twin Cities (Data) Visualization Group
meets the 3rd Tuesday of each month
* I also run the TC Data Viz meetup.
* All are welcome (though it fills up usually).
* Our monthly meetups are presentations by community members.
* We recently did a hack day (more on that later).
Data visualization is __________.
* To me, data visualization is just as generic as it sounds.
* It's really any visual representation of data or information.
* The other day someone was saying "Yeah, we have some charts and graphs, but its not data visualization" and I was a bit speechless.
* The key is that the goal of a visualization is to communicate specific parts of the data and in an effective way.
* It's charts, graphs, maps, and many other design techniques.
* Noun Project
* It's functional art (as opposed to a fine art).
* Functional Art means that it serves a practical purpose.
* Furniture or Architecture are a functional art.
* A chairs main purpose is sitting.
* If it doesn't serve that purpose, it no longer an effective piece of furnture.
* A folding chair does it, often not very comfortably.
* But it needs to be more than just practical.
* It needs to be comfortable, inviting, and easy to use.
* It often can server a specific function.
* While still maintaining ease of use.
* Almost most importantly, it needs to look good.
* The best kind of chair brings together form and function perfectly without sacrificing either.
* This is the same with data visualization.
The new hotness
* Data visualization has been going on for a long, long time.
* So why am I talking about this today to you?
* Why is "data visualizatoin" such a hot topic right now?
* In my opinion it comes down to a few things:
* Well, there are computers. Lots of them.
* Specificially, our computers can visualize things much easier as they become more powerful.
* Designers and artists use computers more easily.
* The internet, its prevelance and increased bandwidth, has allowed us to see (good) data visualization often.
* And I am a firm believer that the success of open source software and mentality has allowed many people to build amazing things inexpensively and collaboratively, especially on the web.
A simple example
* Let's go through an overly simple example.
42
* Here we have data.
* The number 42.
* And here it is visualized.
42%
* Let's add a bit of context.
* It's very important to think about who your audience is and what supplemental information they may need.
42%
* Let's try a different background.
* Colors are extremely powerful.
* They can provide tone and mood.
* They also convey information both implicity and explicity.
* I just like this blue.
42%
* But the blue was a bit dark, so lets try something lighter.
* This is good enough.
42%
* Let's make it bigger, as it is the focus of visualizaiton.
* Again, size is a really important thing to consider
* Like color, it can convey many things.
42%
* Let's try a different font.
* This serif font really doesn't work.
* Serif fonts are usually better for longer pieces of text.
42%
* A sans-serif font usually works best for numbers.
42%
* Let's increase the opacity of the number just a tad to let it blend into the background.
* Opacity can also be a good tool to provide information such as intensity.
42%
of people like the number 42
* Oh, and right, no one actually knows what we are talking about here, so let's add a label.
* Labels and annotations are extremely important. But should be done sparely.
42%
of people like the number 42
* But if we think about our audience a little more (ie. you all), then this background is not really great for a projected display.
* So here we have our final piece.
* Pretty fancy.
Some more data
* Let's talk about a set of examples that uses more than just a single number.
* I am a huge fan of maps.
* My partner and I actually collect maps and we call our basement the map room because that is where hang them.
* A lot of people don't think of maps when thinking about data visualizations. Or they think about putting their visualization on top of a map.
* But that is what makes maps so successful.
* Good maps get out of your way and allow you to find the information you want quickly.
* Maps, especially street maps, are driven by data. Lots and lots of geographical vector data.
* Cartographers take that data and make decisions about symbols, colors, sizes, opacity, etc and turn it into what we see.
* Let's go through a few.
* Most of the maps I will show you are really basic, or what most people would call basemaps, but there is plenty to look at.
* Most of the maps are also driven by the same data source, OpenStreetMap (the wikipedia of maps).
* This one here is very muted
* Even as you zoom in, only the labels get much attention
* This makes it a good candidate to overlay with other data
* The roads really come out here
* As we zoom in, the park areas get very bright
* Zoom in more, more symbols show
* The Map of the Dead, the zombie apocalypse
* Obviously the colors are trying to set the mood
* Zoom in and all the secondary roads show up very distinctly
* This is example from MapBox, a great service that allows you to make custom maps like these.
* This ones starts to bring in elevation data and does some hill shading. Not that drmatic in Minnesota.
* It also shows land use more. You can see the urban areas.
* Roads are a bit muted
* Water reallys jumps out
* Zooming in, still muted, but the building footprints become pretty obvious
* This one is what Github uses for its new GeoJSON rendering feature
* It's overall pretty muted, which is good because its sole purpose is to overlay people's data.
* The roads and the water are kind of the same color
* The next few are from Stamen, an amazing studio out in SF that makes some really powerful data visualization
* This one is Toner and has the express purpose of wanting to be printed
* High contrast
* Hides a lot things
* Puts road lables along the road
* Terrain from Stamen
* This is more like a traditional road atlas.
* Very focused on roads but still trying to give us a sense of place
* Watercolor from Stamen
* Not that focused on the functional side
* STill kind of road focused
* Park Tiles from the National Park Service
* Very focused on showing where national park areas are.
* No roads
* Only really labels of park areas
* Greenpeace map focusing on attention to ocean issues like fishing and whaling.
* The land is totally dark
* Elevation in the oceans
* Highlighted areas of concerns
* Then there is satellite imagery
* You might think its just a photograph, but there is often lots of processing that goes on top of the raw images taken. This will alter colors.
Robberies
* At our first TC Data Viz hack day, a day where about 50 people from the group met to just work and learn together on projects and technologies.
* I was given a set of data from the MPD of Robbery incidents so far this year.
* I got together about 3 or 4 people to spend a few hours to see what we could come up with.
On July 18, 2013 a robbery happened involving a business near the intersection of Lake and Portland at or around 11:35 PM and has been assigned a case number of ABCXYZ123 .
* This is an example row of data.
* Date
* Time
* Place
* Category
* So, we first just bring it into Excel and do some basic grouping and charting. Excel charts are not a great finished product but can be really awesome for finding trends and doing quick analysis.
* This shows amount of incidents for each day during the year.
* We can see that overall the numbers have increased a bit
* This show number of incdents per weekday, starting with Sunday
* More on weekends
* This is showing number of incdents by time of day
* The high being between 10PM and 3AM
* The same chart but with Excels awesome formatting options
* I also spent some time making a heat map really quickly to maybe see what places get more robberies.
* Then added some filters
* We can confirm some of our previous observations
* This is piece done in Tabluea
* Shows all incidents for the year
* Underlays income levels from the Census
* Then looks for any statically significant outliers in each category of robbery
An overview of crime
* The last example I will go through is a crime dashboard we made for Minneapolis at MinnPost.
There were 10 robberies in the Wellford neighborhood in the month of October, 2013 .
* This what is on their website
* Number of incidents by category by month by neightborhood. Going back about 10 years.
* We tried to get to more data through a Data Practices Act but it was slow and we were already almost done with our project by the time we worked out all the detauls.
* Very data-driven.
* Hopefully you can come up with the idea you want to communicate and get the most approriate data.
* But often the data isn't that flexible, so the data drives the visualization a lot.
* Lots of data processing. Formats. Aggregation.
* Data visualization often means a lot of data processing, maybe more so than the visualizing part
* (go through interface)
* When it comes to crime, its really hard to choose one metric that can describe how crime is behaving
* Ultimately we want to show the idea of safety.
* This is tough, let alone with the little data we had.
* We try to show a current snapshot of crime
* The idea of safety includes comparison, both historically, but also to neighboring areas.
* That is why all these are the same size (except population)
* We combine some categories to help pear down the categories while still knowing that not all crime is equal.
* We try to show the long term trend. But we also want to take into account the seasonal difference of crime (higher crimes in the summer), hence why we choose this metric.
* Colors are automatically assigned through a version of k-means clustering
Thanks.
Questions?
Code for slides are on Github .