Hello, everyone, My name is David Pieczkiewicz.
I'm a Clinical Assistant Professor and Director of Graduate Studies for the
Institute for Health Informatics, and we run the Health Informatics program at the
University of Minnesota. I'm a colleague of Karen Munson and she
asked me if I could talk a little bit about data visualization.
So what I have are, is a little presentation on, called, A Few Words on
Data Visualization. Hope you don't mind that title.
I'd like to start out with a story a little bit.
This is a story of four sets of numbers and I have them right here.
They're, they're pairs of x and y's, so you see them as, as a four columns here.
You have x1 and y1, that's one set, x2, y2, et cetera.
And you can look at the numbers and, you know, even just by looking at them you
can tell, oh, okay set three is different from set four and so on.
so you can see small, small changes there.
But if you do things like calculate statistics, and by that, I mean
calculating means and standard deviations and so forth, you'll get, basically, for
all four sets identical numbers. And I have those here on the right.
any of the x's, if you take the mean of all x1's or all x2's, etc.
It's always going to be 9. The means of all y's is going to be 7.5.
The, the variance of x is going to be 11. The variance of y is going to be 4.12.
If you try to correlate x and y, all four sets are going to have exactly the same
correlation, 0.866. And if you try to draw a straight-line
regression to all four of those sets, you'll get exactly the same equation.
You'll get y equals 3 plus 0.5x, whatever x happens to be.
That will be what y is. So from a statistical standpoint, if you
just crunch numbers and looked at these, these four sets of data, you would say
they're statistically identical. Okay.
But they're really not. Well, they are statistically identical
but they are not identical. because what we're lacking is actually
showing what's going on with the data. If you actually took the x's and the y's,
and you plotted them out, you would see a very different story.
So, aside from seeing all the means and variances and so forth being the same,
the story that the data tell, very different.
So, here I have that four sets of data, one, two, three, and four, you should be
able to see it there. And even though they have the same
straight line regression, you can see that the patterns of the points are all
very different. And the point that I'm trying to make
here is that you can do number crunching, whether it's Excel or whether it's with
SAS or R, or some other tool that you have.
You might get one answer that will tell you one thing.
But a lot of people who work with data visualization, such as myself, would say
that's not always going to give you the whole story and you might be missing out
on something important if you don't plot out these numbers.
So that's what I'm trying to show right here.
And collectively trying to graph and show what data do in a visual sense is often
called data visualization. And we've been doing it for a very long
time. we have various bits of medieval evidence
about things that have been charted out or graphed out as early as the ninth or
tenth century. in the field of medicine, which is what
we're concentrating on here we've had we've had like John Snow's Cholera Map,
for example. Karen tells me that you've already talked
about this. There's also Florence Nightingale who did
Coxcomb graphs and so on. so we, we've been doing this for a while.
Sorry, backing up on the slides a little bit.
What we generally see as modern data visualization, a lot of the things we
take for granted, like line graphs or bar graphs or pie charts et cetera.
A lot of them have been around since the mid to later 1700's actually.
a lot of it is due to a man named William Playfair who developed a lot of these
techniques or at least first codified a lot of these techniques for economic
analysis. But as I've shown we've used this kind of
thing a lot for medicine. Certainly a lot now, we use it for, well,
really, all kinds of things. For business, for medicine and, and so
on. Well, why do we do all of these things?
Well, we usually do it for one of two reasons.
We're usually out to either persuade people, and that's why I have Ross Perot
here. if any of you are old enough to remember
Ross Perot. He bought godly, ungodly amounts of
television time, so that he could show what was going on with the debt and what
he thought should be done about it. So he was educating people about it and
using graphics very persuasively. But what I'm going to focus on today is
really about visualizing data to do things like explore.
And to discover things. So to show things that you didn't
automatically or ordinarily see, especially if you just did the straight
numerical statistical analysis. And I have a quote here at the bottom.
this is the greatest value of a picture is when it forces us to notice what we
never expected to see. And that was a quote by John W Tukie.
He's a statistician. Some of those people I was very
imminently quotable, so you'd always have this little quotes and things that people
would have. This is my favorite of, of the stuff the
he, he did for his quotes. And this is really to me, a lot of the
essence of data visualization is helping us to see what, we didn't really expect
in the first place, either us as researcher or us as clinicians, or us as
patients, etc. And then maybe once we've explored it,
maybe then we'll use it for persuasive uses.
Okay, skipping over these things a little bit since I already talked about them.
When, when we're doing things like putting together and reading
visualizations we have to really think about how the human brain works and about
how we analyze data in a visual way. And so part of data visualization
research has to do with, okay, what are the processes that, that go on with
people? And to distill a lot of it, we, we can
clump a lot of what, what we do in our brains in terms of what people will call
perception and cognition. And perception talks about the, the
low-level activity of really sensing the things in a display.
It's really, how does the light hit your eye?
How does how does the sense of color reach your brain, et cetera.
These very, very low-level things that we're not really thinking about but are
done automatically by our brain and our eye, and our other bodily systems.
And that contrasts with cognition, which is the higher-level process of actually
interpreting the display translating it into meaning and, and actually thinking
about it. So really the challenge of data
visualization researchers, like myself are trying to use what we know about
perception and about cognition to to make visualizations better, to make them more
appealing, to make them more effective, to make faster, etc.
So, we're looking out for all of those things.
And this slide here distills a lot of work that had been done in the 80s by
group of people about, now, what are some of the basic perceptual things that our
eye and brain system do? And they would do thinks like, they would
show various displays to people and ask them to make choices, you know, which
which dot is further to the left and the other dot, or which dot is, is bigger,
which dot is brighter, which angle is greater, etc.
And they would see how long these things took.
And so, they got an idea of how quickly we do certain things as opposed to doing
other things. And what I have here is sort of a table
or list of things that our eye and brain system do very quickly.
and then as you go down the list, they become things that take us a little bit
longer to do, and a little bit longer to do, and so on.
For example, if we have aligned scales, like two lines that are, are lined up and
parallel with one another, then figuring out the position of those gray diamonds
that you see there, is going to be very easy for the eye and brain system to do.
We can easily say, the one at the bottom is more to the right.
We do that very quickly, pretty automatically.
If you start doing things like unaligning the scales, then it takes us a little bit
longer, we have to decode things a little bit more.
if we're, we're trying to decode lengths that, that can take a little bit longer
as well. Figuring out things like angle and slope
take longer still. Figuring things out like area the
differences of area. You could see those two circles that are
there. the one on the right is, is just very,
very slightly smaller. But it's kind of hard for us to tell
without sort of looking at it, we need a little bit more detail than looking at
those lines that are at the top, for example.
It takes us a little bit more effort. And finally, there's also things like
volume that is the 3D space that a thing might take up.
That's harder for us to do still. And now there's color and saturation.
Which you might think, oh, you know we can sense color very easily.
But doing things like figuring out what color it is.
And What relationship this color has with that color?
And so on. That actually takes us longer than you
might expect. And so, the, these perceptual tasks, the
experimentation that was done on this. What came out was the idea that if we're
going to do things with visualization, try to favor the things that are more at
the top of the list like position along a line scale, position along on a line
scales, length, and so on. And that's actually one of the reasons
why we see line graphs so often, for example.
Or we see bar charts, because those are, if you think about them, ways that we use
positional along the line and unaligned scales, and try to tell things about the
data. So we're really trying to, to adhere to
this kind of list, okay? Now, when you're thinking about cognitive
tasks, the things that actually take us a little bit longer to do, that we're
actually, you know, using some brain power that we're thinking about we, we
also have some idea based on theory and experiment about some the basic things
that we do. Things like extracting values, like
telling hm, this bar represents 2.5 units of of whatever.
it can include proportions, too. we can also talk about value comparisons,
like we're looking at two quantities or two things that are visualized, and
trying to make comparisons between the two.
Which one is, is greater than the other? Which one is less?
does it look like there's, there's anything going on as far as trend, and
that's really the third thing. If you look at a bunch of comparisons can
you see something that is going on over time or at least over the space of this
visualization that you created. And there's this idea that between the
perceptual task that I talk about on the previous slide and the common cognitive
task that I talk about here, that different displays, the different ways of
visualizing things have different degrees of cognitive burden for different tasks.
And part of what we do in visualization is try to figure out, okay how do we best
match the task that we have with the particular kind if visualization that
will work the best? And you might say, well, you were just
talking about. if, if we just had things along the line
scales, wouldn't that work? Well, that works in a lot of cases but it
really depends on what the visualization is for and what you're doing with it.
So, let me, let me show you with an example here.
This this slight head is two views of exactly the same data.
on the left is a tabular display of the data.
And on the right is a graphical display of the data.
this happens to be FEV1, which is forced expiratory volume after one second.
So this is from lung transplant patients who breathe into a spirometer.
It's a device to measure your breathing rates of flow and volume and so forth.
And this measures how many liters of air a person can blow out if they breathe in.
Really deep breath, and then blow it out, really hard, really fast.
How much air do they let out in one second?
And these numbers are in liters, okay? So what we're seeing here are over about
a two-week period. The FEV1 levels of a lung transplant
patient. And I've shown it as a table in one, and
I've shown it as a graph in the other. And what's going on here is that in the
table, it's very easy for us to see exact value.
So if we're doing things like I need to get the exact value of what this person's
FEV1 was on May 21st, 2000 I could look at it and I could say oh it's 1.038.
That's very, very easy for us to do because it's really just looking things
up. Okay?
But that same task is a little bit harder to do when you're looking at a line
graph. Because, if I'm trying to figure out,
what is exactly the FEV1 of a person on May 21st.
I have to go along the horizontal scale. Look until I go to May 21st.
And then go up the graph a little bit, and see, okay.
Where is that point? It looks like it's a little bit at 1.050
maybe a little bit below. It's a little bit hard to tell.
But the idea is, if I'm trying to extract a particular value from a line graph.
It takes me a little bit longer than it does from a table.
But we can turn this upside down and say, there are certain things that I can do
with a line graph that are much easier to do than we could do with a table.
For example, if I looked at that table and I said.
What is the overall trend of what's happening with the FEV1 with this
patient? Well, I basically have to look at each of
those values and put them together in my mind.
And get this mini narrative of what's happening.
But what's happening with the graph, with the line chart that I have on the right
is that, that is a single shape, and I can apprehend that as one object.
So, the table is many objects that I have to integrate in my mind, but the table is
one object, and if my goal is to figure out what the trend is, there's a lot
less. Cognitive burden involved in actually
looking at the chart that's on the right. So, figuring out what the overall what
the overall trend is. it looks like the patient is doing
overall the same, there might be a little bit of a, a problem.
It'll look like the person was doing pretty well up until the 26 and it looks
like they're declining. Is there something going on with the
patient? But this is the kind of thing that a
physician or a nurse might look at this data and say, hm.
You know, this is important information for my treatment decisions, or it's
important information for to show the patient for so they can make up their own
mind about their treatment, et cetera. Okay, so there are all these things that
go on when you're doing visualizations and the best visualization really depends
on what am I doing with it? So, if it's value extraction, the table
might work the best. If it's trend detection it might be the
line graph that is the best, so it's, it's very situational.
So, you, is really, you know to sum it up, it's really no one best way.
The best format is really very highly task dependent.
And sometimes depending on what you're doing, you may require combinations.
I may be doing trend detection and value comparison.
So I have to think about what are the things that might actually work, in
tandem with one another? So maybe the best solution is more than
one way to visualize in the day that we're doing some kind of split scheme.
And there are lots of published guidelines that are out there that give
best practice tips on what are the best ways of, of laying out a graph.
Or, you know, if I know that I want to make a line graph, for example, what is
the best way of doing it? And sometimes these, these have different
aims. And so the advice that you get is really
tailored for, are you doing this for persuasive graphics, or are you doing
this for statistical graphics. So, it again comes back to that thing
that I was mentioning previously about, are we using visualization to persuade
someone else of something that we found or are we using graphics to discover
something that we haven't seen yet? Sometimes those things work at cross
purposes or have different different kinds of recommendations.
And the thing that you should take away is that even though there are many
published guidelines unfortunately there really is no general theory of data
visualization, at least not yet. That's something that I wish there was a
grand unified theory of visualization. It would make what I do a lot easier, but
it just doesn't really exist, okay? This slide sums up just very, very
briefly some suggested practices for putting together data visualizations and
I talked about a couple of these things. If you're doing something like value
extraction, most people who have done work in visualization will say, well, a
table might work the best. Even though it's not fancy, even though
it's not, you know, a huge visualization kind of thing, it is still a way of
outlining data. And it might be the most effective for
value extraction. If you're trying to show proportions pie
charts or stacked bar charts might work really, well for value comparison bar
charts or line graphs or sometimes scatter plots.
And I have examples of these just very, very small in vignettes on the slide
here. And then finally for trend detection,
there might be line graphs. Okay, there are many kinds of exceptions
to these. This isn't I wouldn't really offer this
as a harden fast rule but it may give you some idea of what might actually work the
best in a given situation and use that as a starting point for the kind of
visualization that may actually work best for your situation.
But the underlying idea again is we're trying to use the design or designs that
help minimize the cognitive burden for whatever that it is that we're doing.
[SOUND]