So you usually don't want to read that kind of stuff in, so

looking at the bottom of the data set can, so to speak,

can kind of help you to see if there's any of that junk down there.

And you want to check to see is the data formatted correctly?

Does it look like the right numbers are in the right columns and

the right numbers are in the right rows?

Sometimes those things can be shifted by one or two.

So you wanna make sure you've got everything kinda correct there.

If you've got data, for example with dates, often looking at the top and

the bottom can be useful because if they're sorted by date,

you can see if the range is correct.

You know, if the earliest and latest dates are correct.

And so that's another thing that you might wanna check for.

And so, just looking at the very edges of the data set can be very useful to flag

a number of basic problems that can very often occur and are usually easy to fix.

But once you kind of get into a data analysis,

if you discover these things later, it can be a real pain in the neck to deal with.

So the next item that I always try to think about when I'm looking at a new data

set, I'm just getting involved in a data analysis is what I call ABC,

always be checking your Ns, okay?

So every aspect of your data set is gonna have some kind of count or

number associated with it.

For example, there's gonna be a total number of observations, or

your sample size.

Is that what you expect it to be?

Are you expecting a certain number of columns?

There's going to be a number of columns, you should always check that end.

But then also within the data set,

there's going to be a certain numbers that you expect.

For example, if you have a number of subjects,

you wanna count the number of subjects or units in your analysis.

If every subject was measured three times, you wanna make sure that every subjects

got actually three measurements associated with it, right?

So there's all kinds of just ends that you can check within your data set and

kind of around your data set to make sure that everything is kind

of in structured in place, okay?

So, the next thing you wanna do is actually just look at your data.

And to me, the easiest way to look at the data to

determine if there are any problems is to make a plot.

So making a plot is useful in two ways.

The first way it's useful is for setting expectations about your data, okay?

So when you look at a plot, you get a sense of kinda how the variables

are related to each other if you make a scatter plot.

Now if you make a box plot,

you can look at the distributions of the variables to see whether they're skewed.

Are there positive and negative values,

are you expecting positive and negative values, things like that?

So plots can very quickly reveal this kind of information in a way that often,

tables cannot.

Because one of the things that plots give you

is they give you a summary plus a deviation.

And very often, tables will only give you the summary.

So for example, they give you the mean or the median.

But a plot will allow you to visualize both the mean and

the deviations from the mean.

And so you'll be able to see if there are very large deviations that are perhaps

unexpected.

Or there are kind of values that, for example, negative values or maybe they're

positive values that you weren't expecting that don't appear correct.

All right, so I think a plot is very important to make.

Not that there is no role for tables in data analysis, but a plot has

a unique ability, in my opinion, to show you both what to expect and

what not to expect in the sense of what the deviations are from that expectation.

So look at the data and make a plot.