0:01
Hi, I'm a Liliana Florea.
I am an assistant professor in the McKusick-Nathans Institute of
Genetic Medicine in the Johns Hopkins University School of Medicine.
I'm a competition biologist which means that I
design algorithms and methods to solve biological problems.
I also use other people's software and tools in order to
analyze genomic data which is what I'm going to show you how to do today.
So I will start the course "Command Line Tools for Genomic Data Science".
It is structured in
four different lectures and today we will start with the basic Unix commands.
Before going into lectures we'll look at sequence and
genome feature representation after which we will start looking at
tools for performing alignments and identifying sequence variation
and lastly we're going to get tools for doing transcript [inaudible] analysis.
So let's get started.
Think about the content that you have represented in your computer.
You have files and you have them organized in directories.
Directories can be themselves part of bigger directories.
So if you're looking at that and what I'm showing is what you can see on my Mac machine.
You have a tree structure.
At the very top is what we call the wood.
And then there are a number of directories, for instance,
here I have in the Macintosh hard drive I have directories,
applications, library system, and so on.
And these directories can have further subdirectories
such as Library under System or it can have files such as mach_kernel.
When you communicate with your computer you communicate via the operating system.
So this can be OS, for instance on the Mac,
or it can be Windows on your typical Windows machine or
it can be Unix which is the system that we're going to be learning about.
And you're doing so via an interpreter.
In a system such as Mac or in
Windows your files have certain types of fixed pages when you
click on a file your operating system will know automatically what kind
of program to use to open the file and to perform operations on it.
However, if you're thinking about it the range of operations
that you can perform on a particular file is quite limited.
And under some circumstances,
for instance for general analyses,
you might want to do your own set of operations
which is what Unix is for and that's what we're going to be doing.
I'm going to illustrate some of the basic Unix commands in
this lecture and in doing so I'm also
going to show that for a particular genomic application.
For our application let's consider that we have one directory that's called "Plants"
which contains genomic information on three current day plant systems.
However, they're on planet toy.
So we have toy apple,
toy pear, toy peach.
We also have two ancient plant systems,
let's say, toy agathis and toy araucaria.
What kind of information do we have?
We have genomes, we have annotation of genes and I'm assuming at this point that we have
taken a course on genes and we have the list of samples from each species.
So, with this information let's embark the point at looking
at the command line in Unix which will be the following section.