I loaded a BAM (RNA-seq) file into IGV but cant see anything!
2
3
Entering edit mode
9.9 years ago
jyalda2 ▴ 30

Hello everyone,

I have RNA-seq data set that I downloaded the BAM file from GEO. I loaded BAM file into IGV but could not see any thing in IGV! I don't know whats wrong! Is my BAM file sorted? or is the created .bai file the correct one and indexed fully?! could some one help me, please?

BAM file that I downloaded from GEO is about 5GB, after loading from my local file to remote file in ssh the size is about 4.5GB

Steps that I followed:

  1. I sorted BAM file using samtools sort file_name.bam file_name.sorted in Bitvise SSH

    sorted file size: about 4.5GB

  2. made index using samtools index file name.sorted.bam

    I got the .bai file quickly with size about 13.5KB!

Then transferred all sorted,bai file into the same place that I saved BAM file on my PC and tried to visualize with IGV (Hg19)

Thank you very much for your help in advance.

Best regards

rna-seq IGV RNA-seq BAM Visualization BAM index • 35k views
ADD COMMENT
1
Entering edit mode

Are you sure the BAM contains hg19 alignments? That index seems way too small.

ADD REPLY
0
Entering edit mode

It is mentioned in the method part of their paper that they mapped using hg19 build.

ADD REPLY
1
Entering edit mode

Make sure you are zoomed in enough (do you see text that says: zoom in to see alignments?)

ADD REPLY
0
Entering edit mode

Yes I'm zoomed in. I can see reference genes but nothing for my files! I wanted to upload a screenshot of my work on IGV here but dont know how?!

I wanted to use UCSC but that one also needs BAM + .bai file. In galaxy could not upload it noted that its >2G so cant be uploaded!

ADD REPLY
0
Entering edit mode

To post an image, upload it to imgur and then use that link here.

Can you please post the output of samtools view -h [bamfile] | head -30?

ADD REPLY
0
Entering edit mode

Thanks.

Sorry, I don't have background of bioinformatic and linux. so I have to search for each command! can I post the screen shot?

ADD REPLY
0
Entering edit mode

IGV IMAGE

ADD REPLY
0
Entering edit mode

That one says "zoom in to see alignments". When in doubt, have a look at the header for the BAM file with samtools view and reindex.

ADD REPLY
0
Entering edit mode

Image of IGV :

Sorry, I don't have background of bioinformatic and linux. so I have to search for each command! can i post the screen shot?

ADD REPLY
0
Entering edit mode

Right, so zoom in.

ADD REPLY
0
Entering edit mode

Zoomed in but still nothing!

ADD REPLY
0
Entering edit mode

Keep zooming, you're not going to see alignments at that scale. Try 1kb or so.

ADD REPLY
0
Entering edit mode

I did but no difference!

ADD REPLY
0
Entering edit mode

Please post the output of samtools view -h [bamfile] | head -30

That will show you the precise coordinates of the first few alignments contained in the BAM. It will also give us a look at the header of the BAM.

Once we have some example alignment coordinates from the BAM, we can then jump to that exact location in IGV and go from there. Posting that information will be a huge help for us.

ADD REPLY
0
Entering edit mode

Screen shot after running samtools view -h [bamfile] | head -30 :

ADD REPLY
0
Entering edit mode

Great, thank you.

Now, in IGV, in that text field at the top just to the left of the "home" button, type the following:

chr1:14410

and click "Go"

What do you see?

ADD REPLY
0
Entering edit mode

Result:

Result

different zoom in:



ADD REPLY
0
Entering edit mode

Cool! So problem resolved then?

ADD REPLY
0
Entering edit mode

Thank you for your help :)

what chr1:14410 stand for? I searched but got confused a bit! :P

I want to see the positive and negative control to check whether the data set is good . when I type and search for housekeeping genes or known dis-regulated genes or genes that is mentioned in the related paper (such as SPP1 or FN1 genes),again (the same problem) there is nothing for my BAM file but it can be find on reference gene. How I can search for different genes?

I only can see the reads when I search for chr1:14410, chr1:14401 or chr1:13507 that were in this output

output

So according this visualization sorting of my BAM file and generating of index was correct?

Is it possible to save this output as .txt file for other chr?

Sorry for asking so many questions :)

ADD REPLY
0
Entering edit mode

Your explanation was really helpful :) Actually I did this, I tried to find the location of genes and search by location on chr. but non of them worked. I'll try again. Thanks

ADD REPLY
2
Entering edit mode

You can direct type the name of the gene in IGV.

ADD REPLY
0
Entering edit mode

And if you want to have an idea of the expression of different genes across several samples, you should use htseq-count to count the number of reads per gene, and then use a tool to assess differential expression like DESeq2 or edgeR. Don't forget that each sample is sequence at different depth so it's dangerous to compare raw coverage without normalization.

ADD REPLY
21
Entering edit mode
9.9 years ago
Dan D 7.4k

I'm going to summarize how we got to the resolution, and hopefully provide some context to help you dig deeper into your follow-up questions.

For the purposes of quick understanding, it's reasonable to consider a BAM file as a list of sequence alignments, where each alignment is composed of a single line which has exhaustive detail about the nature of the alignment to the reference. For more information, read the SAM specification. It's not long and it will provide a lot of information as to what's going on.

A BAM file is simply the compressed, non-human-readable (binary) form of a SAM file. One of the most common ways to interrogate BAM files is a very powerful command-line program called samtools.

You weren't seeing any alignments at first because you weren't zoomed in far enough. The genome is HUGE compared to the length of a typical Illumina read (seven orders of magnitude larger), so unless you zoom in really far the relative length of an alignment is going to be smaller than a single pixel.

BUT! It's definitely possible that there are large regions of the genome which have poor coverage, especially if the dataset is from exome or RNA-Seq, or is WGS with lousy coverage. So just zooming in randomly, you wouldn't necessarily expect to see alignments appear.

What I asked you to do was use samtools to pull out the first few reads from your sorted BAM file. Since the BAM file was sorted, the alignments start at the beginning of chromosome 1. In the screenshot, I saw that you had some reads around base 14,000 of chromosome 1, so I asked you to jump to that area in IGV.

IGV can also search based on gene names, but remember that gene names differ based on the reference used and the entity which is cataloguing the genes. Another approach would be to use a resource like gene metabase like GeneCards to search for your gene, find a consensus start site, and then search based on that coordinate in IGV. For your SPP1 gene, it looks like the start site is at chromosome 4, base 88,896,802, so if you type chr4:88896802 in IGV it will take you to that region of the genome.

Similarly, you can use samtools to query for alignments in a BAM file. If you wanted to find all of the alignments within the consensus start/end region of SPP1, you could type something like this:

samtools view [bamfile] chr4:88896802-88904563

That will give you all of the alignments in human-readable form. You can pipe these to a text file if you want, or use perl/python/sed/awk to pull out specific details.

ADD COMMENT
0
Entering edit mode

Your explanation was really helpful :) Actually I did this, I tried to find the location of genes and search by location on chr. but non of them worked. I'll try again. Thanks

ADD REPLY

Login before adding your answer.

Traffic: 964 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6