I'm going to summarize how we got to the resolution, and hopefully provide some context to help you dig deeper into your follow-up questions.
For the purposes of quick understanding, it's reasonable to consider a BAM file as a list of sequence alignments, where each alignment is composed of a single line which has exhaustive detail about the nature of the alignment to the reference. For more information, read the SAM specification. It's not long and it will provide a lot of information as to what's going on.
A BAM file is simply the compressed, non-human-readable (binary) form of a SAM file. One of the most common ways to interrogate BAM files is a very powerful command-line program called samtools.
You weren't seeing any alignments at first because you weren't zoomed in far enough. The genome is HUGE compared to the length of a typical Illumina read (seven orders of magnitude larger), so unless you zoom in really far the relative length of an alignment is going to be smaller than a single pixel.
BUT! It's definitely possible that there are large regions of the genome which have poor coverage, especially if the dataset is from exome or RNA-Seq, or is WGS with lousy coverage. So just zooming in randomly, you wouldn't necessarily expect to see alignments appear.
What I asked you to do was use samtools to pull out the first few reads from your sorted BAM file. Since the BAM file was sorted, the alignments start at the beginning of chromosome 1. In the screenshot, I saw that you had some reads around base 14,000 of chromosome 1, so I asked you to jump to that area in IGV.
IGV can also search based on gene names, but remember that gene names differ based on the reference used and the entity which is cataloguing the genes. Another approach would be to use a resource like gene metabase like GeneCards to search for your gene, find a consensus start site, and then search based on that coordinate in IGV. For your SPP1 gene, it looks like the start site is at chromosome 4, base 88,896,802, so if you type chr4:88896802
in IGV it will take you to that region of the genome.
Similarly, you can use samtools to query for alignments in a BAM file. If you wanted to find all of the alignments within the consensus start/end region of SPP1, you could type something like this:
samtools view [bamfile] chr4:88896802-88904563
That will give you all of the alignments in human-readable form. You can pipe these to a text file if you want, or use perl/python/sed/awk to pull out specific details.
Are you sure the BAM contains hg19 alignments? That index seems way too small.
It is mentioned in the method part of their paper that they mapped using hg19 build.
Make sure you are zoomed in enough (do you see text that says: zoom in to see alignments?)
Yes I'm zoomed in. I can see reference genes but nothing for my files! I wanted to upload a screenshot of my work on IGV here but dont know how?!
I wanted to use UCSC but that one also needs BAM + .bai file. In galaxy could not upload it noted that its >2G so cant be uploaded!
To post an image, upload it to imgur and then use that link here.
Can you please post the output of
samtools view -h [bamfile] | head -30
?Thanks.
Sorry, I don't have background of bioinformatic and linux. so I have to search for each command! can I post the screen shot?
IGV IMAGE
That one says "zoom in to see alignments". When in doubt, have a look at the header for the BAM file with samtools view and reindex.
Image of IGV :
Sorry, I don't have background of bioinformatic and linux. so I have to search for each command! can i post the screen shot?
Right, so zoom in.
Zoomed in but still nothing!
Keep zooming, you're not going to see alignments at that scale. Try 1kb or so.
I did but no difference!
Please post the output of
samtools view -h [bamfile] | head -30
That will show you the precise coordinates of the first few alignments contained in the BAM. It will also give us a look at the header of the BAM.
Once we have some example alignment coordinates from the BAM, we can then jump to that exact location in IGV and go from there. Posting that information will be a huge help for us.
Screen shot after running
samtools view -h [bamfile] | head -30
:Great, thank you.
Now, in IGV, in that text field at the top just to the left of the "home" button, type the following:
and click "Go"
What do you see?
Result:
different zoom in:
Cool! So problem resolved then?
Thank you for your help :)
what
chr1:14410
stand for? I searched but got confused a bit! :PI want to see the positive and negative control to check whether the data set is good . when I type and search for housekeeping genes or known dis-regulated genes or genes that is mentioned in the related paper (such as SPP1 or FN1 genes),again (the same problem) there is nothing for my BAM file but it can be find on reference gene. How I can search for different genes?
I only can see the reads when I search for
chr1:14410
,chr1:14401
orchr1:13507
that were in this outputSo according this visualization sorting of my BAM file and generating of index was correct?
Is it possible to save this output as .txt file for other chr?
Sorry for asking so many questions :)
Your explanation was really helpful :) Actually I did this, I tried to find the location of genes and search by location on chr. but non of them worked. I'll try again. Thanks
You can direct type the name of the gene in IGV.
And if you want to have an idea of the expression of different genes across several samples, you should use htseq-count to count the number of reads per gene, and then use a tool to assess differential expression like DESeq2 or edgeR. Don't forget that each sample is sequence at different depth so it's dangerous to compare raw coverage without normalization.