Hello all,
I am visualizing the alignment (bam) file in IGV. I can see the distribution of reads across all the exons in the area of interest. I want to see if particular reads aligned across all exons or just a few of them. Looking at a huge number of reads it looks difficult to manually look for it. I am not sure how the strategy should be to proceed with this.
Could someone help me here?
Thank you in advance.
Not clear why you are interested in particular reads. Can you clarify?
For example:
The gene I am interested in looking at has 6 exons. I want to see if read A aligned to exon 1,2,3,4,5,6 or just exon 1,2?
Are they tiny exons or are you looking for multi-mapping to more than one exon?
The exons are around 200,000.
I am interested in look at for one read if part of read mapped to exon 1 and part of read mapped to exon 2 and so on till exon 6.
Basically, I am looking at pattern of alignment across all the exons.
Right click on the alignments and look in the
Sort
andGroup alignments by
options to see if one of those help highlight these kind of reads.Thanks! But every exon has almost more than 50 reads matching. It looks every tedious to manually check for 6 exons. Is there any other way I can do it?
You can filter out spliced reads using answers here: Filtering out the spliced reads from bam file.
Thanks! I tried to filter out the reads that mapped to the area of interest using samtools and the coordinates of the region. However, in IGV i could see 19 reads aligned in that region but when I extracted those using samtools only 10 wherein the output.
Why would that happen?
Only 10 are spliced?
For region of interest say, 57,000,000-57,500,000
In IGV I can see 19 reads are aligned. However, when I try to extract those reads using
samtools view -h -q 10 out.bam "chrX:57,000,000-57,500,00" > gene.bam
I get only 10 reads.
I am little confused over the question but would like to help in someway.
You need to specify if the reads that were aligned were single end vs. paired end and the size.
If you are using illumina paired ends (which are the longest possible in illumina) how come even a paired end read cover 200,000 bp - you mentioned earlier comment tath exons are 200,000 BP. It is possible for paired end reads (max of 300 bp for 2 paired end x 150 bp read) to cover two distant exons but not all the part of exons. In you case its like 6 exons. How is it possible for one read pair to cover such a large number of exons?
You should have understood one thing when you pulled only 10 reads overlapping in
chrX:57,000,000-57,500,00
; that means the alignment in this region in very sparse.If you are using nanopore or pacbio reads than it is a different story. So, please add more detail to your question. Also add some plots from IGV to clarify the issue.
Reads are supposed to contain spliced segments. Exactly how, only @KNC_bioinfo knows, since we can't see the data.
I am sorry about that. I am using nanopore reads.
Important information. Should have been in the original post. What aligner did you use? Perhaps it does not use right tags for you to be able to pull out spliced reads.