looking for alignment in IGV
1
1
Entering edit mode
7.0 years ago
KVC_bioinfo ▴ 600

Hello all,

I am visualizing the alignment (bam) file in IGV. I can see the distribution of reads across all the exons in the area of interest. I want to see if particular reads aligned across all exons or just a few of them. Looking at a huge number of reads it looks difficult to manually look for it. I am not sure how the strategy should be to proceed with this.

Could someone help me here?

Thank you in advance.

igv alignment • 2.7k views
ADD COMMENT
0
Entering edit mode

if particular reads aligned across all exons or just a few of them

Not clear why you are interested in particular reads. Can you clarify?

ADD REPLY
0
Entering edit mode

For example:

The gene I am interested in looking at has 6 exons. I want to see if read A aligned to exon 1,2,3,4,5,6 or just exon 1,2?

ADD REPLY
0
Entering edit mode

Are they tiny exons or are you looking for multi-mapping to more than one exon?

ADD REPLY
0
Entering edit mode

The exons are around 200,000.

I am interested in look at for one read if part of read mapped to exon 1 and part of read mapped to exon 2 and so on till exon 6.

Basically, I am looking at pattern of alignment across all the exons.

ADD REPLY
1
Entering edit mode

Right click on the alignments and look in the Sort and Group alignments by options to see if one of those help highlight these kind of reads.

ADD REPLY
0
Entering edit mode

Thanks! But every exon has almost more than 50 reads matching. It looks every tedious to manually check for 6 exons. Is there any other way I can do it?

ADD REPLY
1
Entering edit mode

You can filter out spliced reads using answers here: Filtering out the spliced reads from bam file.

ADD REPLY
0
Entering edit mode

Thanks! I tried to filter out the reads that mapped to the area of interest using samtools and the coordinates of the region. However, in IGV i could see 19 reads aligned in that region but when I extracted those using samtools only 10 wherein the output.

Why would that happen?

ADD REPLY
0
Entering edit mode

Only 10 are spliced?

ADD REPLY
0
Entering edit mode

For region of interest say, 57,000,000-57,500,000

In IGV I can see 19 reads are aligned. However, when I try to extract those reads using

samtools view -h -q 10 out.bam "chrX:57,000,000-57,500,00" > gene.bam

I get only 10 reads.

ADD REPLY
0
Entering edit mode

I am little confused over the question but would like to help in someway.

  • You need to specify if the reads that were aligned were single end vs. paired end and the size.

    • If you are using illumina paired ends (which are the longest possible in illumina) how come even a paired end read cover 200,000 bp - you mentioned earlier comment tath exons are 200,000 BP. It is possible for paired end reads (max of 300 bp for 2 paired end x 150 bp read) to cover two distant exons but not all the part of exons. In you case its like 6 exons. How is it possible for one read pair to cover such a large number of exons?

    • You should have understood one thing when you pulled only 10 reads overlapping in chrX:57,000,000-57,500,00 ; that means the alignment in this region in very sparse.

If you are using nanopore or pacbio reads than it is a different story. So, please add more detail to your question. Also add some plots from IGV to clarify the issue.

ADD REPLY
0
Entering edit mode

How is it possible for one read pair to cover such a large number of exons?

Reads are supposed to contain spliced segments. Exactly how, only @KNC_bioinfo knows, since we can't see the data.

ADD REPLY
0
Entering edit mode

I am sorry about that. I am using nanopore reads.

ADD REPLY
0
Entering edit mode

Important information. Should have been in the original post. What aligner did you use? Perhaps it does not use right tags for you to be able to pull out spliced reads.

ADD REPLY
0
Entering edit mode
7.0 years ago
Malcolm.Cook ★ 1.5k

Is sounds like you require IGV to visually indicate which RNASeq reads in a bam file are compatible with which transcript models that they overlap, and to what extent.

This is not a fully specified requirement, as it does not define "compatible" or "extent", and does not address issues such as:

  • alternate splicing yielding overlapping transcript models
  • over-lapping genes

Regardless, I believe IGV can not do this for you.

If you fully specify the requirement, you might be able to pre-compute "compatibility" for each read and add it as a user-defined custom FLAG in the sam/bam file, and then you can use IGV's popup-menu capability to color by the value of this attribute.

If you code in R, you may find the following abstraction and function useful: findCompatibleOverlaps

Depending on your application you will want to read:

ADD COMMENT

Login before adding your answer.

Traffic: 2296 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6