Splice-aware insert size for RNA-seq
2
0
Entering edit mode
10.0 years ago
shuelga ▴ 20

Usually for DNA we map PE data to the genome and then use picard's CollectInsertSizeMetrics to give a insert size distribution. For RNA PE data we map to the genome with STAR and then want to look at insert size. I don't think that CollectInsertSizeMetrics will work, since it will think that spliced reads actually have the entire intron as the insert, thus artificially increasing the insert size. Is there a splicing aware tool that will calculate the actual insert size after mapping to the genome? I realize I can also map to the transcriptome and CollectInsertSizeMetrics should work, but I'm wondering if there is an alternate to having to do both mappings.

alignment genome RNA-Seq • 4.0k views
ADD COMMENT
0
Entering edit mode

You can filter out spliced reads from your bam file and use the non-spliced reads to calculate the insert size. See the post: Samtools Filter Reads Cigar Field

ADD REPLY
1
Entering edit mode
10.0 years ago

BBMap will correctly calculate the insert size of spliced reads (and output them as a histogram with the "ihist=file" flag). However, it will only be correct for reads in which a splice site is seen within a read, so not if the intron lies in the unsequenced middle area.

You can also generate an alignment-free insert-size histogram with BBMerge, if the inserts are short enough so that the reads overlap. Again, this uses the "ihist" flag.

Both are part of BBTools.

ADD COMMENT
0
Entering edit mode
10.0 years ago

Aligning to transcriptome would be the best way to calculate the fragment length.

But in RNA-SEQ protocols,in general, there will be no gel-cutting or size-selection steps. hence, for DNA-SEQ (PE or MP), the insert sizes will be normally distribute histogram, where as in RNA-SEQ, the distribution is skewed.

ADD COMMENT

Login before adding your answer.

Traffic: 2303 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6