Hi all,
I have read through many posts about insert size here. And see a very good answer about the insert size.
It is still not so clear for me to understand insert size. I hope some experts can make it clearer.
As illustrated in a good blog and a good anwser, the "insert size"=sequence between adapters (actually encompasses R1 and R2 as well as the unknown gap between them) and it is also known that the ninth column of the SAM file (TLEN) represents the insert size
However, here are some things I still don't understand.
First, in RNA seq data, if the alignments are spliced, and the TLEN reports the distance from the 5'-most to 3'-most position (if my understanding is right). So according to my understaning the TLEN number will include the possible introns which means the TLEN would be unsally longer than "actual insert size"?
Second, if we are mapping DNA sequences, then the fragment length and "insert size"/"template length" are the same?
Third, how Picard tools CollectInsertSizeMetrics actually do to calculate the insert size distribution of a paired-end library, does it only use the TLEN or exclude possible introns?
Any answer to help me better ubderstand this conception will be greatly appreciated.
This is the best illustration for this: A: What is the different between Read and Fragment in RNA-seq?
Yes, this is included in the background of my question...
My question is:
First, in RNA seq data, if the alignments are spliced, and the TLEN reports the distance from the 5'-most to 3'-most position (if my understanding is right). So according to my understaning the TLEN number will include the possible introns which means the TLEN would be unsally longer than "actual insert size"?
Second, if we are mapping DNA sequences, then the fragment length and "insert size"/"template length" are the same?
Third, how Picard tools CollectInsertSizeMetrics actually do to calculate the insert size distribution of a paired-end library, does it only use the TLEN or exclude possible introns?
Technically fragment length will never be equal to insert size (if you only consider size in bp) since fragment includes insert + Illumina adapters. If the DNA fragment does not contain a breakpoint/translocation then it would represent a contiguous stretch of DNA in genome.
I will let someone else tackle #1 and 3.
If you are interested in insert size calculation then use these directions (for BBMap tools).
Thank you for your advice, I will give it a try.