I'm trying to calculate the average length of DNA fragment and I'm having some trouble with the concepts of insert size and fragment size.
(The picture for visualization linked here).
I'm using the following command to extract ninth column of a SAM
file (the TLEN
column)
samtools view -f3 {file} | cut -f 9
The -f3
gives me paired reads mapped in proper pair (per here). My reads are paired, so I get negative values. I take absolute values of those values. The maximum of this array is 500, while the minimum is 89. However, my read length is 101.
How can my minimum value of TLEN
column be lesser than my read length?
Addendum to the question, does the TLEN column then represent the size of DNA fragment or do I have to remove all reads whose TLEN is greater than my read size to get a better estimate of the size of DNA fragment?