Hello,
Three sources suggest different read lengths -- how can I definitively find the read length?
- A colleague suggested that this bam file has reads 40 base pairs long,
- samtools view seems to suggest the reads are 44 base pairs long, and
- Tablet seems to suggest most reads are 40 base pairs long but shows a couple less than 40 (e.g. 36, 38, and 39) base pairs long.
For samtools, I'm running the following command, based off of How Can I Know The Length Of Mapped Reads From Bam File?:
$ samtools view foo.bam | head -n 1000000 | cut -f 10 | perl -ne 'chomp;print length($_) . "\n"' | sort | uniq -c
1000000 44
$ samtools view foo.bam | head -n 1
HWI-ST673_0087:1:1103:11279:92479#NNNNNN/1 0 chrM 1 255 4M4I36M * 0 0 GATGGATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCT bbbeeaceggggfeghiiiiiiiiiiihiiihiiiiihfhhhhi AS:i:-23 XN:i:0 XM:i:1 XO:i:1 XG:i:4 NM:i:5 MD:Z:3C36 YT:Z:UU
With tablet, I see the following:
I assume that I'm just misinterpreting the output. I don't have any formal background in bioinformatics (I am a software developer). If anyone could help set me straight here, I'd appreciate it, or maybe suggest some good relevant reading. Do bam files always have uniform read lengths?
Thanks for the help!
Thanks for the explanation. It turns out I was stupidly looking at the wrong file in Tablet!
But your answer and the others helped me better understand samtools view, cigar, and what to expect with regards to uniform read length -- thank you!