Entering edit mode
24 months ago
shinyjj
▴
50
Hi biostars, I have a question about calculating the median read length from a bam file.
samtools view GTEX-1192X-0011-R10a-SM-DO941.bam | awk '{print length($10)}' | head -1000 | sort -u
Instead of the above command line, is it possible to get a median read length from a bam file?
Awesome. Can you explain what is 2304 and datamash is?
-F 2304
excludes unmapped reads, and datamash is a command line program that makes it easier to perform actions (like column medians) from data in tabular format.Great. Is it possible to calculate median length of total reads so that I can take account for unmapped + mapped?
Just remove that argument and run the same command.
No. It excludes supplementary and secondary reads.
This is what I get for being lazy and thinking I remember what it means instead of looking it up. Thanks for the correction.
Can someone explain the difference between unmapped reads and supplementary reads?
By looking at this discussion, it looks the below command gives the median read lenght of total reads. Can you confirm this, please?
The referenced command will output ALL reads in the BAM file, because there is no selection being applied. Supplementary reads are described here, and you might be interested in getting familiar with BAM flags by reading the specification or playing with a decoding utility.