Entering edit mode
4.5 years ago
from the mountains
▴
250
i want to extract all paired reads from a bam file but for some reason i'm getting a much smaller file than expected.
$ samtools flagstat in.bam
56631731 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
417735 + 0 supplementary
0 + 0 duplicates
56631731 + 0 mapped (100.00% : N/A)
56213996 + 0 paired in sequencing
28112924 + 0 read1
28101072 + 0 read2
55820773 + 0 properly paired (99.30% : N/A)
56213996 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
380483 + 0 with mate mapped to a different chr
380483 + 0 with mate mapped to a different chr (mapQ>=5)
flagstat
indicates to me that i should get 55820773 paired reads, or about 28M pairs of reads. Here is my command to extract:
$ samtools fastq -1 paired1.fastq.gz -2 paired2.fastq.gz -0 /dev/null -s /dev/null -n in.bam
$ zcat paired1.fastq.gz | wc -l
2885736
There are only about 700,000 reads in the first fastq file. I am trying to get rid of singletons. by the way the bam was generated by STAR, and subsequently sorted, indexed and filtered by samtools before using samtools fastq
.
that ended up being the solution. thanks for your expertise--this kind of thing always takes like 45 minutes of my time to figure out and somehow i never retain the information i've learned.
That is why biostars is here. If you don't do this regularly there is no way to retain this sort of info :-)
Accept the answer (green check mark) to provide closure to the thread.
thanks, i did not know what that button was for!