I want to do remove certain reads from my bam file and the output is quite funny ..
I tried trim out BAM files with samtools view and awk, but the size of the output BAM files is tripled. Thus, if I re-use this output bam file it says that the header is missing. For example when I do the following :
samtools view 10iPS-1.sorted.bam |
awk '
BEGIN {
dict[65]
dict[177]
}
$2 in dict' > 1IPS-BAM1-RQ.bam
I don't think that AWK is the best way to select only certain reads in a BAM files .. if you have a better method please let me know !!
Yeah, this is the answer. I bet you will find that your 1IPS-BAM1-RQ.bam is human readable, which means it's an uncompressed .sam file.
Note that samtools view might refuse to compress it back without headers, and the awk command will likely sttip those off. So you may have to add them before using samtools view to compress back to .bam