Hello,
I have used SHRiMP for aligned SOLiD RNA-Seq data using the following commands.
$SHRIMP_FOLDER/bin/gmapper-cs -N 8 --single-best-mapping -E -L $SHRIMP_INDEX/genome-8gb-12_12_12seeds-1of6-cs 1of4.csfasta > 1of4.map.db1of6.sam
and finally
$SHRIMP_FOLDER/bin/mergesam --single-best-mapping --all-contigs < `cat *of4.csfasta` `cat *of4.map.db*of6.sam` > map.sam
I expect the resulting sam/bam file to contain only uniquely mapped reads. But when I perform
samtools view -bq 1 map.sam > map_bq1.bam
samtools view -bS map.sam > map.bam
and compare the "samtools flagstat" output for both the bam files, the map_bq1.bam gives me lesser number of reads than map.bam which is confusing because I would expect both to be the same.
Can you guys tell me why this difference occurs?
Thanks.
In my case this is a single-end not a paired-end data.
Can you check if there are reads in your map.bam file with mapping quality of zero. Because the only things differs between map.bam and map_bq1.bam is the view -bq1 command. Why dont you write a code that identifies reads present in map.bam but nor present in map_bq1.bam (You can do it for the first 2000 reads). It will make things clear.