Dear All,
I performed paired end mapping of transcriptome reads (genrated on illumina platform) by bowtie2. after mapping I used Samtools for calculation of mapping statistics by using samtools flagstat. and output is
68714536 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
54522137 + 0 mapped (79.35%:-nan%)
68714536 + 0 paired in sequencing
34357268 + 0 read1
34357268 + 0 read2
44630360 + 0 properly paired (64.95%:-nan%)
46641928 + 0 with itself and mate mapped
7880209 + 0 singletons (11.47%:-nan%)
168844 + 0 with mate mapped to a different chr
145423 + 0 with mate mapped to a different chr (mapQ>=5)
How to interpret this mapping output.
Apart from this, I want to filter only those mapped reads which are paired properly. For that purpose I used samtools again
samtools view -f 1 -F 12 <bamfile> > pairedendmapps.bam
But I am not sure whether it working correctly or not.
Waiting for your valuable comments
Thank you Sir for your kind reply
confusion:
46641928 + 0 with itself and mate mapped
168844 + 0 with mate mapped to a different chr
145423 + 0 with mate mapped to a different chr (mapQ>=5)
how to interpret these lines .
The first number is the number of concordant and discordant mappings. The other two are looking at a subtype of discordant mappings. Normally we expect fragments that we sequence to contain sequence from a single chromosome. In some cases (e.g., in cancer), there may be chromosomal rearrangements, causing many reads to span multiple chromosomes. This can also happen due to artifacts introduced in library prep. and/or mapping. These last two numbers are giving you information on that. So you have 16884 reads where the mates mapped to different chromosomes. One possibility is that this is just due to one mate of the pair being poorly mapped, meaning that it would have a low MAPQ score. So the second one is giving you information about the number with a somewhat decent MAPQ score. In short, the 168844 number isn't due to poor mapping.
A great answer definitely. I have a problem about the procedure after filter. I used the command
samtools view -hf 0x2 some_file
for the sam file generated by bwa-mem to get only properly paired alignments for further analysis. After using the samtools sort to sort the bam file, am error occurred when I used samtools view to check the sorted bam file.And this bam file cannot be used for index either. I guess it is probably because the header of filtered sam file is still same as the header of the original sam file? Looking forward to your reply. Much appreciate.