I'm mapping a set of paired end Illumina reads against a reference with Bowtie2.
bowtie2 -x reference -1 reads_1p.fastq -2 reads_2p.fastq -S aligned.sam -t --met-file metrics --no-unal -p 10
The fastq files have the seq_id line in the format:
@HWI-ST731_27:5:1110:2607:21757#4@0/1
with the /1 or /2 at the end indicating the paired read number.
The Bowtie2 output file (original sam and sorted bam) lists the reads id without the trailing /1 or /2 for most of the alignments but not all.
Why is that? Can I change tha behavior without changing the alignment and reporting parameters?
It seems to interfere with pysam.AlignmentFile.mate(), because I get an error indicating that a given read has no mate when this is not the fact. What does happen is that one mate has the mate number removed while the other not.
>samtools view aligned_s2.bam | grep 'HWI-ST539_132:4:2108:14057:10095'
HWI-ST539_132:4:2108:14057:10095#2@0/2 163 chr_2 1937 0 98M = 2052 146 TTGAGCATGAATGGGCATATGGCTGGATAAATAAGACTGGTAATCATCCTATGAACATAATCGTGATTAAGAGATAGAAATATGATTAGAAAGTAGGA ?@BDB??B>B<CDDHIJJJEACFHGIGIBHIHGBGGGEHG*?F?DFGF?DD99BDEG>FHBBBFGGIJJJGECHA>=A?;CFEC;CC;;ACEDCCCA@ AS:i:-40 XS:i:-40 XN:i:3 XM:i:11 XO:i:0 XG:i:0 NM:i:11 MD:Z:0N0N0N4G0T8A4A17C0C4A2A48 YS:i:-14 YT:Z:CP
HWI-ST539_132:4:2108:14057:10095#2@0 83 chr_2 2052 0 31M = 1937 -146 GGGTAGAAAGGTAGTTGGTGAGACAAACCAG :*>GFFED?HFBDAHE;A@DFHFDDDBDD@@ AS:i:-14 XS:i:-4 XN:i:0 XM:i:3 XO:i:0 XG:i:0 NM:i:3 MD:Z:0A12T9T7 YS:i:-40 YT:Z:CP
What is the meaning of the modification of the id of just one of the mates?
Thanks!
this is a normal behavior. The origin /1 or /2 is stored in the SAM flag.
Yep, but why when most reads have it removed some not? what that says about those reads/alignments?