Entering edit mode
6.8 years ago
mark.rose
▴
50
Hi All
I'm having a problem with aligning Illumina PE reads with BBMap version 35.92.
Execution of
bbmap.sh overwrite=t ref=Z.fasta in=R1_001_val_1.100.fq in2=R2_001_val_2.100.fq minid=0.25 mappedonly=true out=out.sam outu=unaligned_reads.fa maxindel=100000
yields out.sam which contains only alignments of read1 even though the log info seems to suggest read2 reads were also aligned
Pairing data: pct reads num reads pct bases num bases
mated pairs: 100.0000% 100 107.4763% 59558
bad pairs: 0.0000% 0 0.0000% 0
insert size avg: 412.82
Read 1 data: pct reads num reads pct bases num bases
mapped: 100.0000% 100 100.0000% 29779
unambiguous: 100.0000% 100 100.0000% 29779
ambiguous: 0.0000% 0 0.0000% 0
low-Q discards: 0.0000% 0 0.0000% 0
perfect best site: 0.0000% 0 0.0000% 0
semiperfect site: 35.0000% 35 35.1758% 10475
rescued: 0.0000% 0
Match Rate: NA NA 88.8176% 26449
Error Rate: 65.0000% 65 4.4763% 1333
Sub Rate: 65.0000% 65 0.3627% 108
Del Rate: 0.0000% 0 0.0000% 0
Ins Rate: 49.0000% 49 4.1136% 1225
N Rate: 100.0000% 100 6.7061% 1997
Read 2 data: pct reads num reads pct bases num bases
mapped: 100.0000% 100 100.0000% 25636
unambiguous: 100.0000% 100 100.0000% 25636
ambiguous: 0.0000% 0 0.0000% 0
low-Q discards: 0.0000% 0 0.0000% 0
perfect best site: 0.0000% 0 0.0000% 0
semiperfect site: 26.0000% 26 25.3745% 6505
rescued: 2.0000% 2
Match Rate: NA NA 86.8232% 22258
Error Rate: 74.0000% 74 4.9852% 1278
Sub Rate: 74.0000% 74 0.8894% 228
Del Rate: 0.0000% 0 0.0000% 0
Ins Rate: 42.0000% 42 4.0958% 1050
N Rate: 100.0000% 100 8.1916% 2100
As a test I then tried
bbmap.sh overwrite=t ref=Z.fasta in=R2_001_val_2.100.fq minid=0.25 mappedonly=true out=out.sam outu=unaligned_reads.fa maxindel=100000
Here read2 reads were aligned.
Any ideas what is wrong with the first command line for the paired-end reads?
Thanks
Mark
What is the output of:
and
I also tried
I just noticed that the sam file has as many lines as it should given paired-end reads and that there are 2 lines for every read ID but they are both labeled, for instance, "M00645:16:000000000-B666D:1:1101:18619:5889 1:N:0:48", though they are different sequences. Going back to the fastq files I find the sequence for read1 in the read1 fastq file ( with, of course the ID above). When I look for read2 in the read2 file, I find the base ID above and the sequence is as it is in the sam file except it is reverse complemented.
It is as I suspected, bbmap removes the trailing /1 and /2, but keeps track of first read and second read using sam flags. However, I don't know why the samtools command is failing, which version of samtools are you using?
I checked the FLAG and you are indeed correct, one is read1 and the other read2 despite their QNAMEs. So I guess all is fine, if unexpected, in regard to the representation of paired-ends. Thanks for your help.
As for samtools, I'm using version 0.1.19-44428cd.
You really need to upgrade to the latest samtools (now v.1.7). That version is ancient and can only lead to other troubles.
mark.rose : If you have
samtools
available in your$PATH
you can directly create bam files by specifyingout=myfile.bam
when runningbbmap
.Thanks for the tip
Not that it matters for the original question, but with .sam files, it should be
samtools view -S -f 0x0040 out.sam
, or has that changed in a recent version?