BBMAP paired end alignment problem
1
0
Entering edit mode
6.1 years ago
mark.rose ▴ 50

I'm aligning 150bp paired end Illumina data to a reference with bbmap

 ~/BioInfo/bin/BBMap_38.06/bbmap/bbmap.sh \
                    pairedonly=t \
                    ambiguous=toss \
                    killbadpairs=t \
                    pairlen=1000 \
                    trimq=13 \
                    mintrimlength=100 \
                    minaveragequality=15 \
                    outm=SMX008C05-00067_S67_L001.aligned.sam \
                    outu=SMX008C05-00067_S67_L001.unaligned.sam \
                    showprogress=10000 \
                    scafstats=SMX008C05-00067_S67_L001.scafstats \ 
                    in=SMX008C05-00067_S67_L001_R1_001.500.fastq.gz \
                    in2=SMX008C05-00067_S67_L001_R2_001.500.fastq.gz \
                    ref=~/BioInfo/PROJECTS/10-18/MAIZE_B73_REF_5_GENOME \
                     path=~/BioInfo/PROJECTS/10-18/

Looking at the log data I see

Pairing data:           pct pairs       num pairs       pct bases          num bases

mated pairs:             49.8000%             249        49.8000%              74700
bad pairs:                0.0000%               0         0.0000%                  0
insert size avg:          311.07


Read 1 data:            pct reads       num reads       pct bases          num bases

mapped:                  70.8000%             354        70.8000%              53100
unambiguous:             49.8000%             249        49.8000%              37350
ambiguous:               21.0000%             105        21.0000%              15750
low-Q discards:           0.4000%               2         0.4000%                300

perfect best site:       37.6000%             188        37.6000%              28200
semiperfect site:        37.8000%             189        37.8000%              28350
rescued:                  1.4000%               7

Match Rate:                   NA               NA        98.5503%              36913
Error Rate:              13.6364%              57         1.4203%                532
Sub Rate:                13.3971%              56         0.8997%                337
Del Rate:                 1.4354%               6         0.2830%                106
Ins Rate:                 2.3923%              10         0.2376%                 89
N Rate:                   0.9569%               4         0.0294%                 11


Read 2 data:            pct reads       num reads       pct bases          num bases

mapped:                  67.8000%             339        67.8000%              50850
unambiguous:             49.8000%             249        49.8000%              37350
ambiguous:               18.0000%              90        18.0000%              13500
low-Q discards:           3.8000%              19         3.8000%               2850

perfect best site:       31.8000%             159        31.8000%              23850
semiperfect site:        32.2000%             161        32.2000%              24150
rescued:                  2.6000%              13

Match Rate:                   NA               NA        61.7362%              36817
Error Rate:              21.8905%              88        38.2420%              22806
Sub Rate:                21.6418%              87         0.6925%                413
Del Rate:                 1.2438%               5        37.3700%              22286
Ins Rate:                 1.9900%               8         0.1794%                107
N Rate:                   0.7463%               3         0.0218%                 13

This seems to indicate both read1 and read2 were aligned. But when I look at the output of aligned reads (SMX008C05-00067_S67_L001.aligned.sam) it contains only read1 alignments.

Any thoughts?

Thanks

BBMAP bbmap align paired-end • 2.4k views
ADD COMMENT
0
Entering edit mode

How are you looking at the alignments? Could you post the output of:

samtools view SMX008C05-00067_S67_L001.aligned.sam | head
ADD REPLY
0
Entering edit mode
  $ samtools view SMX008C05-00067_S67_L001.aligned.sam | head -5
K00333:82:HTWHKBBXX:1:1101:27844:3565 1:N:0:TGCAACCGGTCC        99      MAIZE_B73_REF_5_GENOME|10       41935338        10      3=1X3=3X3=2X2=3X1=6X1=1X1=1X2=2I1X3=2X2=2X7=1X1=1X1=53D2=2X4=1X85=    =       41935473        285     CATCCTAGGGATGAATTGGCTACACAGAAATCAGGCCACTGTCAGTTGTGACAAGAGGACAGTTAAGTTGGTGGAAGAACCCAAACAGATAGATTTTGCCCTAGTGCAAGAGTTCAAATAAGGGGGATAGAGTTCCCCACTGATTTAATA        AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ      NM:i:82 AM:i:10
        K00333:82:HTWHKBBXX:1:1101:27844:3565 1:N:0:TGCAACCGGTCC        147     MAIZE_B73_REF_5_GENOME|10       41935473        45      150= =41935338        -285    ACAGATAGATTTTGCCCTAGTGCAAGAGTTCAAATAAGGGGGATAGAGTTCCCCACTGATTTAATAGTCATGGGTAACCAGGATACGACTATAGATGTCATCCTAGGGATGAATTGGCTAACCAAGTATCAAGCCAGTCTCAGCTGTGAC        JJJJJJFFAJFJJJJJJJJJJJJJJJJJJJJFFJFFJJJJJJJFJJJJJAJJJJJJJJJJJJJJJJJJJJJJJJJJFJJFJ<JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFFFAA        NM:i:0  AM:i:45
        K00333:82:HTWHKBBXX:1:1101:5903:3600 1:N:0:TGCAACCGGTCC 99      MAIZE_B73_REF_5_GENOME|3        215866873       44      33=1X116=    =215867074       351     GGACGCAAGCGCACATCATCGTAGAAGCCTCAGGTCTAGTTTGGGTACCTAGTATTGACCATAATCCACTTGCATTGAGGTGGATTAGGGCGCAATTTAAACTAATTTACACTATAATTCACCTCAACACATGTGGATTGAGGTCAATAC        AAFFFJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJFJFJJJJJJJJJJJFJJJJJJJJJJJJJJJ        NM:i:1  AM:i:44
        K00333:82:HTWHKBBXX:1:1101:5903:3600 1:N:0:TGCAACCGGTCC 147     MAIZE_B73_REF_5_GENOME|3        215867074       42      77=1X6=1X65= =215866873       -351    GGTGTTTGGATCTCCTATCACATCGAAGAATCTTTGGATGCTAGTTAGAAATATTACATATGGTCATGAAACAAACTGTAAAAAAGAAAAATTTCTGGTTAGAAATATTAAATACAATCTAATTATAAAACTAATTATATGTATGAGAGC        JAFF<JJJ<A-<777AA-AJJJFJFAFJJJJFAFFFJF-F<JJJJJJJJJJJJJJJJJ7JJFJJFJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJFJJJJFJFFJJJAJJJJJJJJ<JJJJJJJJJJJJJJJFFFAA        NM:i:2  AM:i:42
        K00333:82:HTWHKBBXX:1:1101:31152:3600 1:N:0:TGCAACCGGTCC        83      MAIZE_B73_REF_5_GENOME|4        139757306       45      150= =139757077       -379    CACAGTCCGGTCCTCCGTCAGACCATCTGGCACCCTACTATAACACCCCGTCCACACCCGGACCGACAGTACTTACTCTTGGCAGCTCTCTAGGGCCATATATTGTCCCCACAGACTAACATGGGTCTTTTGTGCACACTTTGTCCTCAC        JFA<JAJJJFFJJFJF7FFJFFFJJJJJJFJJJJJFFJAFFFJJ<JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFAFAA        NM:i:0  AM:i:45

Also

$ samtools view SMX008C05-00067_S67_L001.aligned.sam | cut -f 1|cut -f 2 -d ' '|sort|uniq
    1:N:0:TGCAACCGGTCA
    1:N:0:TGCAACCGGTCC
    1:N:0:TNCAACCGGTCC
    1:N:0:TTCAACCGGTCC

Note I also tried stripping out all parameters other than ones directly related to input or file output and likewise only see read1 alignments

ADD REPLY
0
Entering edit mode
6.1 years ago
h.mon 35k

Indeed the name of your reads in the sam file is strange, as they (mistakenly) indicate all reads are read1. However, your file does contains both reads. Check the numbers 99 and 147 at [Decoding SAM flags][1]. For example, the two first reads have the same name, but the flags show they are in fact both reads from a pair:

K00333:82:HTWHKBBXX:1:1101:27844:3565 1:N:0:TGCAACCGGTCC        99
K00333:82:HTWHKBBXX:1:1101:27844:3565 1:N:0:TGCAACCGGTCC        147

BBMap seems to be doing two odd things: most mappers will output as read name the fastq header up until the first space, bbmap is including information after the first space. More worryingly, bbmap is replacing 2:N:0:TGCAACCGGTCC from the read2 with 1:N:0:TGCAACCGGTCC. I tested locally on my computer (BBTools 38.26) and it does the same as in your case. BWA, for example, outputs as read name only K00333:82:HTWHKBBXX:1:1101:27844:3565.

ADD COMMENT
0
Entering edit mode

Yes, it occurred to me ( in the middle of the night ;-) ) that I should check the bitflags. Thanks for taking a closer look too. I will keep this in mind.

ADD REPLY
1
Entering edit mode

mark.rose : You can add option trd=t to trim these descriptions at the first white space. I believe bbmap replaces 2: with 1: since the alignment reported is for rev comp of the read.

ADD REPLY
0
Entering edit mode

Hey, thanks for the tip

ADD REPLY

Login before adding your answer.

Traffic: 2010 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6