Low read consensus with GroupReadsByUmi
0
0
Entering edit mode
3.0 years ago
a.palmer ▴ 20

Hi everyone,

I’m trying to troubleshoot a low read consensus rate in duplex sequencing data - by using GroupReadsByUmi and CallDuplexConsensusReads in the fgbio package.

For CallDuplexConsensusReads, I use:

java -jar fgbio-1.4.0.jar CallDuplexConsensusReads \
    --input=grouped.bam --output=consensus.unmapped.bam \
    --error-rate-pre-umi=15 --error-rate-post-umi=10 \
    --min-input-base-quality=10

But get:

[2021/12/06 17:33:10 | CallDuplexConsensusReads | Info] Raw Reads Filtered Due to Not Enough Reads (Either Total, AB, or BA): 20,787,008 (0.9964).

When looking at ‘grouped.bam’, I noticed the MI tag only referred to two consecutive sequences, presumably these are the paired reads. Also, each pair has the same read identity (MI= N/A& N/A, not N/A & N/B).

Example from grouped.bam:

A00133:442:HM5MFDSX2:3:2615:27823:2895  83      1       3050000 60      29S71M  =       3050000 -71     AATGTTTTGCCAGTTCTTTTAAATTCTCAACAGTAAAAGAAACAAATTTTAGAGGAACTAAAATATTGCAATCTCCATGCCATGGGAGAATAGAAATACA   FFFFF::FFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF          ZA:Z:CGG ZB:Z:AAT        MC:Z:24S71M50S  MD:Z:0N70       PG:Z:bwa        RG:Z:A  MI:Z:0/B        NM:i:1  MQ:i:60 UQ:i:37                                       AS:i:71  RX:Z:CGG-AAT

A00133:442:HM5MFDSX2:3:2615:27823:2895  163     1       3050000 60      24S71M50S       =       3050000 71      TTTGCCAGTTCTTTTAAATTCTCAACAGTAAAAGAAACAAATTTTAGAGGAACTAAAATATTGCAATCTCCATGCCATGGGAGAATAGAAATACATACCGAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGG      FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFF:FFFF:,FFFFF:FFFFFFFF:FFFFFFFFF      ZA:Z:CGG        ZB:Z:AAT                                      MC:Z:29S71M      MD:Z:0N70       PG:Z:bwa        RG:Z:A  MI:Z:0/B        NM:i:1  MQ:i:60 UQ:i:37 AS:i:72 RX:Z:CGG-AAT

A00133:442:HM5MFDSX2:3:1468:3821:33786  99      1       3050000 60      21S124M =       3050014 159     GCCAGTTCTTTTAAATTCTCAACAGTAAAAGAAACAAATTTTAGAGGAACTAAAATATTGCAATCTCCATGCCATGGGAGAATAGAAATACATCTATCAGAGATATTGTCAGGACCCGAAAGAAATGAAACCCAAAGTGGAGGTC      FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF:FFFFFFFFFFFFFFF:,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF      ZA:Z:CGA        ZB:Z:CTA       MC:Z:145M                              MD:Z:0N123       PG:Z:bwa        RG:Z:A  MI:Z:1/A        NM:i:1  MQ:i:60 UQ:i:37 AS:i:124        RX:Z:CGA-CTA

A00133:442:HM5MFDSX2:3:1468:3821:33786  147     1       3050014 60      145M    =       3050000 -159    AAATTTTAGAGGAACTAAAATATTGCAATCTCCATGCCATGGGAGAATAGAAATACATCTATCAGAGATATTGTCAGGACCCGAAAGAAATGAAACCCAAAGTGGAGGTCCACTAGTGAGTATAATGATGGCAGGTATTTTTGCT      FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF      ZA:Z:CGA        ZB:Z:CTA       MC:Z:21S124M                           MD:Z:145 PG:Z:bwa        RG:Z:A  MI:Z:1/A        NM:i:0  MQ:i:60 UQ:i:0  AS:i:145        RX:Z:CGA-CTA

I think this is why the consensus rate is so low (from a lack of reads with the same MI), though I’m not sure.

For reference, this is my GroupReadsByUmi code:

java -jar fgbio-1.4.0.jar GroupReadsByUmi \
    --input=mapped.bam --output=grouped.bam \
    --strategy=paired --edits=1 --min-map-q=1

There was suitable amplification in the original sequencing experiment, so there must be an ample number of reads for grouping by UMI.

Thanks for your help!

Alex

fgbio sequencing Duplex • 1.0k views
ADD COMMENT

Login before adding your answer.

Traffic: 2131 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6