UMI workflow resulting in bams with empty reads
1
0
Entering edit mode
9 months ago
ww22runner ▴ 60

Hello all,

In my NGS workflow for UMI based reads, I first tried identifying and removing sequence adapters using bbmerge and cutcadapt:

BBMERGE -Xmx1g -ignorejunk in1=SAMPLE_R1  in2=SAMPLE_R2 outa= adapters.fa itn
CUTADAPT -a forward_adapter -A reverse_adapter -o s_2_1_sequence_trimmed_UN.fastq.gz -p s_2_2_sequence_trimmed_UN.fastq.gz SAMPLE_R1 SAMPLE_R2

Then, I converted the trimmed fastq files to an unaligned bam:

java -Xmx4g -jar PICARD FastqToSam \
    O=SAMPLE_PFX_UN_unal.bam \
    F1=s_2_1_sequence_trimmed_UN.fastq.gz \
    F2=s_2_2_sequence_trimmed_UN.fastq.gz \
    SM=SAMPLE_PFX  \
    LB=null \
    PU=NA \
    PL=Illumina

and extracted the umi barcodes from these bams:

java -Xmx4g -jar FGBIO ExtractUmisFromBam \
    --input=SAMPLE_PFX_UN_unal.bam \
    --output=SAMPLE_PFX_UN_unal_umi_extracted.bam \
    --read-structure=5M2S+T 5M2S+T \
    --molecular-index-tags=ZA ZB \
    --single-tag=RX

When I use ValidateSamFile on the output produced above, I see that there are 8 empty reads in the bam:

ERROR:EMPTY_READ        8

which errors out when I try to perform the next step of converting the unaligned bam back to fastq:

  java -Xmx4g -jar PICARD SamToFastq \
        I=SAMPLE_PFX_UN_unal_umi_extracted.bam   \
        F=SAMPLE_PFX_UN_unal_umi_extracted.fastq \
        INTERLEAVE=true

The error seen:

Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation error: ERROR: Record XXXXXXX, Read name XXXXXXX, Zero-length read without FZ, CS or CQ tag

It appears that removal of sequence adapters and UMI barcodes has resulted in empty reads that cause the error. Has anyone experienced this and are there any workarounds?

Thank you

umi • 667 views
ADD COMMENT
2
Entering edit mode
9 months ago

I don't know what the structure of your reads are - lots of UMI workflows lead to empty fastqs for one read or the other by design, as one read contains the barcdoes and nothing else.

However, whenever you trim adaptors, you must always be aware that you might create empty reads as they wer ebasically all adaptor. You can use the -m/--minimum-length switch to cutadapt to filter pairs with a read like this out. I would set it to filter out anything the same length as you UMI or shorter.

ADD COMMENT
0
Entering edit mode

that makes sense, and worked, thank you!

ADD REPLY

Login before adding your answer.

Traffic: 1229 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6