Hello all,
In my NGS workflow for UMI based reads, I first tried identifying and removing sequence adapters using bbmerge and cutcadapt:
BBMERGE -Xmx1g -ignorejunk in1=SAMPLE_R1 in2=SAMPLE_R2 outa= adapters.fa itn
CUTADAPT -a forward_adapter -A reverse_adapter -o s_2_1_sequence_trimmed_UN.fastq.gz -p s_2_2_sequence_trimmed_UN.fastq.gz SAMPLE_R1 SAMPLE_R2
Then, I converted the trimmed fastq files to an unaligned bam:
java -Xmx4g -jar PICARD FastqToSam \
O=SAMPLE_PFX_UN_unal.bam \
F1=s_2_1_sequence_trimmed_UN.fastq.gz \
F2=s_2_2_sequence_trimmed_UN.fastq.gz \
SM=SAMPLE_PFX \
LB=null \
PU=NA \
PL=Illumina
and extracted the umi barcodes from these bams:
java -Xmx4g -jar FGBIO ExtractUmisFromBam \
--input=SAMPLE_PFX_UN_unal.bam \
--output=SAMPLE_PFX_UN_unal_umi_extracted.bam \
--read-structure=5M2S+T 5M2S+T \
--molecular-index-tags=ZA ZB \
--single-tag=RX
When I use ValidateSamFile on the output produced above, I see that there are 8 empty reads in the bam:
ERROR:EMPTY_READ 8
which errors out when I try to perform the next step of converting the unaligned bam back to fastq:
java -Xmx4g -jar PICARD SamToFastq \
I=SAMPLE_PFX_UN_unal_umi_extracted.bam \
F=SAMPLE_PFX_UN_unal_umi_extracted.fastq \
INTERLEAVE=true
The error seen:
Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation error: ERROR: Record XXXXXXX, Read name XXXXXXX, Zero-length read without FZ, CS or CQ tag
It appears that removal of sequence adapters and UMI barcodes has resulted in empty reads that cause the error. Has anyone experienced this and are there any workarounds?
Thank you
that makes sense, and worked, thank you!