Question

Clarification on UMI Length in FASTQ File after Trimming with fastp

0

Entering edit mode

11 weeks ago

daffodil ▴ 10

I have a question regarding the UMI handling in my sequencing data after using fastp.

I ran fastp to trim adapters and append a 6-base UMI to my reads using the following configuration. However, when I inspect the output FASTQ file, I observe a 12-base sequence in the UMI field. Here is an example of a read from the FASTQ file:

@VH00349:206:AACMHWCHV:1:1101:18610:1000:CGTGGT_AGAGCG 2:N:0:CAGATC
CTACCACGGCTCGTCGGACTGTAGAACTCTGAACGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
+
9I9IIIIII-II99IIIIIIIIIIIII-I9IIIII9I9I9II9IIIIIII9IIII-9IIII9I-III--9I9III-III-I9II9I9I9II999I9IIIIIIIIII9II

As you can see, the UMI in this case appears as CGTGGT_AGAGCG, which consists of 12 nucleotides (6 bases in two parts). However, I had expected to see only 6 bases based on the UMI length I specified during trimming.

Could you please help clarify why the UMI appears as 12 bases in this case and whether this is due to the fastp settings or the sequencing process?

umi adapter • 292 views

ADD COMMENT • link updated 11 weeks ago by GenoMax 148k • written 11 weeks ago by daffodil ▴ 10

0

Entering edit mode

Include the command you used for this operation.

ADD REPLY • link 11 weeks ago by GenoMax 148k

0

Entering edit mode

for sample in "${samples[@]}"; do
  echo "Running fastp for $sample"

  fastp \
    -i ${sample}_R1.fastq.gz \
    -I ${sample}_R2.fastq.gz \
    --umi \
    --umi_loc=per_read \
    --umi_len=6 \
    -o ${sample}_R1_trimmed.fastq.gz \
    -O ${sample}_R2_trimmed.fastq.gz \
    --html ${sample}_fastp.html \
    --json ${sample}_fastp.json \
    -w 40
done

ADD REPLY • link updated 11 weeks ago by GenoMax 148k • written 11 weeks ago by daffodil ▴ 10