I have a question regarding the UMI handling in my sequencing data after using fastp.
I ran fastp to trim adapters and append a 6-base UMI to my reads using the following configuration. However, when I inspect the output FASTQ file, I observe a 12-base sequence in the UMI field. Here is an example of a read from the FASTQ file:
@VH00349:206:AACMHWCHV:1:1101:18610:1000:CGTGGT_AGAGCG 2:N:0:CAGATC
CTACCACGGCTCGTCGGACTGTAGAACTCTGAACGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAAGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
+
9I9IIIIII-II99IIIIIIIIIIIII-I9IIIII9I9I9II9IIIIIII9IIII-9IIII9I-III--9I9III-III-I9II9I9I9II999I9IIIIIIIIII9II
As you can see, the UMI in this case appears as CGTGGT_AGAGCG, which consists of 12 nucleotides (6 bases in two parts). However, I had expected to see only 6 bases based on the UMI length I specified during trimming.
Could you please help clarify why the UMI appears as 12 bases in this case and whether this is due to the fastp settings or the sequencing process?
Include the command you used for this operation.