Can anyone help.
I am trying to use fastp to dedup using the UMI. There is an explanation here Tutorial:Use fastp to preprocess FASTQ data with unique molecular identifer (UMI) integrated but it only gives an example if the UMI is at the head of read1.
What does the manual mean by 'the first/second index is used as UMI'. Could someone give an example how we would use this with the index1/2 option for --umi?
Additionally, does anyone know how to extract the UMI if it is on a seperate 'index' file?
That is referring to using Illumina indexes as source of UMI. Extracting UMI and deduplicating them are going to be two separate operations.
Do you mean a separate file for the index read?
No I think this simply means that the UMI will be taken from the index sequence. I am not sure if
fastp
will take the sequence of index from fastq header or it will require a separate file with index reads. You should be able to test that with a small dataset easily.