Hi everyone,
I have the fastq files for some miRNA libraries prepared with the QIAseq miRNA Library Kit. I have to do the UMI extraction, but the problem is that the UMI is after a common sequence for all the reads, such as this:
NNNNNNNNNNNNNNNNNNNAACTGTAGGCACCATCAAT*XXXXXXXXXXXX*NNNNNNNNN
Where the N are the miRNA sequences, the bold part is the common sequence for all the reads and the part with all the X is the part with the UMI sequence.
How could I remove the bold part and append the UMI to the header of the fastq file? The problem is that I have seen that around 3-5% of the reads don't have the common sequence, I suppose that there are sequencing errors and some part of this sequence is changed in some reads, but I don't know how to accept one letter change in the common part.
Thank you very much!
For future visitors: While this question has been solved, QIAGEN makes a set of web based tools available (appear to be free as of this writing) called
GeneGlobe
(LINK).If you are not able to make use of
umi-tools
on command line then you can tryGeneGlobe
for analysis of QIAseq miRNA data. Handbook for QIAseq library kit has information on how to use.You've got two sets of Ns here - one at the start and one at the end. Are they both miRNA sequences? If not, is it the 3' or the 5' Ns that are the miRNA sequence?