Hi All,
Background: I have completed adapter trimming and checked QC on Illumina NextSeq miRNA single end reads of length 75bp. I want to run umi_tools to extract the UMI information before I align the reads to the reference. I am unable to run umi_tools extract.
Command used:
umi_tools extract --stdin=XYZ_R1-trim.fastq.gz --bc-pattern=NNNNNNNNNNNN -L XYZ-extract.log --stdout=XYZ-UMIextracted.fastq.gz
I have a 12 bp UMI barcode here. I think the pattern could be the culprit here. I am new to this UMI analysis, could anyone please share their insight as to what is the mistake here? Error message seen on screen:
Traceback (most recent call last):
File "/home/xyz/.local/bin/umi_tools", line 11, in <module>
sys.exit(main())
File "/home/xyz/.local/lib/python2.7/site-packages/umi_tools/umi_tools.py", line 57, in main
module.main(sys.argv)
File "/home/xyz/.local/lib/python2.7/site-packages/umi_tools/extract.py", line 330, in main
new_read = ReadExtractor(read)
File "/home/xyz/.local/lib/python2.7/site-packages/umi_tools/umi_methods.py", line 971, in __call__
umi_values = self.getBarcodes(read1, read2)
File "/home/xyz/.local/lib/python2.7/site-packages/umi_tools/umi_methods.py", line 726, in _getBarcodesString
umi_quals = [bc_qual1[x] for x in self.umi_bases]
IndexError: string index out of range
Also, am I supposed to use whitelist command before extract? This is not single cell RNA data and hence I omitted that step.
Thank you.
Thank you so much Michael! You were right, it just went through if I did adapter trimming after UMI extraction.
Sometimes, a break from the usual routine is what that works for a dataset!
Thanks again, Rituriya.