Question

umi_tools extract error: IndexError: string index out of range

0

Entering edit mode

6.0 years ago

Rituriya ▴ 50

Hi All,

Background: I have completed adapter trimming and checked QC on Illumina NextSeq miRNA single end reads of length 75bp. I want to run umi_tools to extract the UMI information before I align the reads to the reference. I am unable to run umi_tools extract.

Command used:

umi_tools extract --stdin=XYZ_R1-trim.fastq.gz --bc-pattern=NNNNNNNNNNNN -L XYZ-extract.log --stdout=XYZ-UMIextracted.fastq.gz

I have a 12 bp UMI barcode here. I think the pattern could be the culprit here. I am new to this UMI analysis, could anyone please share their insight as to what is the mistake here? Error message seen on screen:

    Traceback (most recent call last):
  File "/home/xyz/.local/bin/umi_tools", line 11, in <module>
    sys.exit(main())
  File "/home/xyz/.local/lib/python2.7/site-packages/umi_tools/umi_tools.py", line 57, in main
    module.main(sys.argv)
  File "/home/xyz/.local/lib/python2.7/site-packages/umi_tools/extract.py", line 330, in main
    new_read = ReadExtractor(read)
  File "/home/xyz/.local/lib/python2.7/site-packages/umi_tools/umi_methods.py", line 971, in __call__
    umi_values = self.getBarcodes(read1, read2)
  File "/home/xyz/.local/lib/python2.7/site-packages/umi_tools/umi_methods.py", line 726, in _getBarcodesString
    umi_quals = [bc_qual1[x] for x in self.umi_bases]
IndexError: string index out of range

Also, am I supposed to use whitelist command before extract? This is not single cell RNA data and hence I omitted that step.

Thank you.

mirnaseq umi_tools qiaseq • 3.4k views

ADD COMMENT • link updated 5.0 years ago by markaldo • 0 • written 6.0 years ago by Rituriya ▴ 50

score 1 · Answer 1 · 2018-12-11

1

Entering edit mode

6.0 years ago

michael.ante ★ 3.9k

Hi Rituriya,

You named your input "trim"; are all reads long enough to extract the 12 bp sequence? Try to run the UMI extract before trimming.

Cheers,

Michael

ADD COMMENT • link 6.0 years ago by michael.ante ★ 3.9k

0

Entering edit mode

Thank you so much Michael! You were right, it just went through if I did adapter trimming after UMI extraction.

Sometimes, a break from the usual routine is what that works for a dataset!

Thanks again, Rituriya.

ADD REPLY • link 6.0 years ago by Rituriya ▴ 50

score 0 · Answer 2 · 2019-11-12

In Python, a string is a single-dimensional array of characters. The string index out of range means that the index you are trying to access does not exist. In a string, that means you're trying to get a character from the string at a given point. If that given point does not exist , then you will be trying to get a character that is not inside of the string. Indexes in Python programming start at 0. This means that the maximum index for any string will always be length-1 . There are several ways to account for this. Knowing the length of your string (using len() function)could certainly help you to avoid going over the index.