umi_tools extract error: IndexError: string index out of range
2
0
Entering edit mode
6.0 years ago
Rituriya ▴ 50

Hi All,

Background: I have completed adapter trimming and checked QC on Illumina NextSeq miRNA single end reads of length 75bp. I want to run umi_tools to extract the UMI information before I align the reads to the reference. I am unable to run umi_tools extract.

Command used:

umi_tools extract --stdin=XYZ_R1-trim.fastq.gz --bc-pattern=NNNNNNNNNNNN -L XYZ-extract.log --stdout=XYZ-UMIextracted.fastq.gz

I have a 12 bp UMI barcode here. I think the pattern could be the culprit here. I am new to this UMI analysis, could anyone please share their insight as to what is the mistake here? Error message seen on screen:

    Traceback (most recent call last):
  File "/home/xyz/.local/bin/umi_tools", line 11, in <module>
    sys.exit(main())
  File "/home/xyz/.local/lib/python2.7/site-packages/umi_tools/umi_tools.py", line 57, in main
    module.main(sys.argv)
  File "/home/xyz/.local/lib/python2.7/site-packages/umi_tools/extract.py", line 330, in main
    new_read = ReadExtractor(read)
  File "/home/xyz/.local/lib/python2.7/site-packages/umi_tools/umi_methods.py", line 971, in __call__
    umi_values = self.getBarcodes(read1, read2)
  File "/home/xyz/.local/lib/python2.7/site-packages/umi_tools/umi_methods.py", line 726, in _getBarcodesString
    umi_quals = [bc_qual1[x] for x in self.umi_bases]
IndexError: string index out of range

Also, am I supposed to use whitelist command before extract? This is not single cell RNA data and hence I omitted that step.

Thank you.

mirnaseq umi_tools qiaseq • 3.4k views
ADD COMMENT
1
Entering edit mode
6.0 years ago
michael.ante ★ 3.9k

Hi Rituriya,

You named your input "trim"; are all reads long enough to extract the 12 bp sequence? Try to run the UMI extract before trimming.

Cheers,

Michael

ADD COMMENT
0
Entering edit mode

Thank you so much Michael! You were right, it just went through if I did adapter trimming after UMI extraction.

Sometimes, a break from the usual routine is what that works for a dataset!

Thanks again, Rituriya.

ADD REPLY
0
Entering edit mode
5.0 years ago
markaldo • 0

In Python, a string is a single-dimensional array of characters. The string index out of range means that the index you are trying to access does not exist. In a string, that means you're trying to get a character from the string at a given point. If that given point does not exist , then you will be trying to get a character that is not inside of the string. Indexes in Python programming start at 0. This means that the maximum index for any string will always be length-1 . There are several ways to account for this. Knowing the length of your string (using len() function)could certainly help you to avoid going over the index.

ADD COMMENT

Login before adding your answer.

Traffic: 2153 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6