get error in kmer counting in python 3
0
0
Entering edit mode
4.0 years ago
anasjamshed ▴ 140

I want to write a program that

  1. calculates the number of all kmers of a given length across all DNA sequences
  2. displays just the ones that occur more than a given a number of times.

I have tried this script:

import os
import sys
import shutil

# convert command line arguments to variables
kmer_size = int(sys.argv[1])
print(kmer_size)
count_cutoff = int(sys.argv[2])

# define the function to split dna
def split_dna(dna, kmer_size):
    kmers = []
    for start in range(0,len(dna)-kmer_size-1,1):
        kmer = dna[start:start+kmer_size]
        kmers.append(kmer)
    return kmers

# create an empty dictionary to hold the counts
kmer_counts = {}

# process each file with the right name
for file_name in os.listdir("."):
    if file_name.endswith(".fastq"):
        dna_file = open(file_name)

        # process each DNA sequence in a file
        for line in dna_file:
            dna = line.rstrip("\n")

            # increase the count for each k-mer that we find
            for kmer in split_dna(dna, kmer_size):
                current_count = kmer_counts.get(kmer, 0)
                new_count = current_count + 1
                kmer_counts[kmer] = new_count

# print k-mers whose counts are above the cutoff
for kmer, count in kmer_counts.items():
    if count > count_cutoff:
        print(kmer + " : " + str(count))

But it gives an error:

ValueError                                Traceback (most recent call last)
<ipython-input-42-02b791e42fca> in <module>()
      4 
      5 # convert command line arguments to variables
----> 6 kmer_size = int(sys.argv[1])
      7 print(kmer_size)
      8 count_cutoff = int(sys.argv[2])

ValueError: invalid literal for int() with base 10: '-f'

I have been trying from last 3 months I don't know he can I execute it? I can't change the type of any variable

Kindly help me

kmer python DNA FASTA • 1.9k views
ADD COMMENT
0
Entering edit mode

It looks like you're using ipython notebook - correct me if that's not the case.

How are you executing the script?

ADD REPLY
0
Entering edit mode

yes I am using python notebook

ADD REPLY
0
Entering edit mode

Save the script as a .py file and try executing it from the command line. It looks like you're not passing in command line arguments properly through the ipython interface.

ADD REPLY
0
Entering edit mode

how can I run through command line after creating .py file

ADD REPLY
0
Entering edit mode

Please google "run python file on command line"

ADD REPLY
0
Entering edit mode

What arguments are you passing to this script? As the error notes, -f is not an integer (which would be your kmer size).

ADD REPLY
0
Entering edit mode

yes it's not integer I know but I also tried to do by taking string but it also gimme error

ADD REPLY
0
Entering edit mode

There are a number of threads on Biostars about enumerating kmers. You should be able to adapt one of those fairly trivially:

How to count and compare k-mer count vectors (and print the top ten highest contributions)?

Finding 16 mer not present in GRCh38

A: Randomly sample motifs from reference sequence

See also the linked gist

ADD REPLY

Login before adding your answer.

Traffic: 2696 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6