unique k-mers
1
1
Entering edit mode
2.0 years ago
Юлия ▴ 10

I have clusters of transposons. I need to find unique k-mers for each cluster. How can I do it?

F.e.:

 >RLX_02.01.01.99_LTR-scaffold_11709_001-v2_chrUn_830094_830205|RLX_02.01.01.99_LTR-scaffold_11709_001-v2_chrUn_830094_830205|3741
TGCAAATGGGGCTAAGAGCCCTGAAGAATAACCAATGGCGTCCATAACCCTGCCCAGGCCAAGCCGGAAGGGTAACCCTGCTAACGACGTCGATCTCAAAACCTGCTTAAAC

I take the length of k-mers equal to 23.

k-mers • 1.2k views
ADD COMMENT
0
Entering edit mode

What did you try so far?

ADD REPLY
1
Entering edit mode
2.0 years ago
bas1993 ▴ 60

You could write a script in python to loop though the sequence per kmer size and compare them to a list where you put all unique kmers you found.

sequence="TGCAAATGGGGCTAAGAGCCCTGAAGAATAACCAATGGCGTCCATAACCCTGCCCAGGCCAAGCCGGAAGGGTAACCCTGCTAACGACGTCGATCTCAAAACCTGCTTAAAC"
k=0
unique=[]
genome_length=len(sequence) - 23
while k < genome_length:
    kmer = sequence[k:k+23]
    if kmer in unique:
        k+=1
    else:
        unique.append(kmer)
        k+=1
print(unique)
ADD COMMENT
0
Entering edit mode

If I am not mistaken, your code are trying just to find a list of a unique k-mers. But we need to find unique k-mers for each clusters. By unique k-mers I mean such k-mers which occur in each (or almost each) sequence of a given cluster but is absent (or almost absent) in sequences of another cluster.

ADD REPLY
0
Entering edit mode

Then you could rewrite it so that the detected kmers are stored in a dictionary as keys and each time they occur that the value increases. Then you could look at each kmer that occured once and compare them between your clusters.

Maybe there is a tool for this, but I'm not aware of any.

ADD REPLY
0
Entering edit mode

You'd just need the Counter module for this.

ADD REPLY
0
Entering edit mode

when I tried this: k=0 unique=[] genome_length=len(sequence) - 23 while k < genome_length: kmer = sequence[k:k+23] if kmer in unique: k+=1 else: unique.append(kmer) k+=1 print(unique)

I got the list of kmers. Then when I applied this: def get_unique(in_list):

объявление пустого списка

unq_list = []

Итерация по списку

for x in in_list:

  # если значения x нету в unq_list то добавляем
  if x not in unq_list:
     unq_list.append(x)

вывод списка

for x in unq_list: print(x)

my_list = unique print("Уникальным значениями в списке {0} являются".format(my_list)) get_unique(my_list)

I just got the similar list : Уникальным значениями в списке ['TAGCAACCCTAGCCTCCGGCTAA', 'AGCAACCCTAGCCTCCGGCTAAG', 'GCAACCCTAGCCTCCGGCTAAGC', 'CAACCCTAGCCTCCGGCTAAGCT', 'AACCCTAGCCTCCGGCTAAGCTT', 'ACCCTAGCCTCCGGCTAAGCTTC', 'CCCTAGCCTCCGGCTAAGCTTCC', 'CCTAGCCTCCGGCTAAGCTTCCT', 'CTAGCCTCCGGCTAAGCTTCCTC', 'TAGCCTCCGGCTAAGCTTCCTCC', 'AGCCTCCGGCTAAGCTTCCTCCT', 'GCCTCCGGCTAAGCTTCCTCCTC', 'CCTCCGGCTAAGCTTCCTCCTCG', 'CTCCGGCTAAGCTTCCTCCTCGG', 'TCCGGCTAAGCTTCCTCCTCGGC', 'CCGGCTAAGCTTCCTCCTCGGCG', 'CGGCTAAGCTTCCTCCTCGGCGT', 'GGCTAAGCTTCCTCCTCGGCGTG', 'GCTAAGCTTCCTCCTCGGCGTGT', 'CTAAGCTTCCTCCTCGGCGTGTC', 'TAAGCTTCCTCCTCGGCGTGTCT', 'AAGCTTCCTCCTCGGCGTGTCTA', 'AGCTTCCTCCTCGGCGTGTCTAA', 'GCTTCCTCCTCGGCGTGTCTAAA', 'CTTCCTCCTCGGCGTGTCTAAAC', 'TTCCTCCTCGGCGTGTCTAAACC', 'TCCTCCTCGGCGTGTCTAAACCC', 'CCTCCTCGGCGTGTCTAAACCCT', 'CTCCTCGGCGTGTCTAAACCCTA', 'TCCTCGGCGTGTCTAAACCCTAG', 'CCTCGGCGTGTCTAAACCCTAGA', 'CTCGGCGTGTCTAAACCCTAGAT', 'TCGGCGTGTCTAAACCCTAGATC', 'CGGCGTGTCTAAACCCTAGATCG', 'GGCGTGTCTAAACCCTAGATCGT', 'GCGTGTCTAAACCCTAGATCGTC', 'CGTGTCTAAACCCTAGATCGTCG', 'GTGTCTAAACCCTAGATCGTCGA', 'TGTCTAAACCCTAGATCGTCGAG', 'GTCTAAACCCTAGATCGTCGAGG', 'TCTAAACCCTAGATCGTCGAGGA', 'CTAAACCCTAGATCGTCGAGGAA', 'TAAACCCTAGATCGTCGAGGAAC', 'AAACCCTAGATCGTCGAGGAACT', 'AACCCTAGATCGTCGAGGAACTC', 'ACCCTAGATCGTCGAGGAACTCT', 'CCCTAGATCGTCGAGGAACTCTC', 'CCTAGATCGTCGAGGAACTCTCT', 'CTAGATCGTCGAGGAACTCTCTC', .....

ADD REPLY

Login before adding your answer.

Traffic: 2904 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6