Find 12-mer not mapping to the human genome
1
0
Entering edit mode
6.2 years ago
Ömer An ▴ 260

Hi,

I would like to find 12-mer short DNA sequences which do not map to the human genome and do not form self-mer as well. I want them to be as unique as possible. How can I get them? Which tool/software should I use?

mapping alignment sequencing sequence • 1.4k views
ADD COMMENT
2
Entering edit mode
6.2 years ago
5heikki 11k
  1. Fragment the human genome into 12-mers with e.g. jellyfish
  2. Generate all possible 12-mers with e.g. this
  3. Use comm to get lines unique to your all 12-mers file

I don't know what you mean by "not forming self-mer"..

ADD COMMENT
0
Entering edit mode

"Self-mer" in this context would mean homodimerisation.

The only 2 ways to really assess this would be to use primer design software (since I suspect this is a primer design question anyway) and do some thermal calculations. A brute force way would be to simply check for 'palindromicity'.

Simple way would be to use https://eu.idtdna.com/calc/analyzer/home/batch but you're limited to 200 sequences at a time. Maybe check out Primer3 for a commandline tool.

ADD REPLY
0
Entering edit mode

Both comments are very useful. I generated all possible 12-mers in R:

x = expand.grid(rep(list(c("A", "C", "T", "G")), 12))
write.delim(x, file = "12-mers_all_combinations.txt", col.names = F)

I was thinking to "map" them to the reference genome with BLAT and pick the unmapped ones, but fragmenting the genome also sounds a good idea.

Edit: I finished the analysis: I used jellyfish to fragment the human genome (hg19) for all 12-mers present, then used bash comm to get the difference between the two:

jellyfish count -m 12 -s 3G -t 10 hg19.fa > 12-mer_counts.jf
jellyfish dump 12-mer_counts.jf | grep -v "^>" > 12-mers_present_in_hg19.txt
comm -23 12-mers_all_combinations.txt  12-mers_present_in_hg19.txt > 12-mers_absent_in_hg19.txt

$ wc -l 12-mers_*
 16777216 12-mers_all_combinations.txt
 16609017 12-mers_present_in_hg19.txt
   168199 12-mers_absent_in_hg19.txt

Now let the biologist dig into 168199 candidates to pick the best primers 😄

ADD REPLY

Login before adding your answer.

Traffic: 1953 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6