Finding motifs across a genome
1
0
Entering edit mode
6.6 years ago
ioannis ▴ 50

Hi community,

I would like to find the genomic coordinates of all the CCGG motifs across my reference genome. The only thought to go around this would be to grep for CCGG across my reference genome and export these sequences in a fasta format. Then align to the same genome and get the coordinates "chromosome" and "position". However, my genome is from a teleost and there are 2 or 3 duplication events so I am not expecting to get all of them aligned uniquely. Also some times a CCGG in a fasta file might be interrupted from one line to the next one so my grep will not be able to get the sequence.

Do you know any other way or some specific software or browser service (UCSC, NCBI, Ensembl) that can do this without aligning?

Regards, Ioannis

genome CCGG • 1.5k views
ADD COMMENT
0
Entering edit mode

For sure grep will not be a good way to go.

Some answers here : Finding specific k-mer in human genome

ADD REPLY
3
Entering edit mode
6.6 years ago

fuzznuc from EMBOSS Explorer

Load your reference genome, set your pattern, output the result in tab-delimited format and parse it with unix command or any language you want

ADD COMMENT
0
Entering edit mode

It runs like a dream! Thanks a lot Bastien!

Cheers,

Ioannis

ADD REPLY

Login before adding your answer.

Traffic: 2342 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6