Hi community,
I would like to find the genomic coordinates of all the CCGG motifs across my reference genome.
The only thought to go around this would be to grep
for CCGG across my reference genome and export these sequences in a fasta format. Then align to the same genome and get the coordinates "chromosome" and "position".
However, my genome is from a teleost and there are 2 or 3 duplication events so I am not expecting to get all of them aligned uniquely. Also some times a CCGG in a fasta file might be interrupted from one line to the next one so my grep
will not be able to get the sequence.
Do you know any other way or some specific software or browser service (UCSC, NCBI, Ensembl) that can do this without aligning?
Regards, Ioannis
For sure
grep
will not be a good way to go.Some answers here : Finding specific k-mer in human genome