Find Motifs in Genome
1
I am wondering if a tool already exists to accomplish the following task...
First, the motifs of interest:
Second, the pattern to find in the genome:
any motif 6-7nt any motif
Finally, the expected output :
chr, start ( of motif) , gap (
For example:
chromosome: chr1
sequence: ATTCATGTGxxxxxxCATTTGCCG
output: chr1, 4, 6
I can write it myself but I would prefer not to reinvent the wheel. Thanks!
genome
• 2.3k views
My wheel, seqkit (please update to the v0.4.4 + version )
seqkit locate
(usage ) is used to locate subsequences/motifs.
Motifs could be EITHER plain sequence containing ACTGN
OR regular
expression (default) like A[ TU] G( ?:.{ 3} ) +?[ TU] ( ?:AG| AA| GA)
for ORFs.
Degenerate bases like RYMM..
are also supported by flag -d
.
$ cat motifs.fa
> CATGTG
CATGTG
> CATTTG
CATTTG
> CACGTG
CACGTG
> motif4
CATTTG.{ 6,7} CACGTG
$ cat seqs.fa
> seq1
tactgCATGTGactangcgang
> seq2
cccCATTTGttttttCACGTGttt
> seq3
cccCATTTGttttCACGTGttt
$ seqkit locate -i -f motifs.fa seqs.fa | column -t
seqID patternName pattern strand start end matched
seq1 CATGTG CATGTG + 6 11 CATGTG
seq2 CATTTG CATTTG + 4 9 CATTTG
seq2 CACGTG CACGTG + 16 21 CACGTG
seq2 CACGTG CACGTG - 16 21 CACGTG
seq2 motif4 CATTTG.{ 6,7} CACGTG + 4 21 CATTTGttttttCACGTG
seq3 CATTTG CATTTG + 4 9 CATTTG
seq3 CACGTG CACGTG + 14 19 CACGTG
seq3 CACGTG CACGTG - 14 19 CACGTG
Sorry, you must count the gaps by yourself.
Login before adding your answer.
Traffic: 2873 users visited in the last hour
can you see if one of these tools will do what you need?
fuzznuc
FIMO
HIMER