Question

How to get the target sequence of matching consensus motif using Seqkit?

0

Entering edit mode

3.9 years ago

2021yearsold • 0

My aim was to retrieve genes containing a consensus motif from a file containing genes in fasta format. I solved that with the help of Seqkit tool. For ex,

If my motif is something like this, I can write,

seqkit grep -srip 'G[TA][ATC]AGCA[TAC]' input.fasta > output1.fasta

Some of the motifs that I have contains several ambiguous bases and I am not sure the matching region. So my question here,

How can I get the target matching sequence of consensus motif?

Edit:

What is the function of -srip in the above command? Because when I use

grep -o "G[TA][ATC]AGCA[TAC]" input.fasta > output2.fasta

Not all fasta files in output1.fasta have corresponding motif from output2.fasta

Motif Seqkit promoter • 1.3k views

ADD COMMENT • link 3.9 years ago by 2021yearsold • 0

1

Entering edit mode

seqkit grep -srip xxx equals to seqkit grep -s -r -i -p xxx

$ seqkit grep -h
  -s, --by-seq                 search subseq on seq, both positive and negative strand are searched, and mismatch allowed using flag -m/--max-mismatch
  -r, --use-regexp             patterns are regular expression
  -i, --ignore-case            ignore case
  -p, --pattern strings        search pattern (multiple values supported. Attention: use double quotation marks for patterns containing comma, e.g., -p '"A{2,}"'))

ADD REPLY • link 3.9 years ago by shenwei356 8.7k

0

Entering edit mode

Thanks @shenwei356. Solution to my problem was in the -s option, which searched both strands. Now I use -P to use only one strand.

ADD REPLY • link 3.9 years ago by 2021yearsold • 0