How to get the target sequence of matching consensus motif using Seqkit?
0
0
Entering edit mode
3.9 years ago

My aim was to retrieve genes containing a consensus motif from a file containing genes in fasta format. I solved that with the help of Seqkit tool. For ex,

If my motif is something like this, I can write,

seqkit grep -srip 'G[TA][ATC]AGCA[TAC]' input.fasta > output1.fasta

Some of the motifs that I have contains several ambiguous bases and I am not sure the matching region. So my question here,

  1. How can I get the target matching sequence of consensus motif?

Edit:

What is the function of -srip in the above command? Because when I use

grep -o "G[TA][ATC]AGCA[TAC]" input.fasta > output2.fasta

Not all fasta files in output1.fasta have corresponding motif from output2.fasta

Motif Seqkit promoter • 1.3k views
ADD COMMENT
1
Entering edit mode

seqkit grep -srip xxx equals to seqkit grep -s -r -i -p xxx

$ seqkit grep -h
  -s, --by-seq                 search subseq on seq, both positive and negative strand are searched, and mismatch allowed using flag -m/--max-mismatch
  -r, --use-regexp             patterns are regular expression
  -i, --ignore-case            ignore case
  -p, --pattern strings        search pattern (multiple values supported. Attention: use double quotation marks for patterns containing comma, e.g., -p '"A{2,}"'))
ADD REPLY
0
Entering edit mode

Thanks @shenwei356. Solution to my problem was in the -s option, which searched both strands. Now I use -P to use only one strand.

ADD REPLY

Login before adding your answer.

Traffic: 2193 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6