Extracting Sequences After "Motif" & Between Motifs In Multifasta File
2
0
Entering edit mode
11.6 years ago
Raghul ▴ 200

Hi I want to extract sequences after a motif say "TTTTTAAAAA" from a multifasta file. I do not want the nucleotides before this keyword. Is it possible to extract nucleotides between 2 motifs with grep? eg. nucleotides between TTTTTAAAA & AAAATTTT. I tried with grep but I need the fasta headers also. Can anybody suggest a solution in grep (if possible) or perl or python.

thanx raghul

parsing • 4.3k views
ADD COMMENT
0
Entering edit mode

You can get a case with the motif found several times within a same sequence. How do you want to deal with that?

ADD REPLY
0
Entering edit mode

Hello!, I would like to do something similar...did you find a way to complete your task?

ADD REPLY
1
Entering edit mode
11.6 years ago
Ying W ★ 4.3k

I don't think it would be possible with grep but this can be done w/a regex in perl. Something along the lines of:

$line = "";
foreach(<FILE>) { #for every line of the file
  chomp;
  if($_[0] == ">") { #if line starts with >, it is a header so process the previous sequence
    if($line =~ /[TTTTTAAAAA([ACTGN]+)AAAATTTT/g) { #regex to match motif
      print "$1\n" #print sequence in between motif
    }
   $line = ""
    print "$_"; #print header
  }
  else {
    $line = $line.$_ #append sequence
  }
}
if($line =~ /[ACTGN]*TTTTTAAAAA([ACTGN]+)AAAATTTT/g) {
  print "$1\n"
}

or something like that, (warning above code is untested and should be treated as pseudocode)

ADD COMMENT
0
Entering edit mode

Some (many?) versions of grep, such as the "standard" version included in Linux distributions, take the option "-P" meaning "interpret regex as a Perl regex". So if Perl can do it, so can grep.

ADD REPLY
1
Entering edit mode
11.5 years ago
PoGibas 5.1k

grep way

  echo NNNTTTTTAAAACCCAAAATTTTNNN > sequence
  grep -o TTTTTAAAA[A-Z]*AAAATTTT sequence 
  TTTTTAAAACCCAAAATTTT
ADD COMMENT

Login before adding your answer.

Traffic: 2386 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6