Use Query List To Extra Sequeces
2
0
Entering edit mode
11.5 years ago

Question:

A database containing sequences as follows:

>leaf_1
AAGACCATTCGAGCTTATCTCTTC
>leaf_2
ATGGAGAAGGAAATGAAGAGCAGT
>leaf_3
TGGCTGTAAGTCATACCTGTCA
>leaf_4
CGCGGAGTAGATCAGTTTGGTA
>leaf_5
AGTAACGGCTTTACAAGAATCAAA
......

Now I have a query file (inquiry.txt), which looks like:

>leaf_2
>leaf_4
>leaf_5

Need an output file (result.txt) looks like:

>leaf_2
ATGGAGAAGGAAATGAAGAGCAGT
>leaf_4
CGCGGAGTAGATCAGTTTGGTA
>leaf_5
AGTAACGGCTTTACAAGAATCAAA

Could anyone help with this question? Many thanks.

data list • 1.9k views
ADD COMMENT
1
Entering edit mode

This is a pretty common question: Extracting Sequence From A 3Gb Fasta File?

ADD REPLY
1
Entering edit mode
11.5 years ago
csiu ▴ 60

Try?

$ perl below-script.pl all-sequences.txt inquiry.txt

#!/usr/bin/perl

open (INPUT, $ARGV[0]) or die $1;
open (QUERY, $ARGV[1]) or die $1;
open (OUTPUT, ">result.txt");

chomp (my @array=<QUERY>);

while (<INPUT>) {
    foreach my $temp (@array){
    if ($_ =~ $temp) {
    $nextline = <INPUT>;
    print OUTPUT "$_$nextline";
    }
    }
}

close (OUTPUT);
close (QUERY);
close (INPUT);
ADD COMMENT
0
Entering edit mode
11.5 years ago
k.nirmalraman ★ 1.1k

You may also try the following Perl script... and this works for fasta format input files!

  use strict;
  use warnings;

  my @genes;
  open my $list, '<file2.list';
  while (my $line = <$list>) {
      push (@genes, $1) if $line =~ /[^>]+>([^|]+)/;
  }
  my $input;
  close $list;
  {
       local $/ = undef;
       open my $fasta, '<file1.fasta';
       $input = <$fasta>;
       close $fasta;
  }
 my @lines = split(/>/,$input);
 foreach my $l (@lines) {
      foreach my $reg (@genes) {
              print ">$l" if $l =~ /$reg\|/;
      }
}

File 2 will your query file and File1, the fasta sequence file in this case!

ADD COMMENT

Login before adding your answer.

Traffic: 1877 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6