Extract Sequence from Blast tab limited out put
1
0
Entering edit mode
9.0 years ago
Eva_Maria ▴ 190

I got a blastp (-m 8) output like this

GCF_000707685.1_726     GCF_000958575.1_152     98.62   217     3   0   1   217     1   217     4e-158  444
GCF_000707684.1_726     GCF_000878815.1_1985    98.62   217     3   0   1   217     1   217     4e-158  444
GCF_000707683.1_726     GCF_000878805.1_154     98.62   217     3   0   1   217     1   217     4e-158  444

I want to extract all sequences like this

>GCF_000707685.1_726
MKQFLKITVLLLVISFKADAAKRPEVPICQPWPECWDLRVDPESVIEESLDPSRPKSEDT
IDVMDIVSSSMSFSCIRWRVAGMCVWYKWPYKISTSVKVNHYIPDYVVSAYERSGENT
>GCF_000707684.1_726
MKTFKLTILAVCCISSTVSANSTRDSLEPILENGPWYYEVGGARYVPLTRLDSSRISAGA
ISWNGNLMCSNLDPSVSMDAFMNGAKEGFINLQRNAVSTFKGVIASLPGLALQHADPGL

Is there any perl programme or awk is available to tackle this? Please help me

awk blast perl • 2.4k views
ADD COMMENT
3
Entering edit mode
9.0 years ago
5heikki 11k

Parse the sequence IDs from column 2 (e.g. `awk -F '\t' '{print $2}' file > IDs) and extract them with blastdbcmd.

blastdbcmd -h
USAGE
  blastdbcmd [-h] [-help] [-db dbname] [-dbtype molecule_type]
    [-entry sequence_identifier] [-entry_batch input_file] [-pig PIG] [-info]
    [-range numbers] [-strand strand] [-mask_sequence_with mask_algo_id]
    [-out output_file] [-outfmt format] [-target_only] [-get_dups]
    [-line_length number] [-ctrl_a] [-show_blastdb_search_path]
    [-list directory] [-remove_redundant_dbs] [-recursive]
    [-list_outfmt format] [-logfile File_Name] [-version]
ADD COMMENT
0
Entering edit mode

sir I want all aligned sequences like fields 6 to fields 7 ( eg:1 to 217).

ADD REPLY
1
Entering edit mode
blastdbcmd -help
..
-entry_batch <File_In>
   Input file for batch processing (Format: one entry per line, seq id
   followed by optional space-delimited specifier(s)
___^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   [range|strand|mask_algo_id]
___^^^^^^^^^^^^^
..

So also parse that.

ADD REPLY
0
Entering edit mode

I have all ready one code but its not working..

use warnings;

print "Enter Your BLAST result file name:\t";
chomp($blast = <STDIN>);     # BLAST result file name
print "\n";

print "Enter Your Gene list file name:\t";
chomp($database = <STDIN>);  #  file name
print "\n";

open IN,"$blast" or die "Can not open file $blast: $!";

@ids = ();
@seq_start = ();
@seq_end = ();

while(<IN>){

@feilds = split("\t",$_);
push(@ids,$feilds[0]);
push(@seq_start,$feilds[6]);
push(@seq_end,$feilds[7]);
}
close IN;

open OUT,">Result.fasta" or die "Can not open file $database: $!";


for($i=0;$i<=$#ids;$i++){

($sequence)    = &block($ids[$i]);

($idline,$sequence) = split("\n",$sequence);

    if($seq_start[$i] <= 70){
    $pos_Start = 0;
    }
    else{
    $pos_Start = $seq_start[$i]-70;
    }

    $pos_end = $seq_end[$i]+70;
    if($pos_end >= length($sequence)){
    $pos_end = length($sequence);
    }    

$seqlen = $pos_end - $pos_Start;

$Nucleotides = substr($sequence,$pos_Start,$seqlen);

$Nucleotides =~ s/(.{1,60})/$1\n/gs;

print OUT "$idline\n";
print OUT "$Nucleotides\n";    
}
print "\nExtraction Completed...";

sub block{

$id1 =shift;
print "$id1\n";
$start = ();

open IN3,"$database" or die "Can not open file $database: $!";

$blockseq = "";
while(<IN3>){

    if (($_ =~ /^>/)&&($start)){

    last;
    }

    if (($_ !~ /^>/)&&($start)){

    chomp;
    $blockseq .= $_;
    }

    if (/^>$id1/){

    $start = $.;print "$.\n";#-1;
    $blockseq .= $_;
    }
}
close IN3;

return($blockseq);
}<STDIN>;
ADD REPLY

Login before adding your answer.

Traffic: 3030 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6