Hello,
I have multiple fasta headers in a file and I want to extract only Gene_Symbol
from the all headers in a separate file.
>IPI:IPI00000875.1|SWISSPROT:P01141|TREMBL:Q5U081|ENSEMBL:ENSP00000357648|REFSEQ:NP_002255|VEGA:OTTHUMP00000012879 Tax_Id=9806 Gene_Symbol=NOTCH Kinase NotchRas
Expected Result:
Gene_Symbol=NOTCH Kinase NotchRas
I have tried following perl script;
chomp($fname=<STDIN>);
open(IN,$fname) or die "Not correct file!!";
@cont=<IN>; close IN;
open(OUT,">IPIGenes.txt") or die "Can't open it !!";
$size=@cont;
for($i=0;$i<=$size;$i++)
{
chomp($cont[$i]);
@data=split('\|',$cont[$i]);
{
if($data[$i]=~/^Gene_Symbol/)
{print OUT"$data[$i]\n";}
else{skip;}
}
}
But I am not getting any output.
Thanks in advance
@Abdul Rawoof,
To be honest, I am not that good in writing those scripts. I have similar problem with sorting the gene names from fasta headers. I wonder to know how you managed to sort gene codes using excel easily?
Thanks in advance,
Shewit
Dear skalayout, what I did is that, first of all I extracted all fasta header in a separated text file using a small perl script.
You will get protein fasta header like following
Further I replaced "Gene_Symbol" with "#Gene_Symbol" using Find and replace option in textpad and saved changes. After that I open this in Excel usig Text import wizard > select delimited > next > select tab button and in other option put
#
symbol and finish. You will get Gene symbol=gene name in separate column.Best,
Abdul Rawoof
Thanks Abdul. Amazingly, your suggestion is still helpful, even after two years:)
Thanks again.