Hi All Dear,
I have a text file, like the following file. I want to extract the name of the genes.
for example:
NECTIN3
TAGLN3
SMG6
ERICH1
DLGAP2
PPP2R2B
from below input:
ID=id18056;Parent=rna1456;Dbxref=GeneID:102398777,Genbank:XM_006073960.2;gbkey=mRNA;gene=NECTIN3;product=nectin cell adhesion molecule
ID=id18065;Parent=rna1457;Dbxref=GeneID:102398777,Genbank:XR_003108818.1;gbkey=misc_RNA;gene=TAGLN3;product=nectin cell adhesion molecule
ID=cds1149;Parent=rna1456;Dbxref=GeneID:102398777,Genbank:XP_006074022.1;Name=XP_006074022.1;gbkey=CDS;gene=SMG6;product=nectin-3;protein
ID=id18057;Parent=rna1456;Dbxref=GeneID:102398777,Genbank:XM_006073960.2;gbkey=mRNA;gene=ERICH1;product=nectin cell adhesion molecule
ID=id18066;Parent=rna1457;Dbxref=GeneID:102398777,Genbank:XR_003108818.1;gbkey=misc_RNA;gene=DLGAP2;product=nectin cell adhesion molecule
ID=cds1149;Parent=rna1456;Dbxref=GeneID:102398777,Genbank:XP_006074022.1;Name=XP_006074022.1;gbkey=CDS;gene=PPP2R2B;product=nectin-3;protein
What is the best idea?
many thanks for all answer ...
All answers were great
Now, I've Extract the name of the genes. But there is a problem, because a gene may be in different positions, So its name is copied several times.
Is there a suggestion?
.
sorry, How to use sort | uniq?
Do you mean to add it to the previous script?
mostafarafiepour, with all due respect: Invest time and search for these absolutely basic answers yourself. This is a bioinformatics Q&A community, intended to help with bioinformatics-related problems, not a basic Unix learning platform. You are lucky people actually answer these kinds of questions. Again, with respect, but if you are already stuck with these most simple things, I am worried that you will run into some severe trouble once analysis gets beyond executing basic Unix scripts. Learn the basics first, plenty of open-source material online on this.