Entering edit mode
8.7 years ago
Vinay Singh
▴
50
Hello,
I have a genome annotation file and in it I want to extract some information like gene,Note,etc Provide me some solution in unix command by using which i can get my desired information in a file. Sample file
NC_012870.1 RefSeq CDS 10199186 10199404 . - 0 ID=cds1181;Parent=rna1181;Dbxref=InterPro:IPR006043,JGIDB:Sorbi1_5048357,Genbank:XP_002466633.1,GeneID:8062961;Name=XP_002466633.1;**Note**=similar to Nucleobase-ascorbate transporter LPE1;gbkey=CDS;**gene**=Sb01g011360;product=hypothetical protein;protein_id=XP_002466633.1
NC_012870.1 RefSeq CDS 10199487 10199647 . - 2 ID=cds1181;Parent=rna1181;Dbxref=InterPro:IPR006043,JGIDB:Sorbi1_5048357,Genbank:XP_002466633.1,GeneID:8062961;Name=XP_002466633.1;Note=similar to Nucleobase-ascorbate transporter LPE1;gbkey=CDS;gene=Sb01g011360;product=hypothetical protein;protein_id=XP_002466633.1
like this i have a long file now i want to separate some information like gene, Note.
Thanks a lot Mr. Pierre, can you please give me some resources for learning these Unix command it will be a great help.
Just google
bash learning exercises
.I would suggest to start learning unix basic commands for processing records of files.
grep
,awk
,sed
,bio-awk
would be the first for you to learn and work on all informations that you have in your files. Life gets easy with these. Take a look at this link and this for first hand use.P.S: I am not saying you will be able to do everything but still you will have a start and then people in the community can help you more. It is important for you to learn as well. This would help you in future. I hope you already understood the command line which Pierre has put as an answer. Also this question really is a stack overflow query which can be found out it simple search. I would not really go with the way of saying it as a bioinformatics question. People can beg to differ.
I'd recommend against bioawk as an initial piece. Focus on core utils, not sed and awk. cat, cut, tr, wc, sort, uniq and the like. Then, work on pipes and redirects. Move on then to grep, then awk and sed. Practice better regular expressions. Starting with grep will only confuse people unless they have a strong regex background.
Infact I did not mention in every details, yes before jumping to sed and awk it is important to learn the likes of core utils. Starting with grep will definitely confuse. I assume a part of regex learning is done while in Masters so that was my point. However what Ram has said is more precise and directed way.