Hi
I have around 85 gene sequences in individual fasta files. I'd like to rename each file with their header name containing the gene name in [gene=]. For each header, I only want what is in-between the brackets. I'm trying to do this through linux commands.
in fasta file input
>lcl|NC_018552.1_cds_YP_006666009.1_1 [gene=rps12] [locus_tag=C329_pgp044] [db_xref=GeneID:13540299] [protein=ribosomal protein S12] [exception=trans-splicing] [protein_id=YP_006666009.1] [location=complement(join(100912..100937,101474..101705,72928..73041))] [gbkey=CDS]
ATGCCAACTATTAAACAACTTATTAGAAATACAAGACAGCCAATCAGAAATGTCACGAAATCCCCCGCTC
>lcl|NC_018552.1_cds_YP_006666010.1_2 [gene=psbA] [locus_tag=C329_pgp089] [db_xref=GeneID:13540179] [protein=photosystem II protein D1] [protein_id=YP_006666010.1] [location=complement(565..1626)] [gbkey=CDS]
ATGACTGCAATTTTAGAGAGACGCGAAAGCGAAAGCCTATGGGGTCGCTTCTGTAACTGGATAACTAGCA
in need fasta file output
>rps12
ATGCCAACTATTAAACAACTTATTAGAAATACAAGACAGCCAATCAGAAATGTCACGAAATCCCCCGCTC
>psbA
ATGACTGCAATTTTAGAGAGACGCGAAAGCGAAAGCCTATGGGGTCGCTTCTGTAACTGGATAACTAGCA
Can anyone help with this?
TIA
This type of question is among the most frequently asked on this forum. Searching through previous posts should give you several different options of doing this task.
Yes there are many answer in this forum but they are specific to header line. it doesnot work with my header line and its difficult for me to change the command by myself. TIA
i tried this command
awk -F 'gene=|]|[.]{1}' '/^>/ {print $2}' 7seqNC_018552.1cds.fasta > 7NC_018552.fasta
got only gene name alone without contig seq in out. Any help appreciated