I have an ammended fasta file like so:
>ENST00000517147.1 ncrna chromosome:GRCh38:1:9437669:9437778:-1 gene:ENSG00000252956.1 gene_biotype:rRNA transcript_biotype:rRNA gene_symbol:RNA5SP40 description:RNA, 5S ribosomal pseudogene 40 [Source:HGNC Symbol;Acc:HGNC:42816]
GTCTATGGCCATTGCACCCTGAACGTGCCAGATCTTGTCTCATCTTGGAAGCTAAGCAGGGTTGGGCTTGGAGGGGAGGAGGGTGAACCTCAGTTCAGGTTACTTAGCCT
>ENST00000576449.1 ncrna chromosome:GRCh38:CHR_HSCHR18_1_CTG1_1:50319002:50319120:1 gene:ENSG00000262132.1 gene_biotype:rRNA transcript_biotype:rRNA gene_symbol:RNA5SP458 description:RNA, 5S ribosomal pseudogene 458 [Source:HGNC Symbol;Acc:HGNC:43358]
TTTCTATGGCATACCAACCTGAGTGTGCCCAGTCTCATCCAATCTCAGAACGTAAGCAGGATTGGGCCTGGTTAGAACTTGGATGGGAAAATGCCAGTTAAAATCTGTACTAAAAAATT
and an ammended gtf file like so:
1 ENSEMBL gene 9437669 9437778 . - . gene_id "ENSG00000252956.1"; gene_type "rRNA"; gene_status "KNOWN"; gene_name "RNA5SP40"; level 3;
1 ENSEMBL transcript 9437669 9437778 . - . gene_id "ENSG00000252956.1"; transcript_id "ENST00000517147.1"; gene_type "rRNA"; gene_status "KNOWN"; gene_name "RNA5SP40"; transcript_type "rRNA"; transcript_status "KNOWN"; transcript_name "RNA5SP40-201"; level 3; transcript_support_level "NA"; tag "basic";
1 ENSEMBL exon 9437669 9437778 . - . gene_id "ENSG00000252956.1"; transcript_id "ENST00000517147.1"; gene_type "rRNA"; gene_status "KNOWN"; gene_name "RNA5SP40"; transcript_type "rRNA"; transcript_status "KNOWN"; transcript_name "RNA5SP40-201"; exon_number 1; exon_id "ENSE00002089424.1"; level 3; transcript_support_level "NA"; tag "basic";
I want to extract the gene_name from the gtf file i.e. RNA5SP40 and the corresponding ENSG** from either the gtf or fasta file and the print the matching fasta sequence on the following line i.e.:
RNA5SP40|ENSG00000252956.1
GTCTATGGCCATTGCACCCTGAACGTGCCAGATCTTGTCTCATCTTGGAAGCTAAGCAGGGTTGGGCTTGGAGGGGAGGAGGGTGAACCTCAGTTCAGGTTACTTAGCCT
I am a complete beginner at programming and don't really know where to start. I could probably use awk to extract the gene name and ENSG* from the same file but wouldn't know how to match this to print out the fasta sequence from the other file?? Please help!