How to reduce RNA_from_genomic file to just gene name and sequence
1
0
Entering edit mode
2.1 years ago
Chris ▴ 10

I am working with an organism, and the RNA from genomic file is formatted as such

>lcl|NC_064920.1_mrna_XM_049993407.1_2 [gene=LOC126318536] [db_xref=GeneID:126318536] [product=protein flightless-1] [transcript_id=XM_049993407.1] [location=join(1505699..1505924,1512713..1512823,1522051..1522289,1525833..1525994,1526489..1526709,1527880..1528096,1535362..1535603,1536733..1536827,1578252..1578322,1590520..1590766,1602725..1602910,1614216..1614394,1617884..1618022,1625719..1625891,1625979..1626105,1626193..1626315,1644687..1644840,1651848..1652034,1652257..1652508,1654310..1654436,1660544..1660726,1660809..1660922,1661044..1661510)] [gbkey=mRNA]
GATAGTGTATCTTTGCCAAGGTGAAGTAGTGTCATGGGTTGTTTTGATTTTGGTGTATGTATGGGTGGACGTGTGTGAAT
TTTGAGAGTCTTGCGCGAAATAGTATTTGTTTGTGACATATAGATAGTTTATGAGCAGGGATTGAAAGTTGTCAATTAAC
ATCATGGCGAACACTGGCGTGTTACCGTTTGTTAGAGGTGTTGACTTCACAAGAAATGACTTCAGTAATGAAAGATTTCC

Is anyone aware of any posts that suggest how to reduce it down to:

>LOC126318536
GATAGTGTATCTTTGCCAAGGTGAAGTAGTGTCATGGGTTGTTTTGATTTTGGTGTATGTATGGGTGGACGTGTGTGAATTTTGAGAGTCTTGCGCGAAATAGTATTTGTTTGTGACATATAGATAGTTTATGAGCAGGGATTGAAAGTTGTCAATTAACATCATGGCGAACACTGGCGTGTTACCGTTTGTTAGAGGTGTTGACTTCACAAGAAATGACTTCAGTAATGAAAGATTTCC

I have tried looking, but can't find any

Many thanks,

Christian

fasta Line RNA Command • 814 views
ADD COMMENT
1
Entering edit mode
2.1 years ago

Try seqkit seq:

seqkit seq --only-id --id-regexp 'gene=(\w+)' seqs.fasta -o result.fasta

Where --id-regexp is a global option to capture the sequence ID with regular expression.

ADD COMMENT
0
Entering edit mode

That's amazing thank you so much! It worked so quickly! Installing seqkit was more complicated than using the tool! Thanks you again!

ADD REPLY

Login before adding your answer.

Traffic: 2544 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6