error in fasta sequence extraction based on gene ID
2
0
Entering edit mode
7.6 years ago
Bioinfonext ▴ 470

My fasta file header is like this:

>MSTRG.8.1 gene=MSTRG.8

AATCGACCAGAACGTTGACGACTTCTTGAAGCTTATAGCCGATCTCAACAATCTCAACATTGAGATTCCA

>MSTRG.10.1 gene=MSTRG.10

TCATCAGACTCTTCCGCAACCAATACTTCTACCCTTCAGAAGCTCCCTATCAAAGTAGGAATCTTTTATA

>MSTRG.10.2 gene=MSTRG.10

TTCTACCCTTCAGAAGCTCCCTATCAAAGACAAATCTACAGGTCATGTGACTAAAGA

And gene Id is like this:

MSTRG.8.1       gene=MSTRG.8

MSTRG.10.1      gene=MSTRG.10

MSTRG.10.2   gene=MSTRG.10

Could you please help How I can extract sequences for these gene ids from fasta file.

If I trim header of both file then I can able extract, but I need the same header in my fasta files.

Thanks

RNA-Seq • 1.7k views
ADD COMMENT
2
Entering edit mode

it's a frequently asked question, please search on this site.

ADD REPLY
3
Entering edit mode
7.6 years ago

seqkit

seqkit grep -n -f ids.txt seqs.fa
ADD COMMENT
0
Entering edit mode

Thanks, it is resolved

ADD REPLY
0
Entering edit mode
7.6 years ago
GenoMax 147k

I had answered a very similar question on the site yesterday. : C: How to remove sequences from a fasta file based on ID list? The question there was to remove sequences but can also be used in this case with a minor modification of the command line.

nabiyogesh : It is tempting to get an immediate answer by posting a question the moment you encounter it but you should spend some time doing an effort on your part as suggested by @shenwei356. It would help you find other interesting things as you look through the search results trying to find one you can use.

ADD COMMENT
0
Entering edit mode

yes, I will surely try to improve myself.

ADD REPLY

Login before adding your answer.

Traffic: 1830 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6