I have a file of all the exons from a genome in fasta format, for which I only want to retrieve a subset of (based on a another file with sequence IDS). The file contents look like so:
>OFAS000001-RA-EXON01
ATGGGAAACATGAAGAGAGATCTGACCAGCATGGCTTTATCTCCAGACTCGACATACAAGATCCTTGAAGAAGTCAAAGAGGAATCCATTGAATCTAATCTTTCCTCTCTTGATGGAGTCTTAACTCTTAAAGCACCTAAAAAGGCTCTAGGGGATGAAATA
>OFAS000001-RA-EXON02
AGCATTGAATATAAGGTAGAACCAAGGAGCCTGGACAATGTACCTGGTAGTGGACGTAAGTATTTTACAAATTTCAGTCCATTCTTGGGGAGATTCTAA
>OFAS000002-RA-EXON01
ATGATCAAGCCATCTTGCCGT
>OFAS000002-RA-EXON02
TTGCAGAGGGCCTTACTGCTCCTCGTGGCAGTACTATTTTTCCAGGACCCAAATGGCATCGCATCGGCTCCACTCAGCGACGAGTCTAGTACTAAACTCAATGAAATAGACGTGACCCGCGAGGATTCACCTATAGAAAATGGATTAATTGCCAGTTCTGGGACACCCATCATCTCCTCTATAATCAGTAATAATTCGGGACTCCCAGAAGAAGCTGAGCCATCACTTTCAGAACCTTTAAATGAAGAGAGATTAGAAACGAATCAAGAAATTCAAGAACCAATCGGGAAAAACCCTCCACAAGAAAACAACATTGAAGAGCA
>OFAS000003-RA-EXON01
ATGCAGGACACAGATGTGCTCACCTGCGGTGCCTGCCAGAAGGCGTTTGCTCTTGGCGACATCGTCAAGTTCATCCAGCACAAGGTGCTCGCGTGCAACAAGGAGAACTACGGCTGCAACGAGGGCGGAGGCCCTGCAGCGAACCACCACGAGTCCGGGGGTGACTCCGACGGCGAGGTGGGAGTCCTCGCGAGGCGCCCGTCCATCTCGGCGCCGATGAAGAGGACCGGCGATGAGCTGCAGCAGTCCCAACAGCAGCAGCCCAGGTGCTCGACGCCCAAGCGTCGCCCGAGCCCCGGCTCTGCCTCGCCCACCCCTATCAAGGCCGAGCTCGACTCCACACCCCCGTCCTCCTCTCCGGAGGAAGGCCCTTGCAAGAAGCTGAGGACCGATGTAGCCGATGCCACCTCCAACACCACCAACTCAGGTGAGGATCACTTTCTTTCTCTTGGACCCTCTTACGGTCTCTCAATAGTAGTCTAA
>OFAS000004-RA-EXON01
ATGGCACTTGAAATTGCCTTATCAGATATCCTACGTGGGGCGGCCTTGGCATTGATTGCAGTCAGTCAGGGAAATGGGTCGGAAGAAATTATAAATCACATTTGGGG
>OFAS000004-RA-EXON02
AGCCCAGCAATTATGTGTGTTCGACGTGCAAGGCGAGACTGCACTCCGCTTGGAGGCTGGTCCAGCACGTTCAGAACAGCCACGGTATCAAGATTTACGTAGAGAGTAG
>OFAS000005-RA-EXON01
ATGCCTCTCGAACCGGCCCGCCATCCGACCTTCGCCCCGACCTCTCTCGCCAGCGGCGGATCTCTTTTCGCGAGGCCCTCTAGTCATAGCGACCACCACTTCAGGATGGAACACCTCGTCTCAGAACAGTTCAGGCACCACCCTTTAGGGTTGGCGGCGGCGGTGGCCGGGGCGGTGCCACCCCCGCCTTTCGGGCCTCCGGCCGCCGAACGGGCACCTCCGACTACGAGGTCCCTACCTCCACCGCCTTCGCTACCGCTCTCTCTCGAGCCACAGATAGATTTCTACTCGCAGAGGCTGCGGCAGCTAGCTGGTACCACCAGTCCAGGCGCCGCCAACGGCAACTCTTCTCCTTCTCCGAGGAAACTGACGCCTCCTTTCACTAGTCCTAATAACAGCATTCCGACGCCCGTCAGTATGGCACCTTCCTTAATGTTTAACAACAATAACAATAGCAGTACCAATAACAATAACATCCCACCCGGTACTGTCAGGTCACCGACTCCCAGGCCTCAGCAGTCTACTCCTCCTGCTGAACAGAAGCCCGCCCCTGCTAAGGATGCCCCCCAAGAGGAAACTGTTGATCTCACGTCCACTCCTAGATCTGCTTCGACGCCTCCCGCCAAACCGG
The other file that has a subset of CDS sequence IDs I'd like to use to retrieve the corresponding exonic sequences looks like such:
OFAS000003-RA
OFAS000004-RA
OFAS000005-RA
OFAS000006-RA
OFAS000007-RA
OFAS000008-RA
OFAS000009-RA
OFAS000010-RA
OFAS000013-RA
As you can see, the CDS seq IDs are very similar to the exon seq IDs (only missing "–EXON##"). Is there some way I can use the CDS seq ID file to retrieve the corresponding exon seqs?
Thank you! I ended up using the first set of commands you provided to get my required exon IDs outputted. I then used makeblastdb and blastdbcmd to pull out the corresponding exon sequences.