Entering edit mode
16 months ago
PolDE
•
0
Can sequences be extracted from nr.gz to generate a desired FASTA output for a specific ID (fungi)? Easiest way please!
Can sequences be extracted from nr.gz to generate a desired FASTA output for a specific ID (fungi)? Easiest way please!
Using blastdbcmd
from BLAST+ package:
$ blastdbcmd -db nr -taxids "4932" -outfmt %f | grep ">"
>CAI4239086.1 AMM_1a_G0000650.mRNA.1.CDS.1 [Saccharomyces cerevisiae] >CAI6471640.1 AMM_1a_G0000650.mRNA.1.CDS.1 [Saccharomyces cerevisiae]
>CAI6490012.1 ANM_collapsed_G0002730.mRNA.1.CDS.1 [Saccharomyces cerevisiae]
>CAI4833384.1 CEI_1a_G0054260.mRNA.1.CDS.1 [Saccharomyces cerevisiae] >CAI7479578.1 CEI_1a_G0054260.mRNA.1.CDS.1 [Saccharomyces cerevisiae]
>CAI4413312.1 AEG_G0014060.mRNA.1.CDS.1 [Saccharomyces cerevisiae] >CAI6616728.1 AEG_G0014060.mRNA.1.CDS.1 [Saccharomyces cerevisiae]
>PTN39234.1 flocculin FLO1 [Saccharomyces cerevisiae] >CAI4432116.1 AIC_G0016520.mRNA.1.CDS.1 [Saccharomyces cerevisiae] >CAI6639197.1 AIC_G0016520.mRNA.1.CDS.1 [Saccharomyces cerevisiae]
This may or may not work on your specific fungus if the TaxID is not available.
Redirect to a file to get the sequences
$ blastdbcmd -db nr -taxids "4932" -outfmt %f > sequence.fa
nr.gz is just a compressed FASTA file, even though it does not have the ".fasta" extension.
Hence, decompress it and then use any of the methods described here Extract fasta sequences from a file using a list in another file.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
"Extracting data from BLAST databases with blastdbcmd" https://www.ncbi.nlm.nih.gov/books/NBK569853/