Hi all,
I have just finished to analyse for the first time an RNAseq dataset with DESeq2
for Verticillium dahliae. I have obtained a .csv file where there are more 6000 Gene_ID and now I would like to use the Gene_ID
to retrive the corresponding amino acid sequences from NCBI E-Direct, in fasta format. I have tried to use the following code:
esearch -db nuccore -query VDAG_00XXX | elink -target protein | efetch -format fasta
However, when I run the before mentioned code, I do not get the amino acid sequence for the protein but for 1660 other proteins (including the one I have searched for). Because I have a long list of genes that I would like to submit in NCBI E-Direct, do you know how I can retrive the excat gene/protein from it?
Also, I was looking at the NCBI website and noticed that the Gene_ID
I got on the .csv file, it is called Locus_Tag
. Can I still use it in my reasearch?
Thanks!
I found this old post and I was able to retrive a single amino acid sequence for the Gene_ID (Locus_Tag) that I include in the
-query
. However, when I use this code:I do get error and as a result I get amino acids from Homo Sapiens. Any suggestion on how to obtain the amino acid sequences I have included in my .txt file?
Go to: https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000150675.1/
Click on
Download
button --> RefSeq --> Protein fasta to get the protein sequence. You can extract the sequences you need but sounds like you may have the entire genome.Hi GenoMax
Thanks, in the end I could manage to make NCBI EDirect run on the Linux Command Line. I have used the same code as above but I have chose
efetch -format fasta
as an output. I could get the AA sequnces from that and run them in EffectorPThis should have worked though (one example)
Definitely that works. When I use a single VDAG ID I have no problem, but with the .txt file I had to convert the IDs and then run it. Anyway, I did get the AA list I was looking for.