Hi. I want to retrieve several bacteria exomes with the condition that those bacteria has to be (ideally) homogeneously separated in evolution (I need big biological diversity among the species I choose).
Also, it would be positive to pick that bacteria species relying on some insteresting trait (they are related to human, cattle disease, they can survive at extreme conditions...). So, the files has to be annotated or the browser has to have an option where I can filter by this kind of traits.
Is there a way of doing it without writting a scrip? Which database do you recommend me (for bacterias only)?
Thanks for your time.
Retrieving bacterial proteomes from the NCBI is easy (you can use the search in this forum to find out more about that). However, coming up with a list that meets your criteria sounds like a lot of manual work, I don't know how you think that some script would help you there..
I was thinking about doing a script using some kind of evolutive distance parameter. I've been reading about that and it exit. The problem is that I dont have time or informatics skill enough for doing that script, and I though maybe there is some library that can help me.
You can download bacterial proteomes from NCBI using various methods. Following is one for RefSeq geneomes.
wget ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/assembly_summary_refseq.txt
cat assembly_summary_refseq.txt | awk 'BEGIN{FS="\t"}{print $20}' | awk 'BEGIN{OFS=FS="/"}{print $0,$NF"_genomic.faa.gz"}' > urls.txt
Use a simple loop to download data:for i in
cat urls.txt; do wget $i; done
;)
👍