getting information from NCBI genebank files
0
0
Entering edit mode
4.8 years ago

Hi, I am interested to know about the presence of a specific protein called "anti restriction" in a list of ~600 genomes of different species of bacteria. It can be find from the NCBI genebank files. Manually checking those 600 genomes is time-consuming. Any suggestions on how can I know which of these genomes contain this specific protein? Thanks in advance

Assembly NCBI genebank annotation • 1.1k views
ADD COMMENT
0
Entering edit mode

Two possible options:

  1. NCBI allows you to restrict online web blast searches against certain organisms. It may not be possible to put in 600 at a time though.
  2. Download the 600 genomes locally and do blast against that set. More work but feasible. You can use the method in this answer to get the genomes: A: How to download COMPLETE bacterial genomes from NCBI based on list of names?
ADD REPLY
0
Entering edit mode

Do you already have the protein sequences?

ADD REPLY
0
Entering edit mode

It could be any protein associated with anti-restriction. I am looking for a correlation between 'protein A' with anti-restriction proteins. I have already got 600 genomes having protein A. Now, trying to look at which of these genomes have anti-restriction proteins.

ADD REPLY
0
Entering edit mode

I'm not sure if my suggestion is the best approach and I'm not even sure if it's feasible, but here you go:

ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/Archaea_Bacteria/
wget ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/Archaea_Bacteria/All_Archaea_Bacteria.gene_info.gz
cat All_Archaea_Bacteria.gene_info | grep -i -E "anti-restriction|anti restriction" > anti-restriction-list.txt
ftp://ftp.ncbi.nih.gov/blast/db/FASTA/
wget ftp://ftp.ncbi.nih.gov/blast/db/FASTA/nr.gz
gunzip nr.gz 
Then search for proteins in anti-restriction-list.txt  and find their protein sequences 
and then blast against those 600 genomes (e.g concatenate them, use makeblastdb, options _num_alignments 1 and -evalue 1e-300 in tblastn might help)
ADD REPLY

Login before adding your answer.

Traffic: 1817 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6