Local Blast Question About The Genome Download
3
0
Entering edit mode
13.2 years ago
Ss ▴ 50

Hello All,

I have trying to go local blast for my genes (55 in number). I need to find the genes in all the mammalia class. I was trying to download the genomes for all the species of interest but I could not.

I need the nucleotide sequences for blastn program. I looked up every possible dataset but nothing worked.

It will be a great relief if someone can suggest a way. I also tried with blast preformatted datasets. So, I was trying to download the genomes for the species but the fasta sequences are available for the chromosomes.(as full chromosome sequence and not genes).

Thanks.

blast • 3.5k views
ADD COMMENT
2
Entering edit mode
13.2 years ago
John Van Dam ▴ 110

Hi! I think you need to provide a little more (and specific) information. You say you want to blast your 55 genes against mammalian genomes. You have tried to download whole genomes to run blast locally, but could not. Could you not find the genomes or could you not download the files? www.ensembl.org would be a good place to start searching for mammalian genomes. Do you want to search against genome assemblies or transcripts?

In general, make list of genomes you want to search against. Find the repository that will provide you the data (NCBI, embl, ensembl) and download each genome. Combine your datasets (if not already) and use formatdb -i to build a local blast database. Search using the blast program. for instance: blastall -p blastn -i 55genes.fasta -d localblastdatabase (depends on your blast version)

Hope this helps, otherwise provide us with more info. What is it you specifically want to do, what is going wrong and when?

ADD COMMENT
0
Entering edit mode

@SS COMMENTS THE FOLLOWING: Thanks for the input.

I have already listed down the genomes I want to include.I don't know which sort of dataset I should use there are contigs, transcripts , masked ,unmasked sequences.

I need simply the datasets of genes (nucleotides). When I try with NCBI RefSeq FTP, the genomes datasets are arranged into chromosomes. These chromosomes lack the distribution of genes but they are just fasta sequences for the whole chromosomes.

So, I am stuck at the very step itself.I looked into the specific databases too but there is no such option as gene dataset download

ADD REPLY
0
Entering edit mode
13.2 years ago
John Van Dam ▴ 110

Ah, well. If you want to find predicted genes you should go for sets of transcripts. However, this could mean you miss genes because they have not been predicted yet. To verify this you need to blast against the genome assembly (whole chromosomes) to see if you find hits that are not predicted gene loci.

To solve your problem (though not likely to its fullest extent) you can download genomes, transcriptomes (collection of DNA sequences of all transcrips in a genome) and proteomes (translated protein sequence) from ensembl: http://www.ensembl.org/info/data/ftp/index.html For transcripts use cDNA. JGI and Broad institute also allow you to download genome, transcriptome and proteome sequences from each of the sequencing project pages. I haven't looked at the NCBI ftp site lately, but I suspect you may find the files you need there as well.

Goodluck!

ADD COMMENT

Login before adding your answer.

Traffic: 1965 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6