I want to determine the sequence of a particular gene from several fish species. The sequence of this gene is not available for the species in question in the NCBI nucleotide database. However, the RNA-seq data for these species are available. I do not have powerful computing capabilities. Hence it will be difficult to do the whole RNA seq analysis for several species. Moreover, the analysis of whole data might be useless because I am interested only in one particular gene sequence. Could anyone please suggest an approach to look specifically for the sequence of gene of interest among all these RNA-seq data from different species. Thanks!
While not fool-proof, try blast against SRA at NCBI. When you set your blast up select
Sequence Read Archive (SRA)
inChoose Search Set
underother
database. You may be able to further limit the search by using appropriate taxonomy ID.Thanks for suggesting this approach. As you suggested, I have just tried this approach. For now I have tried blasting known sequence from zebrafish with RNA-seq data on fathead fish using the accession number in database. this produces a several highly similar regions (90-100% identical) but with very short query cover (~2%). I have attached a snapshot of the results [2]. I would be highly thankful if you could suggest me some idea about how to proceed from here to construct the full sequence. Thanks [2]: http://postimg.org/image/kprimkmvb/106b249d/
You are going to have to recover those reads (either from blast or from original SRA fastq) and then try to do local assemblies.
filterbyname.sh
andtadpole.sh
may be a couple of programs from BBMap suite that would be helpful in this regard.