Entering edit mode
7.9 years ago
Sanchez95
•
0
I am a newbie to bioinformatics I need to extract the genome sequence from the sra file.I have tried converting the .sra to fasta and fastq format and extracting the sequence but concatenating all the reads do not result in a assembled complete genome.So all I want, is to extract a complete genome sequence from the .sra file.
Those sra files contain just raw sequence data, so no assembled genome will be in there. You'll have to do the assembly yourself, and whether that's possible will depend on the data/experiment/organism/technology.
Try to be more informative with regard to which technology, which organism and which purpose.
Regarding the technology,organism and purpose, I have used the NCBI Toolkit ( https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc , using fastq-dump)to convert the sra to fasta or fastq formats and still exploring the method to assemble the genome from the raw sequence obtained. For now I am particularly looking for staphylococcus aureus and its subspecies. Although, I am aware that these file are there on NCBI but to crosscheck my method for classification of species I am relying upon the research already done in the field. The data set I'm referring to is http://www.nature.com/articles/ncomms10063#supplementary-information (Supplementary data set 1) I am comfortable with python and have used the Bio-python repository.Please suggest a way I could assemble the genome using Python
You would need to use a program for de novo assembly such as velvet, for example, or abyss or soapdenovo.