Question

Extract raw sequence from the .sra file and to check whether the extracted sequence is a complete genome using python

0

Entering edit mode

7.9 years ago

Sanchez95 • 0

I am a newbie to bioinformatics I need to extract the genome sequence from the sra file.I have tried converting the .sra to fasta and fastq format and extracting the sequence but concatenating all the reads do not result in a assembled complete genome.So all I want, is to extract a complete genome sequence from the .sra file.

genome sequencing sra fasta ncbi • 3.3k views

ADD COMMENT • link 7.9 years ago by Sanchez95 • 0

0

Entering edit mode

Those sra files contain just raw sequence data, so no assembled genome will be in there. You'll have to do the assembly yourself, and whether that's possible will depend on the data/experiment/organism/technology.

Try to be more informative with regard to which technology, which organism and which purpose.

ADD REPLY • link 7.9 years ago by WouterDeCoster 47k

0

Entering edit mode

Regarding the technology,organism and purpose, I have used the NCBI Toolkit ( https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc , using fastq-dump)to convert the sra to fasta or fastq formats and still exploring the method to assemble the genome from the raw sequence obtained. For now I am particularly looking for staphylococcus aureus and its subspecies. Although, I am aware that these file are there on NCBI but to crosscheck my method for classification of species I am relying upon the research already done in the field. The data set I'm referring to is http://www.nature.com/articles/ncomms10063#supplementary-information (Supplementary data set 1) I am comfortable with python and have used the Bio-python repository.Please suggest a way I could assemble the genome using Python

ADD REPLY • link 7.9 years ago by Sanchez95 • 0

1

Entering edit mode

You would need to use a program for de novo assembly such as velvet, for example, or abyss or soapdenovo.

ADD REPLY • link 7.9 years ago by mastal511 ★ 2.1k