Question

Extracting Subset Of Sequences From Finished Assembly

0

Entering edit mode

11.0 years ago

erinspice1 • 0

Hi,

I am trying to submit a subset of sequences (n=10), produced through transcriptome assembly, to GenBank. I've been using these sequences for qPCR, and I need to get them into a database before submitting my manuscript. Apparently I have to do this through the Transcriptome Shotgun Assembly Database and Sequence Read Archive. I'm having trouble getting these sequences in a format that would be acceptable to the SRA (see here: http://www.ncbi.nlm.nih.gov/books/NBK47537/). Right now all I have are the .fasta files.

I have access to our server where the assembly is stored, but our lab's bioinformatics person is unavailable, and I have no computer science background. Everything I've Googled is way over my head. I can run a script if you tell me "this part does this" and "put your filename here", but that's all.

I think that the assembly is a SAM file? There's no extension, so I can't be sure. At any rate, how do I get my subset of sequences out of that assembly and into an acceptable format? I know we have Ruby and samtools, and my (limited) previous work with this assembly has been done through putty.

Can anyone provide me with some really, really basic and dumbed down instructions? Thanks in advance!

samtools assembly • 2.2k views

ADD COMMENT • link updated 10.9 years ago by Biostar 20 • written 11.0 years ago by erinspice1 • 0

1

Entering edit mode

The SRA is for the raw data ie. the huge .fastq files you got from the (likely) Illumina instrument. The "10 assembled transcripts" you mention would not be sent to SRA, they would just be sent to Genbank or ENA like any other derived sequences.

ADD REPLY • link 11.0 years ago by Torst ▴ 980