Question

Can Rnaseq Reads Be Used For Genome Assembly

5

Entering edit mode

11.4 years ago

Lhl ▴ 760

Hi There,

I have been doing genome assembly with one pair-end library (insert length = 500) for a non-model plant species using Velvet. The assembly is very fragmented.

Since i also have pair-end RNAseq sequencing data, I am wondering if i can use RNAseq data to improve genome assembly.

My questions are: (1) Does it make sense to map transcriptome assembly to genome assembly and join genome scaffolds/contigs spanned by the same transcripts?

(2) can I try using RNAseq reads data for the same purpose? I mean using RNAseq reads as genomic sequence reads to do de novo assembly. Considering RNAseq PE reads are from alternatively spliced transcripts, I will use them as single-end reads when doing genome assembly.

I will appreciate it very much if someone can point out whether these ideas are reasonable or not or give me additional suggestions.

Kind Regards,

Lhl

assembly • 8.8k views

ADD COMMENT • link updated 7.4 years ago by balaji ▴ 40 • written 11.4 years ago by Lhl ▴ 760

1

Entering edit mode

Highly fragmented assemblies with velvet are not uncommon. Before I can advise furhter: 1) Is this illumina sequencing? Is it MiSeq by any chance? What is the coverage? 2) Have you tried Velvet Optimiser?

And to answer your question directly, I wouldn't use RNASeq reads for genome assembly. Think about it, if it's a eukaryote, it could have introns. Even if it's not, we are talking about gene duplications and repetitive regions...

ADD REPLY • link 11.4 years ago by Adrian Pelin ★ 2.7k

0

Entering edit mode

Thanks akoik063.

It is Illumina-highseq sequencing. About the coverage, Velvet produced the following message (k=57) 'Median coverage depth = 4.293333 Final graph has 5301633 nodes and n50 of 1063, max 70820, total 613220706, using 78039534/136079432 reads'. I mapped reads to the assembly and did some calculations and got average coverage == 45. I have NOT tried Optimiser, i simply tried multiple k-mers and found k=57 gave largest N50.

ADD REPLY • link 11.4 years ago by Lhl ▴ 760

score 2 · Answer 1 · 2017-11-29

Came across some more tools (some included from above)

AGOUTI: improving genome assembly and annotation using transcriptome data https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4952227/

Rascaf: Improving Genome Assembly with RNA Sequencing Data https://www.ncbi.nlm.nih.gov/pubmed/27902792

PEP_scaffolder: using (homologous) proteins to scaffold genomes https://academic.oup.com/bioinformatics/article/32/20/3193/2196523

SCUBAT (Scaffolding Contigs Using BLAT And Transcripts) https://github.com/elswob/SCUBAT

L_RNA_scaffolder: scaffolding genomes with transcripts https://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-14-604

score 1 · Answer 2 · 2013-11-22

1

Entering edit mode

11.4 years ago

Sean Davis 27k

This is NOT an evidence-based answer and represents only intuition, so I hope someone else has more insight. You propose an interesting idea, but I suspect that you are better off doing further genomic sequencing (with potentially different library prep or even technology). The RNA-seq single-end reads may, themselves, be spliced, making including them problematic. Including the RNA-seq as paired-end is also difficult since the insert size distribution is not well-understood given splicing.

ADD COMMENT • link 11.4 years ago by Sean Davis 27k

0

Entering edit mode

Thanks Sean. You are right. That's why i said i will use RNAseq PE reads as SE reads because relative position in RNAseq will be different from those in DNAseq. But still we also need to consider intron/extron structure issues uniq to RNAseq as mentioned by akoik063. Cheers

ADD REPLY • link 11.4 years ago by Lhl ▴ 760

score 1 · Answer 3 · 2013-11-22

1

Entering edit mode

11.4 years ago

Damian Kao 16k

These two tools are supposed to perform this:

L_RNA_Scaffolder http://www.biomedcentral.com/1471-2164/14/604

SCUBAT https://github.com/elswob/SCUBAT

edit* I misread, you want to use RNA-seq reads. These tools used assembled transcripts to attempt to scaffold.

ADD COMMENT • link 11.4 years ago by Damian Kao 16k

0

Entering edit mode

Hi,

I too have the same question.Actually I am looking for a denovo assembly tool that can assemble a meta-transcriptome data(paired-end sequence(insert size=300) **consists of mixed sequence reads of multiple species in a microbial community.

And in search of that I came across a METAVELVET,a de novo metagenome assembly and I am not sure whether this works well with my data?

Any suggestions please.

ADD REPLY • link 11.4 years ago by bambus0725 ▴ 50

0

Entering edit mode

Thanks Damian, I will give L-RNA_Scaffolder a go.

ADD REPLY • link 11.4 years ago by Lhl ▴ 760

0

Entering edit mode

How did L-RNA_Scaffolder performed for you?

ADD REPLY • link 8.6 years ago by Ric ▴ 440

score 1 · Answer 4 · 2013-11-22

I would be hesitant to use RNA-Seq reads for this purpose - genome assembly - because you cannot be certain that the reads are contiguous with respect to a complete genome. I would be much less hesitant to throw into the assembly process RNA-Seq reads from genes that are expressed as a single exon.

You noted that you're working with a non-model plant genome, but can you align reads to a completed plant genome? Not all available plant genomes are for model species. This may allow you to order reads more efficiently and with greater confidence than with the RNA-Seq reads. Or, it may give you one assembly of the genome that you can use. The RNA-Seq assembly can be another. My point is I found synteny very powerful when scaling from Arabidopsis to soybean.

score 0 · Answer 5 · 2016-09-07

0

Entering edit mode

8.6 years ago

Ric ▴ 440

Hi, I found the following tools: * http://bmcresnotes.biomedcentral.com/articles/10.1186/s13104-016-1903-z#Sec2 * http://www.fishbrowser.org/software/PEP_scaffolder/

Or does any one know a better tools?

Mic

ADD COMMENT • link 8.6 years ago by Ric ▴ 440