Can Rnaseq Reads Be Used For Genome Assembly
5
5
Entering edit mode
11.0 years ago
Lhl ▴ 760

Hi There,

I have been doing genome assembly with one pair-end library (insert length = 500) for a non-model plant species using Velvet. The assembly is very fragmented.

Since i also have pair-end RNAseq sequencing data, I am wondering if i can use RNAseq data to improve genome assembly.

My questions are: (1) Does it make sense to map transcriptome assembly to genome assembly and join genome scaffolds/contigs spanned by the same transcripts?

(2) can I try using RNAseq reads data for the same purpose? I mean using RNAseq reads as genomic sequence reads to do de novo assembly. Considering RNAseq PE reads are from alternatively spliced transcripts, I will use them as single-end reads when doing genome assembly.

I will appreciate it very much if someone can point out whether these ideas are reasonable or not or give me additional suggestions.

Kind Regards,

Lhl

assembly • 8.2k views
ADD COMMENT
1
Entering edit mode

Highly fragmented assemblies with velvet are not uncommon. Before I can advise furhter: 1) Is this illumina sequencing? Is it MiSeq by any chance? What is the coverage? 2) Have you tried Velvet Optimiser?

And to answer your question directly, I wouldn't use RNASeq reads for genome assembly. Think about it, if it's a eukaryote, it could have introns. Even if it's not, we are talking about gene duplications and repetitive regions...

ADD REPLY
0
Entering edit mode

Thanks akoik063.

It is Illumina-highseq sequencing. About the coverage, Velvet produced the following message (k=57) 'Median coverage depth = 4.293333 Final graph has 5301633 nodes and n50 of 1063, max 70820, total 613220706, using 78039534/136079432 reads'. I mapped reads to the assembly and did some calculations and got average coverage == 45. I have NOT tried Optimiser, i simply tried multiple k-mers and found k=57 gave largest N50.

ADD REPLY
2
Entering edit mode
7.0 years ago
balaji ▴ 40

Came across some more tools (some included from above)

AGOUTI: improving genome assembly and annotation using transcriptome data https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4952227/

Rascaf: Improving Genome Assembly with RNA Sequencing Data https://www.ncbi.nlm.nih.gov/pubmed/27902792

PEP_scaffolder: using (homologous) proteins to scaffold genomes https://academic.oup.com/bioinformatics/article/32/20/3193/2196523

SCUBAT (Scaffolding Contigs Using BLAT And Transcripts) https://github.com/elswob/SCUBAT

L_RNA_scaffolder: scaffolding genomes with transcripts https://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-14-604

ADD COMMENT
1
Entering edit mode
11.0 years ago

This is NOT an evidence-based answer and represents only intuition, so I hope someone else has more insight. You propose an interesting idea, but I suspect that you are better off doing further genomic sequencing (with potentially different library prep or even technology). The RNA-seq single-end reads may, themselves, be spliced, making including them problematic. Including the RNA-seq as paired-end is also difficult since the insert size distribution is not well-understood given splicing.

ADD COMMENT
0
Entering edit mode

Thanks Sean. You are right. That's why i said i will use RNAseq PE reads as SE reads because relative position in RNAseq will be different from those in DNAseq. But still we also need to consider intron/extron structure issues uniq to RNAseq as mentioned by akoik063. Cheers

ADD REPLY
1
Entering edit mode
11.0 years ago

These two tools are supposed to perform this:

L_RNA_Scaffolder http://www.biomedcentral.com/1471-2164/14/604

SCUBAT https://github.com/elswob/SCUBAT

edit* I misread, you want to use RNA-seq reads. These tools used assembled transcripts to attempt to scaffold.

ADD COMMENT
0
Entering edit mode

Hi,

I too have the same question.Actually I am looking for a denovo assembly tool that can assemble a meta-transcriptome data(paired-end sequence(insert size=300) **consists of mixed sequence reads of multiple species in a microbial community.

And in search of that I came across a METAVELVET,a de novo metagenome assembly and I am not sure whether this works well with my data?

Any suggestions please.

ADD REPLY
0
Entering edit mode

Thanks Damian, I will give L-RNA_Scaffolder a go.

ADD REPLY
0
Entering edit mode

How did L-RNA_Scaffolder performed for you?

ADD REPLY
1
Entering edit mode
11.0 years ago

I would be hesitant to use RNA-Seq reads for this purpose - genome assembly - because you cannot be certain that the reads are contiguous with respect to a complete genome. I would be much less hesitant to throw into the assembly process RNA-Seq reads from genes that are expressed as a single exon.

You noted that you're working with a non-model plant genome, but can you align reads to a completed plant genome? Not all available plant genomes are for model species. This may allow you to order reads more efficiently and with greater confidence than with the RNA-Seq reads. Or, it may give you one assembly of the genome that you can use. The RNA-Seq assembly can be another. My point is I found synteny very powerful when scaling from Arabidopsis to soybean.

ADD COMMENT
0
Entering edit mode

Hi Larry, Thanks for your suggestion. However, i am not very sure that i understand you clearly. By 'My point is I found synteny very powerful when scaling from Arabidopsis to soybean', do you mean they show consistent/conserved synteny?

ADD REPLY
0
Entering edit mode
8.2 years ago
Ric ▴ 440

Hi, I found the following tools: * http://bmcresnotes.biomedcentral.com/articles/10.1186/s13104-016-1903-z#Sec2 * http://www.fishbrowser.org/software/PEP_scaffolder/

Or does any one know a better tools?

Mic

ADD COMMENT

Login before adding your answer.

Traffic: 1826 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6