Hi There,
I have been doing genome assembly with one pair-end library (insert length = 500) for a non-model plant species using Velvet. The assembly is very fragmented.
Since i also have pair-end RNAseq sequencing data, I am wondering if i can use RNAseq data to improve genome assembly.
My questions are: (1) Does it make sense to map transcriptome assembly to genome assembly and join genome scaffolds/contigs spanned by the same transcripts?
(2) can I try using RNAseq reads data for the same purpose? I mean using RNAseq reads as genomic sequence reads to do de novo assembly. Considering RNAseq PE reads are from alternatively spliced transcripts, I will use them as single-end reads when doing genome assembly.
I will appreciate it very much if someone can point out whether these ideas are reasonable or not or give me additional suggestions.
Kind Regards,
Lhl
Highly fragmented assemblies with velvet are not uncommon. Before I can advise furhter: 1) Is this illumina sequencing? Is it MiSeq by any chance? What is the coverage? 2) Have you tried Velvet Optimiser?
And to answer your question directly, I wouldn't use RNASeq reads for genome assembly. Think about it, if it's a eukaryote, it could have introns. Even if it's not, we are talking about gene duplications and repetitive regions...
Thanks akoik063.
It is Illumina-highseq sequencing. About the coverage, Velvet produced the following message (k=57) 'Median coverage depth = 4.293333 Final graph has 5301633 nodes and n50 of 1063, max 70820, total 613220706, using 78039534/136079432 reads'. I mapped reads to the assembly and did some calculations and got average coverage == 45. I have NOT tried Optimiser, i simply tried multiple k-mers and found k=57 gave largest N50.