Question

Denovo Assembly Of Paired And Mate Paired Reads

1

Entering edit mode

11.8 years ago

sebabiokr ▴ 10

I have metagenomic Illumina data (HiSeq 101b reads- one paired-end, one 180b overlapped paired-end and two mate-pair (2-5k) lib). Can someone suggest/describe the best approach or a pipeline to do denovo assembly?

Thanks for all the suggestion. yes it is whole-genome "shotgun" metagenomic data from Illumina with 101bp paired reads. i have three libraries 1. 180bp overlapped paired library, 2. 2K mate-pair library, 3. 5K mate-pair library

I appreciate your suggestion

Thank you

denovo metagenomics assembly • 7.6k views

ADD COMMENT • link updated 11.8 years ago by Philipp Bayer 8.7k • written 11.8 years ago by sebabiokr ▴ 10

0

Entering edit mode

is it transcriptome data or whole-genome sequencing data ?

ADD REPLY • link 11.8 years ago by biorepine ★ 1.5k

0

Entering edit mode

Sounds like this is "shotgun" metagenomic data, unless there is some confusion here; metatranscriptomic usually means transcriptome data. @sebabiokr can you clarify if you have one or two libraries?

ADD REPLY • link 11.8 years ago by Josh Herr 5.8k

0

Entering edit mode

The software NxTrim was recently released here to remove Nextera Mate Pair adapters and categorise reads according to the orientation implied by the adapter location:

https://github.com/sequencing/NxTrim

ADD REPLY • link 10.1 years ago by 14134125465346445 ★ 3.6k

score 4 · Answer 1 · 2013-01-31

4

Entering edit mode

11.8 years ago

Rahul Sharma ▴ 660

HI,

I would first assemble the reads using velvet or SOAPdenovo and then use MEGAN to see the %age of contigs mapping to different genomes after blast alignments. Then there are many nice assemblers for metagenomic studies: MetaVelvet, Met-AMOS, MAP (http://bioinfo.ctb.pku.edu.cn/MAP/). Please go through the literature regarding it, you will find many articles showing performance and benchmarking of these tools.

Best, Rahul

ADD COMMENT • link 11.8 years ago by Rahul Sharma ▴ 660

0

Entering edit mode

I'm not calling you out by any means, but metagenomic assembly is difficult to do and to interpret, so I think the first step after mating the paired ends is to identify each read before assembling reads. There are lots of ways to identify reads. A next step is to do an assembly, but depending on where the samples are coming from, it can be hard to get a grasp of how many contigs to expect and getting an idea via the total microbial diversity through read identification can be a good start.

ADD REPLY • link 11.8 years ago by Josh Herr 5.8k

score 1 · Answer 2 · 2013-01-31

See this Assembly Illumina Paired End Reads. I assume you're just "pairing" your mate pairs and not assembling the metagenomic data into larger contigs. I think you should initially use individual reads; I would avoid assembling your metagenomic data into contigs until you get a better idea what organisms your data represents. You'll next want to identify them: using BLAST or any other number of platforms for metagenomic analysis. "Mapping" in a transcriptome sense doesn't work well for metagenomic data.

score 1 · Answer 3 · 2013-02-02

1

Entering edit mode

11.8 years ago

Philipp Bayer 8.7k

This paper might be of benefit to you: Assembling large, complex environmental metagenomes

They don't really incorporate mate-paired data, so I think it might be best to use some of their grouping/pre-assembly step and then switch over to ALLPATHS-LG, which uses short Illumina reads to generate contigs and then uses mate-paired data to group the assembled contigs and put them together.

ADD COMMENT • link 11.8 years ago by Philipp Bayer 8.7k

0

Entering edit mode

Thank you its really a great information for my data set... i will look my data in this way for assembly...

ADD REPLY • link 11.8 years ago by sebabiokr ▴ 10

Istvan Albert · Answer 4 · 2013-01-31

0

Entering edit mode

11.8 years ago

biorepine ★ 1.5k

If I were you, I would start with this pipeline: Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Especially this bowtie-tophat-cufflinks pipeline is very apt for illumina sequencing data in general.

ADD COMMENT • link updated 11.8 years ago by Istvan Albert 101k • written 11.8 years ago by biorepine ★ 1.5k

1

Entering edit mode

The OP doesn't specify that they have RNA-Seq data...

ADD REPLY • link 11.8 years ago by DG 7.3k