Denovo Assembly Of Paired And Mate Paired Reads
5
1
Entering edit mode
11.8 years ago
sebabiokr ▴ 10

I have metagenomic Illumina data (HiSeq 101b reads- one paired-end, one 180b overlapped paired-end and two mate-pair (2-5k) lib). Can someone suggest/describe the best approach or a pipeline to do denovo assembly?

Thanks for all the suggestion. yes it is whole-genome "shotgun" metagenomic data from Illumina with 101bp paired reads. i have three libraries 1. 180bp overlapped paired library, 2. 2K mate-pair library, 3. 5K mate-pair library

I appreciate your suggestion

Thank you

denovo metagenomics assembly • 7.6k views
ADD COMMENT
0
Entering edit mode

is it transcriptome data or whole-genome sequencing data ?

ADD REPLY
0
Entering edit mode

Sounds like this is "shotgun" metagenomic data, unless there is some confusion here; metatranscriptomic usually means transcriptome data. @sebabiokr can you clarify if you have one or two libraries?

ADD REPLY
0
Entering edit mode

The software NxTrim was recently released here to remove Nextera Mate Pair adapters and categorise reads according to the orientation implied by the adapter location:

https://github.com/sequencing/NxTrim

ADD REPLY
4
Entering edit mode
11.8 years ago
Rahul Sharma ▴ 660

HI,

I would first assemble the reads using velvet or SOAPdenovo and then use MEGAN to see the %age of contigs mapping to different genomes after blast alignments. Then there are many nice assemblers for metagenomic studies: MetaVelvet, Met-AMOS, MAP (http://bioinfo.ctb.pku.edu.cn/MAP/). Please go through the literature regarding it, you will find many articles showing performance and benchmarking of these tools.

Best, Rahul

ADD COMMENT
0
Entering edit mode

I'm not calling you out by any means, but metagenomic assembly is difficult to do and to interpret, so I think the first step after mating the paired ends is to identify each read before assembling reads. There are lots of ways to identify reads. A next step is to do an assembly, but depending on where the samples are coming from, it can be hard to get a grasp of how many contigs to expect and getting an idea via the total microbial diversity through read identification can be a good start.

ADD REPLY
1
Entering edit mode
11.8 years ago
Josh Herr 5.8k

See this Assembly Illumina Paired End Reads. I assume you're just "pairing" your mate pairs and not assembling the metagenomic data into larger contigs. I think you should initially use individual reads; I would avoid assembling your metagenomic data into contigs until you get a better idea what organisms your data represents. You'll next want to identify them: using BLAST or any other number of platforms for metagenomic analysis. "Mapping" in a transcriptome sense doesn't work well for metagenomic data.

ADD COMMENT
1
Entering edit mode
11.8 years ago

This paper might be of benefit to you: Assembling large, complex environmental metagenomes

They don't really incorporate mate-paired data, so I think it might be best to use some of their grouping/pre-assembly step and then switch over to ALLPATHS-LG, which uses short Illumina reads to generate contigs and then uses mate-paired data to group the assembled contigs and put them together.

ADD COMMENT
0
Entering edit mode

Thank you its really a great information for my data set... i will look my data in this way for assembly...

ADD REPLY
0
Entering edit mode
11.8 years ago
biorepine ★ 1.5k

If I were you, I would start with this pipeline: Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Especially this bowtie-tophat-cufflinks pipeline is very apt for illumina sequencing data in general.

ADD COMMENT
1
Entering edit mode

The OP doesn't specify that they have RNA-Seq data...

ADD REPLY

Login before adding your answer.

Traffic: 1744 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6