Question

Assembly Illumina Paired End Reads

1

Entering edit mode

12.0 years ago

vijay ★ 1.6k

Which would be the best tool to assemble paired end reads generated by Illumina?

-Vj

next-gen • 8.9k views

ADD COMMENT • link updated 8.7 years ago by jigarnt ▴ 30 • written 12.0 years ago by vijay ★ 1.6k

2

Entering edit mode

People can help you better when you give some more information. DNA or RNA? Which species? How much RAM do you have? What is more important, contig accuracy or contiguity? Read lenght, insert size, total amount of reads? Do you suspect DNA contamination from other species?

ADD REPLY • link 12.0 years ago by Irsan ★ 7.8k

0

Entering edit mode

This is a metagenome sample. Hence I can't be sure of the number of species, since I am yet to recieve my sequence data. Just on a preparatory note I wanted to know this. I would need contiguity since this is a metagenome. read length would be app. 150bp

ADD REPLY • link 12.0 years ago by vijay ★ 1.6k

score 3 · Answer 1 · 2012-11-19

This is for metagenomics and we're not assembling the reads yet, correct? Amplicon data? You didn't tell us. There's no need to assemble the reads yet, you are just looking to mate the paired-end sequences from your library? I think the terminology is confusing and I prefer "mate" when combining paired-end data over "assembly" as one would do after your paired-end data is matched up and you are looking to make contig sequences from your data. If you have amplicon data (16S, 18S, ITS, etc.) then you can make consensus sequences, but this is not assembly in my opinion.

You didn't give us any information on the technology, but I am assuming from the 150 bp size that this is Illumina data and in FASTQ format?

Here's a previous SEQanswers thread and Best Way To Preprocess Barcoded Illumina Paired-End Data on this topic. There are a couple of options for mating Illumina paired-end data: I have used FastqJoin, PANDAseq, and CLC bio, but I am sure there are many other options out there.

score 1 · Answer 2 · 2012-11-19

1

Entering edit mode

12.0 years ago

Philipp Bayer 8.8k

There are some papers comparing different assembles, I'd look at their result-tables and choose what fits best for your data (hard to tell over here)

Assemblathon

GAGE

and for fun, here's another review: Assembly of large genomes using second-generation sequencing

ADD COMMENT • link 12.0 years ago by Philipp Bayer 8.8k

score 0 · Answer 3 · 2016-03-14

*Getting this error: /Users/lindakohn/Desktop/tools/SPAdes-3.7.1-Darwin/bin/spades.py -k 21,33,55,77 --careful --only-assembler --pe<#>-12 <euro_plasmid_r1_paired.fastq euro_plasmid_r2_paired.fastq=""> --pe<#>-s1 <euro_plasmid_r1_unpaired.fastq> --pe<#>-s2 <euro_plasmid_r2_unpaired.fastq> -o Euro_plasmid_spades_output

-bash: syntax error near unexpected token `newline' what is wrong with the command?**