Which would be the best tool to assemble paired end reads generated by Illumina?
-Vj
Which would be the best tool to assemble paired end reads generated by Illumina?
-Vj
This is for metagenomics and we're not assembling the reads yet, correct? Amplicon data? You didn't tell us. There's no need to assemble the reads yet, you are just looking to mate the paired-end sequences from your library? I think the terminology is confusing and I prefer "mate" when combining paired-end data over "assembly" as one would do after your paired-end data is matched up and you are looking to make contig sequences from your data. If you have amplicon data (16S, 18S, ITS, etc.) then you can make consensus sequences, but this is not assembly in my opinion.
You didn't give us any information on the technology, but I am assuming from the 150 bp size that this is Illumina data and in FASTQ format?
Here's a previous SEQanswers thread and Best Way To Preprocess Barcoded Illumina Paired-End Data on this topic. There are a couple of options for mating Illumina paired-end data: I have used FastqJoin, PANDAseq, and CLC bio, but I am sure there are many other options out there.
There are some papers comparing different assembles, I'd look at their result-tables and choose what fits best for your data (hard to tell over here)
and for fun, here's another review: Assembly of large genomes using second-generation sequencing
*Getting this error: /Users/lindakohn/Desktop/tools/SPAdes-3.7.1-Darwin/bin/spades.py -k 21,33,55,77 --careful --only-assembler --pe<#>-12 <euro_plasmid_r1_paired.fastq euro_plasmid_r2_paired.fastq=""> --pe<#>-s1 <euro_plasmid_r1_unpaired.fastq> --pe<#>-s2 <euro_plasmid_r2_unpaired.fastq> -o Euro_plasmid_spades_output
-bash: syntax error near unexpected token `newline' what is wrong with the command?**
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
People can help you better when you give some more information. DNA or RNA? Which species? How much RAM do you have? What is more important, contig accuracy or contiguity? Read lenght, insert size, total amount of reads? Do you suspect DNA contamination from other species?
This is a metagenome sample. Hence I can't be sure of the number of species, since I am yet to recieve my sequence data. Just on a preparatory note I wanted to know this. I would need contiguity since this is a metagenome. read length would be app. 150bp