Dear all,
I am trying to generate assembly of expected genome size (~800Kb) and have done lot of things so far. Following are the details of my study:
I have 24GB+24GB of MITOCHONDRIAL R1 and R2 data respectively of 101 read length. I have tried generating contigs with Velvet and have got results also for multiple k-mer sizes. But N50 value was too low (~1000) and number of contigs were too high.
Next thing I have done is, I subsampled data from 200% to 25% removal of reads from original file as I got suggestion that I needed to lower the coverage and file size. Thereby doing this subsampling excersise, I have ran subsampled data with VelvetOptimizer for contig generation. With this I got the result with increased N50 value and decreased number of contigs substantially.
Thereafter, I further compressed contig file by using Amos tool and tried doing alignment using Bowtie2 with -x 0 and -I 500 options using Amos output fasta file (contig+singleton file). The Bowtie2 has given total alignment of ~60% in both the cases, for best subsampled (based on N50 and no of contigs) file output and also for contig file without subsampling.
Further I tried doing assembly using SOAPdenovo2, since it has Gap closing provision to create scaffolds from contigs.
But at this point of time I am at impasse over assembly task, since I am not able to how to validate these assemblies. Please suggest something that is very wise to arrive at my genome from contig files
Regards,
Mandar