Hi All,
I performed de novo assembly for a cosmid sequenced on NextSeq PE 300 using SPADES. The pipeline i used is as follows:
1.Trim the sequence to remove low quality bases 2. Extract a subset of reads 3. Perform SPADES de novo assembly.
The expected length of cosmid was 50Kb while I got a sequence length of around 47.5kb. This cosmid contained an overlapping region with another cosmid and the overlapping sequence was PCR amplified and sequenced confirming its presence.
The length of the overlapping sequence is 990bp and it is not present in the assembled sequence.
I have looked through the contigs.fasta file obtained from the SPADES output and this sequence is not present in other contigs as well.
What approach should I use to search for this missing sequence in the raw data or the assembled data? How can I justify the absence of this sequence from the assembled genome?
Thanks!!
Thank you for replying.
I performed further troubleshooting by searching for substrings of missing sequence in the contigs fasta file but did not find any match for substrings of length 50bp, 80bp, and 100bp.
Please guide me for the same.
Why would you search for the missing sequence in the assembled contigs, when you've already said that it's missing? I recommended aligning your data (i.e., your reads) to the missing sequence. Or, you can parse that data for substrings.
It is present in the sample cosmid DNA as confirmed by PCR sequencing. But i guess it was either not sequenced or SPAdes failed to assemble. 1. Illumina sequencing failure can be confirmed by mapping forward and reverse reads to the missing DNA sequence which resulted in 0% mapping rate.