I have the reads generated from Salmonella genome which are at 80% rRNA similarity with the standard E.coli genome. I want to check how much genome can I retrieve by using the different genome as references. These reads were mapped to the E.Coli genome using SMALT. When I checked the final result of the assembly with Quast, it gave only 1 contig, and the length of the contig was the same as that of the E.coli genome used as the reference. But the length of Salmonella whose reads were generated was only 4.9MB and the resulting assembly was 5.9 MB, and getting the size of the assembly bigger than the assembly from which reads are generated is when I realized I am doing something wrong. The steps I performed are as follows:
Generating reads with PIRS:
Reference mapping with SMALT:
Converting BAM file to FASTA:
samtools sort assembly_lib_type.bam -o assembly_lib_type.sorted.bam
samtools index assembly_lib_type.sorted.bam
samtools mpileup -ABuf reference.nt mapped.bam | bcftools call -cOz --pval-threshold 0.99 > mapped.vcf.gz
tabix mapped.vcf.gz
cat reference.nt | bcftools consensus mapped.vcf.gz > mapped.fasta
I have realized that these steps are not giving me the final assembly, but only the consensus FASTA sequence. So the problem lies in the steps I have performed after reference mapping.
Please guide me with the further steps to be taken after getting the BAM file, so that I can get the assembled genome using the Salmonella genome reads and E. coli genome as the reference.
Thank you.
Thank you very much for your response. I will surely try the software you suggested.