I want to use a paired end library for scaffolding my contigs. I used the following command to estimate insert size using bbmap:
bbmap.sh ref=reference.fasta in1=read1.fq in2=read2.fq ihist=ihist_mapping.txt out=mapped.sam
#Mean 352.766
#Median 301
#Mode 271
#STDev 933.708
#PercentOfPairs 99.982
Standard deviation is larger than mean. How would this help in scaffolding with a program like SSPACE_Standard, which uses insert size to scaffold contigs??
I have also tried estimating insert size using the script estimate_insert_size.pl that comes with SSPACE. It does not use a reference genome. Instead, it figures out the insert size by mapping paired reads on contigs. It mapped 10000 reads and gave an estimate of just the median insert size as 328. There is no information about mean and std deviation.
Can someone help with this?
Can you tell us, which software you have used for denovo assembly as many assemblers calculates insert size during the process, so you can get the same from its log file.
If it is a standard PE library, then this much high std deviation should not be present in the data. I would suggest you to once again estimate insert size and std deviation using picard tool giving .bam file as input which is generated from estimate_insert_size.pl by mapping reads to contigs.
Please do not add an answer unless you're actually answering the top level question. Thank you!
Are the contigs 'real' contigs? So there are no Ns in the sequences?
Yes, these are "real", generated using Velvet with the --no scaffolding option.