Entering edit mode
5.6 years ago
hopedwall
•
0
I'm working on a project that uses paired-end samples where the inner distance between mate pairs is not provided, so I tried using BBMap (as suggested in other posts) in order to get an estimate. However, when I tried using samtools like this samtools stats output.sam | grep "insert size average"
I got a different result. Am I doing something wrong? Thanks!
Could you please show us the result of both tools?
Sure, here it is.
BBMap output:
Samtools output:
Hello again,
please do not post screenshots to show us the output of a program or any file content. Instead copy&paste the text and use the formatting bar (especially the
code
option) to present your post better.Thanks!
fin swimmer
Sorry about that, I'll edit asap! [edit: done]
What command line did you use for BBMap? Looks like you are using a small subset of reads for this estmate? Since you are using actual alignment information statistics reported by BBMap should be accurate.
That said following statistic is not very good.
Something seems to be off with your data. Are these data trimmed as pair?
You can also create a histogram of the insert sizes using
I used BBMap in this way:
I'm using simulated reads, generated from a single gene. Could this be the issue? What surprises me is that the value returned by samtools is completely different from the one returned by BBMap. I need the average distance between pairs in order to do some further analysis, but I'm not so sure on which one to choose.
Possibly. How did you generate the simulated reads? Were they generated from that single gene sequence? Is
genome.fa
real full genome? It is possible that some reads generated from single gene may be mapping to places they did not originate from which may be leading to that strange number from samtools.Value BBMap is giving you should be from actual alignments. I am not sure how Samtools calculates the value it is reporting.
I'm using this repository to generate rna-seq paired-end samples, from gene ENSG00000280145, human chromosome 21. Since I'm running BBMap locally on my machine, I needed shorter reads.
BBTools has a tool called
randomreads.sh
that can simulate data. You can give that a try. I would suggest simulating reads from entire chromosome 21 to be a bit more realistic about what the real data will look like and to align simulated reads to that chromosome.randomreads.sh
will allow you to generate reads that fit a profile (and even an insert size range).What are you ultimately planning to do with average insert sizes?
Thanks, I'll give it a try! I'm working on Alternative Splicing event detection from paired-end samples and figured the average insert size might be useful.