How to find out the insert size in RNA seq data
4
0
Entering edit mode
7.2 years ago
rj.rezwan ▴ 10

Hi,

please tell me according to the following shared red circle in picture, can we say this as a insert size of our RNA-seq data.[1]: https://ibb.co/jf4keF

RNA-Seq • 10k views
ADD COMMENT
0
Entering edit mode

As Chris suggested, you need to align your reads to get a SAM/BAM file and then run 'samtools stats' to get information about insert sizes..it will also give you an insert size distribution graph..

ADD REPLY
6
Entering edit mode
7.2 years ago
ivivek_ngs ★ 5.2k

The easiest way to find it programmatically is to convert the SRA to fastq on the fly (if you are unable to get it from their documents) and proceed with alignment to produce the bam file. After that just use bamtools with the below command and it will give the average and the median insert size. Good luck!

bamtools stats -i foo.bam -insert
ADD COMMENT
2
Entering edit mode

changes in argument: -in

here is an example output:

$ bamtools stats -in aaaaaa.bam -insert

**********************************************
Stats for BAM file(s):
**********************************************

Total reads:       1367054
Mapped reads:      1367054      (100%)
Forward strand:    683527       (50%)
Reverse strand:    683527       (50%)
Failed QC:         0    (0%)
Duplicates:        0    (0%)
Paired-end reads:  1367054      (100%)
'Proper-pairs':    1367054      (100%)
Both pairs mapped: 1367054      (100%)
Read 1:            683527
Read 2:            683527
Singletons:        0    (0%)
Average insert size (absolute value): 104.995
Median insert size (absolute value): 80

$ bamtools --version

bamtools 2.2.2
Part of BamTools API and toolkit
Primary authors: Derek Barnett, Erik Garrison, Michael Stromberg
(c) 2009-2012 Marth Lab, Biology Dept., Boston College
ADD REPLY
0
Entering edit mode

Yes, I should have updated it, thanks for putting it in the thread.

ADD REPLY
1
Entering edit mode
7.2 years ago
Chris Fields ★ 2.2k

No, that's the run ID, see the wikipedia article. If you have paired-end reads, you'll need to align them as such for an insert size to be reported.

ADD COMMENT
1
Entering edit mode
7.2 years ago

If the data comes from SRA database, then you can search that info there. If it is a newly sequenced data, than maybe easiest would be to ask sequencing provider.

ADD COMMENT
0
Entering edit mode
6.7 years ago
Michi ▴ 990

also samtools has a very similar command to bamtools:

samtools stats --insert-size foo.bam

you can additionally speed upt the process by using --threads 10

ADD COMMENT

Login before adding your answer.

Traffic: 2017 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6