Does anyone know how to get the average insert size of paired-end RNA seq data (excluding linkers and adapters)? These are Illumina Hiseq2000 data with 2X100 bp run. Thank you!
Does anyone know how to get the average insert size of paired-end RNA seq data (excluding linkers and adapters)? These are Illumina Hiseq2000 data with 2X100 bp run. Thank you!
You can't get this data from the sequencing but if prior to sequencing you (or the one that prepared the libraries) used bioanalyzer or other tool to measure the length of the DNA fragments, you can use this data to compute the length of the inserts.
After you map the reads to the genome you can get the length of each fragment from column 10 (TLEN) of the SAM file, just compute the average of the positive values (>0) in this column.
A little more lengthy of an option: Post-trimming adaptors/barcodes you can align the reads to a reference, or your own de novo assembly using bwa or bowtie. You can then use picard tools (CollectInsertSizeMeterics), and you will get a nice histogram of the insert sizes in your library.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
For future readers: TLEN is the field number 9, not ten.