Question

Insert size in RNA-Seq library

0

Entering edit mode

9.0 years ago

Antonio Camargo ▴ 160

I'm aligning some mouse RNA-Seq libraries against the genome using STAR. After that, I started to analyse the insert size with Picard. I noticed two types of histogram.

The one showed in Histogram A have a single peak, around 180 bp. The type represented in Histogram B, however, have multiple peaks.

As the aligment was done against the genome, I expected that all libraries would have displayed a pattern like the one in Histogram B, as the spliced inserts would produce "artificial" inserts sizes. So, why did I got these two kinds of histograms?

Thanks a lot!

Histogram A:

Histogram A

Histogram B:

Histogram B

rna-seq • 4.2k views

ADD COMMENT • link updated 7.5 years ago by SMILE ▴ 200 • written 9.0 years ago by Antonio Camargo ▴ 160

0

Entering edit mode

Are all of the samples that produce histrogram like that in (A) also total RNAseq rather than mRNAseq? The GEO entry doesn't mention any sort of ribo-depletion.

ADD REPLY • link 9.0 years ago by Devon Ryan 105k

0

Entering edit mode

Two libraries produce histograms like B, they were both produced by the same team. The remaining libraries (3) have histograms like the A.

I suppose that they're all poly-A positive, but this information is not explicit in all libraries.

ADD REPLY • link 9.0 years ago by Antonio Camargo ▴ 160

0

Entering edit mode

Do you get the reason of Histogram B? How do you solve the problem? I have a similar problem...

ADD REPLY • link 7.5 years ago by SMILE ▴ 200

0

Entering edit mode

Mapping against the transcriptome should give a better representation of the actual insert-size distribution. Also, if your reads are paired and mostly overlapping, you can get an alignment-free insert-size distribution with BBMerge:

bbmerge.sh in1=read1.fq in2=read2.fq reads=1m ihist=ihist.txt

Or, in fact, you can get that even if they are NOT mostly overlapping, using kmer frequencies to bridge the gap in the middle (this is slower and uses more memory, and requires sufficient depth):

bbmerge-auto.sh in1=read1.fq in2=read2.fq ihist=ihist.txt k=62 extend2=200 rem ecct

ADD REPLY • link 7.5 years ago by Brian Bushnell 20k