Can anyone explain the weird Insert Size HIstogram
0
0
Entering edit mode
7.2 years ago
SMILE ▴ 190

I'm aligning some mouse RNA-Seq data (paired, 2x101) against the genome using STAR. After that, I analysed the insert size with Picard. Here are the two kinds of Insert Size Histograms. Which the first one seems so weird, can anyone explain to me what reasons can lead to the first Histogram?

(The data with the first Histogram has low alignment rate, after some changes of parameters , the alignment rate is better but it has high percentage of reads mapped to multiple loci. I am looking for the reasons...)

Thanks a lot!

enter image description here enter image description here

RNA-Seq alignment sequencing • 4.8k views
ADD COMMENT
0
Entering edit mode

Were these samples ribo-depleted? If so have you checked to see the efficiency of that process? Use the rDNA repeat of mouse and align your data to it. Other thing to check is to see if you have optical/PCR duplicates. clumpify from BBMap suite would be the tool of choice there.

ADD REPLY
0
Entering edit mode

I have to ask the person who did the experiment to make sure whether is ribo-depleted or polyA captured. I have depleted the rRNA using sortmerna and mapped the data to mouse rRNA, and got 0% alignment. You mean the peak of small insert sizes in the first histogram can be caused by optical/PCR duplicates?

ADD REPLY
0
Entering edit mode

Last time something like your observation was seen was in: Insert size in RNA-Seq library

Good to know that you have eliminated the possibility of presence of rDNA. Have you looked at some of those small inserts (left over after trimming)? What do they blast to? Is there is a possibility that the trimming missed some adapters?

ADD REPLY
0
Entering edit mode

Do you know how to extract those small inserts?

ADD REPLY
0
Entering edit mode

Assuming your reads are just the inserts left after trimming you can use reformat.sh from BBMap suite to grab them in a new file. reformat.sh in=trimmed.fq.gz out=short.fq.gz maxlen=30 (adjust the length number as needed, use in1= in2= out1= out2= if you have PE reads).

ADD REPLY
0
Entering edit mode

Maybe I have to mention that to do comparison I also do alignment without any trimming, the firgure I got looked simmilar. Since without trimming the read length are all 101, I still have the small inserts. So how do the inserts calculate? I don't think I can use the maxlen=30 to extract the small inserts.

ADD REPLY
0
Entering edit mode

The inserts are calculated based on alignment in BBMap. I assume it is the same for STAR.

You should be able to get the small inserts by trimming using bbduk.sh. bbduk.sh in=your_reads.fq.gz out=small_inserts.fq.gz ktrim=r k=23 ref=/path_to/adapters.fa (included in BBMap) maxlength=30.

ADD REPLY
0
Entering edit mode

What does FR stand for?

ADD REPLY
1
Entering edit mode
ADD REPLY
0
Entering edit mode

I mean on the graph, in the upper right hand corner, the legend says FR. Is that frequency? It seems like theres more than enough room to print out the whole word haha

ADD REPLY
0
Entering edit mode

FR for forward-reverse is a common abbreviation.

ADD REPLY
0
Entering edit mode

Great resource!

ADD REPLY

Login before adding your answer.

Traffic: 1759 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6