Entering edit mode
7.2 years ago
SMILE
▴
190
I'm aligning some mouse RNA-Seq data (paired, 2x101) against the genome using STAR. After that, I analysed the insert size with Picard. Here are the two kinds of Insert Size Histograms. Which the first one seems so weird, can anyone explain to me what reasons can lead to the first Histogram?
(The data with the first Histogram has low alignment rate, after some changes of parameters , the alignment rate is better but it has high percentage of reads mapped to multiple loci. I am looking for the reasons...)
Thanks a lot!
Were these samples ribo-depleted? If so have you checked to see the efficiency of that process? Use the rDNA repeat of mouse and align your data to it. Other thing to check is to see if you have optical/PCR duplicates. clumpify from BBMap suite would be the tool of choice there.
I have to ask the person who did the experiment to make sure whether is ribo-depleted or polyA captured. I have depleted the rRNA using sortmerna and mapped the data to mouse rRNA, and got 0% alignment. You mean the peak of small insert sizes in the first histogram can be caused by optical/PCR duplicates?
Last time something like your observation was seen was in: Insert size in RNA-Seq library
Good to know that you have eliminated the possibility of presence of rDNA. Have you looked at some of those small inserts (left over after trimming)? What do they blast to? Is there is a possibility that the trimming missed some adapters?
Do you know how to extract those small inserts?
Assuming your reads are just the inserts left after trimming you can use
reformat.sh
from BBMap suite to grab them in a new file.reformat.sh in=trimmed.fq.gz out=short.fq.gz maxlen=30
(adjust the length number as needed, usein1= in2= out1= out2=
if you have PE reads).Maybe I have to mention that to do comparison I also do alignment without any trimming, the firgure I got looked simmilar. Since without trimming the read length are all 101, I still have the small inserts. So how do the inserts calculate? I don't think I can use the maxlen=30 to extract the small inserts.
The inserts are calculated based on alignment in BBMap. I assume it is the same for STAR.
You should be able to get the small inserts by trimming using bbduk.sh.
bbduk.sh in=your_reads.fq.gz out=small_inserts.fq.gz ktrim=r k=23 ref=/path_to/adapters.fa (included in BBMap) maxlength=30
.What does FR stand for?
I assume you are looking for this: Orientation of PE reads a review of --fr --ff and --rf meanings
I mean on the graph, in the upper right hand corner, the legend says FR. Is that frequency? It seems like theres more than enough room to print out the whole word haha
FR for forward-reverse is a common abbreviation.
Great resource!