Strand specific library
2
0
Entering edit mode
3.4 years ago
esimonova.me ▴ 30

I first analysed the data without taking into consideration the strand-specificity of my library, afterwards I found out that the library was stranded and I reanalysed the data. The difference in counts were very significant between two types of analyses. With no knowledge of strandness ( I got around 1000-2000 counts per gene I am interested in), however specifying strandness in hisat2 and htseq I got only 20-30 counts per some genes that I om interested in. The sequencing aimed at coverage 30 M. I just wanted to ensure that getting this difference in counts number is fine.

htseq • 1.4k views
ADD COMMENT
1
Entering edit mode
3.4 years ago

Looking at protein-coding genes.

If the dataset is unstranded and you do antisense gene counts and sense gene counts your values should be about 50% antisense 50% sense.

If the dataset is stranded (assuming RNA-seq) and you do antisense gene counts and sense gene counts your values should >90% antisense and 10% sense. I usually observe an order of magnitude difference in read counts per stranded when investigating a stranded library.

ADD COMMENT
0
Entering edit mode

Also you can view the BAM files in IGV to verify strandedness

ADD REPLY
0
Entering edit mode

Thanks for the answer! I am sort of new to bioinformatics can you please confirm it by some paper. I may understand why it should 50/50 for unstranded library but the proportion 90/10 for stranded RNA-seq library seems unexplainable to me so far.

ADD REPLY
1
Entering edit mode
3.4 years ago

With no knowledge of strandness ( I got around 1000-2000 counts per gene I am interested in), however specifying strandness in hisat2 and htseq I got only 20-30 counts per some genes that I om interested in.

That suggests to me that you put in the wrong strandedness. Put it in the other direction, and you should get your thousand counts back.

ADD COMMENT
0
Entering edit mode

After checking the strandness with RSeQC I got the following stats:

This is PairEnd Data
Fraction of reads failed to determine: 0.0569
Fraction of reads explained by "1++,1--,2+-,2-+": 0.0170
Fraction of reads explained by "1+-,1-+,2++,2--": 0.9261

Can I conclude based on the stats that it is reverse stranded library?

According to this I think it is reverse stranded: https://chipster.csc.fi/manual/library-type-summary.html

ADD REPLY
1
Entering edit mode

Yes if it RNA-seq it is usually reverse stranded. To check this, if you just change your strand in your histat2 command and then re-run hisat2 and htseq your 20-30 counts should jump to 900-1800+.

ADD REPLY

Login before adding your answer.

Traffic: 2488 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6