Is there any way to detect if your rna-seq data is unstranded or stranded
Is there any way to detect if your rna-seq data is unstranded or stranded
A few RNA-Seq QC tools will detect whether a run is strand-specific. For example, the infer_experiment.py
script in the following claims to do this (never used this myself, so can't vouch for it):
Yep. Only other way I can think of is to check whether there is a strand-specific adaptor used, but this normally gets stripped off the sequence prior to the user getting their hands on it (at least our center does).
Actually, I don't recall whether the TruSeq strand-specific adaptor is the same sequence as their other non-strand-specific counterparts, but then again I've never had to worry about checking for this. Seq centers we've worked with are normally pretty explicit in telling us what protocols and adaptors they use.
Hi
This image could help.
In stranded example reads are clearly stratified between the two strands
Of course, you need to perform the alignments, get the BAM file and visualize it in any of the software available (SeqMonk, RNAseqViewer, IGB, etc)
It might be easier to map to the transcriptome than the genome. Then you know you are mapping to the sense side.
Remember that certain protocols map the first read to the sense strand and the second read to the antisense. Others do it the reverse (the first read is antisense).
Do you know the protocol that was used? You should be able to tell from that whether it is stranded. Joshua Levin has a paper from a couple years ago that compared a bunch of stranded protocols.
If you have a reference you could map to it to find out. There might be another way, but nothing else comes to mind.
I found that the salmon result can depend on whether the reference was assembled as strand-specific or not. I recommend one of the many great Trinity helper scripts for that, the patterns are very distinct. You can also check whether your reference (i.e. transcriptome) was assembled as non-stranded although you have stranded libraries :)
https://github.com/trinityrnaseq/trinityrnaseq/wiki/Examine-Strand-Specificity
use this https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-022-04572-7
Disclaimer = I am not an expert and I would appreciate feedback. This got me information that was better than nothing.
Human page (for latest release) = https://www.gencodegenes.org/human/
salmon index -t gencode.v43.transcripts.fa.gz -i salmon_index --gencode
salmon quant --index=salmon_index --libType A --output delete_me \
-1 end1.fq.gz -2 end2.fq.gz
...
[2023-07-06 07:37:12.643] [jointLog] [info] Automatically detected most likely library type as IU
Ctrl+Z
rm -rf delete_me
Description of types = https://salmon.readthedocs.io/en/latest/library_type.html
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
+1 on a taking the time to post a more modern solution. I wonder if there is a way to in biostar to highlight answers like this.
Upvoting it and/or selecting it as an accepted answer is the way to go. Commenting as you did is also helpful. Bioinformatics changes more rapidly hence we have to more proactive in marking up the most recent correct answer.
There are some great answers posted already, but just in case you want to learn more about strandness, you can also check this previous post: Read pair orientation : Illumina TruSeq Stranded mRNA library
Salmon is not easy-to-use. Actually, it is impossible to run now as it requires an outdated version of boost (libboost_iostreams.so.1.60.0). This is not a feasible option anymore, so anybody looking at the above answer can ignore it.
This is untrue. Those curious, please see Rob Patro's reply to this assertion here and Dave Carlson's experience noted here.
Furthermore, I just installed it where others can use salmon served via MyBinder here (repo is here) and I see specifically
/srv/conda/envs/notebook/lib/libboost_iostreams.so.1.74.0
installed in the Ubuntu system. This is consistent with the current Bioconda recipe here specifyingboost-cpp >=1.74.0
.