I have Total RNA TrueSeq Illumina Stranded library (human). My goal is to find novel (and non-novel) non-coding transcripts in my data (experimental vs control).
After a LOT of Google-fu and asking questions on this website, this is the methodology that I am currently using -
- Align the fasta files with STAR to hg38
- Assemble transcripts for each sample, merge transcripts from all samples (to get a unified transcriptome that represents all the samples), and estimate transcript abundances - all using Stringtie (protocol paper - https://www.nature.com/articles/nprot.2016.095#procedure)
- Use tximport to infer integer counts from the Stringtie transcript abundances and export it to DESeq2.
I wanted to know if this methodology makes sense. Is there anything for which a better method makes more sense. I hope my question is not too broad, given that I do specify the exact pipeline I am employing :)
In a previous question (https://www.biostars.org/p/407788/), I did discover that strand information is not something that matters while mapping with STAR