I am an intern and I was assigned a small project to conduct independently. It involves looking at retrotransposon levels in cancer samples from TCGA.
These samples are PE tumor/matched normal. I understand the workflow involving RNA-seq but I am confused whether I need to look at PE files or singleton files after alignment?
TopHat2 was able to align the PE with its singletons included but using STAR, this is not the case. Alex Dobin (creator of STAR) suggested I need to align the singletons separately.
I am under the impression that the aligned PE files is what I should be interested in, but are the singletons equally important, especially for looking at retrotransposons?
Are there any tutorials or just any recommended literature that would help me understand the concepts and workflow a little better?
Thank you very much for your time and help.
In practice, aligning singletons doesn't normally gain you much. Yeah, it should only take a couple minutes (you can leave the index loaded in memory with STAR), but it's unlikely to gain you much unless one of the reads was completely crap and you therefore have an unusually high number of singletons (a normal number would be in the low single digits, typically <1%).
Right, and I have been getting small singletons. For looking at retrotransposons, does it matter if I use a reference genome (GRCh38) with random chromosomes (ones outside of chr1-22, X,Y,M)?
I would prefer keeping the unplaced contigs for this purpose, though I wouldn't use the haplotype alleles unless you happen to be using bwa.