Hello,
I have downloaded TCGA raw counts (gene quantification) using STAR 2 pass method for their mRNA sequencing pipeline, however, I am comparing this data to another dataset. My second dataset I had BAM files, however, I generated my own FASTQ R1 and R2 using Picard. However, I am using the same 2 PASS method for alignment. Is this acceptable, comparing dataset with slightly differently pre-alignment QC? My second dataset had some PCR duplication, so I will be using a Picard code to mark duplicates, and trimmomatic. However, TCGA pre-alignment did not do this (I assume, QC did not yield issues...however, there is no paper associated with the illumina work). Everything post-alignment will be the same. Is this okay?
I did do research on Biobambam (I was unfamiliar) and have read their output is slightly different, but similar. The paper seems to focus on run time more than result differences.
Kaylin
Hmmm...okay. I am a graduate student and still learning. I will be using DESeq2 for normalization, and it will account for some of the things mentioned. I am including batch information. This has been racking my brain...and I want to make sure the scientific rigor, is there.
In terms of your comment on PCR duplications. Yes, I have seen that reading. Is there another alternative RNA sequencing specific that addresses the QC? or could this be addressed in the normalization? DESeq2 doesn't directly account for this....based on what I have read. :(