Entering edit mode
7.3 years ago
rbronste
▴
420
When dealing with fastq files from different datasets in the literature which may be comprised of read lengths such as 50bp or 75bp etc., is there a good way to normalize across them with your own data for comparison in mind - before/after alignment? Thanks.
I have not approached the problem via PCA yet, however that is a good idea as well. I just assumed that when datasets from different sources are repurposed for ones study (right from the fastq files in GEO), the differing read lengths off the Illumina machine would present some coverage confound when compared to our 75bp reads. Maybe I am mislead in looking at it that way?
I guess, more than read length, the other factors play a major role, like different library prep methods, different platforms, time etc etc. So its better to see the PCA plot with all the information available to identify the major confounding factors and correct for them.
Is there a particular PCA tool you would use for this instance?
Anything should work. Its just PCA. Only thing is you need to quantify all the samples and get the matrix. May be you can feed into DESeq2 which has a PCA function and many tutorials available