Normalizing Illumina fastq read-lengths from different GEO datasets
1
1
Entering edit mode
7.3 years ago
rbronste ▴ 420

When dealing with fastq files from different datasets in the literature which may be comprised of read lengths such as 50bp or 75bp etc., is there a good way to normalize across them with your own data for comparison in mind - before/after alignment? Thanks.

GEO fastq RNA-Seq • 2.1k views
ADD COMMENT
2
Entering edit mode
7.3 years ago

Do you see any systematic bias towards readlenghts like in PCA ? In that case you can apply the batch correction methods.

ADD COMMENT
0
Entering edit mode

I have not approached the problem via PCA yet, however that is a good idea as well. I just assumed that when datasets from different sources are repurposed for ones study (right from the fastq files in GEO), the differing read lengths off the Illumina machine would present some coverage confound when compared to our 75bp reads. Maybe I am mislead in looking at it that way?

ADD REPLY
1
Entering edit mode

I guess, more than read length, the other factors play a major role, like different library prep methods, different platforms, time etc etc. So its better to see the PCA plot with all the information available to identify the major confounding factors and correct for them.

ADD REPLY
0
Entering edit mode

Is there a particular PCA tool you would use for this instance?

ADD REPLY
0
Entering edit mode

Anything should work. Its just PCA. Only thing is you need to quantify all the samples and get the matrix. May be you can feed into DESeq2 which has a PCA function and many tutorials available

ADD REPLY

Login before adding your answer.

Traffic: 2243 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6