Question

Difference of results with the same input [RNAseq analysis]

0

Entering edit mode

3.3 years ago

de.mecquenem.ninon ▴ 10

Hello!

I am trying to optimize the treatment of some RNAseq files by splitting the input reads into several files. I am comparing the results I have obtained with:

the reads input as one file
the split input as several files treated in parallel. I merge the SAM files after alignment.

I align with STAR then I assemble the transcriptome with Cufflinks.

On one sample (paired end, around 2Gb per file), I am having these differences of FPKM on this gene: (left value is FPKM of entire file, right is the splitted file)

Inpp4a|XM_006496019.3: 11.08, 9.37
Inpp4a|NM_030266.4: 5.11, 3.67
Inpp4a|NM_001374630.1: 1.06, 4.18

I used BamCompare of Deeptools to understand the difference between the two sample on this gene (NC_000067.7, 37338000->37450000) and the difference (--operation: substract) is less than 0.05 on this region.

With experience, would you consider the FPKM values obtained as different? I consider it as different as Cufflinks provides FPKM confidence interval: second value is outside of the confidence interval.
I would need help to understand which factor can cause this difference and what could be done to fix it?

Any leads or reference is highly appreciated!

Thank you very much!

RNAseq split cufflinks STAR • 493 views

ADD COMMENT • link 3.3 years ago by de.mecquenem.ninon ▴ 10