How to convert strand-specific reads to not-strand-specific format?
1
0
Entering edit mode
10.3 years ago
trakhtenberg ▴ 160

How could I prepare for differential expression analysis (with tophat/cufflinks) RNAseqs done using strand-specific kit and not-strand-specific kit? For example, is there a tool which could convert pair-end strand-specific fastq reads to not-strand-specific pair-end format?

If this is possible, are there procedural or statistical considerations which could render differential expression analysis of this sort unreliable?

Thank you

RNA-Seq • 2.7k views
ADD COMMENT
1
Entering edit mode
10.3 years ago

There's no format difference between fastq files containing stranded and unstranded libraries. The strandedness is due to how the molecular biology of the library prep works. If you need to combine the two, just note that in the experimental design, since it'd just be a batch effect.

ADD COMMENT
1
Entering edit mode

There are a few subtle and not so subtle differences - in strand specific data one of the paired end files (for the TruSeq protocol it is file 2) contains all the 5' ends and the other contains the 3' ends for the transcripts. In an un-stranded protocol would expect an even distribution of these across the files.

To comment on the original poster's question: you really shouldn't want to transform stranded data into unstranded one. That just sounds like a solution to a problem that you think will be solved this way but it won't.

ADD REPLY
0
Entering edit mode

The strand-specific, I assume, would have more unique reads relative to the not-strand-specific. Conversely, not-strand-specific may get more hits for some annotated transcripts due to possible reads from the opposite strand counting towards this transcript. So, I thought it would be important to convert strand-specific to not-strand-specific before comparing these datasets (one is publicly available so I just want to download it and compare to my data). So, what should I do?

ADD REPLY
1
Entering edit mode

Bad solution: Randomly swap half the reads between the fastq files of the stranded dataset.

Better solution: Redo the sequencing of both types of treatments at the same time since, aside from the stranded vs. not issue, you'll still have a batch effect between the datasets that you're not accounting for.

ADD REPLY
1
Entering edit mode

You can't make data more "comparable" by destroying some of its attributes. Follow through with the analysis for each data set independently then compare the results.

ADD REPLY

Login before adding your answer.

Traffic: 1825 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6