Using Dexseq To Compare Differential Exon Usage From Different Technical Replicates
1
1
Entering edit mode
11.4 years ago
alittleboy ▴ 220

Hi All:

I have a general question about testing differential exon usage using DEXSeq. Suppose I have a sample called SLR (from a cell line), and it is sequenced on 5 lanes, so I have BAM files like L1_SLR, L2_SLR, L3_SLR, L5_SLR and L7_SLR.bam. Here, the letter "L" denotes "lanes". In a typical RNA-Seq experiment, we should expect that reads from those 5 lanes have little variations.

Now, similarly, I have three technical replicates of the cell line SLR, let's call them TEC1, TEC2 and TEC3. For each technical replicate (say TEC1), it is also sequenced using 5 lanes. For example, I have L1_TEC1, L3_TEC1, L4_TEC1, L5_TEC1 and L7_TEC1. Similarly for TEC2 and TEC3.

Since TEC1, TEC2 and TEC3 are technical replicates of cell line SLR, ideally there should not be any differential exon usage detected. Our goal is to focus on these technical replicates and see if we'll obtain larger than expected amounts of differential exon usage. The tool we consider is DEXSeq. I looked at this manual here: http://bioconductor.org/packages/devel/data/experiment/vignettes/pasilla/inst/doc/create_objects.pdf. In Table 1, I think the author "concatenated" all the lanes (BAM files) into a single one for each replicate (treat1fb, etc), so we see "total" number of reads and exon counts. In my situation, can I know how to obtain

(1) the combined BAM file for SLR? Shall I cat the 5 lanes of BAM files? (after combination, I can get SAM file using samtools, and then use dexseq_count.py GFF_file \ SLR_sorted.sam SLR.txt)

(2) the GFF file is converted from GTF file using dexseq_prepare_annotation.py; can I know if the GTF file is just downloaded from Ensembl website (choose Homo Sapiens if my cell line is from human)?

(3) since my ultimate objective is to compare (maybe pairwise?) those technical replicates: SLR, TEC1, TEC2, TEC3, instead of the traditional situation of comparing biological sample 1 (with several tech reps) with biological sample 2 (with several tech reps). Would there be any problem? I can see that maybe comparing SLR vs. TEC1 is impossible (also for other pairwise comparisons) as there is no "replicates" of the replicate.

Thank you so much!

exon • 5.2k views
ADD COMMENT
1
Entering edit mode
11.4 years ago
venks ▴ 740

Hi,

Since one sample is run on five lanes I don't see any problem in concatenating all the BAM files. Once you have a single BAM file you can prepare your reference GFF file and run dexseq_Count.py. I used refseq from UCSC browser since Ensembl genome had many genes compared to UCSC (which would inturn get your DEXSEQ run fairly faster). Look for the counts in count file and try visualizing in IGV browser. From my understanding you wanted to do pairwise comparison but still I couldn't understand what you wanted to do? or what your looking for?

Good luck

ADD COMMENT
0
Entering edit mode

Thank you for the reply! I think you're right -- the lanes need to be combined so that a single sample has a BAM file. One more question: for preparation of GFF file, if I want to use Ensembl annotation, shall I download the GTF file and then pass the uncompressed gtf file to dexseq_prepare_annotation.py to get the GFF file? The GFF file should apply to that species (regardless of the BAM files for particular samples), am I right? Sorry this may be a naive question... Just wanna clarify ;-)

ADD REPLY
0
Entering edit mode

You might not even want to change the format to .GFF. You can still use prepare_annotation.py and retain it as .gtf and get the ECS. Yes, the ensembl .gtf file should be for respective species. Hope it helps. It might looks little tricky in the beginning but once you finish every step you will get the big picture of it.

ADD REPLY

Login before adding your answer.

Traffic: 2169 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6