Question

Set control vs experiment for gage/pathview!

0

Entering edit mode

11.2 years ago

Parham ★ 1.6k

I am following GAGE workflow (1). The sample data is from ArrayExpress and it contains a set of control and a set of experiment (KD1 and KD2). I quote from the workflow(1)

As an example, below is the code I used to map, index and process the read data for the first sample (ERR127302). You can write a shell script to do so for all samples (ERR127302-9).

The code is:

tophat2 -p 8 -o tophat_out_1 ref/hg19 ERR127302_1.fastq.gz ERR127302_2.fastq.gz

If I include all files in the mapping script above, won't they just be shuffled all together?!!! I couldn't figure out how should I set in the pathway the control vs the experiment? Any help is appreciated.

I am a new user to Biostars, so I only have 5 posts per day. I may not be able to reply to all answers but I am reading your comments.

Thanks!

Reference

RNA-Seq Data Pathway and Gene-set Analysis Workflows, Weijun Luo, February 6, 2014.

gage control-vs-experiment • 3.8k views

ADD COMMENT • link updated 3.8 years ago by Ram 45k • written 11.2 years ago by Parham ★ 1.6k

Ram · Answer 1 · 2014-06-12

1

Entering edit mode

11.2 years ago

Devon Ryan 105k

When Weijun Luo wrote that, he meant that the script would iterate over the samples, writing the output to different files, not that tophat2 should just be run once with all of the samples together (that would, indeed, make life difficult).

ADD COMMENT • link updated 5.6 years ago by Ram 45k • written 11.2 years ago by Devon Ryan 105k

0

Entering edit mode

I see! Then how do I set control and experiment in the command lines? I cannot understand by just reading commands, since I am very new to this field. Should I rename the accepted hits files to control or experiment?! So many stupid questions I have!

ADD REPLY • link 11.2 years ago by Parham ★ 1.6k

1

Entering edit mode

You needn't rename the BAM files that tophat produces, just use the -o option from tophat to put them in different directories. You end up specifying which samples belong to which group within R, when you run the gage() command (see section 3.4 of the workflow you're looking at). The idea is that you have normalized counts in a matrix (cnts.norm) with samples as columns. The column numbers for samples in the two groups are then provided as the ref= and samp= parameters.

ADD REPLY • link 11.2 years ago by Devon Ryan 105k

0

Entering edit mode

This line you mean?

cnts.kegg.p <- gage(cnts.norm, gsets = kegg.gs, ref = ref.idx, samp = samp.idx, compare ="unpaired")

So should I leave the ref.idx and samp.idx as they are or should I change them. The problem I am having is how does this matrix column numbers are set to ref and samp?

ADD REPLY • link updated 5.6 years ago by Ram 45k • written 11.2 years ago by Parham ★ 1.6k

0

Entering edit mode

Yes, that's the line. The actual indices for the samples will depend on how you make the matrices. In the workflow that you're looking at, the order of the samples is the order of the file names in fls in section 3.2. Personally, I'd use htseq-count or featureCounts and then just load the counts, but that's me. I'd also just export rlog2 counts from DESeq2 rather than bothering to do things manually, but that's rather outside the purview of that workflow.

ADD REPLY • link 11.2 years ago by Devon Ryan 105k