Dear community,
I am currently working on differential expression in transcripts of vitis vinifera. I aligned several RNAseq experiments on the transcripts sequences of the vitis reference genome using bwa and computed fpkm and effective reads counts with the software express. I ran the deseq pipeline but there is some part that is not clear to me. I am working on two experiments with 2 replicates each. My coldata file (named coldata.txt) is the following:
condition type
CAN3FI2_exp.txt untreated paired-end
CAN3FI3_exp.txt untreated paired-end
CAN3NI2_exp.txt treated paired-end
CAN3NI3_exp.txt treated paired-end
The first rows of my expression (reads count) input table (named expressionTable.txt) are the following:
CAN3FI2_exp.txt CAN3FI3_exp.txt CAN3NI2_exp.txt CAN3NI3_exp.txt
678 1003 598 488
40 87 25 15
9 12 7 7
914 1028 1063 856
0 0 0 0
48 74 31 25
221 281 181 140
776 1225 664 539
0 0 0 0
finally the first rows of the rownames file (named rownames.txt) are the following:
chr5.gff3_MRNA_VIT_05s0020g02570.t01
chrUn.gff3_MRNA_VIT_00s0144g00290.t01
chr15.gff3_MRNA_VIT_15s0048g00090.t01
chr11.gff3_MRNA_VIT_11s0118g00830.t01
chr5.gff3_MRNA_VIT_05s0165g00290.t01
chr9.gff3_MRNA_VIT_09s0070g00570.t01
chr18.gff3_MRNA_VIT_18s0001g07490.t01
chr12.gff3_MRNA_VIT_12s0035g01940.t01
chr16.gff3_MRNA_VIT_16s0022g02170.t01
chr3.gff3_MRNA_VIT_03s0038g02130.t01
I run the following pipeline to compute the genes that are differentially expressed:
library("DESeq2")
expSet <- read.table("expressionTable.txt",header=T)
rownames <- unlist(read.table("rownames.txt"))
row.names(expSet) <- rownames
coldata <- read.table("coldata.txt",header=T)
cts <- as.matrix(expSet)
dds <- DESeqDataSetFromMatrix(countData=cts,colData=coldata, design=~ condition)
dds <- DESeq(dds)
res <- results(dds)
write.table(res,"expDiff.txt")
ntd <- normTransform(dds)
write.table(assay(ntd),"normalizedExpressions.txt")
If I consider the transcript named chr16.gff3_MRNA_VIT_16s0100g00990.t01 and I look for it in the produced expDiff.txt file I find:
"chr16.gff3_MRNA_VIT_16s0100g00990.t01" 32.3186500941916 1.09546846712526 0.29492459931321 3.71440181550224 0.000203684932436758 0.00480523551483428
As you can see the log2Fold change value is 1.095; this means that the treated sample is more highly expressed than the untreated. If I look for that gene in the rowname.txt file in order to get the line number I obtain the following:
grep -nr "VIT_16s0100g00990" rownames.txt
28170:chr16.gff3_MRNA_VIT_16s0100g00990.t01
Since the expressionTable.txt file has also the field names line, I used the following command to retrieve the reads counts from it corresponding to the transcript chr16.gff3_MRNA_VIT_16s0100g00990.t01
sed -n 28171p expressionTable.txt
19 162 0 0
My question is the following: if the treated sample have zero count in both the replicates, how can it be more highly expressed than the untreated as reported in the expDiff file? I thought it was a matter of normalization and for this reason I used the function normTransform to have the normalized data output in the file normalizedData.txt. However if I look for this transcript in this file I get the following:
grep "chr16.gff3_MRNA_VIT_16s0100g00990.t01" normalizedExpressions.txt
"chr16.gff3_MRNA_VIT_16s0100g00990.t01" 4.29263896969935 6.80318233956259 0 0
So even here the treated have zero count. Am I doing something wrong? Sorry to bother you with a so long e-mail but I preferred to be as detailed as possible in order to give you the possibility to have a precise picture of what I am doing
Thanks a lot for your support
Cross-posting, especially without noting this, is discouraged (duplicates effort of the volunteers answering questions).
https://support.bioconductor.org/p/102962/