Question

nmt1 gene present in Tuxedo output both not DESeq!

0

Entering edit mode

11.0 years ago

Parham ★ 1.6k

Hi,

I analysed my RNA-seq data for S. pombe once with Tuxedo pipeline and once with DESeq! Surprisingly I see one of the genes which is important for my experiment present in Tuxedo gene_exp.diff output but not in deseq.res!

In gene_exp.diff I get nmt1 III:1837498-1844115 but when I look for it on pombase it has other coordination starting 1838335 and ending at 1839525 and I don't see this gene in deseq.res output at all. Am I missing something with annotations or there some other thing that I am not considering?

Your help is much appreciated.

/Parham

DESeq Tuxedo • 2.7k views

ADD COMMENT • link updated 3.6 years ago by Ram 45k • written 11.0 years ago by Parham ★ 1.6k

0

Entering edit mode

I am doing this from the RNA-seq workflow. I have all my accepted_hits.bam and accepted_hits.bam.bai in tophat_all. Does it help? Please let me know if you need more info!

> library(GenomicAlignments)
> fls <- list.files("tophat_all/", pattern="bam$", full.names =T)
> bamfls <- BamFileList(fls)
> flag <- scanBamFlag(isNotPrimaryRead=FALSE)
> param <- ScanBamParam(flag=flag)
> gnCnt <- summarizeOverlaps(exByGn, bamfls, mode="Union",
+ ignore.strand=TRUE, param=param)
> cnts=assay(gnCnt)

> dim(cnts)
[1] 7017    6
> sel.rn=rowSums(cnts) !=0
> cnts=cnts[sel.rn,]
> dim(cnts)
[1] 6097    6

ADD REPLY • link updated 3.6 years ago by Ram 45k • written 11.0 years ago by Parham ★ 1.6k

0

Entering edit mode

Yup, that helps. I would guess that either exByGn doesn't contain nmt1 or for some reason the counts for it are 0 and it's getting filtered when you subset cnts. Both of those are things you should be able to check.

ADD REPLY • link 11.0 years ago by Devon Ryan 105k

0

Entering edit mode

I could check cnts and it had zero value for all bam files! So it must be exByGn not containing nmt1 as you said, but I couldn't figure out how to check it. Would you give me a hint on that?

ADD REPLY • link updated 3.6 years ago by Ram 45k • written 11.0 years ago by Parham ★ 1.6k

0

Entering edit mode

If it's all 0 in cnts then it'd be easier to just redo the counting with featureCounts.

ADD REPLY • link 11.0 years ago by Devon Ryan 105k

0

Entering edit mode

Alright I did it again with featureCounts. Thanks for suggesting it! But here it still gives all zero for all BAM mapped reads! Here is counts.txt output and counts.txt.summary. Since I never used it before I am writing the code I used for this purpose. If you could kindly have a look on that if I am doing something wrong.

$ featureCounts -a genes.gtf -t exon -g gene_id -o counts.txt 1.bam 2.bam 3.bam 4.bam 5.bam 6.bam

ADD REPLY • link updated 3.6 years ago by Ram 45k • written 11.0 years ago by Parham ★ 1.6k

0

Entering edit mode

This is due to the ncRNA on the opposite strand. Since you have a stranded protocol, it's impossible to discriminate between reads mapping to the ncRNA and nmt1 (well, you can and do get reads assigned to that ncRNA, but that's because part of it doesn't overlap another feature). I'm guessing that cufflinks is either ignoring the ncRNA or its EM algorithm prioritizes against it (or perhaps it uses coverage in gauging which gene to assign counts to, though I could see that back firing).

ADD REPLY • link 11.0 years ago by Devon Ryan 105k

0

Entering edit mode

When I was running Tuxedo pipeline I was giving --library-type -frsecondstrand for all steps! Does that explain it?

ADD REPLY • link 11.0 years ago by Parham ★ 1.6k

0

Entering edit mode

It does. If these really are stranded libraries then tell summarizeOverlaps() or featureCounts that.

ADD REPLY • link 11.0 years ago by Devon Ryan 105k

0

Entering edit mode

Great now it has reads and makes sense! Thanks a lot!

ADD REPLY • link 11.0 years ago by Parham ★ 1.6k

Ram · Accepted Answer · 2014-08-04

1

Entering edit mode

11.0 years ago

Devon Ryan 105k

That cufflinks changes a gene's coordinates is rather expected, since that's partly its purpose. The output of DESeq is dependent upon what you use as input. So check if nmt1 is in there. BTW, you should use DESeq2 rather than DESeq if you aren't already (though in this case it'd be possible for nmt1 to get filtered out).

ADD COMMENT • link 11.0 years ago by Devon Ryan 105k

0

Entering edit mode

Thanks Devon, however I don't understand "what you use as input"! I have the same mapped reads as I used for cufflinks and yes I see the nmt1 in there (IGV). What am I doing wrong?!

And regarding DESeq2, so you are saying that DESeq can filter out nmt1 but not DESeq2? A bit of explanation on that is much appreciated.

Regards,

ADD REPLY • link updated 3.6 years ago by Ram 45k • written 11.0 years ago by Parham ★ 1.6k

0

Entering edit mode

Regarding DESeq2, no I mean that it could filter nmt1 out while DESeq wouldn't, though DESeq2 is the better choice for you anyway.

For "what you use as input", I mean the output from either htseq-count or featureCounts (or whatever else you used) that you used to get counts for DESeq. The first thing to check is if nmt1 exists in those files (if not, well that's your problem).

ADD REPLY • link 11.0 years ago by Devon Ryan 105k

0

Entering edit mode

I see! I used accepted_hits.bam out of TopHat for both DESeq and DESeq2, but in none of them I see nmt1! However I see the gene perfectly mapped when I look it up in IGV as above! Do you know what can the problem be?

ADD REPLY • link 11.0 years ago by Parham ★ 1.6k

0

Entering edit mode

Well you're not using a BAM file as input, you wouldn't get any meaningful results. Show the first few lines of code that you're using in R in order to get counts ready for DESeq/DESeq2 and then I can offer more insight.

ADD REPLY • link 11.0 years ago by Devon Ryan 105k

0

Entering edit mode

Sorry I replied to the original post by mistake. Please look below!

ADD REPLY • link updated 3.6 years ago by Ram 45k • written 11.0 years ago by Parham ★ 1.6k