Entering edit mode
7.9 years ago
Kritika
▴
270
Hello all Recently i got human rna seq sample for differential expression analysis I used htseq-count and cufflink for abundance estimation and Deseq and cuffdiff for differential expression analysis . In a output of both deseq and cuffdiff i am getting 63000 genes which is not possible when human have 30000 genes in their genome. I used reference of GrCh38 and gtf (GRCh38) downloaded from ensemble. I am not able to get where i am getting wrong. Please help!!!!!!!!!!!
63000 genes or transcripts? If you provide details on how you ran each step of your pipeline it will be easier to determine what went wrong. For example, if yoou received input files as BAM files, and they were aligned to references with a different order of chromosomes, I would not be surprised if every gene appears to be differentially expressed.
Hi I ran the accepted.bam files which i got from tophat i kept --b2 very sensitive. This bam file is not sorted.
Is your data paired end? The htseq FAQ states
yes its pairend data
but then after running on sorted bam file on cufflink i am getting same result
Can you post commands you used for htseq-count and DESeq2?
deseqcommand
But this time i used ref and gtf of hg19 For both the version initially i was getting 63000 genes. This time i only filtered protien coding genes in hg19.gtf and used that i got 27000 genes for all samples. Not sure whether i am correct or not
Your R code looks incomplete, did you use the DESeqDataSetFromHTSeq function?