Question

differential expression of human sample showing 63000 genes in output

0

Entering edit mode

7.9 years ago

Kritika ▴ 270

Hello all Recently i got human rna seq sample for differential expression analysis I used htseq-count and cufflink for abundance estimation and Deseq and cuffdiff for differential expression analysis . In a output of both deseq and cuffdiff i am getting 63000 genes which is not possible when human have 30000 genes in their genome. I used reference of GrCh38 and gtf (GRCh38) downloaded from ensemble. I am not able to get where i am getting wrong. Please help!!!!!!!!!!!

RNA-Seq human trancriptome cuffdiff • 2.0k views

ADD COMMENT • link 7.9 years ago by Kritika ▴ 270

0

Entering edit mode

63000 genes or transcripts? If you provide details on how you ran each step of your pipeline it will be easier to determine what went wrong. For example, if yoou received input files as BAM files, and they were aligned to references with a different order of chromosomes, I would not be surprised if every gene appears to be differentially expressed.

ADD REPLY • link 7.9 years ago by d-cameron ★ 2.9k

0

Entering edit mode

Hi I ran the accepted.bam files which i got from tophat i kept --b2 very sensitive. This bam file is not sorted.

ADD REPLY • link 7.9 years ago by Kritika ▴ 270

0

Entering edit mode

Is your data paired end? The htseq FAQ states

For paired-end data, the alignment have to be sorted either by read name or by alignment position.

ADD REPLY • link 7.9 years ago by d-cameron ★ 2.9k

0

Entering edit mode

yes its pairend data

ADD REPLY • link 7.9 years ago by Kritika ▴ 270

0

Entering edit mode

but then after running on sorted bam file on cufflink i am getting same result

ADD REPLY • link 7.9 years ago by Kritika ▴ 270

0

Entering edit mode

Can you post commands you used for htseq-count and DESeq2?

ADD REPLY • link 7.9 years ago by WouterDeCoster 47k

0

Entering edit mode

htseq-count -f bam -s no -t exon -i gene_id -o htseq_sample1_samout accepted_hits.bam hg19.gtf > count_list1
htseq-count  -f bam -s no -t exon -i gene_id -o htseq_sample2_samout accepted_hits.bam hg19.gtf

deseqcommand

count_1_2 <- subset(count_list,select=c("Control","Test"))
conditions <- factor(c("Control","Test"))
sample1_2 <- newCountDataSet(count_1_2 ,conditions)
sample1_2 <- estimateSizeFactors(sample1_2)
sample1_2 <- estimateDispersions(sample1_2 , method="blind",sharingMode = "fit-only" , fitType= "local") 
result_1_2  <- nbinomTest(sample1_2,"Control","Test")
head(result_1_2 )
write.csv(result_1_2 ,"~/Desktop/RNASeq/deseq_results/sample1_2.csv")

ADD REPLY • link updated 7.9 years ago by WouterDeCoster 47k • written 7.9 years ago by Kritika ▴ 270

0

Entering edit mode

But this time i used ref and gtf of hg19 For both the version initially i was getting 63000 genes. This time i only filtered protien coding genes in hg19.gtf and used that i got 27000 genes for all samples. Not sure whether i am correct or not