Dear All, I would really appreciate some help. I have a aligned BAM file which is sorted. This file was obtained from paired end sequencing. I am trying to find the raw counts which fall in the regions mentioned in the GTF.
The GTF is below:
1dh10b ena exon 12163 14079 . + . gene_id "ECDH10B_0014"; transcript_id "ACB01219"; exon_number "1"; gene_name "dnaK"; gene_source "ena"; gene_biotype "protein_coding"; transcript_name "dnaK-1"; transcript_source "ena"; transcript_biotype "protein_coding"; exon_id "ACB01219-1";
1dh10b ena exon 4040647 4043550 . + . gene_id "ECDH10B_3947"; transcript_id "ECDH10B_3947-1"; exon_number "1"; gene_name "rrlC"; gene_source "ena"; gene_biotype "rRNA"; transcript_name "rrlC"; transcript_source "ena"; transcript_biotype "rRNA"; exon_id "ECDH10B_3947-1";
When I ran Rsubreads in the paired end command and results mentioned below:
The code for feature count:
fc1<-featureCounts(files=c("R1.bam"),
annot.ext="/data/tellis/analysis/genome/combined/dh10b_gfp_psb1c3_h3/test_part.gtf",
isGTFAnnotationFile=TRUE,GTF.featureType="exon",
isPairedEnd=TRUE)
dge1 <- DGEList(counts=fc1$counts, genes=fc1$annotation)
write.table(dge1$counts,file="/data/tellis/analysis/genome/combined/dh10b_gfp_psb1c3_h3/test_part_gtf_pe.txt",sep="\t")
Result obtained: "R1.bam" "ECDH10B_0014" 6528 "ECDH10B_3947" 255754
When I run the above with single end mode i.e. isPairedEnd=FALSE, I get the following results: "R1.bam" "ECDH10B_0014" 12049 "ECDH10B_3947" 440677
I wanted to double check this with bedtools:
bedtools coverage -a test_part.gtf -b R1.bam > cov_prt_gtf_bam.txt
The counts I got were: ECDH10B_0014 12049 ECDH10B_3947 4040647
However, because bedtools results dont account for paired end the results were similar to feature counts in single end mode.
But when I ran HTseq command as below
htseq-count -f bam -s no R1.bam test_part.gtf > htseq_test_part_gtf.txt
Results ECDH10B_3947 0 ECDH10B_0014 6509
When I checked on the IGV browser for the location on where the gene ECDH10B_3947 is I could hardly find any reads but bedtools in single mode picked up counts similar to feature counts single mode. There were also almost half the counts in feature counts in paired end mode. But the HT seq counts were zero and on IGV also there were no reads. ITs strange that for ECDH10B_0014 gene the counts of HT seq was similar to feature count in paired end mode.
Can someone please help? I wold be grateful.
Thanks, Yaseen
It's likely that IGV and htseq-count are filtering a number of reads (possibly appropriately). You can check this by changing the filtering options in IGV. As a rule, if htseq-count and featureCounts disagree greatly the only reason is a setting different between them. Otherwise they're identical most of the time.
Thank you. Thats very helpful :)
Hello Yaseen Ladak!
It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?t=72405
This is typically not recommended as it runs the risk of annoying people in both communities.
Thanks, I would be more careful next time.