Entering edit mode
5.5 years ago
akh22
▴
120
I am trying to get counts from Bam files generated by STAR. I used Rsubread's featureCount with Ensemble Mus_musculus.GRCm38.96.gff3 ;
fc<-featureCounts(files = "10-IT-1-21-19.bam", nthreads = 24, isPairedEnd = T, isGTFAnnotationFile=T, annot.ext="Mus_musculus.GRCm38.96.gff3", GTF.attrType = "Name"))
and here is a stat;
Status Skin.25.IT.1.21.19.bam
1 Assigned 7686415
2 Unassigned_Unmapped 0
3 Unassigned_MappingQuality 0
4 Unassigned_Chimera 0
5 Unassigned_FragmentLength 0
6 Unassigned_Duplicate 0
7 Unassigned_MultiMapping 0
8 Unassigned_Secondary 0
9 Unassigned_NonSplit 0
10 Unassigned_NoFeatures 2985131
If I run this with buil-in mm10 as follows;
fc<-featureCounts(files = "10-IT-1-21-19.bam", nthreads = 24, isPairedEnd = T, annot.inbuilt = "mm10"))
and stat for this is;
Status Skin.25.IT.1.21.19.bam
1 Assigned 18096135
2 Unassigned_Unmapped 0
3 Unassigned_MappingQuality 0
4 Unassigned_Chimera 0
5 Unassigned_FragmentLength 0
6 Unassigned_Duplicate 0
7 Unassigned_MultiMapping 0
8 Unassigned_Secondary 0
9 Unassigned_NonSplit 0
10 Unassigned_NoFeatures 4960586
11 Unassigned_Overlapping_Length 0
12 Unassigned_Ambiguity 911357
As you can see # of assigned by Ensemble Mus_musculus.GRCm38.96.gff3 is significantly less than the one by mm10. I thought Ensemble gff3 annotation has a larger coverage than mm10. I'd appreciate any comments on this.
Thanks.
What reference was used to make the bam?
I believe it was ensemble GRcm38, though I am not 100% certain since the read mapping was done by some other lab.
Try the GTF file from Ensembl instead (never use GFF files unless you have no other choice).
What is a reason for using GTF over GFF3 ? I thought GFF3 was a preferred annotation file over GTF.
Thanks.
So I tried the ensemble GTF and got on the average 80% assigned reads. GTF looks definitely promising than GFF3.
It's kind of a dirty little secret that GFF files are tough to support.
Yeah, I am begging to realize this, despite of what I was told from "experts".
Thanks.