Question

"featureCounts" of "subRead" does not work with ensembl.gft

0

Entering edit mode

8.9 years ago

alakatos • 0

Hello,

I am analyzing an SE (50) RNAseq dataset (Illumina) for the first time. I used STAR for read alignment. I generated genome indices by STAR using GRCh38_r85.all.fa and Homo_sapiens.GRCh38.85.gtf downloaded from Ensembl (e.g. wget ftp://ftp.ensembl.org/pub/release-85/gtf/homo_sapiens/Homo_sapiens.GRCh38.85.gtf.gz).

$ head -6 Homo_sapiens.GRCh38.85.gtf 
#!genome-build GRCh38.p7
#!genome-version GRCh38
#!genome-date 2013-12
#!genome-build-accession NCBI:GCA_000001405.22
#!genebuild-last-updated 2016-06
1   havana  gene    11869   14409   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; havana_gene "OTTHUMG00000000961"; havana_gene_version "2"; etc

The alignment went smoothly and the result looked OK.

 Started job on |   Sep 20 09:49:47
 Started mapping on |   Sep 20 09:50:27
 Finished on |  Sep 20 10:09:30
 Mapping speed, Million of reads per hour | 88.83
 Number of input reads |    28202438
 Average input read length |    51
 UNIQUE READS:
 Uniquely mapped reads number | 23282219
 Uniquely mapped reads % |  82.55%
 Average mapped length |    50.75
 Number of splices: Total | 1764086
 Number of splices: Annotated (sjdb) |  1747703
 Number of splices: GT/AG | 1745945
 Number of splices: GC/AG | 13045  etc

Then, I wanted to quantify the reads per gene by "featureCounts"

subread-1.5.0-p3-Linux-x86_64/bin/featureCounts -a Homo_sapiens.GRCh38.85.gtf -g gene_id -o counts.txt Aligned.sortedByCoord.out.bam

I did not get any error message but the output file contains only 0.

$ head -10 counts.txt
Geneid  Chr Start   End Strand  Length  Aligned.sortedByCoord.out.bam
ENSG00000223972 1;1;1;1 11869;12613;12975;13221 12227;12721;13052;14409 +;+;+;+ 1735    0
ENSG00000227232 1;1;1;1;1;1;1;1;1;1;1   14404;15005;15796;16607;16858;17233;17606;17915;18268;24738;29534 14501;15038;15947;16765;17055;17368;17742;18061;18366;24891;29570 -;-;-;-;-;-;-;-;-;-;-   1351    0
ENSG00000278267 1   17369   17436   -   68  0
ENSG00000243485 1;1;1   29554;30267;30976   30039;30667;31109   +;+;+   1021    0
ENSG00000237613 1;1;1   34554;35245;35721   35174;35481;36081   -;-;-   1219    0
ENSG00000268020 1   52473   53312   +   840 0
ENSG00000240361 1   62948   63887   +   940 0
ENSG00000186092 1   69091   70008   +   918 0

 $ head -10 counts.txt.summary 
Status  Aligned.sortedByCoord.out.bam
Assigned    0
Unassigned_Ambiguity    0
Unassigned_MultiMapping 0
Unassigned_NoFeatures   0
Unassigned_Unmapped 0
Unassigned_MappingQuality   0
Unassigned_FragmentLength   0
Unassigned_Chimera  0
Unassigned_Secondary

I am not sure what caused this "hiccup". Could the GTF file be an issue? It worked for STAR.

Your advice and help are highly appreciated.

Thanks, Anita

RNA-Seq subRead ensembl gtf • 3.8k views

ADD COMMENT • link updated 8.9 years ago by GZ1995 ▴ 410 • written 8.9 years ago by alakatos • 0

score 0 · Answer 1 · 2016-09-21

0

Entering edit mode

8.9 years ago

GZ1995 ▴ 410

https://support.bioconductor.org/p/64636/

ADD COMMENT • link 8.9 years ago by GZ1995 ▴ 410

0

Entering edit mode

Thank you for your help. Anita

ADD REPLY • link 8.9 years ago by alakatos • 0