Hi,
I am new to bioinformatics and I am not very familiar with the human genome.
I mapped my RNA-seq reads on the human genome using hisat-2.
After mapping, I would like to count, for each gene, the number of reads mapping on exons or introns. What I am lacking is a gtf or a bed file with the coordinates of introns and exons for all protein-coding genes.
I am trying to use the Table browser on http://genome.ucsc.edu/cgi-bin/hgTables (as described here: https://www.biostars.org/p/13290/) and I get a BED file in the following format:
chr12 6534569 6534809 ENST00000496049.1_intron_0_0_chr12_6534570_f 0 +
chr12 6534569 6534809 ENST00000229239.10_intron_0_0_chr12_6534570_f 0 +
chr12 6534861 6536493 ENST00000229239.10_intron_1_0_chr12_6534862_f 0 +
chr12 6536593 6536683 ENST00000229239.10_intron_2_0_chr12_6536594_f 0 +
chr12 6536790 6536919 ENST00000229239.10_intron_3_0_chr12_6536791_f 0 +
chr12 6537010 6537100 ENST00000229239.10_intron_4_0_chr12_6537011_f 0 +
chr12 6537216 6537308 ENST00000229239.10_intron_5_0_chr12_6537217_f 0 +
chr12 6537390 6537583 ENST00000229239.10_intron_6_0_chr12_6537391_f 0 +
As you can see, the table has only the transcript identifier. I would need the gene name or the gene identifier (starting with ENSG) to count and assign all intronic reads to a specific gene. I would then use featureCounts to count reads for any specific feature.
Do you have any idea how to do that? Probably my approach is completely wrong.
Thank you in advance