I would appreciate if anyone could help me understand the following issue I have with gene quantification using featureCounts.
As you can see in an example featureCounts output below, there are some genes that span to more than one chromosomes. I think featureCounts estimates the gene length by counting total number of bases in the exons of the gene copies in multiple chromosomes,
GeneID Chr
64109 chrX;chrX;chrX;chrX;chrX;chrX;chrX;chrX;chrX;chrY;chrY;chrY;chrY;chrY;chrY;chrY;chrY;chrY
Start 1190449;1193218;1196780;1198562;1198801;1202402;1206433;1208806;1212556;1190449;1193218;1196780;1198562;1198801;1202402;1206433;1208806;1212556
End 1191160;1193302;1196900;1198724;1199145;1202535;1206599;1208908;1212815;1191160;1193302;1196900;1198724;1199145;1202535;1206599;1208908;1212815
Strand Length
27097 -;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;-;- 4180
So, the gene ‘64109’ belong to chromosomes X as well as Y. The total length is 4180, which is ~sum of all the exons in both X and Y chromosomes. My concern is, is the gene count based on gene length across multiple chromosomes sensible? For example, what if the copies of gene ‘’64109’ in X and Y chromosomes have different biological function? I think there is some important understanding I am lacking here. An explanation would be great!
If this is indeed not right way to quantify a gene mapping to multiple chromosomes, how do you address this issue? Should such genes be removed from analysis?