I'm running differential expression analysis for my species of interest using CuffDiff (3 samples, 5 biological replicates each). While checking the output files I've found a case of duplicated genes (both are neighbours, the only difference is their length) with only one having expression values estimated (FPKM), the second one has zeroes in all samples. Also in the read_group_tracking file the number of raw_frags is 0 for the second gene. I've inspected the bam files produced with Tophat and the RNA-seq reads are mapped in locations of both genes. I've tested CuffDiff with several sets of parameters (default, with -b
/--frag-bias-correct
, -u
/--multi-read-correct
, etc) and all give the same output. Is this normal way that Cuffdiff behaves in case of duplicated, highly similar genes? Should I care or not? I saw that actually in some studies, people first cluster the genes based on sequence similarity and then estimate expression only for the representative genes of each cluster. Thanks for any piece of advice in the above matter!