Hello,
I have following problem: In an Eukariote organism with (almost) no splicing and genes/annotated ORFs located really close to each other I need to distinguish between separate genes and these which are likely to be somehow joined. Lets say I have putative genes A and B, one protein coding and one maybe not. The way to do it is to look for at stranded RNASeq data. If a set of both mates from the pair of reads map to A only (or B only) but there are no "crossings", then A and B are likely to be separate genes. We are talking 30+Mbp genome with <10k genes.
Hence my question: what would be a sane way of doing it?
I was thinking about dumping the names of the reads mapping to individual genes, and then doing comparisons, but maybe there is a better way.
many thanks for your help
At least with cufflinks it is not possible. Genes are transcribed in operon-like units, then cut into separate transcripts. For cufflinks it looks like few hundreds giant genes in the whole genome.