Entering edit mode
15 months ago
cyril-cros
▴
950
Hi, I have a GTF/GFF transcriptome that includes ribosomal sequences annotated from barrnap. I end up with ribosomal sequences that are present with the same gene IDs / transcript IDs at different sites and on multiples scaffolds. Should I instead force them to have unique names at each site? Thanks!
How are you intending to use the rDNA information? There is so much variation in rDNA loci even among individuals in a population in terms SNP and copy number variance that trying to annotate each repeat might not be worthwhile if this isn't an rDNA specific project. I would probably change the names to include scaffold so you can more easily differentiate, but again, this depends on use case.
I am trying to run through the bioinformatics part of the PacBio MAS-Seq protocol (high quality long reads of reverse transcribed mRNA with single cell barcodes, for cell specific isoform identification). I work with a non model species with a highly heterogeneous genome, and annotating rDNA is not a priority.
That being said, I was presuming rDNA gene loci would be much more conserved across individuals, so your answer already helps quite a lot. I will just read more on the topic.
Have a look in the literature in similar species if studies are available, if not there is plenty of other literature about it out there in model systems. Maybe rDNA is more stable in your system. If you haven't already done the sequencing, I would recommend looking at ribo-depletion as they tend to take up a lot of the reads that could be better used elsewhere in a non-rDNA study.