Entering edit mode
12 months ago
buhbs
▴
30
Hi all, I was wondering if anyone knows of an R package to annotate genomes based on the sequence of features. For example, I would like to use a list of features with their corresponding sequences that I have made as a database and then query a genome fasta files for those features. I use snap gene now but I was hoping to automate my annotations using R. I appreciate any ideas/input. Cheers
I guess you meant:
map X sequences corresponding to features (gene, transcript, repeat etc.) to the genome, record positions as i.e. BED/GTF, then query it? But this would be run of the mill genome annotation.
But if you already extracted these feature sequences from the same genome (== you have the positions) then I am not sure what you intend to do.
I have extracted the features from a parent genome but I am looking for insertions/deletions/SNPs in genomes of daughter strains. So i want to use a feature database to annotate new genomes.
So you align these sequences of features to your new genome. Not other way around
I understand that I need to align them but I was asking if anyone knows any R packages to align and annotate features in a genome?
As far as I know there are no genome aligners written in R. So in any case you will need a standalone program to do it or a Galaxy server.
Aligner installations are probably easiest using conda:
You could always use something like Liftoff to get the corresponding annotations for your daughter genomes, then just map using the genome-specific annotations Liftoff GitHub page