I used cctyper to predict crispr sequences from contigs. It provided CRISPR repeats, spacers, identity percentage, and cas genes. I want to use this data to make predictions about bacterial taxonomy. Can I use a repeat sequence for this? and I want to relate this bacterial taxonomy data with my already-annonated taxonomic data by Metaphln. Indeed, I want to make a correlation at the contig level. How can i do that?
This question is impossible to answer without more information about the repeat in question. Simply put, do you see enough variation in said repeat that is conserved by taxonomic classifications?
One of the main genes used in microbial taxonomic classification, the16s rRNA gene, is copy number variable both within and among populations and can vary from something like 1 to 13 copies per genome. Or at least that's what was the consensus last time I looked into it (which was a long time ago).
My suggestion would be to rigorously test whether you can using existing metagenomic tools. If you see similar classifications with your repeat as you do with other well characterised loci, then yes, it is suitable. If you don't maybe it's because you only get family level specificity, or maybe it's not fit for taxonomic classification. I suspect high copy number repeats are not suitable, but I've never tested this or seen any literature about it.
I don't understand what you mean about correlations at the contig level.