Hi there,
Apologies if this is a silly question but my bioinformatic experience is quite limited and unfortunately entirely self-taught (I am a PhD student). I was hoping for some advice please.
I have fully closed a total of 193 bacterial plasmids (from a total of 39 isolates from the same bacterial species) using hybrid assembly (Illumina and MinION technologies). Some bacterial isolates contain multiple plasmids ranging from 3-8 in total. They vary in size from 2,947 to 289,861 bps and are from different plasmid families (Rep types). I have exported each individual plasmid as a unique fasta file. I was just wondering if there is a way to assess genetic relatedness (and visibly display) between these plasmid files? The main aim of what I am trying to do is to show that specific plasmids are unique to specific strain types (ST) and will therefore cluster on this basis. So for example, for one specific ST I have 52 plasmids from a total of 10 different isolates and I want to show that these plasmids are similar to each other? I have already REP-typed them but I feel that this is not discriminatory enough. I have seen papers make a core-plasmid gene analysis using Roary and generate a phylogenetic tree on this basis (the plasmids were all from the same family and similar in length), however I'm not sure this would be appropriate in my case as the plasmid sequences are obviously much more diverse and vary in size.
Hopefully that makes sense. I really appreciate any help or input.
Thanks, Nicole
Can you roughly subdivide them into groups based on their size? I assume these bacterial strains are a single organism (or closely related)? Then you could use roary on the groups of plasmids to generate the trees.
Thank you so much for your quick response, I have been searching scientific papers for about a week but have failed to come up with a reasonable work-flow and thought I better ask for advice. Yes, all of these bacterial strains are a single organisms (Enterococcus faecium). I have been thinking of trying this but have several questions:
1) If plasmids are similar in size but a different plasmid family - will this impact the core-gene plasmid output? I.e. if they are quite diverse even if they are similar in size? 2) What would you consider is a reasonable division based on size? Would plasmids ranging from 2,000-20,000 bp, 30,000-90,000 and 100,000-200,000 be too large of a division?
Thank you again, I really appreciate the advice.
Can you classify the plasmids based on function (resistance genes they are carrying or some other criteria). Sizes above are indicating a wide range so criteria for classification may need to be chosen in a way that makes biological sense.
I think I will perhaps use their predominant PlasmidFinder type to group plasmids together and create separate core phylogenies for each as the size range tend not to be quite as large (e.g, one core phylogeny for rep11a types etc). Hopefully this makes sense. Thank you for your help! My final year PhD brain was ready to burst :)