Question

genomic and regional diversity of bacteria

0

Entering edit mode

2.7 years ago

Sidra • 0

I have 1510 isolates and I want to do diversity analysis in term of its regional and genomic diversity of bacteria ? Please Suggest me some R packages or any other bioinformatics Tools?

R • 1.6k views

ADD COMMENT • link updated 2.6 years ago by Ram 45k • written 2.7 years ago by Sidra • 0

2

Entering edit mode

It takes a lot more than a single sentence to properly describe what you have, and what exactly you want to do with it.

ADD REPLY • link 2.7 years ago by Mensur Dlakic ★ 30k

0

Entering edit mode

Actually, I have 1510 salmonella isolates with different serotypes. I have made a metadata table in which I have put its locations from where the bacteria have been isolated. now I am looking for a suitable R Package to identify which location or region had what diversity of bacterial serotypes. I request you to suggest me something in this regard. by the way, is there any R package to fulfill the above mentioned requirements in which we can input the bacterial fasta files?

ADD REPLY • link updated 2.6 years ago by Ram 45k • written 2.7 years ago by Sidra • 0

0

Entering edit mode

I think it also takes more than a single R package to do the analysis. Also, what do you mean by "have 1510 isolates". Have they been sequenced and how? If you have short-read sequences you could either assemble all or some samples (e.g. using Spades) and align them to create a phylogeny or call variants and create a tree from the variant matrix. Both approaches will be computationally challenging because of the large number of samples. If you don't have sequencing data please specify what you have and how that data provides information about genomic diversity. In my understanding, only genomic sequencing data can provide that information, or did I miss something?

ADD REPLY • link 2.7 years ago by Michael 56k

Ram · Answer 1 · 2022-12-31

Hi,

I think you can probably do this by grouping by two columns or something similar

Ref: https://www.geeksforgeeks.org/group-by-function-in-r-using-dplyr/

library(dplyr)
             
df = read.csv("salmonella_isolates_metadata.csv")
             
df_grp_reg_sero= df %>% group_by(serotype_column, region_column )  %>%
summarise(region.serotype.frequency= n(),
                    .groups = 'drop')
             
View(df_grp_reg_sero)