Hi all,
I am quite new to Metagenomics and how to statistically analyse my data.
I have run kraken2 to taxonomically profile my assembled metagenomics samples. I have three different disease state groups and I would like to see if there are any statistical differences between them. I decided to try to run metagenomeseq and/or phyloseq, however I am unsure of how to go from my kraken reports to inputting this into R.
I thought to create biom-tables with the program kraken-biom, but I am unsure if I should create one table per group or one table per sample.
Any information anyone has on Metagenome stats and using metagenomeseq/phyloseq I'd be grateful for your help!
Thanks!
I have previously mapped my reads to databases made from the contigs of each sample when using anvio to bin my samples. Would these bam files contain the sufficient information to do the comparisons? To use kraken to find out what the differential contigs are should I input the concoct contig bins I have created into kraken?
Thank you very much for your help!
You can generate a table of the number of reads mapped to each contig from your bam file, then you can sum them up for each bin and use these counts for comparison (with DESeq for instance). Since you have bins I would take other approaches for determining taxonomy, gtdb-tk for instance which is fast but more accurate (a larger DB and more refined method).
Thank you for your reply, is there a preferred method for generating a table of the number of reads mapped to each contig from the bam file? Is this the same as calculating coverage?
Also is DeSeq recommended to find taxonomic differences between samples/groups of samples? Or more the genes within samples?
I'm using
samtools idxstats <file.bam> | awk '{print $1"\t"int(($3+$4)/2)}'
to get the table for each bam file. Using DESeq to compare samples at the gene level will leave you with a long list of highly dependent features compared between samples, I would compare contigs (or bins) and then figure out what's in the differential ones.