Question

Nucleotide diversity and TajimaD from SNPs

0

Entering edit mode

4 months ago

BioD • 0

Hi all,

I have the illumina paired end reads of four chromosomes in fastq format. Using the cat command, I merged all of the fastq files into one. I used the GATK hard filtering method to extract SNPs in vcf format after mapping with bwa mem. Next, I would like to find the nucleotide diversity, TajimaD, Fixation index etc from the SNPs. For example, find the diversity between chromosomes 1 and 2, chromosomes 1 and 3, chromosomes 1 and 4. I would like to know whether this approach is correct. How do I find the above parameters from vcf files or any other methods possible?

GATK TajimaD vcf • 517 views

ADD COMMENT • link updated 4 months ago by Ram 44k • written 4 months ago by BioD • 0

0

Entering edit mode

This question was also asked on bioinformatics SE: https://bioinformatics.stackexchange.com/questions/22663/nucleotide-diversity-and-tajimad-from-snps

Please keep in mind that posting the same question to multiple sites can be perceived as bad etiquette, because efforts may be made to address a problem that has already been solved elsewhere in the meantime.

The helpful thing to do if you do decide to post on multiple forums is to add a link to the other forum posts on each post so people will look at the other posts before investing their effort.

ADD REPLY • link 4 months ago by Ram 44k

score 0 · Answer 1 · 2024-07-01

It's impossible to tell you if that approach is correct without background on what your research questions is. Things like data structure, including number whether there are multiple populations, are important since you say you want to calculate fixation index.

Functionally, these metrics are all pretty straightforward to calculate and there are numerous tools out there that do it. Several come up with a simple google search. I've used popgenome in R previously for all these metrics, but it's been a few years so I'm sure other tools have been developed since.