How to calculate the Genetic Distance (P-Distance) by using mtDNA pacBio data?
0
0
Entering edit mode
6 months ago

Genetic_Distance

Dear researchers and experts,

I am using mtDNA PacBio data to calculate genetic distance (P distance). For my analysis, I have been utilizing bioinformatics tools on a Linux platform. Initially, I tried using BCFtools, followed by running a script in Python (PyCharm) to measure genetic distance (P distance). Could you please suggest some tools for these calculations? I humbly request your guidance from the beginning. Could someone please help me? Below, I have attached a screenshot of my calculations. Currently, I am using data from three species, with each species having data from three different locations. Please guide me about this.

These are the commands I have used:

Step 2: Call Variants

Identify the differences between your PacBio reads and the reference genome using a variant caller such as BCFtools.

Commands

Call variants using BCFtools:

`bcftools mpileup -f reference.fasta sorted_aligned_reads_with_rg.bam | bcftools call -mv -Oz -o variants.vcf.gz`
`bcftools filter -i 'QUAL > 30 && DP > 10' variants.vcf.gz -Oz -o filtered_variants.vcf.gz`
`bcftools index filtered_variants.vcf.gz`

Step 3: Calculate Genetic Distance

Use the filtered variants to calculate the genetic distance (P distance) between the sequences.

Example Python Script to Calculate P Distance

1. Extract allele frequencies:

`bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\t%AF\n' filtered_variants.vcf.gz > allele_frequencies.txt`

2. Calculate P distance using Python:

import pandas as pd
import matplotlib.pyplot as plt

# Load allele frequency data
allele_freqs = pd.read_csv('allele_frequencies.txt', sep='\t', names=['CHROM', 'POS', 'REF', 'ALT', 'AF'])

# Calculate P distance
total_sites = len(allele_freqs)
diff_sites = sum(allele_freqs['AF'] > 0)  # Count sites with non-reference alleles

p_distance = diff_sites / total_sites
print(f'P distance: {p_distance}')

# Visualization
# Histogram of allele frequencies
plt.figure(figsize=(10, 6))
plt.hist(allele_freqs['AF'], bins=20, edgecolor='black')
plt.title('Histogram of Allele Frequencies')
plt.xlabel('Allele Frequency')
plt.ylabel('Count')
plt.grid(True)
plt.show()

# Scatter plot of positions vs allele frequencies
plt.figure(figsize=(10, 6))
plt.scatter(allele_freqs['POS'], allele_freqs['AF'], alpha=0.5)
plt.title('Allele Frequencies by Position')
plt.xlabel('Position')
plt.ylabel('Allele Frequency')
plt.grid(True)
plt.show()

Thank you for your assistance.

pacBio Genetic-Distance mtDNA • 402 views
ADD COMMENT
1
Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (`text` becomes text), or use one of (a) the option highlighted in the image below/ (b) fenced code blocks for multi-line code. Fenced code blocks are useful in syntax highlighting. If your code has long lines with a single command, break those lines into multiple lines with proper escape sequences so they're easier to read and still run when copy-pasted. I've done it for you this time.
code_formatting

ADD REPLY

Login before adding your answer.

Traffic: 2269 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6