Working with mappy and comparing it to Blast
0
0
Entering edit mode
17 months ago
Chris K ▴ 10

Hello!

Following a previous post of mine (Alignment error using Biopython) and thanks to the kind help of @zorbax i came up with this piece of code to do the alignment i wanted:

import mappy as mp
import re

# Load the reference genome of your choice against which you will align the desired genes
genome_file = "CP000033.3[1..1993560].fasta"
genome = mp.Aligner(genome_file)

# Load the genes of interest sequences from a fasta file
gene_file = "/home/user/VScode/genes.fasta"

# List to store the accession IDs with alignments
aligned_accession_ids = []

with open(gene_file, "r") as fasta:
    gene_sequences = []
    curr_sequence = ""
    curr_name = ""
    for line in fasta:
        line = line.strip()
        if line.startswith(">"):
            if curr_name != "":
                gene_sequences.append((curr_name, curr_sequence))
                curr_sequence = ""
            curr_name = re.search(r"\b\w+\b", line).group()
        else:
            curr_sequence += line
    # Add the last gene sequence
    gene_sequences.append((curr_name, curr_sequence))

# Iterate over each gene sequence
for gene_name, gene_seq in gene_sequences:
    # Iterate over each genome sequence
    for genome_seq in genome.map(str(gene_seq)):
        # Calculate the alignment length
        alignment_length = genome_seq.mlen

        # Calculate the percentage identity
        percentage_identity = (alignment_length / len(gene_seq)) * 100



        # Store the accession ID if there is an alignment (this is for future use not relevant with the program itself)
        if alignment_length > 0:
            aligned_accession_ids.append(gene_name)

        # Print the alignment information
        print(f"Gene: {gene_name} (Length: {len(gene_seq)})")
        print(f"Genome: {genome_seq.ctg} (Length: {genome_seq.blen})")
        print(f"Percentage Identity: {percentage_identity:.2f}%")
        print()

# Print the list of accession IDs with alignments (this is for future use not relevant with the program itself)
print("Accession IDs with alignments:")
for accession_id in aligned_accession_ids:
    print(accession_id)

The question i would like to ask is that after a quick check using the same files with BLAST it produced somewhat different results. Especially for the second hit. So is this an acceptable result?, should i be worried and change something in the code? or this is normal and leave it as be?

enter image description here enter image description here

Thank you all in advance!

mappy BLAST coding • 632 views
ADD COMMENT
1
Entering edit mode

Since you seem to be doing the alignment with some other part of the code we don't know if you are using the same exact algorithm and or settings as the web blast. If you click on Search Summary of the web BLAST result page above you will get the exact settings used for that search. Be sure to use the same settings for your local search.

search

ADD REPLY
0
Entering edit mode

Ok! I will check it out thank you!

ADD REPLY

Login before adding your answer.

Traffic: 1616 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6