Question

Blastn - matches with another species

0

Entering edit mode

11 months ago

davidmaimoun ▴ 50

Hello,

I am writing a workflow on neisseria meningitidis. After the assembly (by Spades), I want to know that I really have this specie before continuing.

So, I added a blastn step: picked references from ncbi, and run blast on them. When I run blast on the complete genome, I find really good hits with the neisseria meningitidis - bit score max 50000 - but also some hits, very few but still, with another bacteria (salmonella enterica, bit score max 800).

On the other hand, when I cut my genome and pick only 500000 nucleotides, I don't get hits with the other species.

Is it normal?

Is it better to run blast only on a part of the genomes?

How can I know when the bitscore is good or not?

Thank you

blast blastn • 642 views

ADD COMMENT • link 11 months ago by davidmaimoun ▴ 50

0

Entering edit mode

Using a local aligner like blast for doing sequence similarity searches on whole genomes does not seem like a good idea. You are going to see hits to similar organisms (like you do above).

Perhaps you need to think about using an alternate like bbsketch: BBSketch - A Tool for Rapid Sequence Comparison

I am writing a workflow on neisseria meningitidis. After the assembly (by Spades), I want to know that I really have this specie before continuing.

So you are not sure if the starting sample is pure neisseria or if it contains other organisms?

ADD REPLY • link 11 months ago by GenoMax 148k

0

Entering edit mode

Hi, thank you for your help and sorry if I wasn't clear.

I'm almost 100% that is a neisseria. But my boss want still add a step to check, in case there was a contamination or something.

So I picked reference assemblies of different species (including neis. meningitidis) from NCBI refseq and create a database, and run blast on it, with my samples as input.

ADD REPLY • link 11 months ago by davidmaimoun ▴ 50

1

Entering edit mode

How about flipping this test around? You could create a simulated read dataset (Illumina reads, PE, 100 bp) using the Neisseiria genome from RefSeq. Then use this to align against your assembly. Look for alignment % and coverage across the genome (depth). Former should be very high (depending on how similar your strains are to the reference).