Question

Databases to blast against when checking for human sequences

0

Entering edit mode

5.6 years ago

Michael ▴ 270

I have several sets of relatively short DNA sequences ranging from 200bp to about 2000bp. They are stored as FASTA.

They are all supposed to be from bacterial origin.

However, I want to make sure that there are no human sequences sneaking in. Some of them could also be just partially human (meaning a part of the entire sequence could be from from human origin).

I would just blastN against whole human_genomic.*tar.gz and est_human.*.tar.gz. Speed is not much of an issue, so I do not need a solution like Centrifuge or mapping. I think I would like to go with blast for high sensitivity.

Do you have some more databases you would add to the table to search against?

blast human • 973 views

ADD COMMENT • link 5.6 years ago by Michael ▴ 270

0

Entering edit mode

Use bbsplit.sh to bin your reads ( A: Tool to separate human and mouse ran seq reads ). Use human genome alone if you don't know specific bacteria you want to include. Reads aligning to human genome will go into one file and rest will be collected in second.

ADD REPLY • link 5.6 years ago by GenoMax 147k

0

Entering edit mode

Thank you! However, I am NOT asking for a tool to split my data. I know bbsplit.sh. I am asking if you would add another database in addition to the two I have mentioned above to make sure very short stretches of human sequences get catched.

ADD REPLY • link 5.6 years ago by Michael ▴ 270

0

Entering edit mode

Human genome sequence should be a catch all. There should be no need to add any other sequence. EST's etc are all a subset of entire genome.

Some of them could also be just partially human (meaning a part of the entire sequence could be from from human origin).

That is a tough criteria. If you want to enforce that then what minimum length are you thinking of using for hits? You may have small stretches of sequence identity between your data and human genome by chance.

ADD REPLY • link 5.6 years ago by GenoMax 147k

score 1 · Accepted Answer · 2019-05-03

1

Entering edit mode

5.6 years ago

Michael ▴ 270

Seems like the standard "human genomic" BLAST database is suitable for me.

ADD COMMENT • link 5.6 years ago by Michael ▴ 270