Question

Manipulate refseq database

0

Entering edit mode

4.9 years ago

anasofiamoreira94 ▴ 80

Good Morning, I apologize for this question, and I will try to explain myself as clearly as possible. We analyze some fastq files from iontorrent, and we blast against the whole nt. Now we are seeing certain results that don't make sense (environmental, vibrios, etc), in the sense that more things appear than we were expecting. So now, we wanted to use the refseq mitochondrial, but I have doubts about the taxid, because when I do it against the mitochondrial, I don't have the results of the species. Could someone indicate the best path for my goal? Thanks

ncbi database nt refseq • 1.4k views

ADD COMMENT • link updated 4.8 years ago by Biostar 20 • written 4.9 years ago by anasofiamoreira94 ▴ 80

0

Entering edit mode

I apologize for this question

No need for that. Biostars exists specifically for questions.

ADD REPLY • link 4.9 years ago by WouterDeCoster 47k

0

Entering edit mode

It would help to describe what this project is about. We have heard what you have tried to do but not why.

NG Sequencing can be very sensitive (you would sequence contaminant DNA in any of your samples very easily) and if one is not careful with samples/preps unexpected results can happen.

ADD REPLY • link 4.9 years ago by GenoMax 147k

0

Entering edit mode

I can't really describe the goal of the project... I'm so sorry. Can you lead me to the best way on how to use refseq database?

ADD REPLY • link 4.9 years ago by anasofiamoreira94 ▴ 80

0

Entering edit mode

Maybe you can describe the data. Are it amplicons or full genomes (all DNA)? If amplicons which marker/gene you are looking at?

ADD REPLY • link 4.9 years ago by gb ★ 2.2k

0

Entering edit mode

we perform target sequencing, so we focus on amplicons

ADD REPLY • link 4.9 years ago by anasofiamoreira94 ▴ 80

1

Entering edit mode

I already expected this when you said you had hits on "environmental sample" entry's. The nt database is full of those kind of reads. I practice researchers take a sample from water or soil and know that there is a certain family of species in it but don't know exactly which species. And it will be just uploaded to genbank, if you check the taxonomy of those hits often it will not go deeper then family level.

The easy thing to do is to use a database that is specifically created for your target. (We don't know your target so we can not give suggestions). The more difficult thing, depending on your own skills is to filter the nt database.

in the sense that more things appear than we were expecting

Think your expectation is wrong, depending on your type of sample ofcourse.

So now, we wanted to use the refseq mitochondrial, but I have doubts about the taxid, because when I do it against the mitochondrial

As far as I thought they all just have taxid's in the same way the nt database have. But using this database will probably not help. (don't know for sure don't know your goal). Most of the time a species is specifically target sequenced with a certain primer and not the mitochondria. So you will miss a lot.

ADD REPLY • link 4.9 years ago by gb ★ 2.2k

0

Entering edit mode

Let's say I want to analyse my data against hits of meat and fish, would you suggest to continue using nt?

ADD REPLY • link 4.9 years ago by anasofiamoreira94 ▴ 80

0

Entering edit mode

I personally would suggest to filter nt. (You can also filter the hits afterwards...maybe) or use BOLD (http://boldsystems.org/) This database is not that easy to use the API is a bit weird.

Filtering nt also drastically speed up the blasting progress.

ADD REPLY • link 4.9 years ago by gb ★ 2.2k

0

Entering edit mode

But how can I filter the nt? Is it possible to do it remotely?

ADD REPLY • link 4.9 years ago by anasofiamoreira94 ▴ 80

1

Entering edit mode

remotely don't think so. You could download it and use biopython or maybe blastdbmc. In detail it is a lot to explain so maybe you can better make a start and ask a new question if you are stuck. Or maybe some one else has an other suggestion.

Basically you can read the nt database with biopython and if a fasta header contains a certain text you can write the read to a new file.

ADD REPLY • link 4.9 years ago by gb ★ 2.2k

0

Entering edit mode

Probably make a new question, thanks

ADD REPLY • link 4.9 years ago by anasofiamoreira94 ▴ 80

0

Entering edit mode

btw maybe this helps Downloading all COI sequences from BOLD database

ADD REPLY • link 4.9 years ago by gb ★ 2.2k

0

Entering edit mode

I'm so sorry but I can't use BOLD database,only NCBI,but thanks for the suggestion.

ADD REPLY • link 4.9 years ago by anasofiamoreira94 ▴ 80

0

Entering edit mode

You can't filter nt remotely. You could do a blast search against it using a specific entrez query with species of your interest. As it stands meat and fish is too broad a term and is not one that could be used with entrez.

ADD REPLY • link 4.9 years ago by GenoMax 147k

0

Entering edit mode

I can specify the genes of interest with entrez, but I will continue to hava all the species that I don't want...

ADD REPLY • link 4.9 years ago by anasofiamoreira94 ▴ 80