Finding FASTA files to use with bowtie-2-build to create a custom database for Kneaddata
1
0
Entering edit mode
2.8 years ago
jobie1 ▴ 30

I would like to run the kneaddata tool on some metagenomics data (illumina) I have from seabird gut microbiome samples in order to remove any host-associated reads. Simply put, I have to use bowtie2-build to build a custom database containing reads from the genomes of my selected study species, but I am very new to metagenomic analysis and I have a few beginner questions:

What sort of FASTA files I am looking for to build the custom database? Can I just download nucleotide sequences found off of RefSeq for example? Do I download all those sequences individually?

Also, where should I look to download the FASTA files I can use to build a bowtie2 index for the following seabird species: Morus bassannus, Uria aalge, Rissa tridactyla, Fratercula arctica

Thanks

kneaddata metagenomics bowtie2 • 1.1k views
ADD COMMENT
2
Entering edit mode
2.8 years ago
GenoMax 147k

Use NCBI genomes page to see if there is genomic information available for the species you list. Here is one example. You can then visit the genome page and find fasta file for the genome.

Normally you can do this via https://www.ncbi.nlm.nih.gov/genome/ but the browse genomes tool on that page is not working today.

ADD COMMENT
0
Entering edit mode

I see this fairly often, and oddly enough many times it is by moderators: what seem to be direct answers to a question (like the one above) are entered as comments. Not only does that leave many questions "unanswered", but it doesn't allow for any answer to be accepted.

I wonder if my criteria for what constitutes a direct answer are off, or if moderators work with more stringent rules when answering questions.

If anyone cares, I have had this comment saved in a file at least a month ago, which was when I first had this thought. I decided at that point not to enter it as I didn't want the hard-working moderators to feel attacked for something that may be perceived as a cosmetic point. Since then I have seen numerous examples of the same approach because now I am paying attention to it. Hopefully GenoMax doesn't feel singled out, even though I am using his/her post as an example.

ADD REPLY
1
Entering edit mode

Mensur Dlakic your point is fair and I am guilty of this. I do at times post my answers as comments (especially if they feel partial to me or if the original question is not exactly clear as to what OP wants). Once OP confirms their utility I generally move them to an answer. I try to do that when I notice it in other cases as well.

I originally thought that the seabird gut microbiome samples was the focus but this does seem like OP is simply looking for bird genomes. Will move my comment.

ADD REPLY
0
Entering edit mode

Thanks! I found most of the species I am looking for on NCBI. I cannot find the Northern gannet (Morus bassannus) genome but there is this NCBI Nucleotide page. Could I use a combination of the files found here to represent the northern gannet genome?

Or download data from this SRA experiment in FASTA format using this page?

ADD REPLY

Login before adding your answer.

Traffic: 1857 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6