I would like to run the kneaddata tool on some metagenomics data (illumina) I have from seabird gut microbiome samples in order to remove any host-associated reads. Simply put, I have to use bowtie2-build to build a custom database containing reads from the genomes of my selected study species, but I am very new to metagenomic analysis and I have a few beginner questions:
What sort of FASTA files I am looking for to build the custom database? Can I just download nucleotide sequences found off of RefSeq for example? Do I download all those sequences individually?
Also, where should I look to download the FASTA files I can use to build a bowtie2 index for the following seabird species: Morus bassannus, Uria aalge, Rissa tridactyla, Fratercula arctica
Thanks
I see this fairly often, and oddly enough many times it is by moderators: what seem to be direct answers to a question (like the one above) are entered as comments. Not only does that leave many questions "unanswered", but it doesn't allow for any answer to be accepted.
I wonder if my criteria for what constitutes a direct answer are off, or if moderators work with more stringent rules when answering questions.
If anyone cares, I have had this comment saved in a file at least a month ago, which was when I first had this thought. I decided at that point not to enter it as I didn't want the hard-working moderators to feel attacked for something that may be perceived as a cosmetic point. Since then I have seen numerous examples of the same approach because now I am paying attention to it. Hopefully GenoMax doesn't feel singled out, even though I am using his/her post as an example.
Mensur Dlakic your point is fair and I am guilty of this. I do at times post my answers as comments (especially if they feel partial to me or if the original question is not exactly clear as to what OP wants). Once OP confirms their utility I generally move them to an answer. I try to do that when I notice it in other cases as well.
I originally thought that the
seabird gut microbiome samples
was the focus but this does seem like OP is simply looking for bird genomes. Will move my comment.Thanks! I found most of the species I am looking for on NCBI. I cannot find the Northern gannet (Morus bassannus) genome but there is this NCBI Nucleotide page. Could I use a combination of the files found here to represent the northern gannet genome?
Or download data from this SRA experiment in FASTA format using this page?