Hi, how can I accomplish the task in the title? I have some reads from Illumina and I'd like to know if the species is a wolf (Canis lupus). I have to distinguish it from Canis lupus familiaris.
Hi, how can I accomplish the task in the title? I have some reads from Illumina and I'd like to know if the species is a wolf (Canis lupus). I have to distinguish it from Canis lupus familiaris.
You can do this by taking a sample of reads from your data and then blasting them at NCBI (use Canis (taxid:9611)
to limit your search to Canis genus. You can use reformat.sh
from BBMap suite like this (sampling 30 reads and converting to fasta format in process):
reformat.sh in=your_file.fq.gz out=sampled.fa skipreads=300000 samplereadstarget=30
Note: You could do two searches. First without limiting to Canis to do a non-biased search. If that only shows Canis as primary hit then you are done otherwise do the genus limited search.
Edit: As @Lieven said above it may be difficult to identify the species apart if the genome are very closely related.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I would assume those genomes will be quite similar so making a distinction with relative short reads will be not straightforward (if possible at all).
Great, I just thought it. Sh files it's a great solution also because I'm planning to put a server on in my lab. One things, I suppose that the sampling is random, right?
For closed species I was thinking to search a gene or a marker (16s rRNA or other) that generally are used for metagenomics application. They are capable to distinguish very close species.
Thanks very much
Yes sampling with
reformat.sh
is random unless you provide an identical seed each time.For future reference: Please use
ADD COMMENT/ADD REPLY
when responding to existing posts to keep threads logically organized.Great, I just thought it. Sh files it's a great solution also because I'm planning to put a server on in my lab. One things, I suppose that the sampling is random, right?
For closed species I was thinking to search a gene or a marker (16s rRNA or other) that generally are used for metagenomics application. They are capable to distinguish very close species.
Thanks very much
Are we actually talking about genomic or RNA illumina reads? if the later, 16s rRNA will not help you much as normally RNA-samples are being depleted of rRNA.
If your otherwise on the look for specific marker genes, the random sampling will of course not help you, you will have to search them all to potentially find them. Coming to think of it this also holds true when you're working with genomic reads.
perhaps you need to enlighten us a bit more on the specifics or your data.
We are talking about genomic Miseq illumina whole genome reads. The sample is taken from the fur of an embalmed animal from my city museum. The d-loop region searching in 16s rRNA was inconclusive there was no band in gel electroforesis after amplification.