Finding species matching fastq reads
1
0
Entering edit mode
6.7 years ago
netsam ▴ 10

Hi, how can I accomplish the task in the title? I have some reads from Illumina and I'd like to know if the species is a wolf (Canis lupus). I have to distinguish it from Canis lupus familiaris.

next-gen • 2.5k views
ADD COMMENT
1
Entering edit mode

I would assume those genomes will be quite similar so making a distinction with relative short reads will be not straightforward (if possible at all).

ADD REPLY
0
Entering edit mode

Great, I just thought it. Sh files it's a great solution also because I'm planning to put a server on in my lab. One things, I suppose that the sampling is random, right?

For closed species I was thinking to search a gene or a marker (16s rRNA or other) that generally are used for metagenomics application. They are capable to distinguish very close species.

Thanks very much

ADD REPLY
0
Entering edit mode

Yes sampling with reformat.sh is random unless you provide an identical seed each time.

For future reference: Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

ADD REPLY
0
Entering edit mode

Great, I just thought it. Sh files it's a great solution also because I'm planning to put a server on in my lab. One things, I suppose that the sampling is random, right?

For closed species I was thinking to search a gene or a marker (16s rRNA or other) that generally are used for metagenomics application. They are capable to distinguish very close species.

Thanks very much

ADD REPLY
0
Entering edit mode

Are we actually talking about genomic or RNA illumina reads? if the later, 16s rRNA will not help you much as normally RNA-samples are being depleted of rRNA.

If your otherwise on the look for specific marker genes, the random sampling will of course not help you, you will have to search them all to potentially find them. Coming to think of it this also holds true when you're working with genomic reads.

perhaps you need to enlighten us a bit more on the specifics or your data.

ADD REPLY
0
Entering edit mode

We are talking about genomic Miseq illumina whole genome reads. The sample is taken from the fur of an embalmed animal from my city museum. The d-loop region searching in 16s rRNA was inconclusive there was no band in gel electroforesis after amplification.

ADD REPLY
2
Entering edit mode
6.7 years ago
GenoMax 147k

You can do this by taking a sample of reads from your data and then blasting them at NCBI (use Canis (taxid:9611) to limit your search to Canis genus. You can use reformat.sh from BBMap suite like this (sampling 30 reads and converting to fasta format in process):

reformat.sh in=your_file.fq.gz out=sampled.fa skipreads=300000 samplereadstarget=30

Note: You could do two searches. First without limiting to Canis to do a non-biased search. If that only shows Canis as primary hit then you are done otherwise do the genus limited search.

Edit: As @Lieven said above it may be difficult to identify the species apart if the genome are very closely related.

ADD COMMENT

Login before adding your answer.

Traffic: 2085 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6