Question

How To Separate Reads From Two Different Species In Exome Dataset?

8

Entering edit mode

13.0 years ago

2184687-1231-83- ★ 5.1k

I would like to know if there is any clever protocol to separate reads for an exome Illumina sequenced dataset from a sample of a heterotransplanted human tumour into an immunodeficient rodent (BALB/c train):

http://www.nature.com/nprot/journal/v2/n2/full/nprot.2007.25.html

The exome sample sequenced would contain both reads belonging to the human cancer cells sequenced that would have been enriched from surrounding mice cells due to the cross-species sequence annealing during the exome enrichment protocol.

Is there any clever way of separating the read sets from human and mouse in such a case?

exome • 4.0k views

ADD COMMENT • link updated 7.9 years ago by Biostar 20 • written 13.0 years ago by 2184687-1231-83- ★ 5.1k

score 8 · Answer 1 · 2012-01-06

8

Entering edit mode

13.0 years ago

brentp 24k

If you use an aligner that supports references greater than 4GB, and that allows you to pull out uniquely mapped reads (e.g. BWA 0.6+ ) Then you could include both mouse and human genomes into a single reference FASTA file. You'd have to prefix, so that human chromosome 1 is hg19chr1 and mouse is mm9chr1.

Then, when you pull out uniquely mapped reads, you'll know which organism they came from.

You will not be able to use this to separate reads that map equally well to either reference.

ADD COMMENT • link 13.0 years ago by brentp 24k

2

Entering edit mode

+David Quigley, that would only happen if a miscall happened to make a read from human more like mouse, right? BWA should still be able to find the correct mapping in human, and infer it's correct by pairing, right?

ADD REPLY • link 13.0 years ago by brentp 24k

1

Entering edit mode

If you have paired data (which you probably do) you're going to get hosed when one read in the pair maps to human Chr1 and the other maps to mouse Chr2. BWA will penalize the alignment score because the apparent read gap distance is huge. Just something to keep in mind.

ADD REPLY • link 13.0 years ago by David Quigley 11k

0

Entering edit mode

+1 for this solution, which is how we have dealt with reference-based mapping from hybrid genome sequences.

ADD REPLY • link 13.0 years ago by Casey Bergman 18k

0

Entering edit mode

so if there are reads that map equally well to both species, they will share coverage between one and the other? This will probably end up in sudden drops in coverage for exons that are highly conserved between mouse and human, is that right?

ADD REPLY • link 13.0 years ago by 2184687-1231-83- ★ 5.1k

0

Entering edit mode

somewhat, but those reads will still be mapped, you'll just have to pull them out another way.

ADD REPLY • link 13.0 years ago by brentp 24k

score 2 · Answer 2 · 2012-01-06

2

Entering edit mode

13.0 years ago

Larry_Parnell 16k

Good and interesting question, but do you need to separate the reads based on species? Could you not succeed in the goals of exome sequencing with mouse and human reads mixed, then sorting out species based on alignments to something like RefSeq mRNAs? I would think that would all work fine.

I would filter beforehand for common repeats like the human Alu, which is known to be expressed as mRNA. Mouse B1 elements can be filtered as well.

ADD COMMENT • link 13.0 years ago by Larry_Parnell 16k

0

Entering edit mode

Good idea about using information from species specific TE lineages.

ADD REPLY • link 13.0 years ago by Casey Bergman 18k

score 0 · Answer 3 · 2012-01-06

0

Entering edit mode

13.0 years ago

Damian Kao 16k

This paper might be of interest to you.

Also check out barcode of life

ADD COMMENT • link 13.0 years ago by Damian Kao 16k

0

Entering edit mode

He want's to assign all reads to either mouse or human, what you suggest is to identify a few unique sequence snippets to see which species are present (which is known in this case).

ADD REPLY • link 13.0 years ago by Michael Kuhn 5.0k

0

Entering edit mode

I think this method would probably not be that good at separating reads other than telling me that there are human and mouse reads in my dataset.

ADD REPLY • link 13.0 years ago by 2184687-1231-83- ★ 5.1k