Question

Dna-Seq Multiple Mapped Tags

1

Entering edit mode

12.4 years ago

KCC ★ 4.1k

During a typical DNA-seq experiment, tags mapped to multiple positions are filtered out. What are the pluses and minuses of taking each multiple-mapped tag, counting the number of positions it maps to and adding (1/(# of mappings)) to all the places where the tag matches? So, if the tag maps to two positions, one adds 0.5 to each instead of the usual +1 to the mapped

Thus, I am showing the probability that the tagged mapped to that position. Has anybody ever tried this?

The reason I care about this is I want to examine behavior of my tags in hard to map regions such as repeats. If I follow the normal procedures then I get close to no information about these areas. I only need to get as granular as 500-1000bp so not overly precise.

• 2.2k views

ADD COMMENT • link updated 12.4 years ago by Fidel ★ 2.0k • written 12.4 years ago by KCC ★ 4.1k

score 2 · Answer 1 · 2012-12-03

What are the pluses and minuses of taking each multiple-mapped tag?

You need to be confident where your sequence comes from before doing statistical analysis, of course you can try to use that method and check for variations in your datasets, just be careful with your statistics.

But with technologies pushing longer high quality reads, the proportion of multi-mapped reads will be irrrelevant soon.

Thus, I am showing the probability that the tagged mapped to that position. Has anybody ever tried this?

Yes, check the literature, this strategy has been used in DNAseq and RNAseq (sorry but I don't have the references in my head right now). Some tools even use this method for initial step before redistributing the values in iterative steps, check Cufflinks related papers.

Istvan Albert · Answer 2 · 2012-12-05

2

Entering edit mode

12.4 years ago

Fidel ★ 2.0k

What Salzberg endorses in this paper (http://www.nature.com/nrg/journal/v13/n1/full/nrg3117.html) is to map a multi-read randomly to one of the mapping positions. This has the following advantages:

Is faster compared to mapping all positions the read may map to.
The output mapping file does not need any extra processing besides the usual pipelines.
Is built-in in some of the aligment software.

Regarding your proposed method I have mostly seen in applied in the context of RNA-seq. Here is a paper on the topic: http://bioinformatics.oxfordjournals.org/content/26/4/493.long

ADD COMMENT • link updated 12.4 years ago by Istvan Albert 102k • written 12.4 years ago by Fidel ★ 2.0k

0

Entering edit mode

Great. This was really helpful.

ADD REPLY • link 12.4 years ago by KCC ★ 4.1k