Question

Chr_Random Positions

3

Entering edit mode

13.7 years ago

Biomed 5.0k

The sequencing center we work with recently started to report variants for random chromosomes, we now end up with variants (whole exome) that have been mapped to one of the chrrandom. I would first ask for a good explanation of this chrrandom issue and secondly can anyone explain how to approach the analysis when it comes to these randomchr based variants? Should I just disregard them? I fear it would be hard to find many downstream annotations for these positions in the public databases like dbsnp? I am sorry for the question is not specific but I think I basically need chrrandom and exome sequencing 101 type of an answer or please direct me to a good read on this...

Thanks

chromosome random exome sequencing variant • 5.0k views

ADD COMMENT • link updated 13.4 years ago by Ning-Yi Shao ▴ 390 • written 13.7 years ago by Biomed 5.0k

1

Entering edit mode

Do you mean this?

ADD REPLY • link updated 5.3 years ago by Ram 44k • written 13.7 years ago by Michael 55k

1

Entering edit mode

to be honest I've never gone that far, since my current work end with reporting variants, but I can tell you that we haven't found (yet) variants on genes described on that contigs, surely because probes were not designed to cover them, and since we're doing exome sequencing they were quickly filtered out. nothing meaningful to date, I'm afraid. in case we find any, our intention is to process them in the same way we process the rest, although we foresee that annotation on those contigs will be very limited.

ADD REPLY • link 13.7 years ago by Jorge Amigo 14k

0

Entering edit mode

Michael's link is where you can read a little bit about those chr_random, chrUn or chr_hap, although it doesn't properly help you deciding what to do with variants found on those special contigs. without further explanation here, all I can tell is that my group has decided to disregard only chr_hap information even from the mapping step due to their exposure to natural selection (variation found on them could be spureus, lots of pseudogenes are present, ...), and to indeed consider the rest of the contigs (chr_random and chrUn) as they aren't placed on the genome just for algorithmic reasons.

ADD REPLY • link 13.7 years ago by Jorge Amigo 14k

0

Entering edit mode

Jorge can you elaborate on how do you treat a variant you identify in a chrN_random? Specifically if you want to see if this variant is found before or what is the frequency of this allele ( i.e dbsnp and or 1000 genomes) or what about deleteriousness prediction with SIFT or Ployphan? Did you ever get a nearly meaningful result from these or know a paper that argues so?

ADD REPLY • link 13.7 years ago by Biomed 5.0k

0

Entering edit mode

That's what I felt too, thanks for sharing your experience Jorge.

ADD REPLY • link 13.7 years ago by Biomed 5.0k

score 1 · Answer 1 · 2011-08-16

For what I know, these are the contigs of genome that are not quite sure the exact position. Because there are many factors effect the assembling of genome, so some contigs the consortium didn't integarate with whole genome, just labeled as chrUn_ (not sure which chromosome come from) or chr1__random (from chr1 already known). And for hg19, there are patches released when the consortium integerate the contig with genome (in hg19, the coordinates are reversed for contigs, so in the version hg19, the integeration patch doesn't effect the already sequence coordinates). More information you may check this: http://www.ncbi.nlm.nih.gov/projects...initions.shtml

For what I checked, some chrUn contigs have also some variants of rRNAs or such things. So I think you'd better exclude chr__random firstly, because the annotation is just duplication of the known annotated, so the result may be false positive, and acutally perhaps we should mapped the reads as variants to the annoatation record of reference choromosomes.

score 0 · Answer 2 · 2011-06-08

0

Entering edit mode

13.6 years ago

Dataminer ★ 2.8k

I am also working with NGS data analysis and although my work is more focussed on TF binding and urs is more focussed on SNP analysis.

We generally/mostly discard this Chr_rand in order to avoid any ambiguity in our data analysis.

In order to check how this Chr_rand is skewing your data just perform two analysis on same dataset one with chr_rand regions and one without and browse them to genome browser and you will see the difference.

ADD COMMENT • link 13.6 years ago by Dataminer ★ 2.8k

5

Entering edit mode

Seeing the difference is exactly the reason why chr_random should be included. Most people do not care about the SNPs/signals in unlocalized/unplaced contigs, but we do care false SNPs/signals caused by reads coming from these contigs but wrongly mapped to chromosomal regions.

ADD REPLY • link 13.5 years ago by lh3 33k

1

Entering edit mode

down vote?????????????

ADD REPLY • link 13.6 years ago by Dataminer ★ 2.8k

1

Entering edit mode

lh3 is right. including these random contigs in the pipeline sure increases the mapping time, but it definitely improves the mapping (and forthcoming) results by removing reads that could otherwise map wrongly, hence lowering the variant calling power and quality.

ADD REPLY • link 13.5 years ago by Jorge Amigo 14k