Hi, I am doing CNV detection based on read depth and wanted to compare my result with the golden set published by the 1000GP group. So, I downloaded their results from the link : ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/pilot_data/paper_data_sets/companion_papers/mapping_structural_variation/
I am very puzzled and hope somebody can help me. According to the reference genome,
awk 'NF>2' humang1kv37.fasta
...
19 dna:chromosome chromosome:GRCh37:19:1:59128983:1
...
chr19 runs from 1 to 59128983. But, CNVs on chr19 in their results go beyond 59128983
awk '{if ($1==19) print $1"\t"$2}' union.2010_06.deletions.sites.vcf
...
19 63742587
19 63788277
where $2 is the start positions of CNV.
Please, can somebody enlighten me where I did wrong ?
Many thanks! Yh
please do be careful when lifting over any variation as underlaying sequence and structure changes between two assemblies can mean thats straight coordinate mapping may be inaccurate
That seems to be the problem. Thank you very much for mentioning the liftover tool! The UCSC website is a treasure. ... YH