Hello Biostars,
I was wondering if there is a PLINK file available that has the copy number status of all structural variants (SVs) for samples in the 1000 Genomes. I know they are often named with names such as MERGED_DEL_2_106009 which is the same as esv2666691 in the UCSC genome browser. What I wanted to know is either A) if the file (or one like it) already exists or B) if not, what would be the easiest way to generate such a file. Ideally we would have a file that has the sample name on each line and each column would have a 0,1, or 2 for the copy number state for each SV.
I can pull them one at a time and I imagine there is a way to use tabix to get all SVs for a chromosome for all 1000 Genomes samples and then combine VCF files, but wonder if we could do this for all SVs in the genome at once.
Seems like something I should know, but do not. So I appreciate any help. There are apparently calls available in the current release.
Thanks,
Rx
Not an answer to the question, but ostensibly the most recent phase3 release does have copy number declarations in it. That information comes from Laura Clarke with the 1000 Genomes Project.
Ryan