Entering edit mode
7.9 years ago
burcakotlu
▴
40
Hi to all,
I would like to calculate pairwise LD for two given genomic locis (not rsIDs)? Is it possible?
Thanks, Burçak
Thank you for your reply.
I have read that vcftools calculates pairwise LD through the arguments below from this website (https://vcftools.github.io/documentation.html#ld).
./vcftools --vcf input_data.vcf --hap-r2 --ld-window-bp 50000 --out ld_window_50000
If I get it right, input_data.vcf contains the genomic coordinates of interest. But for which population does it calculates LD? And do I need to download any data so that vcftools will utilize it during LD calculation?
I could not understand this part.
Any idea?
Burçak
Example
vcftools --gzvcf ALL.chr5.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz --chr 5 --from-bp 1000000 --to-bp 1100000 --out chr5_analysis --keep Samples.txt --hap-r2
input_data.vcf
You can use vcf files from 1000 Genomes Project (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/). However, these files have all the subjects from 1000. You need to create a list of samples you want to use.
Samples
Choose your samples from the file integrated_call_samples_v3.20130502.ALL.panel available with the data.
Samples.txt
NA06984
NA06985
NA06986
NA06989
Do you mean that vcftools will use "ALL.chr5.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz" file in order to calculate LD and vcftools will calculate LD for all genomic coordinate pairs in --chr 5 --from-bp 1000000 --to-bp 1100000?
If yes, I want to provide genomic positions in a file instead of "--chr 5 --from-bp 1000000 --to-bp 1100000"?
And is there a way to calculate LD using output of WES data coming from parents and a child?
Or does it have to consist of a lot of samples?
Thanks, Burçak
1- Input file : If you have your own data (which you should always specify when you ask a question), replace "ALL.chr5.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz" with your vcf file.
2- LD : Yes, vcftools will calculate the LD for all coordinate pairs in the region. If you want the LD between two variants, create a vcf file with only those variants and it will work. The --chr, --from-bp and --to-bp options can be removed.
3- Parents and Child : No idea
Yes my main question right now is "Can we give parents and their child data in vcf format as input file to vcftools and calculate LD for various genomic loci using this input file?"
Thank you, Burçak