merging two different datasets of different samples and variants
0
0
Entering edit mode
6.4 years ago

Hello All,

I am vinay kumar reddy nannuru, i have a vcf file of 92 samples with 14000 variants in each and i want to merge with another publicly available dataset consists of 30,000 samples with 950,000 variants. How can i merge them by having all samples in same output file with similar variant positions in all samples of output. Could someone please give me a clear explanation. And second question how can i select some subset of samples from the second dataset of 30000 samples. Thank for your time.

Vinay Kumar Reddy Nannuru

SNP • 1.5k views
ADD COMMENT
1
Entering edit mode

What have you tried?

ADD REPLY
0
Entering edit mode

I havenot tried anything, i am new to do this. so i joined the group, i read many questions regarding this. What i understood it can be done by using plink but it is not so clear for me to do. Thanks for your reply.

ADD REPLY
0
Entering edit mode

search this site for 'bcftool merge' or/and 'gatk combinevariants'

ADD REPLY
0
Entering edit mode

thank you very much and i am working on it.

ADD REPLY
0
Entering edit mode

If you wish to work with ROD (Reference-Ordered Data) files such as VCF or BED, You should check out the following tools:

  • bcftools
  • vcftools
  • bedtools
  • bedops
  • GATK (sub-tools such as CombineVariants, FilterVariants, etc)
  • samtools (as needed)

One or more of the above will have utilities to do exactly what you want, although you might have to break down your task into smaller steps. Most of the tools above also support piping, so you can chain these multiple steps together to form a reusable pipeline.

ADD REPLY
0
Entering edit mode

thank you very much and i am working on it, i chose to use vcftools and when i used a command it shows folowing;

vcf-isec -f -n ../vcffiles/gbs.africe.impute ../../ZeaGBSv27_publicSamples_imputedV5_AGPv4-161010.vcf 

Could not parse: [../vcffiles/gbs.africe.impute]
 at /usr/local/bin/vcf-isec line 21
    main::error('Could not parse: [../vcffiles/gbs.africe.impute]\x{a}') called at /usr/local/bin/vcf-isec line 71
    main::parse_params() called at /usr/local/bin/vcf-isec line 11
ADD REPLY
0
Entering edit mode

-n needs an integer argument - I don't think you're using the command right.

ADD REPLY
0
Entering edit mode

hello mr ram, yes n is two files, i have added the parameter, and my output file only contains only my project samples with similar positions of both files. But what i want is output includes all the samples from my project data and public data with similar positions. How can i do this. thanks, vinay.

ADD REPLY
0
Entering edit mode

n is two files

-n needs an integer argument. 2 is an integer argument. The names of two files are not an integer argument.

ADD REPLY

Login before adding your answer.

Traffic: 2081 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6