Entering edit mode
6.4 years ago
vinayreddynannuru
▴
20
Hello All,
I am vinay kumar reddy nannuru, i have a vcf file of 92 samples with 14000 variants in each and i want to merge with another publicly available dataset consists of 30,000 samples with 950,000 variants. How can i merge them by having all samples in same output file with similar variant positions in all samples of output. Could someone please give me a clear explanation. And second question how can i select some subset of samples from the second dataset of 30000 samples. Thank for your time.
Vinay Kumar Reddy Nannuru
What have you tried?
I havenot tried anything, i am new to do this. so i joined the group, i read many questions regarding this. What i understood it can be done by using plink but it is not so clear for me to do. Thanks for your reply.
search this site for 'bcftool merge' or/and 'gatk combinevariants'
thank you very much and i am working on it.
If you wish to work with ROD (Reference-Ordered Data) files such as VCF or BED, You should check out the following tools:
One or more of the above will have utilities to do exactly what you want, although you might have to break down your task into smaller steps. Most of the tools above also support piping, so you can chain these multiple steps together to form a reusable pipeline.
thank you very much and i am working on it, i chose to use vcftools and when i used a command it shows folowing;
-n
needs an integer argument - I don't think you're using the command right.hello mr ram, yes n is two files, i have added the parameter, and my output file only contains only my project samples with similar positions of both files. But what i want is output includes all the samples from my project data and public data with similar positions. How can i do this. thanks, vinay.
-n
needs an integer argument.2
is an integer argument. The names of two files are not an integer argument.