sorting a multi-sample (genotype) vcf file
1
1
Entering edit mode
5.1 years ago
nagarsaggi ▴ 40

I have a freebayes genotyped multisample vcf file. I want to sort the names of the samples in alphabetical order to make my life a bit easy with post variant calling analysis. I have tried Picared SortVcf which work fine which works fine on a small file but failed on a large file (~4 Gb). If you suggest ways to sort a large multi-sample file without distorting the variants information, it would a great help.

SNP • 6.8k views
ADD COMMENT
7
Entering edit mode
5.1 years ago

Hey, try this:

$ bcftools query -l input.vcf | sort > samples.txt
$ bcftools view -S samples.txt input.vcf > output.vcf

If not already done, I would also suggest to use bcf instead of vcf or vcf.gz. This really improves speed when working with bcftools on large datasets.

fin swimmer

ADD COMMENT
0
Entering edit mode

It worked perfectly! Thanks

ADD REPLY
0
Entering edit mode

I spent a little bit too much time trying to figure out how to do this just to come here and find this simple solution. Thanks!

ADD REPLY
0
Entering edit mode

I have the same problem. Can you put an example of "samples.txt" please

ADD REPLY
0
Entering edit mode

it should just be a text file with no index or header and a single sample in each line

sample1
sample2
sample3

etc

ADD REPLY
0
Entering edit mode

$ bcftools query -l populations.snps_whitelistRetainedLociSamplesSingletonsHWETags.vcf | sort > sample.txt [main] Unrecognized command.

I can´t not run

Somebody can help me?

ADD REPLY
0
Entering edit mode

$ bcftools query -l populations.snps_whitelistRetainedLociSamplesSingletonsHWETags.vcf | sort > sample.txt [main] Unrecognized command.

I can´t not run I send my database Somebody can help me?

ADD REPLY

Login before adding your answer.

Traffic: 1541 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6