Fastest Way To Rename Sample Name In 37 Gb Gzipped Vcf File Or Binary Ped File
2
3
Entering edit mode
12.4 years ago
Kevin ▴ 640

Hi I have a project file for 2000 individuals, with SNPs from WGS. We have decided to change the sample names to a more uniform way of naming individuals from different sources.

I wish to do a quick check if there's inbuilt functions in plink or plink seq or vcftools to do this i.e. pass it a sample name remapping file to output a new set of fam ids and idv ids.

Otherwise, i might consider doing this using sqlite since there's a possibility I might have to slice the data in another way again

vcf ped bed • 12k views
ADD COMMENT
0
Entering edit mode

Plink can update individual names in FAM and ped files and the like.

http://pngu.mgh.harvard.edu/~purcell/plink/dataman.shtml#updatefam

however I know from PLINKSEQ just changing the files, might not actually work.

ADD REPLY
0
Entering edit mode

Hi, running into an identical problem here. Do you recall how you ended up doing this?

Thanks in advance,
Steve

ADD REPLY
5
Entering edit mode
12.4 years ago
tiagoantao ▴ 690

As long as you do NOT (i) change the order or (ii) remove/add individuals you can edit the FAM file for ids. My recommendation would be a script to do the correction on the FAM file.

Remember: the order and the number of individuals stay the same. Other than that, you can change the FAM ids

ADD COMMENT
4
Entering edit mode
12.3 years ago
Nick Crawford ▴ 210

You could create a new vcf header with the final #CHROM line updated to reflect the new sample names. Then use tabix to replace the older header with the new one.

# Replace VCF header. The file must be compressed by bgzip. 
tabix -r header.txt big.vcf.bgz > big.vcf.new_header.bgz

More here: http://vcftools.sourceforge.net/docs.html#one-liners

Edit: To get your VCF in the correct format and to create the tbi index do something like...

gunzip -c big.vcf.gz | bgzip > big.vcf.bgz
tabix -p vcf big.vcf.bgz
ADD COMMENT

Login before adding your answer.

Traffic: 1964 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6