I am using bcftools to save a bcf2 file (chr22.bcf2) for a later analysis using plink.
bcftools view --samples-file sample.txt --force-samples --types snps --output-type b chr22_genotypes.vcf.gz | bcftools norm -Ob -m+any | bcftools norm -Ob --fasta-ref human_g1k_v37.fasta | bcftools annotate --output-type b --output chr22.bcf2 -I '%CHROM:%POS:%ID'
But when I load this file into plink, invalid chromosome was found.
plink --bcf chr22.bcf2 --make-bed --out chr22
Error: Invalid chromosome code 'hs37d5' in .bcf file.
This error was not returned when I save the file as vcf and load it with plink. And when I checked which line has a chromosome code of "hs37d5", the below message was returned.
bcftools view chr22.bcf2 | grep -e "hs37d5" | cut -f1-7
##reference=ftp://ftp.1000genomes.ebi.ac.uk//vol1/ftp/technical/reference/phase2 _reference_assembly_sequence/hs37d5.fa.gz
##contig=<ID=hs37d5,assembly=b37,length=35477943>
Are these the header lines? Can exclude them when loading with plink? or exclude them when saving as bcf2?
I am not sure about the steps you mentioned.
My original data is chr22_genotypes.vcf.gz. Do you mean I can remove the contig parameter in chr22_genotypes.vcf.gz and save as chr22.vcf And then apply
bgzip chr22.vcf | tabix -p chr22.vcf.gz
to chr22.vcf?And then apply this command to convert to bcf?
Also, do you mean remove the contig parameter using bcftools? Could you suggest the command I should be using?
Hey, no, I only convert the BCF (binary compressed) back to VCF (plain text) so that I can manually remove the line using the vi editor (or any other text editor). This is a perfectly reasonable thing to do, provided that you understand.
I see. Thank you very much.