Entering edit mode
8.3 years ago
MAPK
★
2.1k
I have a large vcf file. I need to split them by chromosome for which I have used the following command below. I was able to extract and the chromosomes chr1 to chr22, but could extract chrX,chrY and chrM. How can I get the split the file for those three chromosomes?
command I used :
bgzip -c myvcf.vcf > myvcf.vcf.gz
tabix -p vcf myvcf.vcf.gz
tabix myvcf.vcf.gz chr1 > chr1.vcf
Any luck with this problem? Same for me with tabix version: 1.9, when I am trying to extract variants from chromosome X, no outputs.
I found the same problem for bedtools2 intersectBed, no chrX outputs.
Either no X present or the name you use is different from what is actually used in the file. Is it like
chrX
or ratherX
? Try to find that out.Thanks.
It is not a chrX or X problem. The vcf files are from "http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/release/20190312_biallelic_SNV_and_INDEL/". Genomic regions other than X work fine.
Here is the all command I used (sorry I don't know how to use code mode):
1, ls ALL.chr*.gz |sort -V > file
2, for a in ls ALL.chr*.gz; do bcftools index $a; done
3, bcftools concat -f file -o 1kg38.vcf --threads 10
4, sort bed -k1V -k2n > extract.bed The content of extract.bed looks like:
5, bgzip 1kg38.vcf -@ 10
6, tabix -p vcf 1kg38.vcf.gz
7, tabix -h -R extract.bed 1kg38.vcf.gz > extract.vcf or bcftools view -R extract.bed 1kg38.vcf.gz > extract.vcf
Please stop using the answer field for comments, I already moved your previous one to comment for that reason. Also please highlight code with the
10101
button. The code itself is fine, the VCF simply does not contain the variants you look for. If you want to visualize it simply transform the chrX file to BED usingand then load into the IGV browser. There is nothing in the regions you are querying.
Hi, I probably figure it out. It is not the problem of tabix, but the problem of the 1000G vcf records. There is a huge gap in the chrX vcf, the coordinate jumps from 2781457 to 155703812, leaving no info in the middle.