Entering edit mode
6.2 years ago
bsmith030465
▴
240
Hi,
I want to filter my bcf file such that:
- Autosomal chromosomes are filtered with DP > 20
- X chromosome, for females is filtered with DP > 20
- X chromosome, for males, with DP > 10 (i.e relax this criteria for males)
My current plan is to split the data into autosomal (and chrY) and X chromosome using something like (any way to specify chr1-chr22?):
bcftools view -i 'DP>20' input.bcf --regions chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22,chrY -Ob -o autosomalY.bcf
## sample ids as they appear in input.bcf file
bcftools query -l input.bcf > allsampleids.txt
I can process the X chromosome using:
## assuming femaleIDs.txt contains female sample ids
bcftools view -i 'DP>20' input.bcf --regions chrX --samples-file femaleIDs.txt -Ob -o Xfemales.bcf
bcftools view -i 'DP>10' input.bcf --regions chrX --samples-file ^femaleIDs.txt -Ob -o Xmales.bcf
My questions:
- How should I recombine the outputs, such that I again have one bcf dataset? i.e. how do I combine autosomalY.bcf, Xfemales.bcf and Xmales.bcf? Not sure if this is possible or even a good idea....
- Is there a cleaner way to do this?
Thanks!
Hi Pierre,
Thanks!!!
It would be extremely helpful if there was some documentation with this (where is it reading the male/female ids?). Also, what if I wanted to to do 'DP>10 & GQ >10', how do I incorporate multiple conditions?
my bad ! I forgot to add the link : http://lindenb.github.io/jvarkit/VcfFilterJdk.html
it's hard coded:
basically it requires to a knowledge of java and the library for HTS. But there are many examples or links to previous biostar post in the manual : http://lindenb.github.io/jvarkit/VcfFilterJdk.html