Hi,
I'm a very beginner on using bash so my question may seem stupid for some of you. I have a VCF annotated file with a big number of samples. I want to subset a file from this one with all the variations of a gene (located on the chromosome ($1 = chr9) and between the position ($2 = POS) 81583683 and 81689305.
I used the awk command after modifications awk '{$1== "chr9" && 81583683 <$2< 81689305}' VCF1 > VCF2
but had always error message.
Can anyone tell me please if the awk command is correct in this case for selection with 2 conditions or I should use another command?
Thank you
Thank u for help! I used the command of bcftools after indexing the vcf file. my command line looks like this: bcftools view file1.vcf.gz "chr9:81583683-81689305" -O v file2.vcf. It works but it doesn't return all the variations that i want to get, just some of them while I want to get all the variations even the duplicated one.
show us the variants ignored by the command above
Its huge number of variations ignored (I have file with 800 samples and i want to search the variations for all the samples in this region). The command generates only some of variation and just once ( for exemple, if a variation appears in 5 samples, i want to find 5 lines with this variation in the generated file, however with this line command, either I don't find it in the generated file or i find it just one time (on line))
that's still not clear to me