filtering out rsid from vcf files.
0
0
Entering edit mode
2.8 years ago
rheab1230 ▴ 140

Hello, I have a vcf file in which there are many snp matching to genes. So one gene have many rsid values to it. I want to filter the vcf file so that it contain only 100 rsid uniquely mapped to 100 genes. meaning each gene should matched to only 1 SNP and not many. I tried to use bcftools filter option but not getting desired results. This is how my one rsid value in vcf file look like;

22      16050075        rs587697622     A       G       100.0   PASS    AC=1;AF=0.000199681;AN=5008;NS=2504;DP=8012;EAS_AF=0;AMR_AF=0;AFR_AF=0;EUR_AF=0;SAS_A
F=0.001;AA=.|||;VT=SNP;ANN=G|intergenic_region|MODIFIER|CHR_START-LA16c-4G1.3|CHR_START-ENSG00000233866|intergenic_region|CHR_START-ENSG00000233866|||n.16050
075A>G||||||  GT  

So this rs587697622 corresponds to ENSG00000233866 but there are several rsid corresponding to this gene. I just want to filter unique rsid mapped to one gene only.

vcf genotype rsid gene • 875 views
ADD COMMENT
0
Entering edit mode

I am trying to create a simulated data using only one snp mapped to one gene id. I can use multiple variations for the gene but it will take a lot of time. So I taught of taking only unique rsid mapping to just unique genes and consider those for my simulated studies.

ADD REPLY
0
Entering edit mode

this is one of those problems that pretty that very few people needed to solve before, hence a ready-made solution may not available,

but at the same time is one that is almost trivial to solve with minimal programming skills:

read the file, collect the gene names, then output a line only if the gene has not been seen before, just a few lines of code really

ADD REPLY

Login before adding your answer.

Traffic: 1936 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6