Entering edit mode
14 months ago
ethan.kreuzer
•
0
I have an issue working with vcf files in that when I rename entries with no ID to be CHR:POS:REF:ALT, the ID's produced are sometimes too long when converted to plink .bed files. I do not want to change the name convention to something different than CHR:POS:REF:ALT and I do not want to removed them. Is there away to filter out entries in VCF files such that if their ID's are longer than say, 15 characters, they get removed ?
Example, the following entries with these ID's will be kept:
rs145699
rs343930204
chr6:10550:A:T
chr6:54032:G:C
The following would be removed:
chr6:38458939:A:TTTTCCT
chr6:35908:CCCCCCG:G
I am looking for a command that uses bcftools to do this ideally. Thank you!
0
Entering edit mode
- Note that 15 is always too low of a length limit. Even without the "chr" prefix, there are lots of SNPs on e.g. chr10 with POS > 99999999, which would be filtered out by this rule. The lowest limit I'd ever recommend imposing is 39; this corresponds to EIGENSOFT's capacity.
- Alternatively, you can just filter out indels, if there are any other reasons your analysis would have problems with them. "plink2 --snps-only" is one easy way to do that; there are many others.
ADD REPLY
• link
14 months ago by
chrchang523
11k