Entering edit mode
8.6 years ago
bioroma.spb
▴
50
Hello everyone,
I have a whole folder of VCF's generated by GATK CombineVariants. I want to remove variants (entire rows) containing ":R" or ":F" (but not ":FR") strings in INFO column. What is the best way to do this?
Thank you! Problem solved.
UPD: I've encountered another problem: command you wrote leaves rows with :F at the end of the column. Do you have any suggestions why?
I've updated the AWK command to take that into account. The awk commands interprets '[^R]' as "any term that isn't R". So if ':F' is at the end of the field, it will not exclude it because it is expecting a term that isn't there. I have fixed this issue by writing '[^R]*$' instead. The asterix stands for "0 or more" and the dollar sign stands for the end of the field. It will therefore remove lines with ':F' if it's at the end of the field or otherwise anything that isn't ':FR'.
Thank you again! Now everything works well.