Entering edit mode
3.8 years ago
shubhamkumbhar420
▴
40
I have two .tsv files, 1st file contains variant calling data in which column "gene known gene" contains all variant gene names. In my 2nd file, I have genes that are related to diabetes. Now I want to filter file 1 to contains only genes that match with fie 2. idk how to filter it! Thanks in advance
please provide some file examples, as far I can understand, you can use
grep
or a perl/python script.Suppose this is my file 1 and in this column "Gene.knownGene" contains all my variants. Now I have file 2 which contains variants related to a particular disease. So I want to compare file 1 to file 2 and delete all variants which are mismatched Note: files are much bigger than this eg. and file 1 looks like a normal VCF file
Thanks in advance
File 1
Chr Start End Ref Alt Func.knownGene Gene.knownGene
chr1 183401 183401 C G intronic FO538757.3
chr1 183629 183629 G A intronic ABCC8 (6833)
chr1 601515 601515 T C ncRNA_exonic RP5-857K21.4
chr1 601544 601544 G A ncRNA_exonic RP5-857K21.4
chr1 601606 601606 G T ncRNA_intronic RP5-857K21.4
chr1 610767 610767 G A ncRNA_intronic AKT2 (208)
chr1 610795 610795 A G ncRNA_intronic RP5-857K21.4
chr1 611072 611072 A C ncRNA_intronic RP5-857K21.4
chr1 611073 611073 G A ncRNA_intronic RP5-857K21.4
chr1 611317 611317 A G ncRNA_intronic RP5-857K21.4
can you share an example of the second file?
File 2
ABCC8 (6833)
HYMAI (57061)
HYMAI (57061)
KCNJ11 (3767)
PLAGL1 (5325)
ZFP57 (346171)
HNF1B (6928)
AKT2 (208)