I have two files, file1 contain, information regarding gene start positions and gene end positions with their chromosome locations and file 2 contain variant information in particular chromosome location it may be in CDS/5'UTRs/3'UTRs or some where else, I want to compare file2 (variant information) to file1 (complete gen information) and want to filter only those gene from file1 that have variation according to file2. for example
file2
Don-0 1 288 A G 3 1 1 1
Don-0 1 291 T A 5 2 1 1
Don-0 1 303 T C 18 3 1 1
Don-0 1 310 C G 14 2 1 1
Don-0 1 317 C T 23 4 1 1
Don-0 1 331 A T 32 6 1 1
Don-0 1 344 A C 32 9 1 1
Don-0 1 352 G C 32 7 1 1
Don-0 1 365 C G 32 6 1 1
Don-0 1 **4000** A G 25 6 1 1
`
file1
chromosome 1 30427671 . . . ID=Chr1;Name=Chr1
**gene 3631 5899 .** + . ID=AT1G01010;Note=protein_coding_gene;Name=AT1G01010
mRNA 3631 5899 . + . ID=AT1G01010.1;Parent=AT1G01010;Name=AT1G01010.1;Index=1
protein 3760 5630 . + . ID=AT1G01010.1-Protein;Name=AT1G01010.1;Derives_from=AT1G01010.1
exon 3631 3913 . + . Parent=AT1G01010.1
five_prime_UTR 3631 3759 . + . Parent=AT1G01010.1
CDS 3760 3913 . + 0 Parent=AT1G01010.1,AT1G01010.1-Protein;
exon 3996 4276 . + . Parent=AT1G01010.1
CDS 3996 4276 . + 2 Parent=AT1G01010.1,AT1G01010.1-Protein;
We want to filter only gene which contains variant position from file2 like
**gene 3631 5899 .** **4000** **A G**
Need your valuable comments and suggestions
I'm sure you can find many answers on this site for your question. Search for "bedtools" please.