I have a excel file from windows (spreadsheet) contains start, end and indel sequences (-GGC means GGC deletion or +AAC means AAC insertion) in column 1, 2 and 3 respectively. I have a reference genome in both fasta format and EMBL format (annotated). How can I locate my indels in the genome based on reference? I want to find where eaxctly my indels are (e.g. in which gene or intergenic regions?)
e.g. my file
Start End Indel_Description
12 20 +GCCGCAC
45 46 -C
What genome? Do you have the positions of genes in the genome (base pair locations?
Bacterial genome...I have the positions of all genes. Now I have to compare my indel positions to allocate genes..!!