Question

How to find gene annotation given a position in a bacterial genome

0

Entering edit mode

3.0 years ago

Madde ▴ 20

I have a .bed and .gff file for a bacterial genome of Gardnerella vaginalis. I also have a csv that contains lists of positions where there are mutations. Ex: 10401, 224444.

I want to feed in the position and figure out the gene or intergenic region the mutation is in. So put in 10401 which is the nucleotide position in the genome, and output what gene annotation or region it is.

How do I do this, are there available tools?

position annotation snp intergenic • 966 views

ADD COMMENT • link 3.0 years ago by Madde ▴ 20

score 1 · Answer 1 · 2021-12-03

1

Entering edit mode

3.0 years ago

Alex Reynolds 35k

To do an ad-hoc search via the BEDOPS kit:

$ gff2bed < genes.gff > genes.bed
$ echo -e 'chrZ\t10400\t10401' | bedops -e 1 genes.bed -

Replace chrZ with the name of your contig.

Once you have a feel for this, put your zero-indexed positions of interest into a tab-delimited, sort-bed-sorted BED file to run a full search:

$ bedops -e 1 genes.bed positions.bed

Or to get associations, you can use bedmap:

$ bedmap --echo --echo-map positions.bed genes.bed

This will report each position, along with any genes that associate with that position, where there are overlaps.

Make sure contig names are consistent between gene annotations and positions, and that BED files are sorted properly, per sort-bed.

ADD COMMENT • link 3.0 years ago by Alex Reynolds 35k

0

Entering edit mode

Thank you so much! I had success pulling out the gene associated with the manual position entry. However I am stuck on how to create a .bed file out of a .txt file of positions. The positions I have are just one number because they are SNPs so I have a .txt file that looks like this:

40136

47092

136648

165946

219134

Thank you I really appreciate the help!

ADD REPLY • link 3.0 years ago by Madde ▴ 20