Hi, for context I am a 3rd year undergraduate student majoring in Math minoring in Applied Stats and CS. I am about to start a research internship using R and Bioinformatic tools, and from a brief preliminary meeting it was revealed my first task to be able to write a program to create a table holding 500 * 106 snippets of genetic code given in a dataset and showing which gene code they originated from given all 50k gene codes.
Having no knowledge of Bioinformatic tool kits and libraries, would I be developing an algorithm myself to do all of these comparisons (seems like awful running time brute forcing it) or are there already pre existing sources to do tasks like these?
Thank you!