Hi everyone,
I need help for my project. Actually I don't know how to proceed with my results. Briefly I had the entire trancriptomes of different strains of lb. plantaum sequenced with the Illumina HiSeq. I got my results in fastq format (each of these files is about 12 gigs) and processed them with the software DNA-Star. I mapped them against another strains and got a list of genes (about 3000) with their relative expression level. Unfortunately the name of these genes is in a format (JDM1_RS05500, just to make an example) that is not related with anything in any database. If I try to search it on kegg or uniprot I don't have any result. The only one that gives me a result is ncbi but leads me to the webpage of the entire genome of the strain (i suggest you to try to search for it yourselves, just to have an idea - search for JDM1_RS05500).
This name corresponds to the locus_tag. I am trying to convert each of the 3000 gene in a format like this (agrB) to match it with the kegg database and find the pathway in which it's involved. There are two main problems though: 1) it will take me ages to look for each of my 3000 genes; 2) once I've done it, I wouldn't know how to proceed anyway.
I understand my big limit of knowledge in this field (I'm sure that even slightly experienced ones between you realized it) but I'm asking just for some suggestion to better understand where to start from.
Thanks a million to everyone
Thanks for your answer. Actually I already have a similar tab with all the protein products. Even in the genome you suggested there's no direct link to the kegg. SO I just can search for each of them individually. I was just wondering how to proceed after that. Is there a way to cross the genes with the pathway or maybe I should do it manually?