Entering edit mode
3.8 years ago
mxm189
•
0
Hi everyone,
Basicly the question is in the title. Basically, I have a file in galaxy that contains all mouse genes but I need to remove genes so that I only have the genes related to glucose metabolism left. Does anyone have a good script or any idea how I should go about this? I thought about building a different file from scratch, using GO annotation (AmiGo2 search) but I dont even know which extension or in what program I should save the downloaded data... and then convert it into GAF or something. But yeah clearly I'm new at this so, please if someone could help me out that would be awesome.
-Koen
The hardest part is not the filtering, its getting genes related to "metabolism". Different people would probably give you different definitions for "genes related to metabolism".
This is definitely a valid point, I decided to further specify to only glucose metabolism related genes. I suppose this would narrow things quite a bit?
For what purpose do you need the extracted data for? Unless you need something specific, wouldn't using something like
grep -f
with a gene list you can manually curate from existing GO annotations (e.g. here) be enough?Grep -f? Im a complete newbie, as in I normally only do wetlab work, sorry about that. I also found that list, but how do I use grep-f to filter on GO annotation if the file that I'm working with GTF(GFF) doesn't contain any GO annotations. Is manually searching for the genes the only way?
Download a file export from the page linked by @newbio17 above (use the
text
file button at top left) and save the file locally. OR This link may work for that purpose.We are going to get the gene names from this file. There should be 326 genes.
Download the GTF file from GENCODE (if you don't have one already).
gunzip
the GTF file to uncompress it.We will extract only the lines that are for "glucose metabolic" genes present in
gene_names
file using the following command. Please note that there are multiple transcripts for each gene.Please be patient, it may take some time to process the gene list.
genes_of_interest.gtf
will contain genes of interest.Thank you, but I ran into some trouble... almost immediately. I tried running the command in Ubuntu, twice. But it says: awk: Fatal: Cannot open file 'GO_term_summary_210131_211109.txt' (mine is called that) for reading (No such file or directory). I have the file on my desktop and Ubuntu also is running on my desktop I believe: username@desktop-3UMHGAT.
Make sure you are in the correct directory. If you have the file on your Desktop then you will need to
cd ~/Desktop
first.Same, no such file or directory :(. I have ubuntu installed for windows, I don't know why but maybe its because root = desktop?
It may be best if you spent some time learning basic unix command line. I recommend this guide for new users.
Once you figure out where the file is you should be able to do the steps I outline above.
Forgot to give an update on the situation. As soon as the guide mentioned directories I looked up some tutorials on youtube for Ubuntu on windows specifically and I used the commands you mentioned Genomax. Which worked ;) Thanks. Now all I need to do is make some heatmaps... time to start watching some tutorials again.
Here are the processes related to "glucose metabolic process" at AmiGO. You could filter them based on the organism you are interested in e.g. mouse to get the gene names.
Thank you, how do you get such an extensive list, if I search on glucose metabolism I end up with only around 30 genes.