I currently have 2 Excel files, one a list of 1000+ genes of interest, and one of analyzed RNA-seq data with read counts. I am trying to match the genes of interest to their corresponding read count; however, it is not feasible to individually copy and paste these values as the spreadsheet is over 25,000+ genes long.
Does anyone know of an effective method to isolate the expression data of my genes of interest off the large RNA-seq spreadsheet?
Thank you in advance.
two ways:
You can use any simple text editor to get your genes of interest in this format
goi_1|goi_2|goi_3
etc. Next, trygrep
like thisYou can also save the output as
I would recommend moving to linux cdmline ;-)
save your excel lists as tab-delineated text files. Then do
this will grep for each line from your <gene_list> file the corresponding line from your <count_list> and write it to a new output file.
How did you actually end up with these data files in excel? (I mean, you might already have the files in a text-based format? )