Question

RNA-seq Data Excel Formatting

1

Entering edit mode

6.3 years ago

michaell ▴ 10

I currently have 2 Excel files, one a list of 1000+ genes of interest, and one of analyzed RNA-seq data with read counts. I am trying to match the genes of interest to their corresponding read count; however, it is not feasible to individually copy and paste these values as the spreadsheet is over 25,000+ genes long.

Does anyone know of an effective method to isolate the expression data of my genes of interest off the large RNA-seq spreadsheet?

Thank you in advance.

RNA-Seq sequencing • 4.9k views

ADD COMMENT • link updated 6.3 years ago by Kevin Blighe 89k • written 6.3 years ago by michaell ▴ 10

2

Entering edit mode

two ways:

Vlookup function in excel
Load excel sheets in msaccess, write SQL

ADD REPLY • link 6.3 years ago by cpad0112 21k

0

Entering edit mode

You can use any simple text editor to get your genes of interest in this format goi_1|goi_2|goi_3 etc. Next, try grep like this

grep -E "goi_1|goi_2|goi_3" expression_matrix.txt

You can also save the output as

grep -E "goi_1|goi_2|goi_3" expression_matrix.txt > output.txt

ADD REPLY • link 6.3 years ago by Satyajeet Khare ★ 1.6k

0

Entering edit mode

I would recommend moving to linux cdmline ;-)

save your excel lists as tab-delineated text files. Then do

grep -f <gene_list> <count_list> > new_output.txt

this will grep for each line from your <gene_list> file the corresponding line from your <count_list> and write it to a new output file.

How did you actually end up with these data files in excel? (I mean, you might already have the files in a text-based format? )

ADD REPLY • link 6.3 years ago by lieven.sterck 15k

score 3 · Answer 1 · 2019-01-05

michaell, if you are in any way interested in data analysis / bioinformatics, you should take the opportunity to learn some coding given the situation that you face.

Export your Excel® sheets to CSV or TSV format. Be wary that gene names beginning with 'SEPT', 'DEC', etc will likely have been automatically converted to Excel® date format.
Input the exported data to R Programming Language (available for Windows, Mac OS, and Linux)
Perform the filter operations in R
Export the filtered data back to TSV or CSV from R (optional)

In our 'profession', usage of Excel® for data analysis is not recommended. Excel® is a very powerful program for performing other tasks, but it's strong point is not in manipulating large datasets, nor for performing statistics.

If you have absolutely no experience with R, then take a look at my (and my former colleagues') own tutorial notes, which covers everything that you need to complete your task:

https://github.com/kevinblighe/Rtutorials (see Lecture 1 - RTrainingLect1_KBlighe.ppt and Lecture 2 - RTrainingLect2_KBlighe.ppt)

If you do this, then you can impress your friends and say that you did some coding.

Good luck,

Kevin