RNA-seq Data Excel Formatting
1
1
Entering edit mode
5.9 years ago
michaell ▴ 10

I currently have 2 Excel files, one a list of 1000+ genes of interest, and one of analyzed RNA-seq data with read counts. I am trying to match the genes of interest to their corresponding read count; however, it is not feasible to individually copy and paste these values as the spreadsheet is over 25,000+ genes long.

Does anyone know of an effective method to isolate the expression data of my genes of interest off the large RNA-seq spreadsheet?

Thank you in advance.

RNA-Seq sequencing • 4.6k views
ADD COMMENT
2
Entering edit mode

two ways:

  1. Vlookup function in excel
  2. Load excel sheets in msaccess, write SQL
ADD REPLY
0
Entering edit mode

You can use any simple text editor to get your genes of interest in this format goi_1|goi_2|goi_3 etc. Next, try grep like this

grep -E "goi_1|goi_2|goi_3" expression_matrix.txt

You can also save the output as

grep -E "goi_1|goi_2|goi_3" expression_matrix.txt > output.txt
ADD REPLY
0
Entering edit mode

I would recommend moving to linux cdmline ;-)

save your excel lists as tab-delineated text files. Then do

grep -f <gene_list> <count_list> > new_output.txt

this will grep for each line from your <gene_list> file the corresponding line from your <count_list> and write it to a new output file.

How did you actually end up with these data files in excel? (I mean, you might already have the files in a text-based format? )

ADD REPLY
3
Entering edit mode
5.9 years ago

michaell, if you are in any way interested in data analysis / bioinformatics, you should take the opportunity to learn some coding given the situation that you face.

  1. Export your Excel® sheets to CSV or TSV format. Be wary that gene names beginning with 'SEPT', 'DEC', etc will likely have been automatically converted to Excel® date format.
  2. Input the exported data to R Programming Language (available for Windows, Mac OS, and Linux)
  3. Perform the filter operations in R
  4. Export the filtered data back to TSV or CSV from R (optional)

In our 'profession', usage of Excel® for data analysis is not recommended. Excel® is a very powerful program for performing other tasks, but it's strong point is not in manipulating large datasets, nor for performing statistics.

If you have absolutely no experience with R, then take a look at my (and my former colleagues') own tutorial notes, which covers everything that you need to complete your task:

If you do this, then you can impress your friends and say that you did some coding.

Good luck,

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 1739 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6