How to read a tar file, filled with txt.gz files and CEL.gz
1
0
Entering edit mode
3.1 years ago
AlexStar ▴ 170

I have multiple raw data files which are tar archives. I downloaded them from GEO and opened on 7-Zip. Each one of these files contains many txt.gz files and/or CEL.gz files.

Example: The tar file - GSE49355_RAW.tar

The files it contains - GSM1197996_092302_HG-U133A_AF07130_1.CEL.gz

GSM1197997_092302_HG-U133A_AF07149_1.CEL.gz

GSM1198020_CGCLG_073003_HG-U133A_AF09317_1.CEL.gz

etc..

My goal is to get a raw RNA expression data, with the samples in the columns and genes in rows. I've been trying to read the tar files into R using the Untar() function and the getGEO() function, but all I get is a vector of the names of the files, I can't access the data that I want.

How can I read the tar files AND the files in it, in R ?

tar R 7-Zip • 3.2k views
ADD COMMENT
0
Entering edit mode
3.1 years ago
ATpoint 86k

The tarball is just a compressed folder. Once you unpacked that you can read the individual files. The CEL files are the raw data from a microarray (affymetrix) experiment. Please read existing threads such as how I can read CEL files with affy and go into the limma manual (Bioconductor package) to learn about arrays. There are dozens of posts on this, after all it's an old and well established type of experiment.

ADD COMMENT

Login before adding your answer.

Traffic: 2947 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6