TCGA level 3 gene expression dataset
1
0
Entering edit mode
9.9 years ago
nikkitta.sa ▴ 10

Hello,

I am looking at Breast cancer gene expression level 3 data on TCGA.

I downloaded a dataset of 139 text files and sifted through some files manually to look for my gene of interest and its expression value. However, I haven't had any luck finding it so far.

I am thinking I ought to write a script that would give me -

the gene of interest, its expression value and the record name holding these values.

Is there a better way to go about this process? Are there any online tools I could use? Would there be any suggestions? I'd appreciate a discussion..bounce off some ideas.

Please and Thank you.

-N

TCGA bulk-data Level3 breast-cancer • 4.4k views
ADD COMMENT
1
Entering edit mode

Have you tried just using grep?

ADD REPLY
0
Entering edit mode

Thanks Devon, I tried grep and it worked like a charm. How do I write my results to a new file?

I used grep -r "GeneName". This gave me a list of FileName GeneName ExpressionValue

Thanks again. I'm working on writing these results to a file for future use.

ADD REPLY
0
Entering edit mode

Just use redirection, so grep -r "GeneName" some_file > the_output.txt

ADD REPLY
0
Entering edit mode

I'm looking at the whole dataset as opposed to single file and here's the part of my output on the console -

./US82800149_251976013065_S01_GE2_105_Dec08.txt_lmean.out.logratio.gene.tcga_level3.data.txt:FBXL10    -0.985666666666667
./US82800149_251976013053_S01_GE2_105_Dec08.txt_lmean.out.logratio.gene.tcga_level3.data.txt:FBXL10    -1.13183333333333
./US82800149_251976012883_S01_GE2_105_Dec08.txt_lmean.out.logratio.gene.tcga_level3.data.txt:FBXL10    -0.796166666666667
./US82800149_251976013058_S01_GE2_105_Dec08.txt_lmean.out.logratio.gene.tcga_level3.data.txt:FBXL10    -0.155
./US82800149_251976012925_S01_GE2_105_Dec08.txt_lmean.out.logratio.gene.tcga_level3.data.txt:FBXL10    -1.25033333333333
./US82800149_251976012919_S01_GE2_105_Dec08.txt_lmean.out.logratio.gene.tcga_level3.data.txt:FBXL10    -1.1105
./US82800149_251976012940_S01_GE2_105_Dec08.txt_lmean.out.logratio.gene.tcga_level3.data.txt:FBXL10    -0.405333333333333
./US82800149_251976012950_S01_GE2_105_Dec08.txt_lmean.out.logratio.gene.tcga_level3.data.txt:FBXL10    -0.557
./US82800149_251976013090_S01_GE2_105_Dec08.txt_lmean.out.logratio.gene.tcga_level3.data.txt:FBXL10    -1.74166666666667
./US82800149_251976013068_S01_GE2_105_Dec08.txt_lmean.out.logratio.gene.tcga_level3.data.txt:FBXL10    -1.72983333333333
./US82800149_251976013047_S01_GE2_105_Dec08.txt_lmean.out.logratio.gene.tcga_level3.data.txt:FBXL10    -0.02
./US82800149_251976012874_S01_GE2_105_Dec08.txt_lmean.out.logratio.gene.tcga_level3.data.txt:FBXL10    -0.997
./US82800149_251976013082_S01_GE2_105_Dec08.txt_lmean.out.logratio.gene.tcga_level3.data.txt:FBXL10    -0.573666666666667
./US82800149_251976012867_S01_GE2_105_Dec08.txt_lmean.out.logratio.gene.tcga_level3.data.txt:FBXL10    -1.1965
./US82800149_251976013052_S01_GE2_105_Dec08.txt_lmean.out.logratio.gene.tcga_level3.data.txt:FBXL10    -1.16416666666667
./US82800149_251976012945_S01_GE2_105_Dec08.txt_lmean.out.logratio.gene.tcga_level3.data.txt:FBXL10    -0.549166666666667
./US82800149_251976013076_S01_GE2_105_Dec08.txt_lmean.out.logratio.gene.tcga_level3.data.txt:FBXL10    -2.30916666666667
./US82800149_251976012908_S01_GE2_105_Dec08.txt_lmean.out.logratio.gene.tcga_level3.data.txt:FBXL10    -1.34766666666667
./US82800149_251976012956_S01_GE2_105_Dec08.txt_lmean.out.logratio.gene.tcga_level3.data.txt:FBXL10    -1.07033333333333
./US82800149_251976012870_S01_GE2_105_Dec08.txt_lmean.out.logratio.gene.tcga_level3.data.txt:FBXL10    -1.51533333333333
./US82800149_251976012913_S01_GE2_105_Dec08.txt_lmean.out.logratio.gene.tcga_level3.data.txt:FBXL10    -1.0305
./US82800149_251976012879_S01_GE2_105_Dec08.txt_lmean.out.logratio.gene.tcga_level3.data.txt:FBXL10    -0.000333333333333338
./US82800149_251976012873_S01_GE2_105_Dec08.txt_lmean.out.logratio.gene.tcga_level3.data.txt:FBXL10    -0.976

Need to write this to a file.

I do not want to move my files to a different directory..just write this o/p to a file

I was looking at this link here

But I've only gotten more confused.

ADD REPLY
1
Entering edit mode

So a full example would be something like

grep FBXL10 *lmean.out.logratio.gene.tcga_level3.data.txt > FBXL10.txt
ADD REPLY
0
Entering edit mode

Right. Thanks Devon.

I was trying to use

grep -r "FBXL10" . > outfile.txt

And I got the message -

grep: input file './outfile.txt' is also the output

Why is that the case? the ' . ' would ask grep to go through all the subirectories and files to look for my gene of interest..Why would it consider outfile.txt as input?

ADD REPLY
1
Entering edit mode

FYI, this is getting rather off-topic, since this is just basic computer usage.

On all Unix derived systems (Mac OS X, Linux, etc.), "." means the current working directory. So if you tell grep to recursively search through everything in the directory where the output is, then it'll go through the output too. It's actually quite clever that that error is even caught. That's why my example used wild-cards. The alternative is to just put the output elsewhere: "> /home/whatever-your-username-is/filename.txt".

ADD REPLY
0
Entering edit mode

Thanks a lot. Appreciate it.

ADD REPLY
0
Entering edit mode

Right, so just put > some_file.txt at the end of your command as I did in my example.

ADD REPLY
0
Entering edit mode
9.9 years ago
dario.garvan ▴ 520

TCGA-Assembler imports all of the data you want into R and makes a convenient table which you can use in other analyses.

ADD COMMENT
0
Entering edit mode

Thanks Dario,

I'll look into it..

ADD REPLY

Login before adding your answer.

Traffic: 2099 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6