Hello,
I have rna-seq data for whole genome. I want to extract whole blood tissue columns and GBR population samples only. The file has many samples from different populations as well as different tissues count data.
I don't understand how to do this?
It would help that you state the source of the data. By GBR, I presume that you mean 'Great Britain', i.e., the island of Great Britain, which comprises the nation states of England, Scotland, and Wales. There should be some associated metadata with your dataset, which likely has this information, and also the information on which samples are whole blood tissue.
This is GTEx data. I am not sure that 'ethnicity' is recorded, and neither is it important considering that this is RNA-seq / gene expression. What are you trying to do with this data? Who has told you to do what?
This file contains gene count. I want to extract sample from only GBR populations and consider only whole blood tissue rather than all the tissue. Then I want to correlation between this count file to predicted count file to see the correlation.
You need a sample metadata file that contains sample metadata about the sample IDs contained in that gene count (GCT) file. The person who requested that you process this data should know where such file exists.
I downloaded this whole genome rna-seq file from this link:https://storage.googleapis.com/gtex_analysis_v8/rna_seq_data/GTEx_Analysis_2017-06-05_v8_RNASeQCv1.1.9_gene_tpm.gct.gz I am not able to understand the header or the metadata for this file since its gc format and I have never worked with this file format. Yes, GBR represent island of great britain.
This is GTEx data. I am not sure that 'ethnicity' is recorded, and neither is it important considering that this is RNA-seq / gene expression. What are you trying to do with this data? Who has told you to do what?
This file contains gene count. I want to extract sample from only GBR populations and consider only whole blood tissue rather than all the tissue. Then I want to correlation between this count file to predicted count file to see the correlation.
You need a sample metadata file that contains sample metadata about the sample IDs contained in that gene count (GCT) file. The person who requested that you process this data should know where such file exists.
Okay, I will contact my seniors with this. Thank you.
I got a metadata file containing tissues info as well as the sample ID info. But how should I merge this with the above files in order to develop a matrix. https://storage.googleapis.com/gtex_analysis_v8/annotations/GTEx_Analysis_v8_Annotations_SampleAttributesDS.txt