Multiple RNA-seq fastq files for one sample in GEO database
1
0
Entering edit mode
16 months ago
maximal_life ▴ 20

Hello, I'm trying to analyze public RNA-seq data obtained from GEO database. But I have a trouble now about how to handle some data.

The representative dataset is GSE88945 (PRJNA349164). The number of samples is just three but the number of data are 47. There are many files that have the same GSM ID and different SRR IDs.

I can describe the first header lines from several fastq files with the same GSM ID like below:

Filename
> Header line

SRR4432915_GSM2355695_H_G3_Homo_sapiens_RNA-Seq_1.fastq.gz
> @SRR4432915.1 HWI-ST959:164:C2KV4ACXX:6:1101:1172:2037/1

SRR4432916_GSM2355695_H_G3_Homo_sapiens_RNA-Seq_1.fastq.gz
> @SRR4432916.1 DF9F08P1:223:D2F1AACXX:7:2115:13996:65512/1

SRR4432917_GSM2355695_H_G3_Homo_sapiens_RNA-Seq_1.fastq.gz
> @SRR4432917.1 DF9F08P1:223:D2F1AACXX:7:2304:18001:55398/1

SRR4432918_GSM2355695_H_G3_Homo_sapiens_RNA-Seq_1.fastq.gz
> @SRR4432918.1 DF9F08P1:223:D2F1AACXX:8:1101:1402:2235/1

In this case, there are three different points to branch. 1) HWI-ST959 vs. DF9F08P1 2) 7 vs. 8 3) 2115 vs. 2304

Many questions in my mind can be summarized like below. I wonder what makes those differences between fastq files and whether I can merge the data or not. If somebody knows, please advise me.

Thanks in advance.

GEO RNA-seq multiplefiles • 960 views
ADD COMMENT
0
Entering edit mode
16 months ago
bk11 ★ 3.0k

They have three samples: H_G3 ( 16 replicates), H_G5 (13 replicates) & H_G14 ( 18 replicates). See below-

esearch -db bioproject -query "PRJNA349164" | elink -target sra | efetch -format runinfo| cut -d "," -f1,30 
#Note: SamplesGroup column is added after running the above command
Run SampleName  SamplesGroup
SRR4432915  GSM2355695  H_G3
SRR4432916  GSM2355695  H_G3
SRR4432917  GSM2355695  H_G3
SRR4432918  GSM2355695  H_G3
SRR4432919  GSM2355695  H_G3
SRR4432920  GSM2355695  H_G3
SRR4432921  GSM2355695  H_G3
SRR4432922  GSM2355695  H_G3
SRR4432923  GSM2355695  H_G3
SRR4432924  GSM2355695  H_G3
SRR4432925  GSM2355695  H_G3
SRR4432926  GSM2355695  H_G3
SRR4432927  GSM2355695  H_G3
SRR4432928  GSM2355695  H_G3
SRR4432929  GSM2355695  H_G3
SRR4432930  GSM2355695  H_G3
SRR4432931  GSM2355696  H_G5
SRR4432932  GSM2355696  H_G5
SRR4432933  GSM2355696  H_G5
SRR4432934  GSM2355696  H_G5
SRR4432935  GSM2355696  H_G5
SRR4432936  GSM2355696  H_G5
SRR4432937  GSM2355696  H_G5
SRR4432938  GSM2355696  H_G5
SRR4432939  GSM2355696  H_G5
SRR4432940  GSM2355696  H_G5
SRR4432941  GSM2355696  H_G5
SRR4432942  GSM2355696  H_G5
SRR4432943  GSM2355696  H_G5
SRR4432944  GSM2355697  H_G14
SRR4432945  GSM2355697  H_G14
SRR4432946  GSM2355697  H_G14
SRR4432947  GSM2355697  H_G14
SRR4432948  GSM2355697  H_G14
SRR4432949  GSM2355697  H_G14
SRR4432950  GSM2355697  H_G14
SRR4432951  GSM2355697  H_G14
SRR4432952  GSM2355697  H_G14
SRR4432953  GSM2355697  H_G14
SRR4432954  GSM2355697  H_G14
SRR4432955  GSM2355697  H_G14
SRR4432956  GSM2355697  H_G14
SRR4432957  GSM2355697  H_G14
SRR4432958  GSM2355697  H_G14
SRR4432959  GSM2355697  H_G14
SRR4432960  GSM2355697  H_G14
SRR4432961  GSM2355697  H_G14

You may merge each replicates or analyze them separately. It will depend upon what basically you want to achieve. Also, it looks like they have provided raw and normalized data for this study.

enter link description here

ADD COMMENT
0
Entering edit mode

Thank you for your answer! You said "merge each replicates", and I want to know about that point in detail. In ONE sample (in this case, H_G3), there are some differences among multiple fastq files. But now I'm confused what files are reasonable to be merged and what files are not. As you say, is it okay to merge all fastq files for a sample?

ADD REPLY
0
Entering edit mode

May I ask you, what is your plan after merging the files? These are mRNA expression data.

ADD REPLY
0
Entering edit mode

Oh, I'm sorry for my late reply. I'm planning to do processing of those RNA-seq data, and then perform co-expression network analysis (after integrating other datasets). So I'm about to preprocess those data now. But I'm confused whether I can just simply merge data from one sample.

ADD REPLY

Login before adding your answer.

Traffic: 1731 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6