Question

ebi.uk RNAseq data

0

Entering edit mode

6.8 years ago

clizama • 0

Hey,

I was downloading some pair end data files from ebi.ac.uk. I have the following problem in general they have just two files per sample (for pair end). This RNAseq has 4 files per sample I think they split the files in 4, two for each files.

https://www.ebi.ac.uk/ena/data/view/PRJNA378234

But not sure which are the files that I have to merge :

Example: These two code belong to the same sample, however each code has two files. Paired - SRR5314617 File 1 File 2

SRR5314618 File 1 File 2

Then my question is? I have to merge file1 with file1 and 2 with 2? or 1 and 2 in code 1 same code2? Not sure exactly how Ebi organize it.

Thanks

RNA-Seq • 1.3k views

ADD COMMENT • link updated 6.8 years ago by Devon Ryan 104k • written 6.8 years ago by clizama • 0

score 1 · Answer 1 · 2018-02-13

1

Entering edit mode

6.8 years ago

Devon Ryan 104k

You'll see that one column is labeled "FTP" and the other "Galaxy", so you have a single set of paired-end files. You want the FTP column unless you're using Galaxy.

ADD COMMENT • link 6.8 years ago by Devon Ryan 104k

0

Entering edit mode

Hi,

Just checking the FTP column, If you check the code SRR5314617 and SRR5314618 both correspond to same sample ( RNA-seq liver macrophage 1m old liver1mWT3) and both has a single set of paired-end files. Then, I'm little confused if they split the files in two set of paired end, and I have to merge file 1 with file 2 and file 2 with file..etc.

Thanks

ADD REPLY • link 6.8 years ago by clizama • 0

0

Entering edit mode

If you are certain the two datasets represent an identical sample then you could merge the resulting alignments. There is no need to do the merge at the read stage. In case you decide the merging was not appropriate, you would need to back up just one step instead of starting over. You can also estimate if there is any kind of batch effect (different runs/libraries etc) by keeping them separate.

ADD REPLY • link 6.8 years ago by GenoMax 147k

0

Entering edit mode

It's possible these are replicates for the same sample, I checked the bioproject in NIH the experiment has 15 SRA files, but the ebi website has 28.

ADD REPLY • link 6.8 years ago by clizama • 0

1

Entering edit mode

That is odd indeed. Download one or two duplicate sample files and see if it is the same data was uploaded twice by error.

ADD REPLY • link 6.8 years ago by GenoMax 147k

0

Entering edit mode

Ah, on GEO it's clearer that those are two runs of the same samples. That happens on occasion, though one hopes they're runs from the same library prep.

ADD REPLY • link 6.8 years ago by Devon Ryan 104k