Hey,
I was downloading some pair end data files from ebi.ac.uk. I have the following problem in general they have just two files per sample (for pair end). This RNAseq has 4 files per sample I think they split the files in 4, two for each files.
https://www.ebi.ac.uk/ena/data/view/PRJNA378234
But not sure which are the files that I have to merge :
Example: These two code belong to the same sample, however each code has two files. Paired - SRR5314617 File 1 File 2
- SRR5314618 File 1 File 2
Then my question is? I have to merge file1 with file1 and 2 with 2? or 1 and 2 in code 1 same code2? Not sure exactly how Ebi organize it.
Thanks
Hi,
Just checking the FTP column, If you check the code SRR5314617 and SRR5314618 both correspond to same sample ( RNA-seq liver macrophage 1m old liver1mWT3) and both has a single set of paired-end files. Then, I'm little confused if they split the files in two set of paired end, and I have to merge file 1 with file 2 and file 2 with file..etc.
Thanks
If you are certain the two datasets represent an identical sample then you could merge the resulting alignments. There is no need to do the merge at the read stage. In case you decide the merging was not appropriate, you would need to back up just one step instead of starting over. You can also estimate if there is any kind of batch effect (different runs/libraries etc) by keeping them separate.
It's possible these are replicates for the same sample, I checked the bioproject in NIH the experiment has 15 SRA files, but the ebi website has 28.
That is odd indeed. Download one or two duplicate sample files and see if it is the same data was uploaded twice by error.
Ah, on GEO it's clearer that those are two runs of the same samples. That happens on occasion, though one hopes they're runs from the same library prep.