Hello, I am downloading some data from this project from ENA:
https://www.ebi.ac.uk/ena/data/search?query=PRJNA494719
There are cases where I do not understand the format. For instance, this is supposed to be a single-end file (it says "SINGLE"): https://www.ebi.ac.uk/ena/data/view/SRX4809217
but has three files instead, namely, SRR7976417.fastq.gz, SRR7976417_1.fastq.gz and SRR7976417_2.fastq.gz
On the other hand, this file: https://www.ebi.ac.uk/ena/data/view/SRX4809200
is shown as paired-end (says "PAIRED") but is a single file.
For most of the other data in this project, SINGLE has a single file and PAIRED has 3 files (I guess read1, read2, orphan read). For instance:
https://www.ebi.ac.uk/ena/data/view/SRX4809208
and
https://www.ebi.ac.uk/ena/data/view/SRX4809204
How to know if the problematic files are really single-end or paired-end?
Thanks very much
Thank you @Sean Davis !. But what about the single file that says "PAIRED"?. How to know if it is actually single-end or a merge between read1 and read2 paired-ends?
Check the read names. If it is a merge you should have the same read name twice (probably adjacent to each other), otherwise read names should be unique.
Thank you very much @ATpoint !