Hi,
I am trying to do a benchmarking study using ENA datasets. I have downloaded the datasets from project with accession number PRJEB3381 belonging to CEPH pedigree 1463 using SRA toolkit (prefetch ERR194146 && fasterq-dump ERR194146
). This is supposed to be paired end sequencing and I was expecting two files but there are three fastq files;
ERR194146_1.fastq
ERR194146_2.fastq
ERR194146.fastq
Someone had asked a similar question earlier but in my case NCBI website has not mentioned the file ERR194146.fastq as barcode (The library names are provided as - ERR194146_2, ERR194146, unspecified). In ENA, checking the sample accession (SAMEA1573614) it is defined as unspecified?
I checked the number of reads in ERR194146_1.fastq and ERR194146_2.fastq and they are the same so is it safe to ignore ERR194146.fastq and proceed with the other two?
Also, I checked the downloaded file via SRA vs direct wget download. The number of reads in both ERR194146_1.fastq and ERR194146_2.fastq is same but the number of reads in ERR194146.fastq is much less in SRA. What could be cause of it?