Entering edit mode
3.6 years ago
Rogerio Ribeiro
▴
110
Greetings community
I'm currently processing public datasets from my species of interest. One of my samples, which I thought was PE came back as SE after I downloaded using SRA Explorer. I thought this was weird and I double-checked in NCBI and my sample is indeed supposed to be pair-ended. I tried to download using fasterq-dump and I was able to obtain the 2 files.
(here is the sample in question)
I'm not sure if this is a bug in SRA explorer or it might just be a badly annotated submission. Can anyone explain the different results? Thanks in advance
It may be incorrectly annotated. If paired-end, the two reads are submitted interleaved or worse two files concatenated end-to-end.
When you go to the traces listing at NCBI it shows one read (LINK). EBI-ENA predictably has a single file for download as well.
My first hypothesis was that the PE reads were deposited in the same file but after download, the single fastq file from SRA explorer was around 1.6G, while downloading using fasterq-dump I got two files, each with 1.5G. Also the NCBI listing shows 2 reads not 1
Take a look at NCBI traces site link I posted above which shows one read. If you inspect actual reads under the "Read" tab those appear to be ~100bp each. It may be best to check the publication associated with the data (and/or contact submitters) to be sure.