Hi all!
I was downloading and testing metagenome sample stored at EBI Metagenomics.
Here is the Introduction:
https://www.ebi.ac.uk/ena/data/view/ERR1298503
And here is the taxonomy:
The sample name is ERS1069635, run ID: ERR1298503 and the title of experiment: 16s rRNA gene amplicon sequencing of 50 week-old mouse gut microbiota as performed on Illumina MiSeq and Oxford Nanopore MinION sequencer. (ERP014408).
During analysis I saw that total raw number of reads in fastq files (PE, paired-end) is 249583 in R1 file and 249583 in R2 file. When viewing taxonomy results stored in database for remaining sample I saw that the total number of raw reads is 402734 and that number is divided into taxonomy levels in further steps.
I have no idea how 249583 became 402734? Is this an error? Could anyone have a look at this experiment and give me a tip? Maybe it is something that need to be reported ...
I would appreciate for any help.
Best regards,
Agata
A complete guess but the pipeline description (https://www.ebi.ac.uk/metagenomics/pipelines/3.0) says that overlapping reads are first merged and then fed in to QC analysis. Therefore the number of initial reads are less than 2*249583.
But since reads are merged it should NOT be more than 249583 reads total to process ... that's my opinion. Read from R1 is merged to read R2 and that is not 2 reads but 1 merged read...
Again my assumption but not all pairs get merged. A few which have overlaps get merged. So the output could be pair1+pair2+merged. But as Istvan says could be a reporting issue as well.