I have a few microbiome data sequenced with Nanopore Minion. For each run, I have pass, fail and skip directories. Within the pass directory, I also have 0:10 (10 different) subdirectories. Would someone please explain me the difference between pass, fail and skip data and which data I should be analyzing. I also want to understand what 0 to 10 different subdirectories within pass directory mean? Thank you for your help.
Thank you so much for your answer. Yes the base calling was done using live basecalling method. Now I have some more questions:
Thank you again for your help.
Ok. Thank you. No this is new data. I also have two directories with both multiple fast5 and one fastq files for each microbiome sample. Should I use the fastq file generated by live basecalling method or should I convert all fast5 files to fastq?
I expect the fastq to contain all reads from that folder, you can easily count the files to verify that, although some fast5 may rarely fail basecalling and not lead to a read.
This isn't accurate. Having an average Phred score of >7 is a necessary but not sufficient condition for a read to go into the "pass" category. For 1D-squared runs, the read also needs to have been of both strands; if only a single strand goes through the pore, then the read will still be basecalled and may get a quality score above 7, but even if it does, it'll still go into the fail category.
For 1D runs, I can tell that there's some other necessary condition for a read to "pass", because I see reads in the "fail" category with Phred scores above 7 in my data. But I haven't figured out what that condition is, yet.
Ah yes, 1D^2 might be different, but I don't see that being used a lot. Thanks for the heads up!
For the normal 1D reads, are those reads above Q7 using your calculations of average quality, or did you use the score from the sequencing-summary.txt?
Using my calculation - which was flawed, as was pointed at https://bioinformatics.stackexchange.com/q/8735/3144 by... oh, by you. :)