Hello fellow bioinformaticians,
I was wondering if anyone would be so kind as to illuminate me in general of what the default output formats of most current gen sequencing machines are?
I've found it difficult to find information that covers the general spectrum and not specific machines, of which I have limited knowledge of what is actually being used today. I suspect that split file paired end fastq data makes up the majority of it, but what about interleaved paired end and single end data?
And what about next gen machines like Nanopore? What format is produced there?
A quick run-down from you people who are experienced enough would be highly beneficial and I would be very appreciative of your input.
Thank you!
Ion Torrent machines like PGM and S5 output bam files..
As the only output or in addition to fastq? Are they aligned or unaligned bam files?
Default output is ubam/bam. There's a bam2fastq plugin, but you lose some useful info with that. You can get ubam straight from Illumina basecalls too..
which information is lost? I use bamToFastq to get fastq file and do the analysis.
Most notably flow signal information, which some programs, like e.g. SPAdes can utilize.
Is there any consensus on which format the fastq files take on in regards to split and interleaved?
Few places (JGI is one) use interleaved fastq files (and because of that @Brian's BBMap is designed to work with interleaved fastq files) but AFAIK paired-end data is supplied as split files.