Question

What is the default output format of current gen sequencing machine?

0

Entering edit mode

8.4 years ago

Jenez ▴ 540

Hello fellow bioinformaticians,

I was wondering if anyone would be so kind as to illuminate me in general of what the default output formats of most current gen sequencing machines are?

I've found it difficult to find information that covers the general spectrum and not specific machines, of which I have limited knowledge of what is actually being used today. I suspect that split file paired end fastq data makes up the majority of it, but what about interleaved paired end and single end data?

And what about next gen machines like Nanopore? What format is produced there?

A quick run-down from you people who are experienced enough would be highly beneficial and I would be very appreciative of your input.

Thank you!

sequencing single end paired end output format • 2.3k views

ADD COMMENT • link 8.4 years ago by Jenez ▴ 540

score 2 · Answer 1 · 2016-07-06

2

Entering edit mode

8.4 years ago

GenoMax 147k

End-user consumable output format for sequence data (irrespective of technology) is fastq (and mostly Sanger format fastq at that).

Edit: Technically sequencers output data in intermediate/specific file formats (bcl in case of Illumina, HDF5 format: PacBio, MinION) but all these are ultimately convertible to fastq.

ADD COMMENT • link 8.4 years ago by GenoMax 147k

1

Entering edit mode

Ion Torrent machines like PGM and S5 output bam files..

ADD REPLY • link 8.4 years ago by 5heikki 11k

0

Entering edit mode

As the only output or in addition to fastq? Are they aligned or unaligned bam files?

ADD REPLY • link 8.4 years ago by GenoMax 147k

0

Entering edit mode

Default output is ubam/bam. There's a bam2fastq plugin, but you lose some useful info with that. You can get ubam straight from Illumina basecalls too..

ADD REPLY • link 8.4 years ago by 5heikki 11k

0

Entering edit mode

which information is lost? I use bamToFastq to get fastq file and do the analysis.

ADD REPLY • link 8.4 years ago by Echo ▴ 70

0

Entering edit mode

Most notably flow signal information, which some programs, like e.g. SPAdes can utilize.

ADD REPLY • link 8.4 years ago by 5heikki 11k

0

Entering edit mode

Is there any consensus on which format the fastq files take on in regards to split and interleaved?

ADD REPLY • link 8.4 years ago by Jenez ▴ 540

1

Entering edit mode

Few places (JGI is one) use interleaved fastq files (and because of that @Brian's BBMap is designed to work with interleaved fastq files) but AFAIK paired-end data is supplied as split files.

ADD REPLY • link 8.4 years ago by GenoMax 147k