Question

how do bioinformatic pipelines take into account different sequencing technologies?

0

Entering edit mode

3.3 years ago

hamarillo ▴ 80

Hi,

I am currently trying to write a pipeline to analyze ATAC sequencing data using Snakemake. I have had a good time learning about the modular + reproducible philosophy of Snakemake, its rules, generalization, etcetera.

I started with paired-end FASTQ files from an Illumina NovaSeq, which means I've got four files (two lanes). My simple analysis worked and then I started an attempt to turn it into a pipeline that I could potentially re-use every time I want to perform this type of analysis. However, I keep wondering How do I take into account all the different types of input FASTQ files that I might get? e.g. data from a NextSeq is going to come in eight files, instead of four, etcetera.

I've always had issues limiting the scope of my work, and I realize I might be falling into that trap here, but I am still curious, how do the pipelines that can deal with "everything" solve this?

Thanksss!

pipelines snakemake design technology sequencing • 977 views

ADD COMMENT • link updated 3.3 years ago by GenoMax 147k • written 3.3 years ago by hamarillo ▴ 80

2

Entering edit mode

I've always had issues limiting the scope of my work, and I realize I might be falling into that trap here, but I am still curious, how do the pipelines that can deal with "everything" solve this?

You define a variable seq_tech to specify the sequencing technology, and based on it you define the rest of parameters that are unique to each technology.

ADD REPLY • link 3.3 years ago by Mensur Dlakic ★ 28k

2

Entering edit mode

account different sequencing technologies?

Different sequencing technologies is not the same thing as different Illumina sequencers (which is what you seem to be mostly referring to). Ultimately every sample is going to have one file (or more than one, if it ran on multiple lanes). It is possible to simply cat those lane specific/multiple files together to create one pair of files (R1/R2) per sample for Illumina sequencers. Think of lane specific files as technical replicates of sequencing.

If you had files from nanopore, PacBio and Illumina then they would indeed be from different technologies and you will need to process them differently, even using different programs to do alignments etc.

ADD REPLY • link 3.3 years ago by GenoMax 147k