I'm trying to create a pipeline that converts FASTQ --> VCF and it would include the option of single-end and paired-end reads. I am a little bit confused about how to approach this for paired-end reads as they are more complicated. For single-end pairs I've done the following steps:
- Demultiplex with
sabre se
- Trim adapters with Cutadapt
- Map reads with BWA
- Convert .SAM to .BAM using
samtools view
- Sorted alignments using
samtools sort
- Created an index using
samtools index
- Used
samtools mpileup
to create .BCF - Converted .BCF to .VCF using
bcftools call
For single-end pairs this has worked well but I am unsure of the process for paired-end reads. I am new to bioinformatics so I am still in the learning process.
For paired-end reads I was told I need to merge the read pairs together before cutting the adapters on each end. Would the process by as follows:
- Demultiplex with
sabre pe
- Join the forward and reverse pairs using
pandaseq
- Trim the adapters from each side using
cutadapt
and since withpandaseq
a single merged file is created, treat as single-end?
It seems pandaseq doesn't know anything about adapters, so you should trim adapters before joining reads - the adapters will be at the end of the reads, and will prevent paired read merging. Also, use Trimmomatic or BBDuk, which trim the reads as pairs.
That will only work if you are sure that all reads are supposed to overlap in the middle (i.e. insert sizes are shorter than sequencing cycles).