We have sequenced genomic dna using miseq 2x250, and have received fastq files, i was wondering how to go about with respect to alignment of these reads and check for indels and snps is there a specific work flow?
We have sequenced genomic dna using miseq 2x250, and have received fastq files, i was wondering how to go about with respect to alignment of these reads and check for indels and snps is there a specific work flow?
Well, you could use the BBMap package and do something like this:
#Remove duplicates
clumpify.sh in=reads.fq.gz out=clumped.fq.gz dedupe optical
#Remove low-quality regions
filterbytile.sh in=clumped.fq.gz out=filtered_by_tile.fq.gz
#Trim adapters
bbduk.sh in=filtered_by_tile.fq.gz out=trimmed.fq.gz ktrim=r k=23 mink=11 hdist=1 tbo tpe minlen=100 ref=bbmap/resources/adapters.fa ftm=5 ordered
#Remove synthetic artifacts and spike-ins.
bbduk.sh in=trimmed.fq.gz out=filtered.fq.gz k=27 ref=bbmap/resources/sequencing_artifacts.fa.gz,bbmap/resources/phix174_ill.ref.fa.gz ordered qrtim=r trimq=6
#Map to reference
bbmap.sh in=filtered.fq.gz out=mapped.sam.gz bs=bs.sh pigz unpigz ref=reference.fa
#Call variants
callvariants.sh in=mapped.sam.gz out=vars.txt vcf=vars.vcf.gz ref=reference.fa ploidy=1 prefilter
I'll have to look into that. I don't see much MiSeq 2x250bp WGS data, mainly 2x300 amplicon data, which gets pretty ragged toward the ends. I have noticed, though, that MiSeq appears to have a pretty consistent positional quality component (at least in the sample I analyzed for that purpose) that appears to be due to focus (the number of mismatches increased radially out from the center).
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
which species are you sequencing?
we are working on mouse