Hi everyone,
I am doing some variant calling before QTL mapping, but I found there is extremely high missing rate in my genotype table. I hope someone can help me.The below picture is my result. I processed the data in my conda environment.
Here is my pipeline:
(1) Use axe-demux to sort out my lines in each library.
(2) Use cutadapt to trim adapter with phreq score<28 and filter minimum sequence length <20:
cutadapt -j 4 -q 28 $DATADIR/$line.fastq.gz -m 20 -o $OUTDIR/$line.trim.fastq.gz -a AGATCGGAAGAG
I write a loop to deal with these 96 samples, so the $line is name of samples.
(3) Use bowtie2/samtools/bcftools to do variant calling. I run a similar loop here.
bowtie2 -p 8 -x $DATADIR/ind/BTx623 -U $WORKDIR/$line.trim.fastq.gz -S $OUTDIR/bt2/$line.bt2.sam
samtools view -b -T $REF $OUTDIR/bt2/$line.bt2.sam -o $OUTDIR/bt2/$line.bt2.bam
samtools sort -o $OUTDIR/sorted/$line.sorted.bam $OUTDIR/bt2/$line.bt2.bam
samtools index $OUTDIR/sorted/$line.sorted.bam
bcftools mpileup -Ou -f $DATADIR/Sbicolor_454_v3.0.1.fa -o $OUTDIR/mpileup/$line.mpileup.bcf \
$OUTDIR/sorted/$line.sorted.bam
bcftools call -m -v -Ov -o $OUTDIR/call/$line.call.vcf $OUTDIR/mpileup/$line.mpileup.bcf
(4) Use vcffilter of vcflib to filter the variant (depth>15, mapping quality>20), then sort the variant as tassel request. I run a similar loop here.
vcffilter -f "DP > 15 & MQ > 20" $line.call.vcf > $OUTDIR/$line.filter.vcf
run_pipeline.pl -SortGenotypeFilePlugin -inputFile $OUTDIR/$line.filter.vcf -outputFile
$SORTDIR/$line.filter_sort.vcf -fileType VCF
(5) When I open the vcf file, I found the weird scene in my Taasel window. There are so many unreasonable missing sites. One of the candidate reasons is that there are many scaffolds in reference genome, but I wonder that it is not the only reason to get this peculiar result.
I use sorghum reference genome here, and the F2 population is from an accession cross with reference. BTW, There is no luck in my second library.
I am testing the unfiltered-scaffold postulation now, it will take hours to run that, so before I get the result, plz let me if there is something wrong in my process, so I can stop the unfruitful loop.