Extremely high missing rate in vcf merged file
0
0
Entering edit mode
2.6 years ago
Abieskawa • 0

Hi everyone,

I am doing some variant calling before QTL mapping, but I found there is extremely high missing rate in my genotype table. I hope someone can help me.The below picture is my result. I processed the data in my conda environment.

The extremely high missing rate in my genotype table

Here is my pipeline:

(1) Use axe-demux to sort out my lines in each library.

(2) Use cutadapt to trim adapter with phreq score<28 and filter minimum sequence length <20:

cutadapt -j 4 -q 28 $DATADIR/$line.fastq.gz -m 20 -o $OUTDIR/$line.trim.fastq.gz -a AGATCGGAAGAG

I write a loop to deal with these 96 samples, so the $line is name of samples.

(3) Use bowtie2/samtools/bcftools to do variant calling. I run a similar loop here.

bowtie2 -p 8 -x $DATADIR/ind/BTx623 -U $WORKDIR/$line.trim.fastq.gz -S $OUTDIR/bt2/$line.bt2.sam
samtools view -b -T $REF $OUTDIR/bt2/$line.bt2.sam -o $OUTDIR/bt2/$line.bt2.bam
samtools sort -o $OUTDIR/sorted/$line.sorted.bam $OUTDIR/bt2/$line.bt2.bam
samtools index $OUTDIR/sorted/$line.sorted.bam
bcftools mpileup -Ou -f $DATADIR/Sbicolor_454_v3.0.1.fa -o $OUTDIR/mpileup/$line.mpileup.bcf \
$OUTDIR/sorted/$line.sorted.bam
bcftools call -m -v -Ov -o $OUTDIR/call/$line.call.vcf $OUTDIR/mpileup/$line.mpileup.bcf

(4) Use vcffilter of vcflib to filter the variant (depth>15, mapping quality>20), then sort the variant as tassel request. I run a similar loop here.

vcffilter -f "DP > 15 & MQ > 20" $line.call.vcf > $OUTDIR/$line.filter.vcf
run_pipeline.pl -SortGenotypeFilePlugin -inputFile $OUTDIR/$line.filter.vcf -outputFile 
$SORTDIR/$line.filter_sort.vcf -fileType VCF

(5) When I open the vcf file, I found the weird scene in my Taasel window. There are so many unreasonable missing sites. One of the candidate reasons is that there are many scaffolds in reference genome, but I wonder that it is not the only reason to get this peculiar result.

genome variant • 686 views
ADD COMMENT
0
Entering edit mode

I use sorghum reference genome here, and the F2 population is from an accession cross with reference. BTW, There is no luck in my second library. Second library

ADD REPLY
0
Entering edit mode

I am testing the unfiltered-scaffold postulation now, it will take hours to run that, so before I get the result, plz let me if there is something wrong in my process, so I can stop the unfruitful loop.

ADD REPLY

Login before adding your answer.

Traffic: 1844 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6