What are "must have" data types one should extract from WGS .fq data?
0
0
Entering edit mode
3.1 years ago

I have multiple .fq whole genomes prepared for variant calling. However, it is quite expensive to repeat the whole pipeline so i wonder what data types are must have to extract?

Right now i am planning to extract the following:

  1. Variants (indels, SNPs), with HaplotypeCaller
  2. Structural variants (<1000 bp long), with Manta

What are some other "must have" i could extract?

wgs • 1.4k views
ADD COMMENT
0
Entering edit mode

why just <1000 bp long ?

ADD REPLY
0
Entering edit mode

It's the maximum length generated by Manta for my dataset (~100 bp reads).

ADD REPLY
0
Entering edit mode

Do you have a reference for that?

ADD REPLY
0
Entering edit mode

Nope, i manually measured the length of structural variants after conversion to PLINK format. Now when i think about it i see there might be a crack in my logic due to PLINK limits on allele naming length.

ADD REPLY
0
Entering edit mode
bcftools query -f '%INFO/SVLEN\n'  in.vcf | tr -d '-' | sort -n | tail
ADD REPLY

Login before adding your answer.

Traffic: 1795 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6