Hi, I have some mouse samples. I would like to determine sex to test for sample switch or corruption from sequencing data. I have vanila freebayes called vcf files. IĀ figured out that there is plugin to determine sex from vcf files
bcftools +vcf2sex
However it seems to require ploidy information for given genome(see for example this problem https://github.com/samtools/bcftools/issues/175). This information should be in the format
space/tab-delimited list of CHROM,FROM,TO,SEX,PLOIDY
I am using GRCm38.82 (http://ftp.ensembl.org/pub/release-82/gtf/mus_musculus/README) for read mapping and I cannot find that ploidy information for given genome release.
I would like to ask:
- Which is most reliable way to determine sex of the sample either from bam or vcf file?
- What does exactly ploidy information mean? Are those coordinates of pseudoautosomal regions? And why this information is required for vcf2sex to work properly?
- Where can I find that ploidy information for given genome release?
As a validation step, I would first test the solutions on already sex-known samples to build an expectation to what the solutions would lead and their accuracy.
This step can also clarify what vcf2sex exactly does.
Thank you for your help but I don't see it very intuitive. While I need to find coordinates of autosomal regions on X chromosome in some refference. So how exactly would you find them for my build GRCm38.82 (http://ftp.ensembl.org/pub/release-82/gtf/mus_musculus/README)?
Your advice about coverage is probably the best I can do. I have exome seq data but I guess everything apply.