Obtaining SNPs with large samples in a short time
2
0
Entering edit mode
4.0 years ago
Daier ▴ 20

Hi I have 150 samples. Now I have completed the mapping step and got the final BAM file. At present, I need to call SNP. Using angsd software, but has not been successful (run was killed), do not know what reason. I also learned that samtools and gatk software can call SNP, but it will take at least a month. So what other software can call SNP for a large number of samples in a short time?

SNP • 1.3k views
ADD COMMENT
0
Entering edit mode

These are whole-genome BAMs? Please add more information

ADD REPLY
0
Entering edit mode

Yes, they're all whole-genome BAMs.

ADD REPLY
2
Entering edit mode
4.0 years ago

Create a GVCF file for each samples in a 'small genomic region" with gatk haplotypecaller in GVCF mode, Combine the GVCFs with gatk CombineGVCFs and create the final vcf with gatk GenotypeGVCFs.

...see the "GATK best practices".

ADD COMMENT
1
Entering edit mode
4.0 years ago
Qiongyi ▴ 180

I totally agree with what Pierre recommend above. It definitely won't take a month if you know how to go through the GATK workflow. I would say a couple of days for WGS data. To speed it up, you can also split your regions of interested into many smaller BED files (e.g. split the whole genome BED to chromosome-level BED) and run jobs in parallel.

ADD COMMENT
0
Entering edit mode

To split a sorted BED file, here is an efficient way to do that:

$ for chr in `bedextract --list-chr input.bed`; do bedextract $chr input.bed > input.$chr.bed; done

This will be several times faster than awk or other approaches.

ADD REPLY

Login before adding your answer.

Traffic: 2688 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6