Entering edit mode
4 months ago
ja569116
•
0
Hi,
I genotyped samples from methylation reads/bisulfite sequencing. I was surprised that many of the alternative alleles were degenerate bases: R or Y.
V00001.vcf
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001
NW_022882922.1 28895 . C T 0 PASS NS=1:DP=52 GT:GQ:DP 0/1:0:52
NW_022882922.1 36586 . C T,Y 0 PASS NS=1:DP=23:GU=T/C GT:GQ:DP 1/2:0:23
NW_022882922.1 36640 . G A 0 PASS NS=1:DP=40 GT:GQ:DP 1/1:0:40
NW_022882922.1 39071 . A G 0 PASS NS=1:DP=43 GT:GQ:DP 1/1:0:43
V0021
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001
NW_022882922.1 25160 . G Y 0 PASS NS=1:DP=34:GU=T/C GT:GQ:DP 0/1:0:34
NW_022882922.1 25676 . T C 0 PASS NS=1:DP=41 GT:GQ:DP 0/1:0:41
NW_022882922.1 28342 . G A,R 0 PASS NS=1:DP=35:GU=A/G GT:GQ:DP 1/2:0:35
NW_022882922.1 29887 . C A 0 PASS NS=1:DP=48 GT:GQ:DP 0/1:0:48
One sample had way more degenerate bases:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001
NW_022882922.1 8082 . G A 0 PASS NS=1:DP=6 GT:GQ:DP 0/1:0:6
NW_022882922.1 11106 . T G 0 PASS NS=1:DP=19 GT:GQ:DP 0/1:0:19
NW_022882922.1 17828 . C G 0 PASS NS=1:DP=27 GT:GQ:DP 0/1:0:27
NW_022882922.1 25160 . G Y 0 PASS NS=1:DP=37:GU=T/C GT:GQ:DP 0/1:0:37
NW_022882922.1 27396 . G A,R 0 PASS NS=1:DP=33:GU=A/G GT:GQ:DP 1/2:0:33
NW_022882922.1 28342 . G A,R 0 PASS NS=1:DP=27:GU=A/G GT:GQ:DP 1/2:0:27
NW_022882922.1 28895 . C T 0 PASS NS=1:DP=32 GT:GQ:DP 0/1:0:32
NW_022882922.1 29887 . C A 0 PASS NS=1:DP=35 GT:GQ:DP 0/1:0:35
NW_022882922.1 40905 . T C,Y 0 PASS NS=1:DP=17:GU=T/C GT:GQ:DP 1/2:0:17
NW_022882922.1 43671 . A C 0 PASS NS=1:DP=11 GT:GQ:DP 0/1:0:11
NW_022882922.1 43859 . A T 0 PASS NS=1:DP=18 GT:GQ:DP 0/1:0:18
NW_022882922.1 46336 . G A,R 0 PASS NS=1:DP=26:GU=A/G GT:GQ:DP 1/2:0:26
When I try to combine them with GATK, I got an error because of them.
I have preprocessed my samples in two different ways. My goals are:
- Count and estimate the percentage of degenerate sites (with R/Y). I can count how many total sites there are with bcftools but I don't know how to count degenerate sites.
- After knowing which preprocessing is better, I would like to filter those degenerate bases/sites to finally make my dataset.
Thanks;
Please use the formatting bar (especially the
code
option) to present your post better. You can use backticks for inline code (`text` becomestext
), or use one of (a) the option highlighted in the image below/ (b) fenced code blocks for multi-line code. Fenced code blocks are useful in syntax highlighting. If your code has long lines with a single command, break those lines into multiple lines with proper escape sequences so they're easier to read and still run when copy-pasted. I've done it for you this time.what was the error that GATK returned?