Hi, I am using a bespoke program (g2gtools extract) that extracts sequences from a FASTA based on a GTF and the features you want (in my case I am extracting all transcripts). My FASTA is an entire genome of a diploid organism customized to include all SNPs and INDELS the organism carries.
When I run this program, it returns an error:
"Sequence contains non-DNA character '*' at position 393"
At first, I was thinking that maybe these denote stop codons but when I investigate my FASTA, there are only 35. But upon second thought, that doesn't make much sense because its a genomic FASTA
Thanks in advance!
UPDATE TO THE ISSUE
Ok, so I found out the issue. I found out that the patient VCF has overlapping INDELs and, although they are not denoted like this within the VCF, when I extracted a set of them in a patient with bcftools query
, the *
appears. I looked at GATK and this is how overlapping INDELs are annotated: https://gatk.broadinstitute.org/hc/en-us/articles/360035531912-Spanning-or-overlapping-deletions-allele
My issues now what to do to prevent the reporting of overlapping INDELs in this way... I wonder if there is a way to prevent this annotation...
Original trio VCF - so 2 parents and the patient:
#CHROM POS REF ALT
chr1 154590147 CCG C
chr1 154590148 CG C
chr1 154590149 G *
chr1 154590149 G C
and then, if I just extract the patient:
#CHROM POS REF ALT GT
chr1 154590148 CG C 0|1
chr1 154590149 G * 1|0
chr1 154590149 G C 0|1
after extracting the proband genotypes using bcftools query
.
Must be something related to this then.
I am using the same software for the entire pipeline - g2gtools -, so it would be weird if the output from a software is incompatible with that software....
Yea, youre right. g2gtools does introduce the into the fasta. I am just not sure what they mean. I am going to try to see what SNPs or INDELs align with the location of the in the FASTA
Can you show me what g2gtools commands you are running? I've been using g2gtools a bit in the past few weeks.
hey dsull ! Check out my update to the question please!