How to annotate SNVs in a BAC sequenced by NGS
0
0
Entering edit mode
3.1 years ago
Hagen ▴ 10

Hello,

I'm trying to annotate variations in NGS data from bacterial artificial chromosomes with respect to the reference sequence.

To do this i build a map of the BAC (including vector) and map the NGS reads to this BAC map. I also use a variant caller to find any differences to my reference map. These are then curated by hand until all variations have been correctly called.

What I would like to is to check these variants against known SNP-IDs.

To this I downloaded the dbSNP135 in bigbed format, and used the genomic positions of my reference DNA to extract all SNVs for this region from the database using bigBedToBed. I can then assign rsxxxxx IDs by matching the positions for SNVs as the positions are unambiguous using custom scripts (I'm using python, but that should not matter much).

BUT: for del/ins and variations at homopolymers it is more difficult since the position can be ambiguous and I might therefore miss the match. Example: an addional T in GATTTTACG could be either GA-T-TTTTACG ord GATTTT-T-ACG, which seems to be dependent on the strand of the reference as variant callers tend to place such SNVs at the 5' End. If however my reference is on the minus strand in comparison to the genome reference (hgh19 or 38) (because it was used revcom so that the gene of interest contained in the BAC is 5'-->3') the SNV in dbSNP153 does not match. Additionally at least for insertions it can either be class "ins" --> Ref "-" Alt "A", but could also be delins like Ref "GG" Alt "GGA". For homopolymers it seems that the entire homopolymer is usually (or sometimes?) the Ref and would run under delins.

I'm a little bit lost on how to deal with these ambiguities. Are there any specific rules as to when an insertion/deletion is either ins or del or delins? and what are the rules for the genomic positions of such variations.

I have also seen many people here recommending Tools like SNPeff, AnnoVar, Ensembl VEP, but since those need vcf or BED input I assume they would also fail if the positions don't match.

Any hint on how to tackle this problem would be highly appreciated!

Many thanks Hagen

dbSNP SNV NGS variant python • 479 views
ADD COMMENT

Login before adding your answer.

Traffic: 1911 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6