Calling TE sequences from VCF files
1
1
Entering edit mode
5.0 years ago
elcortegano ▴ 200

I have run the software sniffles to call structural variants from Pacbio sequences. In the resultant VCF file, most of entries in the ALT field look like:

N[chromosome_3:7199420[

In the example above, this is actually for a variant in a different chromosome, CHR=chromosome_1 and POS=270281, so I guess this is a transponsable element coming from chromosome 3 that is present in that location.

I am not familiarized with this format for the ALT field, and was wondering if there is a straightforward way to get the sequence for that element (or any other structural variant found in the VCF). Any ideas?

next-gen variant-calling sniffles • 1.3k views
ADD COMMENT
2
Entering edit mode
3.6 years ago
Shunhua ▴ 20

What you saw is a BND record, which represents arbitrary rearrangement event with 2 break ends. The t[p[ format represents “piece extending to the right of p is joined after t” (see details in https://samtools.github.io/hts-specs/VCFv4.2.pdf).

In Sniffles, this is likely a translocation event that might or might not involve transposable element. To extract SV sequence, you can use -n -1 option to have Sniffles output all SV-supporting reads for each SV entry in the VCF, then you can find read IDs under RNAMES=. The first ID usually represents the "primary SV read" that contain representative sequence.

If you are using long read data and are interested in getting all non-reference transposable element sequences based on Sniffles output, you can use TELR (https://github.com/bergmanlab/TELR) that will run Sniffles, find candidate TE loci, and report their sequences based on a local assembly strategy.

ADD COMMENT

Login before adding your answer.

Traffic: 1365 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6