co-occurrence matrix from my VCF file
1
0
Entering edit mode
12 weeks ago
HarperReed • 0

Hello, I'm working with a haploid organism and want to build a co-occurrence matrix from my VCF file to identify which variants/mutations co-occur on the same read. When I worked with diploid organisms, I had to phase the VCF file before constructing this matrix. Now that I'm focusing on a haploid organism, what should I do? Can I still use the same logic of phasing (even though it's haploid), or is there a different approach I should take to build the matrix?

thank you in advance

co-occurrence vcf • 291 views
ADD COMMENT
0
Entering edit mode
12 weeks ago
robben • 0

From the VCF designation https://samtools.github.io/hts-specs/VCFv4.2.pdf enter image description here

Each row represents a new variant but because multiple reads are used to call variants it no longer has the information at the read level. It does however have positional information so you can likely use that to look in you BAM/SAM file for reads that span multiple variants. Some psuedocode might be:

for row in SAM:
for row in VCF:
    if SAM.start < VCF.position { Left = VCF.row }
    if SAM.end > VCF.position { Right = VCF.row }
done
if Left & Right { contains2 = True }
Spanning[row,] = [contains2, Left, Right]

Also, it shouldn't be a problem with the phasing because a single read is physically incapable of having sequence from both genomes of a diploid organisms since each read is from a single DNA molecule.

ADD COMMENT

Login before adding your answer.

Traffic: 1963 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6