Incorporating Raw Read Coverage Per Sample In Merged Vcf
2
0
Entering edit mode
12.1 years ago
Drio ▴ 920

I have a merged vcf file and I'd like to add the raw depth coverage per each sample for each of the snps. My plan was to extract the raw coverage from each bam and then add a new field to the merged vcf.

Notice that I am not asking for the DP field. I want to know what was the raw depth coverage at that locus on each of the samples.

Have you guys done that before? If so what approach do you follow? Is there any tool available for that or you have done it by hand?

vcf merge coverage depth-of-coverage • 3.6k views
ADD COMMENT
1
Entering edit mode
12.1 years ago
Drio ▴ 920

I think this feature should be incorporated in the vcf-tools, concretely in the merge tool. In the meantime I wrote some code that implements what I described.

Thanks for your post Pierre. I found some "issues" with your solution:

  1. No header describing the new INFO field is added to the metadata. Maybe you added it but didn't pasted it?
  2. We have to pass the bams to your java tool. There is no need to do that since the path to the bams should be contained in the original vcf header. So the tool can extract them.
  3. It won't scale for big vcf files. Too many IOs to retrieve the coverage at site.

My solution can be found here. It implements 1 and 2 and overcomes 3 by only computing the RDP field when the functional consequence of the SNP is interesting. More details in the gist.

The usage looks like this:

$ gzip -cd merged.vcf.gz | java vcfAddCoverage

A gzipped input example can be found here.

ADD COMMENT
0
Entering edit mode
12.1 years ago

For one sample the following java program reads a VCF from stdin and appends 'REALDP' to each sample (one bam per sample)

(warning: Not fully tested, I've played with some random bams/reference):

import java.io.BufferedReader;
import java.io.File;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Pattern;

import net.sf.samtools.SAMFileReader;
import net.sf.samtools.SAMRecordIterator;
import net.sf.samtools.SAMFileReader.ValidationStringency;


public class Biostar42544 {

    public static void main(String[] args) throws Exception
        {
        if(args.length==01) return;
        Pattern tab=Pattern.compile("[\t]");
        List<SAMFileReader> inputSams=new ArrayList<SAMFileReader>();
        for(int i=0;i< args.length;++i)
            {
            SAMFileReader inputSam=new SAMFileReader(new File(args[i]));
            inputSam.setValidationStringency(ValidationStringency.SILENT);
            inputSams.add(inputSam);
            }
        String line;
        BufferedReader in=new BufferedReader(new InputStreamReaderSystem.in));
        while((line=in.readLine())!=null)
            {
            if(line.startsWith("#"))
                {
                System.out.println(line);
                continue;
                }
            String tokens[]=tab.split(line);
            int pos=Integer.parseInt(tokens[1]);
            tokens[8]+=":REALDP";
            for(int i=0;i< tokens.length-9  && i< inputSams.size();++i)
                {
                int count=0;
                SAMRecordIterator iter=inputSams.get(i).query(tokens[0],pos, pos, false);
                while(iter.hasNext())
                    {
                    iter.next();
                    count++;
                    }
                iter.close();
                tokens[9+i]+=(":"+count);
                }
            for(int i=0;i< tokens.length;++i)
                {
                if(i>0) System.out.print('\t');
                System.out.print(tokens[i]);
                }
            System.out.println();
            }
        for(SAMFileReader i:inputSams) i.close();
        }

    }

Compile:

$ javac -cp .:sam-1.63.jar Biostar42544.java

Usage:

$ samtools mpileup -g -f ex1.fa sorted.bam sorted1.bam sorted2.bam | \ bcftools view -vcg - |\ java -cp .:sam-1.63.jar Biostar42544 sorted1.bam sorted2.bam

##fileformat=VCFv4.1
##samtoolsVersion=0.1.18 (r982:295)
(...)
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
#CHROMPOSIDREFALTQUALFILTERINFOFORMATex1ex1b
seq138.CA27.3.DP=6;VDB=0.0286;AF1=0.5002;AC1=2;DP4=3,0,3,0;MQ=60;FQ=25.6;PV4=1,1,1,1GT:PL:GQ:REALDP0/1:40,0,38:39:450/1:22,0,21:22:45
seq1107.CG5.83.DP=18;VDB=0.0029;AF1=0.4946;AC1=2;DP4=6,0,3,0;MQ=60;FQ=8.15;PV4=1,6.9e-06,1,1GT:PL:GQ:REALDP0/1:25,0,65:25:240/1:14,0,40:15:24
seq1288.AACATAG999.INDEL;DP=78;VDB=0.1070;AF1=0.5;AC1=2;DP4=18,12,24,12;MQ=60;FQ=999;PV4=0.62,0.36,1,0.017GT:PL:GQ:REALDP0/1:255,0,255:99:780/1:255,0,255:99:78
seq1548.CA999.DP=117;VDB=0.0921;AF1=0.5;AC1=2;DP4=33,24,12,39;MQ=60;FQ=999;PV4=0.0004,0.0075,1,1
(...)
seq2784.CAATTCAATTAATT999.INDEL;DP=147;VDB=0.1121;AF1=1;AC1=4;DP4=0,0,54,54;MQ=57;FQ=-147GT:PL:GQ:REALDP1/1:255,217,0:99:1471/1:255,108,0:99:147
seq21344.AC280.DP=90;VDB=0.1087;AF1=0.5;AC1=2;DP4=15,27,9,27;MQ=59;FQ=283;PV4=0.34,0.0082,0.0027,1GT:PL:GQ:REALDP0/1:179,0,215:99:960/1:136,0,172:99:96
ADD COMMENT

Login before adding your answer.

Traffic: 2579 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6