Question

Bait/Target files for Picard HsMetrics (Exome Sequencing)

1

Entering edit mode

10.0 years ago

rafamoura1987bio ▴ 10

Hi,

I used the expanded exome kit (nextera) for my samples. Now I want to use picard/HsMetrics, which asks for a baits and targets files.

On the Illumina website, I can find the targets bed file for the kit. I also find the targeted_regions file.

Are these the correct files for picard HsMetrics? When I use them, I get strange values for efficiency (like >2).

Maybe the targets file is some sort of "expanded exome definition"? if so, how do I find this definition for "expanded exome / illumina"? if the targets is this "exome definition", what is the the baits file? (would it be the targets file provided by illumine?)

Sorry for the naive question, thank you

picard exome-sequencing • 11k views

ADD COMMENT • link updated 2.9 years ago by Ram 45k • written 10.0 years ago by rafamoura1987bio ▴ 10

Ram · Answer 1 · 2015-08-13

0

Entering edit mode

10.0 years ago

Zaag ▴ 870

From what I understand you get a Targeted Regions Manifest and Exome Probe Manifest, so use the regions from the Probe Manifest to create the Tiled region file for picard and the Targeted one for the other.

http://support.illumina.com/downloads/nextera-rapid-capture-expanded-exome-product-files.html

You could also make your own target file with all coding exons (from UCSC) or something else.

ADD COMMENT • link updated 2.9 years ago by Ram 45k • written 10.0 years ago by Zaag ▴ 870

1

Entering edit mode

The problem with creating your own target file is that none of the exome capture kits capture all known/annotated exons. So you will throw off your on/off target statistics really badly if you do this.

ADD REPLY • link 10.0 years ago by User 59 13k

0

Entering edit mode

Isn't that exactly what you want to know? I'm not interested in knowing how efficient is Illumina at targeting the regions they say they target. My interest is in knowing how efficient is Illumina in targeting a list of annotated exons I trust from source X.

For this reason, I personally prefer creating my own targeted file and use it with the probe manifest from Illumina.

ADD REPLY • link 10.0 years ago by Carlos Borroto ★ 2.1k

1

Entering edit mode

No, you need to know how efficient your capture is, and how many exons you have failed to capture in the targeted region. Knowing that you haven't captured things that you weren't meant to capture isn't very useful in the grand scheme of things. You should have checked your targets of interest were covered in the kit before you ran your experiment.

ADD REPLY • link updated 2.9 years ago by Ram 45k • written 10.0 years ago by User 59 13k

0

Entering edit mode

I look at the on/off bait statistic for that. That tells you how efficient your capture is.

If you are able to cover your target of interest also depends on fragment and read size

ADD REPLY • link 10.0 years ago by Zaag ▴ 870

0

Entering edit mode

Hence my comment "So you will throw off your on/off target statistics really badly if you do this" ;) With Agilent SureSelect, the on/off bait and on/off target are generally for all intents and purposes identical...

ADD REPLY • link 10.0 years ago by User 59 13k

0

Entering edit mode

Ok I didn't know that, we use Nimblegen and the proberegion sometimes is 3 times the target (for custom designs).

ADD REPLY • link 10.0 years ago by Zaag ▴ 870

Ram · Answer 2 · 2015-08-13

Hi @Zaag, you're correct: I get the targeted regions and probe manifests. Are you saying that the intervals file I can create from the "targeted regions" file should be used as "TARGET_INTERVALS" for Picard? Similarly, the intervals file created from the probe manifest should be used as "BAIT_INTERVALS" for Picard?

If so, I'm afraid I'm missing something because the "bait efficiency" field exceeds 100% (the probe manifest gives me something around 30Mb, while the targeted regions is around 60Mb... so, efficiency would be the ratio 60/30)....

I'd like to use the definition of exome as set by Illumina... It's not that I want to know "how efficient is Illumina at targeting the regions they say they target", but I want to know how far off my experiment is from such definition... (does this even make sense?)

I'm afraid my question may concern even simpler concepts: is "BAIT" (as defined by Picard) the same as "probe" (as defined by Illumina)?

Ram · Answer 3 · 2016-12-05

cat nexterarapidcapture_exome_targetedregions_v1.2.bed | sed s/chr//g | sed s/M/MT/g > nexterarapidcapture_exome_targetedregions_v1.2.no_chr.MT.bed
cat NexteraRapidCapture_Exome_Probes_v1.2.txt | grep CEX | sed -e 's/chrM/chrMT/g;s/chr//g;' | cut -f2,3,4 > NexteraRapidCapture_Exome_Probes_v1.2.bed
picard BedToIntervalList I=annotations/NexteraRapidCapture_Exome_Probes_v1.2.bed O=annotations/NexteraRapidCapture_Exome_Probes_v1.2.interval_list SD=human_g1k_v37.dict 
picard BedToIntervalList I=annotations/nexterarapidcapture_exome_targetedregions_v1.2.no_chr.MT.bed O=annotations/nexterarapidcapture_exome_targetedregions_v1.2.no_chr.MT.interval_list SD=human_g1k_v37.dict

rule hsMetrics:
    input:
        bam = config['process_dir'][freeze] + config['results']['recalibrated'] + "/{sample}.recal.la.bam",
        bam_probe_intervals = "annotations/NexteraRapidCapture_Exome_Probes_v1.2.interval_list",
        bam_target_intervals = "annotations/nexterarapidcapture_exome_targetedregions_v1.2.no_chr.MT.interval_list",
    output:
        hs = config['landing_dir'][freeze] + config['results']['hsmetrics'] + "/{sample}.hsmetrics",
    params:
        picard = config['jars']['picard']['path'],
        md = "CalculateHsMetrics",
        opts = config['tools']['opts']['med'] + ' ' + config['javatmpdir'],
        metrics = config['process_dir'][freeze] + config['results']['picard']
    log:
        config['datadirs']['log'] + "/{sample}.hsmetrics.log"
    shell:
        """
        {params.picard} {params.opts} \
        CalculateHsMetrics \
        BAIT_INTERVALS={input.bam_probe_intervals} \
        TARGET_INTERVALS={input.bam_target_intervals} \
        INPUT={input.bam} \
        OUTPUT={output.hs} \
        METRIC_ACCUMULATION_LEVEL=ALL_READS \
        QUIET=true  \
        VALIDATION_STRINGENCY=SILENT 2> {log}
        """