How to create B-allele frequency file for ADTEx?
1
2
Entering edit mode
9.4 years ago
Dataman ▴ 380

Hi,

I am using ADTEx to perform copy number analysis of some exome sequencing data. ADTEx requires a B-allele frequency file in order to perform ploidy estimation and genotype prediction. However, I do not know how to create such file and the ADTEx tutorial does not explain it either. I was wondering if there is a general way to create B-allele frequency files that I do not know of!

The file as is explained in the tutorial should have the following fields:

  • chrom - chromosome name (same format as in BED or BAM file)
  • SNP_loc - location of the SNP
  • control_BAF - B allele frequency (BAF) at each SNP in control sample
  • tumor_BAF - B allele frequency (BAF) at each SNP in tumor sample
  • control_doc - Total read count at each SNP in control sample
  • tumor_doc - Total read count at each SNP in tumour sample

I would like to thank you in advance for your responses and wishing you all a nice summer.

adtex next-gen-sequencing • 5.2k views
ADD COMMENT
0
Entering edit mode

Hi, I want to use ADTEx as well but how do I create or where do I get the target definition file? Thanks for your help

ADD REPLY
1
Entering edit mode

Hi, sorry for the late reply! Here is a link for a standard probe file: http://sourceforge.net/projects/conifer/files/probes.txt/download

This set of probes was made by Krumm et al (see: http://conifer.sourceforge.net/tutorial.html ). Ideally you need to make your own set of probes and for that you need to know how the Exome sequencing was done i.e. which kit was used for pulling down the targets and performing the sequencing. Hope this is helpful! :)

ADD REPLY
2
Entering edit mode
9.4 years ago
russhh 5.7k

Here's what I do:

  • Compute germline heterozygous SNPs for all patients using GATK::HaplotypeCaller (all patients provided to a single run of GATK, that is; though for speed I split this up by chromosome then merge back together)
  • Extract patient-specific .vcf of heterozygous SNPs from the latter
  • Convert the .vcf to a .bed for each patient
  • Generate a samtools pileup for both the tumour and normal sample using the latter .bed file
  • Run Varscan somatic --validate over the two pileups (outputting in varscan native format)
  • Then I convert a dataframe of the varscan data into a .baf dataframe using the following
def snps_to_baf(
        vscan_data,
        drop_if_no_alt_in_normal = False,
        drop_N_refs = False
        ):
    """
    Takes VarScan somatic snp data (as a Dataframe) and computes the depth of coverage 
    and b-allele fractions at each site therein. Returns a pandas dataframe of the same 
    form as in the example file provided by ADTEx
    That is, 
        chrom    SNP_loc    control_BAF    tumor_BAF    control_doc    tumor_doc

    For use in ADTex, I suggest setting drop_if_no_alt_in_normal and drop_N_refs to True
    """
    baf_cols = ['chrom', 'SNP_loc', 'control_BAF', 'tumor_BAF', 'control_doc',
         'tumor_doc']
    empty_baf = pd.DataFrame({k : [] for k in baf_cols}, columns = baf_cols)

    if drop_if_no_alt_in_normal:
        # keep rows that have at least one read supporting the alt allele
        # in the normal sample (if specified by the user)
        vscan_data = vscan_data[vscan_data.normal_reads2 > 0]

    if drop_N_refs:
        # drop all rows that have 'N' as the reference allele
        vscan_data = vscan_data[vscan_data.ref != 'N']

    if len(vscan_data) == 0:
        return empty_baf
    # 
    baf_data = pd.DataFrame({
        'chrom'       : vscan_data['chrom'],
        'SNP_loc'     : vscan_data['position'],
        'control_BAF' : vscan_data['normal_reads2'] / (
             vscan_data['normal_reads1'] + vscan_data['normal_reads2']
             ),
        'tumor_BAF'   : vscan_data['tumor_reads2'] / (
            vscan_data['tumor_reads1'] + vscan_data['tumor_reads2']
            ),
        'control_doc' : vscan_data['normal_reads1'] + vscan_data['normal_reads2'],
        'tumor_doc'   : vscan_data['tumor_reads1']  + vscan_data['tumor_reads2']
        },
        columns = baf_cols
        )
    return baf_data
ADD COMMENT
1
Entering edit mode

obviously, you need python / pandas etc for this to work

ADD REPLY
0
Entering edit mode

@russ_hyde: Thank so much for the comprehensive answer. I will give it a try asap! :)

ADD REPLY

Login before adding your answer.

Traffic: 1814 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6