Number of genes covered in WES
2
0
Entering edit mode
15 months ago

Hi,

While doing WES analysis, is there any possibility to find out how many genes have been covered in the sample ? That is, the kit I use claims that it covers this many genes. So I want to identify how many of these genes have been captured/sequenced in the sample. I have the bed file of the particular kit.

Please tell me how can I do this.

Thank you.

targetsequencing WES coverage bedfile • 1.3k views
ADD COMMENT
3
Entering edit mode
15 months ago
DareDevil ★ 4.3k

You can assess the coverage of genes targeted by a specific kit during Whole Exome Sequencing (WES) analysis using the provided bed file. This will help you determine how effectively the kit has captured and sequenced the intended genes in your sample. Here's how you can approach this:

Prepare the Bed File: Make sure you have the correctly formatted BED file that defines the target regions of the kit you used. This file should contain the chromosome, start position, end position, and possibly other information for each targeted region.

Map Reads: Perform the standard steps of WES analysis, including read mapping (alignment) of your sequenced data (FASTQ files) to the reference genome using a tool like BWA or Bowtie2.

Calculate Coverage: Use tools like bedtools to calculate the coverage of the mapped reads over the targeted regions defined in the BED file. Bedtools provides a function called coverageBed that can help you calculate the coverage statistics.

Analyze Coverage Data: Once you have the coverage data, you can analyze it to determine how many genes or target regions have been sufficiently covered by the sequencing.

Step 1: Prepare the Bed File

Ensure that you have a correctly formatted BED file that defines the target regions of the kit. The BED file should look something like this:

chr1    1000    2000    Gene1
chr1    3000    4000    Gene2
...

Step 2: Map Reads

Map the sequenced reads to the reference genome using a suitable alignment tool like BWA or Bowtie2.

Step 3: Calculate Coverage

Use the coverageBed command from bedtools to calculate the coverage of the mapped reads over the target regions in the BED file. Here's an example command:

bedtools coverage -a target_regions.bed -b mapped_reads.bam > coverage.txt

This will generate a coverage.txt file containing coverage information for each target region.

Step 4: Analyze Coverage Data

Write a script to analyze the coverage data and count how many target regions meet a certain coverage threshold (e.g., 20x). You can count the number of genes or target regions that meet this threshold.

coverage_threshold = 20
covered_genes = set()

with open("coverage.txt", "r") as coverage_file:
    for line in coverage_file:
        parts = line.strip().split("\t")
        gene = parts[3]
        coverage = int(parts[7])
        if coverage >= coverage_threshold:
            covered_genes.add(gene)

num_covered_genes = len(covered_genes)
print(f"Number of genes covered at {coverage_threshold}x or higher: {num_covered_genes}")
ADD COMMENT
0
Entering edit mode

Hi,

Thank you for the detailed reply. I generated the coverage.txt file but after running the Python script I got the number of genes covered as zero. Do you have any idea why this happened?

This is how my coverage.txt looks like:

1   12080   12251   DDX11L1 176 171 171 1.0000000
1   12595   12802   DDX11L1 259 207 207 1.0000000
1   13163   13658   DDX11L1 876 495 495 1.0000000
1   14620   15015   WASH7P  295 395 395 1.0000000
1   15795   15914   WASH7P  0   0   119 0.0000000
1   16743   17098   WASH7P  405 355 355 1.0000000
1   17247   18121   WASH7P  440 874 874 1.0000000
ADD REPLY
2
Entering edit mode

If the code is not working for you properly. You can try packages like mosdepth

mosdepth --by capture.bed sample-output sample.exome.bam

BamStats05 or by using bedtools

bedtools coverage -sorted -d -g reference.genome -a genes.bed -b my_sorted.bam | sort -k4,4 bedtools groupby -g 4 -c 8 -o mean | sort -k1,1 > gene_coverage.txt
ADD REPLY
0
Entering edit mode
15 months ago
Zhenyu Zhang ★ 1.2k

You need to define the concept of "a gene is covered". A few things

  1. which region? I assume you are talking about exons only?
  2. which gene model?
  3. how much coverage is considered covered (90% of all exon bases?)
  4. what is the depth you need?

This normally goes like this sequencing covers 500 genes with at least 30x coverage of more than 90% of the exon bases in this gene model.

ADD COMMENT
0
Entering edit mode
  1. Yes, I meant exons.
  2. I didn't get you
  3. Yes
  4. I don't know how to choose that
ADD REPLY

Login before adding your answer.

Traffic: 1787 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6