Question

Nucleotide coverage if >5 reads

0

Entering edit mode

7.1 years ago

Vca80553 • 0

Hi Everyone,

I have generated BAM files after mapping my Illumina reads to a reference genome. Now I want to know how much of the reference genome is covered (aligned/mapped) by reads (e.g. 98% of the reference genome is covered by reads). For that I used Bbmap/pileup.sh. It works fast and it is accurate.

However, the 98% I get is the % of nucleotides covered by at least 1 read. It it possible to modify this, and get % of nucleotides that were covered at least with 5 reads?

Thanx,

Sara

alignment sequence pileup • 3.1k views

ADD COMMENT • link updated 7.1 years ago by Pierre Lindenbaum 164k • written 7.1 years ago by Vca80553 • 0

1

Entering edit mode

Take a look at the pileup.sh command options I have compiled in this thread (search for name pileup) to see if one of those options helps.

ADD REPLY • link 7.1 years ago by GenoMax 147k

1

Entering edit mode

Hello Vca80553!

It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?t=78704

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY • link 7.1 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Yes, sorry. I didn't know this. My fault. Will not happen again. Thanx.

ADD REPLY • link 7.1 years ago by Vca80553 • 0

0

Entering edit mode

Welcome to biostars. Interesting guidelines for posting can be found in the following posts:

ADD REPLY • link 7.1 years ago by WouterDeCoster 47k

0

Entering edit mode

For pileup.sh

Output format (tab-delimited):
ID, Avg_fold, Length, Ref_GC, Covered_percent, Covered_bases, Plus_reads, Minus_reads, Read_GC, Median_fold, Std_Dev

Plus_reads:        Number of reads mapped to plus strand
Minus_reads:       Number of reads mapped to minus strand

Perhaps you can add those two numbers and filter on that to get the info you need.

ADD REPLY • link 7.1 years ago by GenoMax 147k

0

Entering edit mode

Tagging: Brian Bushnell

ADD REPLY • link 7.1 years ago by GenoMax 147k

0

Entering edit mode

Hi,

Thanx for answering.

I am not sure I understand your answer. (I am pretty new with this, sorry)

The output format gives me something like this for each sample:

ID, Avg_fold, Length, Ref_GC, Covered_percent, Covered_bases, Plus_reads, Minus_reads, Read_GC, Median_fold, Std_Dev
S1588, 24909.1657, 7904, 100.0000, 98.7854, 7808, 1105713, 1105713, 6112, 20615.50

The Plus reads and Minus reads give me the total number of mapped reads to the whole reference genome. How can I filter from there?

ADD REPLY • link updated 7.1 years ago by GenoMax 147k • written 7.1 years ago by Vca80553 • 0

1

Entering edit mode

Sorry. You are right. For some reason I was thinking that the output of pileup.sh is per base.

ADD REPLY • link 7.1 years ago by GenoMax 147k

score 2 · Answer 1 · 2017-10-20

2

Entering edit mode

7.1 years ago

harold.smith.tarheel ★ 5.0k

You can use BEDtools genomecov command:

bedtools genomecov -ibam FILE.BAM -d | awk '$4 > 4' | wc -l

Divide the output by the size of the genome to get the percentage.

ADD COMMENT • link 7.1 years ago by harold.smith.tarheel ★ 5.0k