Bedtools Genomecoveragebed Usage : How To Create A Genome File?
5
6
Entering edit mode
11.5 years ago
komal.rathi ★ 4.1k

I am using BEDTOOLS and the following command to get the coverage file:

$ ./genomeCoverageBed -ibam ~/GG_project/trim/ecoli.bam -g <genome file=""> > ~/GG_project/trim/coverage

where ecoli.bam is my sorted bam file, and coverage is my output file

From where do I get the genome file? How do I create a genome file?? Specifically I would need a ecoli.genome file.

bedtools • 40k views
ADD COMMENT
24
Entering edit mode
8.2 years ago
kirannbishwa01 ★ 1.6k

To make a genome file (for bed tools) using reference genome

1) Use samtools to generate fasta index samtools faidx lyrata_genome.fa - this will create a lyrata_genome.fa.fai (index file)

But, this index file won't work as genome file due to file format issue (mainly more than required number of columns).

2) take the index file, then use awk

awk -v OFS='\t' {'print $1,$2'} lyrata_genome.fa.fai > lyrata_genomeFile.txt

  • this prints the 1st and 2nd column of a fai index file and separates the column by tab (OFS flag).
  • use this file as genome file in bedtools.

if space desired between columns do this

awk {'print $1,"",$2'} lyrata_genome.fa.fai > lyrata_genomeFile.txt

if 'chr' needs to be added infront of the chromosome/scaffold names do this

awk {'print "chr"$1,"",$2'} lyrata_genome.fa.fai > lyrata_genomeFile.txt

ADD COMMENT
3
Entering edit mode
11.5 years ago
Ian 6.1k

If you are referring to the '-g' flag you can simply use the 'fetchChromSizes' script (link is to a 64bit Linux binary).

fetchChromSizes hg19 > hg19.chrom.sizes

'hg19' is an example, but any of the other UCSC genomes can be used, e.g. mm9, sacCer3, etc.

The '>' is a redirect to a new file.

The output is a tab-delimited file of chromosome name followed by it length.

I hope that helps.

ADD COMMENT
2
Entering edit mode
6.1 years ago
Medhat 9.8k

This is too old , but I just saw it :)

you can use this with the bam file itself if it has the header:

samtools view -H my.bam | grep -P '^@SQ' | cut -f 2,3 | awk 'BEGIN{OFS="\t"}{split($1, a, ":"); split($2, b, ":"); print a[2], b[2] }'

ADD COMMENT
3
Entering edit mode

Or with sed:

$ samtools view -H file.bam|grep @SQ|sed 's/@SQ\tSN:\|LN://g'' > genome.txt
ADD REPLY
1
Entering edit mode

There is an unnecessary single quote at the end. Should be: $ samtools view -H file.bam|grep @SQ|sed 's/@SQ\tSN:\|LN://g' > genome.txt

ADD REPLY
0
Entering edit mode

Actually I found also using genomeCoverageBed with Ibam do not require -g :D

ADD REPLY
0
Entering edit mode

Thanks, though newer versions of many bedtools seem generally more forgiving with regard to the -g option. Being stuck with 2.25 it's still required and the samtools faidx option requires htslib...

ADD REPLY
2
Entering edit mode
5.0 years ago
Phoenix Mu ▴ 100

I think this answer might be helpful although I am late. I saw this from the help page of genomeCoverageBed:

Tips:
One can use the UCSC Genome Browser's MySQL database to extract
chromosome sizes. For example, H. sapiens:

mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e \
"select chrom, size from hg19.chromInfo" > hg19.genome
ADD COMMENT
1
Entering edit mode
11.5 years ago

samtools faidx output is pretty much what a genome file looks like. So run that on your reference genome.

ADD COMMENT
0
Entering edit mode

When I use this command: $ samtools faidx ref_sequence.fasta I get a ref_sequence.fasta.fai file which is only 32K large Its content are as follows:

gi|49175990|ref|NC_000913.2| 4639675 89 70 71

Just one line.

With this output my depth coverage plot is a bell shaped graph.

ADD REPLY
1
Entering edit mode

Hi, what tool did you use to generate the histogram (represented graphically as in x-axis and y-axis) from the output of genomeCoveragebed?

thanks

ADD REPLY
0
Entering edit mode

Sounds just like what you should expect, right?

ADD REPLY

Login before adding your answer.

Traffic: 2531 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6