Converting Bedgraph to coverage
3
0
Entering edit mode
8.1 years ago

I downloaded a dataset from your the UCSC Genome Browser- "Table Browser" with the following input parameters:

"Mouse + Mapping and sequencing".

This is an effort to find a measure of mappability throughout the genome (at bp level) and to align these measurements with a previous experiment output of GC content (also mapped to the genome at a nucleotide level), I want to plot and contrast GC content vs mappability.

As I have already found the GC content using HOMER (where 'annotate.pl' records the GC content at a nucleotide level):

annotatePeaks.pl /projects/reference_genome.bed mm9 -size 20000 -hist 10 -d -CpG > GC_content.txt

I would now like to take the dataset from the UCSC genome browser and plot it against the GC content: The downloaded dataset is in the following format:

chr1    3000000 3000058 1 
chr1    3000058 3000059 0.5 
chr1    3000059 3000060 0.0625 
chr1    3000060 3000073 0.05

This is a bedgraph file and I need it to be converted such that the mappability is documented at a 'per base' level. Note that lengths of nucleotides are skipped which is not a the 'per base' level...

I attempted to do this with "BedTools" as per the following template:

bedtools genomecov [OPTIONS] [-i|-ibam] -g

typical attempt (where '-d' signifies that coverage should be at the 'per base' level)

bedtools genomecov -d -i UCSC_mm9_mappability_dataset -g 

Error: The requested genome file () could not be opened. Exiting!

But I get the above error

(I have also tried without '-g' and and with '-bg' but it says that I can not use 'bg' and '-d' at the same time... I also tried ommitting the -g/bg parameter completely but it gave me the same can not open file error).

Can anyone recommend a different software that would fulfil my task or how to get my current strategy working?

alignment mappabilitty • 3.5k views
ADD COMMENT
2
Entering edit mode
8.1 years ago

the -g option is needed when you need to report 0 coverage regions using bedtools genomecov. if that is not the case, and your only requirement is to split each coverage region into a per-base file, then you only need a quick awk/perl transformation:

perl -lane 'for ($F[1]+1..$F[2]) { print "$F[0]\t$_\t$F[3]" }' input.txt
ADD COMMENT
1
Entering edit mode
8.1 years ago
Dan D 7.4k

You're very close. Don't give up on bedtools! The "genome file" is just a list of contigs and their sizes. Run this on your command line:

mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e "select chrom, size from mm9.chromInfo"  > mm9.genome

And you've got what you need for the -g argument. Or just copy and paste this into a file:

chrom   size
chr1    197195432
chr2    181748087
chr3    159599783
chr4    155630120
chr5    152537259
chr6    149517037
chr7    152524553
chr8    131738871
chr9    124076172
chr10   129993255
chr11   121843856
chr12   121257530
chr13   120284312
chr14   125194864
chr15   103494974
chr16   98319150
chr17   95272651
chr18   90772031
chr19   61342430
chrX    166650296
chrY    15902555
chrM    16299
chr13_random    400311
chr16_random    3994
chr17_random    628739
chr1_random     1231697
chr3_random     41899
chr4_random     160594
chr5_random     357350
chr7_random     362490
chr8_random     849593
chr9_random     449403
chrUn_random    5900358
chrX_random     1785075
chrY_random     58682461
ADD COMMENT
0
Entering edit mode

Thanks so much, I tried the following: (after running the command you suggested to create the size file)

bedtools genomecov -i mm9_mappability -g mm9.genome

It returns the following error; "Less than the req'd two fields were encountered in the genome file (mm9_mappability_data_points) at line 100098. Exiting."

I tried looking through the bedgraph documentation but can't find anything where to specify such a size-documenting file... Thanks

ADD REPLY

Login before adding your answer.

Traffic: 2074 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6