Create a bed file bins from fasta?
2
Hi, given a fasta file, I would like to create 1Mb bins until the end of the chromosome and store that in a bed file as such.
1 chr1 0 1000000
3 chr1 1000000 2000000
5 chr1 2000000 3000000
7 chr1 3000000 4000000
9 chr1 4000000 5000000
11 chr1 5000000 6000000
The input fasta is from human hg38 coordinates.
Bonus would be great to do this in R but necessarily.
thanks.
bed
fasta
• 1.1k views
Solution of rpolicastro is the easiest. Here a custom solution.
library(Biostrings)
#/ Some dummy data, here yeast genome
url <- "http://ftp.ensembl.org/pub/release-106/fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna.toplevel.fa.gz"
fasta <- readDNAStringSet(url)
names(fasta) <- gsub(" .*", "", names(fasta))
make_bins <- function(fasta, binsize){
intervals <-
lapply(names(fasta), function(x){
current <- fasta[[x]]
div <- length(current)/binsize
chunks <- floor(div)
rest <- length(current) - chunks*binsize
if(chunks>0){
from <- seq(0, chunks*binsize, binsize)
to <- c(from[2:length(from)])
if(rest==0){
from <- from[1:(length(from)-1)]
} else {
to <- c(to, to[length(to)]+rest)
}
} else {
from <- 0
to <- length(current)
}
data.frame(chr=x, start=from, end=to)
})
intervals <- do.call(rbind, intervals)
return(intervals)
}
b <- make_bins(fasta, 50000)
head(b)
chr start end
1 I 0 50000
2 I 50000 100000
3 I 100000 150000
4 I 150000 200000
5 I 200000 230218
6 II 0 50000
Since you said you want to do this in R, you would first use the GenomicRanges::tileGenome
function to generate a GRanges
object with your genomic tiles. This can then be exported using rtracklayer::export
.
As a generic example.
library("GenomicRanges")
library("rtracklayer")
chrm_sizes <- c(I=1e9, II=2e8)
genome_tiles <- tileGenome(chrm_sizes, tilewidth=1e6)
export(genome_tiles, "genome_tiles.bed", format="bed")
Login before adding your answer.
Traffic: 1303 users visited in the last hour
+1
With
data.frame(unlist(genome_tiles))[,1:3]
to get a data.frame from that directly in R.