Dear Biostars,
My aim is to generate start and end cordinates for each chromosome in human reference genome to use for other downstream analysis.
so using R packages BSgenome
and GenomicFeatures
I generated the cordinates of hg38 genome and saved as .bed
file.
However, i think my cordinates (.bed) i generated are inaccurate because .bed files are 0-based index
Question; Any susggestions on how to generate a 0-based index .bed file for chromosomes of human reference genome?
codes used
################begingining of codes#################
library(BSgenome)
library(GenomicFeatures)
library(GenomeInfoDbData)
# download a local copy of BSgenome.Hsapiens.UCSC.hg38 using bioconductor
if (interactive()) {
if (!require("BiocManager"))
install.packages("BiocManager")
BiocManager::install("BSgenome.Hsapiens.UCSC.hg38")
}
##load BSgenome.Hsapiens.UCSC.hg38 on current workspace
library(BSgenome.Hsapiens.UCSC.hg38)
##assign hg38 genone to `hg38` variable to save on typing the long name
hg38 <- getChromInfoFromUCSC("hg38")
##get cordinates of hg38 in GRaanges formart
hg38.cordinates <- GRanges(seqnames = hg38[,1],ranges=IRanges(end=hg38[,2],width = hg38[,2]))
ranges(hg38.cordinates)
###save as df and write out as a bed file
hg38.cordinates.dt <- cbind(hg38[,1],as.data.frame(ranges(hg38.cordinates)[,c(1,2)]))
dim(hg38.cordinates.dt)
write.table(file="/path/to/write_to/hg38.cordinates.bed",hg38.cordinates.dt,sep="\t",quote=F,row.names=F, col.names = c("chr","start","end","width"))
##########################end of codes################################
0-based simply means that you have to subtract 1 from the start coordinates, so the start coordinate of a chromosome is 0 and the end is the length of the chromosome.