Error :creating a custom BS genome pakage for Olea Europeae
1
1
Entering edit mode
8.2 years ago

I am analysing a set of MeDip-seq Data Using the MeQa pipeline. my pipeline crushes because there is no bioconductor BSgenome Package available for Olea Europeae . i am creating my custom BSgenome package following the Bioconductor Manual .

here is my Package seed File

BSgenome.Oeuropaea.IOGC.v1_seed-file.txt.
Package: BSgenome.Oeuropaea.IOGC.v1.
Title: Full genome sequences for Olea Europaea var. sylvestris (IOGC version 1).
Description: Full genome sequences for Olea Europaea var. sylvestris (olive) as provided by IOGC (v1, 2016) and stored in olea europaea genome browser.
Version: 1.0.
organism: Olea Europaea var.sylvestris.
common_name: Olive.
provider: IOGC.
provider_version: v1.
release_date: 2016.
release_name: OE1.0.
source_url: http://h3abionet.fso.ump.ma/cgi-bin/gb2/gbrowse/olea_europea/.
organism_biocview: Olea_europaea.
BSgenomeObjname: Oeuropaea.
seqs_srcdir:/opt/exp_soft/magrid/MeQA-1.0.0/oussama/Olea-Europaea_BS_genome_package.
seqnames: paste("chr", c(1:23, "Un", paste(c(1:23, "Un"))), sep="").

when i run the

> library(BSgenome)
> forgeBSgenomeDataPkg("path/to/my/seed")

the files are loaded but then i get the following error

Loading 'chr23' sequence from FASTA file '/opt/exp_soft/magrid/MeQA-1.0.0/a/Olea-Europaea_BS_genome_package/chr23.fa' ... DONE
Loading 'chrUn' sequence from FASTA file '/opt/exp_soft/magrid/MeQA-1.0.0/a/Olea-Europaea_BS_genome_package/chrUn.fa' ... DONE
**Error in XVector:::new_XVectorList_from_list_of_XVector(tmp_class, x) :
  all elements in 'x' must be DNAString objects**

do i need to convert my fasta files into DNAstring objects ? if yes how ? any help is very welcome

Cheers

Oussama

R alignment • 3.4k views
ADD COMMENT
1
Entering edit mode
8.2 years ago

Yes, you need to and here is an example on how to do it:

source("http://bioconductor.org/biocLite.R")
biocLite("Biostrings")
require(Biostrings)
dnastring = DNAString("TTGAAA-CTC-N")
ADD COMMENT
1
Entering edit mode

Hi

my forgeBSgenomeDataPkg("seed.file") executed without error, but while running R CMD check BSgenome.Aiptasia.CC7 I am facing following error:

* using log directory ‘/home/nawazk/KAUST_Projects/Aiptasia_epigenetic_analysis/results/UMR_identification/Aiptasia_CC7/BSgenome.Aiptasia.CC7.Rcheck’
* using R version 3.6.1 (2019-07-05)
* using platform: x86_64-pc-linux-gnu (64-bit)
* using session charset: UTF-8
* checking for file ‘BSgenome.Aiptasia.CC7/DESCRIPTION’ ... OK
* this is package ‘BSgenome.Aiptasia.CC7’ version ‘1.0.’
* checking package namespace information ... OK
* checking package dependencies ... OK
* checking if this is a source package ... OK
* checking if there is a namespace ... OK
* checking for executable files ... OK
* checking for hidden files and directories ... OK
* checking for portable file names ... OK
* checking for sufficient/correct file permissions ... OK
* checking whether package ‘BSgenome.Aiptasia.CC7’ can be installed ... ERROR
Installation failed.
See ‘/home/nawazk/KAUST_Projects/Aiptasia_epigenetic_analysis/results/UMR_identification/Aiptasia_CC7/BSgenome.Aiptasia.CC7.Rcheck/00install.out’ for details.
* DONE

Status: 1 ERROR
See
  ‘/home/nawazk/KAUST_Projects/Aiptasia_epigenetic_analysis/results/UMR_identification/Aiptasia_CC7/BSgenome.Aiptasia.CC7.Rcheck/00check.log’
for details.
ADD REPLY
0
Entering edit mode

Hello Vijay lakhujani thank you so much for your valuable response .

ADD REPLY
0
Entering edit mode

Please up vote the response if it really helped you. It will help others who might face the same issue.

ADD REPLY
0
Entering edit mode

can you please provide more details ??

ADD REPLY
0
Entering edit mode

Hi,

I get the same error as oussama badad even after I followed your suggestion of requiring the use of Biostrings... Do you have a further suggestion or idea of what might be wrong?

Thank you in advance!
R

ADD REPLY
0
Entering edit mode

Can you please share what you tried? What I gave was a generic example; please share the exact commands you tried.

ADD REPLY
1
Entering edit mode

Here is my seed file:

Package: BSgenome.Zmays.EnsemblPlants.AGPv4r32
Title: Zea mays (EnsemblPlants AGPv4 release 32)
Description: Zea mays full genome as provided by EnsemblPlants (AGPv4, release 32)
Version: 4.32
organism: Zea mays
common_name: maize
provider: EnsemblPlants
provider_version: 4.32
release_date: Aug. 2016
release_name: AGPv4
source_url: ftp://ftp.ensemblgenomes.org/pub/release-32/plants/fasta/zea_mays/dna/
organism_biocview: Zea_mays
BSgenomeObjname: Zmays
seqs_srcdir: ~/Documents/ref/AGPv4
seqfiles_prefix: Zea_mays.AGPv4.dna.chromosome.
seqfiles_suffix: .fa
seqnames: paste(c(1:10, paste(c(1:10), sep="")), sep="")

Then, I run:

library(BSgenome)
forgeBSgenomeDataPkg("path/to/my/seed")

...gives an error message of:

...
Loading '9' sequence from FASTA file '~/Documents/ref/AGPv4/Zea_mays.AGPv4.dna.chromosome.9.fa' ... DONE
Loading '10' sequence from FASTA file '~/Documents/ref/AGPv4/Zea_mays.AGPv4.dna.chromosome.10.fa' ... DONE
Error in XVector:::new_XVectorList_from_list_of_XVector(tmp_class, x) : 
  all elements in 'x' must be DNAString objects

So as you suggested, I tried:

source("http://bioconductor.org/biocLite.R")
biocLite("Biostrings")
require(Biostrings)

...which still gives me the same error message:

...
Loading '9' sequence from FASTA file '~/Documents/ref/AGPv4/Zea_mays.AGPv4.dna.chromosome.9.fa' ... DONE
Loading '10' sequence from FASTA file '~/Documents/ref/AGPv4/Zea_mays.AGPv4.dna.chromosome.10.fa' ... DONE
Error in XVector:::new_XVectorList_from_list_of_XVector(tmp_class, x) : 
  all elements in 'x' must be DNAString objects

Thanks again!

ADD REPLY
0
Entering edit mode

Here is a slight update on my situation.

I haven't really solved my problem yet, but I found a work around (namely: use-another-computer).
Apparently, if I use newer versions of the packages, I get into the error mentioned above. I also realised that it never saves seqlengths or compressed data. The versions were:

R version 3.2.2 (2015-08-14) -- "Fire Safety"
BSgenome_1.38.0
Biostrings_2.38.4

But if I use:

R version 3.0.3 (2014-03-06) -- "Warm Puppy"
BSgenome_1.30.0
Biostrings_2.30.1

Then, it runs without an error message.
However, when I check if I can install the final package with

R CMD check BSgenome.Zmays.EnsemblPlants.AGPv4r32_4.32.tar.gz

Now, I get

* installing *source* package 'BSgenome.Zmays.EnsemblPlants.AGPv4r32' ...
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
Error : .onLoad failed in loadNamespace() for 'BSgenome.Zmays.EnsemblPlants.AGPv4r32', details:
  call: .normargSeqnames(seqnames)
  error: supplied 'seqnames' cannot contain duplicated sequence names
Error: loading failed
Execution halted

I would appreciate if someone can point out what I need to change or what I should try.
Thanks!

ADD REPLY
0
Entering edit mode

Hello again, I received a reply on Bioconductor, and changing the seqname line in the seed file to:

seqnames: paste0(1:10)

solved my problem.

ADD REPLY

Login before adding your answer.

Traffic: 2303 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6