Bismark on sample with 2 genomes
1
1
Entering edit mode
7.7 years ago
mia ▴ 70

Hi,

I have a Bisulfite converted sample that includes 2 genomes. One is wheat and other is fungi, for which I have separate assembled genomes. I am wondering how to deal with this of data. Also, to provide more information, the wheat assembled genome is not Bisulfite converted. Can you use Bismark to align againest and unconverted geonme?

Thank you in advance for all your help, Mia

next-gen software error Bismark • 2.7k views
ADD COMMENT
0
Entering edit mode

To my knowledge Bismark will convert the genome to a bisulphite for you when you build the index. It has the option to do this for both strands of the genome (5' -> 3' and 3' -> 5'), which i'm told is only needed if your sequencing was generated using a non-strand-specific protocol. Else the standard 5' -> 3' or "sense" conversion is all that is needed. If that is the case, I highly recommend you do not use Bismark as it is slow and in my hands prone to errors. BWA-meth only does the sense-strand conversion, but is significantly faster than Bismark and produces "cleaner" output files. I'd personally recommend BWA-meth over Bismark any day.

Also, I recently attended a workshop on WGBS in Freiburg where it was mentioned that some special considerations are required for plants. I do not work with plants, so I forgot what those special considerations were, but hopefully the man who taught me can teach you when he sees this post :) It might have only been relevant for the analysis of the data, not the mapping... mrgh. Sorry i can't be of more help :-/

ADD REPLY
1
Entering edit mode

John,

I really appreciate the information you provided and I will look into BWA-meth. Also, can you provide me with the name of the workshop in Freiburg?

Thanks, Mia

ADD REPLY
1
Entering edit mode

Hi John,

I have installed bwa-meth. However, when I am trying to align the samples to I get the following error:

[M::process] read 1269842 sequences (160000092 bp)... 
[M::process] 0 single-end sequences; 1269842 paired-end sequences 
[M::process] read 1269842 sequences (160000092 bp)... 
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 2790, 0, 0) 
[M::mem_pestat] skip orientation FF as there are not enough pairs 
[M::mem_pestat] analyzing insert size distribution for orientation FR... 
[M::mem_pestat] (25, 50, 75) percentile: (31, 37, 45) [M::mem_pestat] low and high boundaries for computing mean and std.dev: (3, 73) 
[M::mem_pestat] mean and std.dev: (37.20, 9.60) 
[M::mem_pestat] low and high boundaries for proper pairs: (1, 87) 
[M::mem_pestat] skip orientation RF as there are not enough pairs 
[M::mem_pestat] skip orientation RR as there are not enough pairs 
[M::mem_process_seqs] Processed 1269842 reads in 4401.036 CPU sec, 276.901 real sec 
Traceback (most recent call last):
 File "/usr/local/bin/bwameth.py", line 4, in import('pkg_resources').
run_script('bwameth==0.10', 'bwameth.py') 
File "/usr/local/lib/python2.7/dist-packages/pkg_resources/init.py", line 738, in run_script self.require(requires)[0].run_script(script_name, ns) 
File "/usr/local/lib/python2.7/dist-packages/pkg_resources/init.py", line 1499, in run_script exec(code, namespace, namespace) 
File "/usr/local/lib/python2.7/dist-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 601, in main(sys.argv[1:]) 
File "/usr/local/lib/python2.7/dist-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 586, in main set_as_failed=args.set_as_failed) 
File "/usr/local/lib/python2.7/dist-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 259, in bwa_mem as_bam(cmd, fa, prefix, calmd, set_as_failed) 
File "/usr/local/lib/python2.7/dist-packages/bwameth-0.10-py2.7.egg/EGG-INFO/scripts/bwameth.py", line 299, in as_bam out.write(str(aln) + '\n') 
IOError: [Errno 32] Broken pipe

I am wondering if you have had this issue before?

Thanks, Mia

ADD REPLY
0
Entering edit mode

Hmm.. I haven't and i'm not totally sure what it means. I've looked at the latest BWAmeth code on github and it doesn't have this specific line of code. It's similar but different. Instead of out.write it's now sys.stdout.write. Perhaps this was changed to fix the problem you are having? ~O~

What version of BWAmeth are you using?

ADD REPLY
1
Entering edit mode

John,

I downloaded the bwa-meth.0.10 according to the install instructions on the github site.

ADD REPLY
0
Entering edit mode

ah, apparently that hasn't been updated since 2014 :/ In Sep 13, 2016 a new version came out, 0.2.0.

The toolshed part is still OK, but the part where you "wget https://github.com/brentp/bwa-meth/archive/v0.10.tar.gz", instead of doing that just click the green "Clone or Download" button and you'll get the latest tar.gz file, whatever that's called, and the rest of the installation instructions should be the same, but apply it to the new file you download from github

ADD REPLY
1
Entering edit mode

John,

Downloading and installing the new version did the trick. Thanks a lot for your help.

Mia

ADD REPLY
0
Entering edit mode

I highly recommend you do not use Bismark as it is slow and in my hands prone to errors.

Interesting - in my hands I have found the opposite. BWA-meth seems to be much more permissive with its alignments than Bismark, which has caused all kinds of weird post-alignment artefacts for me in the past. In contrast I've found Bismark to be more reliable on the whole (though the underlying Bowtie2 aligner is slower than BWA). What kind of errors have you had?

[bwa-meth] produces "cleaner" output files.

Could you elaborate on this? I thought that they produced pretty similar output..

Disclaimer: I used to work with the Bismark author ;) But I am genuinely curious as I've been playing with both tools recently..

ADD REPLY
2
Entering edit mode
7.6 years ago
Phil Ewels ★ 1.4k

Bismark comes with a tool to generate bisulfite converted genomes which you need to use before running it - bismark_genome_preparation (see the docs). This creates bisulfite converted versions of your reference genome.

I would suggest aligning your raw reads against both genomes. If you're worried about cross-species mapping you can compare the BAM files are remove any reads that aligned against both species to avoid artefacts by using read identifiers.

Phil

ADD COMMENT

Login before adding your answer.

Traffic: 2521 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6