Installing programs for coverage analysis

Question

Trouble with breseq Bam2Cov

0

Entering edit mode

18 months ago

Freddie • 0

Hi All,

and thank you for taking the time to try and help.

I am using this pipeline to calculate the coverage of my sequencing across:

Installing programs for coverage analysis

You will need:

bowtie2 (I used version 2.51) download here: https://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#obtaining-bowtie-2
samtools (I used version 1.18) download here: http://www.htslib.org/download/
bcftools (I used version 1.18) download here: http://www.htslib.org/download/
htslib (I used version 1.18) download here: http://www.htslib.org/download/

Example install, how all of the above are installed:

./configure --prefix=/Path/to/new/folder I just called put in ./configure --prefix=/Users/phr361/tool
cd tools_folder
make
sudo make install
export PATH=/Path/to/new/folder/bin:$PATH

breseq (I used version x.xx) download here: https://github.com/barricklab/breseq/releases works straight from the download also add it to your path:

export PATH=/Path/to/breseq/bin:$PATH

Raw sequence reads to coverage data

Set a variable for bowtie2 home directory, for me it was: $BT2_HOME=/Users/phr361/bowtie2-2.5.1-macos-arm64

$BT2_HOME/bowtie2-build $BT2_HOME/pUC19/Results/consensus.fasta pUC19 take the fasta from your sequencing data and give it a name, in this case I used the name pUC19, cause well... that's what its called.
$BT2_HOME/bowtie2 --end-to-end --very-sensitive -x pUC19 -U ./pUC19/Results/rawdata/rawreads.fq -S $BT2_HOME/pUC19/Results/result.sam

n.b.

-x to the file you set up in 1.
-U to the fastq file for the raw reads.
it doesn't like .fastq files, just change the ending.
-S where to save the .sam file
--end-to-end is the type of alignment.
--very-sensitive is another flag.

these settings were taken from these papers:

samtools view -bS $BT2_HOME/pUC19/Results/result.sam > $BT2_HOME/pUC19/Results/result.bam converts the .sam file to a .bam file, this is in binary, and compresses the data.
samtools sort $BT2_HOME/pUC19/Results/result.bam -o $BT2_HOME/pUC19/Results/result.sorted.bam sorts the binary file
samtools index ./pUC19/Results/result.sorted.bam ./pUC19/Results/result.sorted.bam.bai create an index file, necessary for the next step. Make sure you save it to the same location / folder.
breseq BAM2COV -b ./pUC19/Results/result.sorted.bam -f ./pUC19/Results/consensus.fasta pUC19 -r contig_1:1-2673 --table --resolution 0

I'm not entirely sure how the -r flag works, but basically use the name in the fasta file, and then 1:the end of your desired range, for me it was the entire sequence.
--table saves it at a table separated file

My question(s)

I have used the above pipeline before for the rawreads of the sequencing of a short plasmid (pUC19) but now I'm working with a larger genome (roughly 50 kbp) and when I try to run the find command: breseq BAM2COV I get this error:

breseq BAM2COV -b ./lambda_phage/results/24/24_0.sorted.bam -f ./lambda_phage/lambda.fasta lambda -r lambdaX:1-50000 --table --resolution 0
================================================================================
breseq 0.38.1  revision 0ec86d234830   http://barricklab.org/breseq

Active Developers: Barrick JE, Deatherage DE
Contact:           <jeffrey.e.barrick@gmail.com>

breseq is free software; you can redistribute it and/or modify it under the
terms the GNU General Public License as published by the Free Software 
Foundation; either version 2, or (at your option) any later version.

Copyright (c) 2008-2010 Michigan State University
Copyright (c) 2011-2022 The University of Texas at Austin

If you use breseq in your research, please cite:

 Deatherage, D.E., Barrick, J.E. (2014) Identification of mutations
  in laboratory-evolved microbes from next-generation sequencing
 data using breseq. Methods Mol. Biol. 1151: 165–188.

 If you use structural variation (junction) predictions, please cite:

  Barrick, J.E., Colburn, G., Deatherage D.E., Traverse, C.C.,
  Strand, M.D., Borges, J.J., Knoester, D.B., Reba, A., Meyer, A.G. 
  (2014) Identifying structural variation in haploid microbial genomes 
  from short-read resequencing data using breseq. BMC Genomics 15:1039.
================================================================================
COMMAND: BAM2COV
+++   Tabulating coverage...
[fai_fetch] Warning - Reference 0 not found in FASTA file, returning empty sequence
Assertion failed: (m_seq), function reference_sequence, file pileup_base.cpp, line 42.
zsh: abort      breseq BAM2COV -b ./lambda_phage/results/24/24_0.sorted.bam -f  lambda -r

I think its something to do with my .fasta file: which looks like:

>
GGGCGGCGACCTCGCGGGTTTTCGCTATTTATGAAAATTTTCCGGTTTAAGGCGTTTCCGTTCTTCTTCG
TCATAACTTAATGTTTTTATTTAAAATACCCTCTGAAAAGAAAGGAAACGACAGGTGCTGAAAGCGAGGC
TTTTTGGCCTCTGTCGTTTCCTTTCTCTGTTTTTGTCCGTGGAATGAACAATGGAAGTCAACAAAAAGCA
GCTGGCTGACATTTTCGGTGCGAGTATCCGTACCATTCAGAACTGGCAGGAACAGGGAATGCCCGTTCTG
CGAGGCGGTGGCAAGGGTAATGAGGTGCTTTATGACTCTGCCGCCGTCATAAAATGGTATGCCGAAAGGG... and so on.

Please let me know if you can see any errors, I really appreciate it. Or if you can suggest any other ways to get the coverage analysis I'm looking for.

Best wishes,

Freddie

bamtools bam2cov breseq • 861 views

ADD COMMENT • link 18 months ago by Freddie • 0

score 1 · Answer 1 · 2023-08-28

If you are only looking to get coverage then use mosdepth (LINK).

I think its something to do with my .fasta file

Correct. Fasta file requires a header (it can be anything but it can't be a blank line following > like you show above)

So change

>
GGGCGGCGACCTCGCGGGTTTTCGCTATTTATGAAAATTTTCCGGTTTAAGGCGTTTCCGTTCTTCTTCG
TCATAACTTAATGTTTTTATTTAAAATACCCTCTGAAAAGAAAGGAAACGACAGGTGCTGAAAGCGAGGC
TTTTTGGCCTCTGTCGTTTCCTTTCTCTGTTTTTGTCCGTGGAATGAACAATGGAAGTCAACAAAAAGCA
GCTGGCTGACATTTTCGGTGCGAGTATCCGTACCATTCAGAACTGGCAGGAACAGGGAATGCCCGTTCTG

to

>My_sequence (or anything else)
GGGCGGCGACCTCGCGGGTTTTCGCTATTTATGAAAATTTTCCGGTTTAAGGCGTTTCCGTTCTTCTTCG
TCATAACTTAATGTTTTTATTTAAAATACCCTCTGAAAAGAAAGGAAACGACAGGTGCTGAAAGCGAGGC
TTTTTGGCCTCTGTCGTTTCCTTTCTCTGTTTTTGTCCGTGGAATGAACAATGGAAGTCAACAAAAAGCA
GCTGGCTGACATTTTCGGTGCGAGTATCCGTACCATTCAGAACTGGCAGGAACAGGGAATGCCCGTTCTG

If you used this reference file to create indexes for bowtie2 and align the data, then you may end up having to redo the analysis. With an empty fasta header many of the tools downstream will not work right.

score 1 · Answer 2 · 2023-08-28

You are missing ACCESSION i.e. fasta header. You can change this

>
GGGCGGCGACCTCGCGGGTTTTCGCTATTTATGAAAATTTTCCGGTTTAAGGCGTTTCCGTTCTTCTTCG
TCATAACTTAATGTTTTTATTTAAAATACCCTCTGAAAAGAAAGGAAACGACAGGTGCTGAAAGCGAGGC
TTTTTGGCCTCTGTCGTTTCCTTTCTCTGTTTTTGTCCGTGGAATGAACAATGGAAGTCAACAAAAAGCA
GCTGGCTGACATTTTCGGTGCGAGTATCCGTACCATTCAGAACTGGCAGGAACAGGGAATGCCCGTTCTG
CGAGGCGGTGGCAAGGGTAATGAGGTGCTTTATGACTCTGCCGCCGTCATAAAATGGTATGCCGAAAGGG... and so on.

into this-

>lambdaX
GGGCGGCGACCTCGCGGGTTTTCGCTATTTATGAAAATTTTCCGGTTTAAGGCGTTTCCGTTCTTCTTCG
TCATAACTTAATGTTTTTATTTAAAATACCCTCTGAAAAGAAAGGAAACGACAGGTGCTGAAAGCGAGGC
TTTTTGGCCTCTGTCGTTTCCTTTCTCTGTTTTTGTCCGTGGAATGAACAATGGAAGTCAACAAAAAGCA
GCTGGCTGACATTTTCGGTGCGAGTATCCGTACCATTCAGAACTGGCAGGAACAGGGAATGCCCGTTCTG
CGAGGCGGTGGCAAGGGTAATGAGGTGCTTTATGACTCTGCCGCCGTCATAAAATGGTATGCCGAAAGGG... and so on.