Hi All,
and thank you for taking the time to try and help.
I am using this pipeline to calculate the coverage of my sequencing across:
Installing programs for coverage analysis
You will need:
- bowtie2 (I used version 2.51) download here: https://bowtie-bio.sourceforge.net/bowtie2/manual.shtml#obtaining-bowtie-2
- samtools (I used version 1.18) download here: http://www.htslib.org/download/
- bcftools (I used version 1.18) download here: http://www.htslib.org/download/
- htslib (I used version 1.18) download here: http://www.htslib.org/download/
Example install, how all of the above are installed:
./configure --prefix=/Path/to/new/folder
I just called put in./configure --prefix=/Users/phr361/tool
cd tools_folder
make
sudo make install
export PATH=/Path/to/new/folder/bin:$PATH
- breseq (I used version x.xx) download here: https://github.com/barricklab/breseq/releases works straight from the download also add it to your path:
export PATH=/Path/to/breseq/bin:$PATH
Raw sequence reads to coverage data
Set a variable for bowtie2 home directory, for me it was: $BT2_HOME=/Users/phr361/bowtie2-2.5.1-macos-arm64
$BT2_HOME/bowtie2-build $BT2_HOME/pUC19/Results/consensus.fasta pUC19
take the fasta from your sequencing data and give it a name, in this case I used the name pUC19, cause well... that's what its called.$BT2_HOME/bowtie2 --end-to-end --very-sensitive -x pUC19 -U ./pUC19/Results/rawdata/rawreads.fq -S $BT2_HOME/pUC19/Results/result.sam
n.b.
- -x to the file you set up in 1.
- -U to the fastq file for the raw reads.
- it doesn't like .fastq files, just change the ending.
- -S where to save the .sam file
- --end-to-end is the type of alignment.
- --very-sensitive is another flag.
these settings were taken from these papers:
- https://academic.oup.com/nar/article/50/15/8401/6513576
- https://academic.oup.com/nar/article/48/1/249/5610345
samtools view -bS $BT2_HOME/pUC19/Results/result.sam > $BT2_HOME/pUC19/Results/result.bam
converts the .sam file to a .bam file, this is in binary, and compresses the data.samtools sort $BT2_HOME/pUC19/Results/result.bam -o $BT2_HOME/pUC19/Results/result.sorted.bam
sorts the binary filesamtools index ./pUC19/Results/result.sorted.bam ./pUC19/Results/result.sorted.bam.bai
create an index file, necessary for the next step. Make sure you save it to the same location / folder.breseq BAM2COV -b ./pUC19/Results/result.sorted.bam -f ./pUC19/Results/consensus.fasta pUC19 -r contig_1:1-2673 --table --resolution 0
- I'm not entirely sure how the -r flag works, but basically use the name in the fasta file, and then 1:the end of your desired range, for me it was the entire sequence.
- --table saves it at a table separated file
My question(s)
I have used the above pipeline before for the rawreads of the sequencing of a short plasmid (pUC19) but now I'm working with a larger genome (roughly 50 kbp) and when I try to run the find command: breseq BAM2COV I get this error:
breseq BAM2COV -b ./lambda_phage/results/24/24_0.sorted.bam -f ./lambda_phage/lambda.fasta lambda -r lambdaX:1-50000 --table --resolution 0
================================================================================
breseq 0.38.1 revision 0ec86d234830 http://barricklab.org/breseq
Active Developers: Barrick JE, Deatherage DE
Contact: <jeffrey.e.barrick@gmail.com>
breseq is free software; you can redistribute it and/or modify it under the
terms the GNU General Public License as published by the Free Software
Foundation; either version 2, or (at your option) any later version.
Copyright (c) 2008-2010 Michigan State University
Copyright (c) 2011-2022 The University of Texas at Austin
If you use breseq in your research, please cite:
Deatherage, D.E., Barrick, J.E. (2014) Identification of mutations
in laboratory-evolved microbes from next-generation sequencing
data using breseq. Methods Mol. Biol. 1151: 165–188.
If you use structural variation (junction) predictions, please cite:
Barrick, J.E., Colburn, G., Deatherage D.E., Traverse, C.C.,
Strand, M.D., Borges, J.J., Knoester, D.B., Reba, A., Meyer, A.G.
(2014) Identifying structural variation in haploid microbial genomes
from short-read resequencing data using breseq. BMC Genomics 15:1039.
================================================================================
COMMAND: BAM2COV
+++ Tabulating coverage...
[fai_fetch] Warning - Reference 0 not found in FASTA file, returning empty sequence
Assertion failed: (m_seq), function reference_sequence, file pileup_base.cpp, line 42.
zsh: abort breseq BAM2COV -b ./lambda_phage/results/24/24_0.sorted.bam -f lambda -r
I think its something to do with my .fasta file: which looks like:
>
GGGCGGCGACCTCGCGGGTTTTCGCTATTTATGAAAATTTTCCGGTTTAAGGCGTTTCCGTTCTTCTTCG
TCATAACTTAATGTTTTTATTTAAAATACCCTCTGAAAAGAAAGGAAACGACAGGTGCTGAAAGCGAGGC
TTTTTGGCCTCTGTCGTTTCCTTTCTCTGTTTTTGTCCGTGGAATGAACAATGGAAGTCAACAAAAAGCA
GCTGGCTGACATTTTCGGTGCGAGTATCCGTACCATTCAGAACTGGCAGGAACAGGGAATGCCCGTTCTG
CGAGGCGGTGGCAAGGGTAATGAGGTGCTTTATGACTCTGCCGCCGTCATAAAATGGTATGCCGAAAGGG... and so on.
Please let me know if you can see any errors, I really appreciate it. Or if you can suggest any other ways to get the coverage analysis I'm looking for.
Best wishes,
Freddie
Thank you! I thought I had tried it, clearly I hadn't properly.
All is working now.