Dear all, I am trying to do read mapping to construct cp genomes using a reference genome and performing the mapping with bwa-mem.
I am running this command bwa mem -M -B 4 $REF $R1 $R2 > output.bam
-M to mark shorter split as secondary and -B 4 for mismatch penalty. I have two questions:
1)after I have run bwa and my reference genome fasta file is 160kb and and my fastq files are 2.70 GB should I get a bam file of 5 GB or should it be smaller?
2) when I try to convert the bam file into a fasta using samtools is there a way to have the reference genome on the top and then the reads? how should the fasta file look like in order to then use it to annotate the genomes for example in Geneious?
I hope my question are not too vague or basic. thanks a lot
The size of the bam depends on how many reads mapped to the reference.
Also, be aware that bwa mem outputs a SAM file by default, not a BAM.
That is not true. All reads will be at the SAM/BAM file, they simply will be flagged as unmapped. Still, default (and only) output of BWA mem is SAM not BAM. Simply count reads in one of the fastq files (
wc -l fastq.fastq
), divide this by 4, and compare this withsamtools flagstat output.sam
thanks for your answer! I guess my main issue is that after obtaining the output.sam files of my series of reads.fastq I would like to end with a fasta multi alignment file containing the constructed cp genomes and I have looked for a pipeline to do that but with no useful solutions. so I after performing the bwa-mem I can't picture the further steps.
Once you obtain the alignments (do the SAM/BAM conversion etc.) you will need to call variants and then generate the consensus. This process is described on this page.
That said if you have access to Geneious licenses then why don't you do this entire thing in Geneious. Normally people won't recommend commercial products but if you are new to command line it may be much easier to get this done in Geneious.
hi I am doing the sorting of the bam file through:
but I keep getting this output:
?BC?EN? ?0???~b????i???
??%?:f?R? A??O6{??p??8?&?>?
S?Sl?r??ԭ??X?~?0?v????/?u?_96???BCB=?}?d?uW?xw]?K?jʗ?o???뭊g???RTO?]?3f?t??C??c'?(xE?'$??tHA???D?????? EB|#"!?#AP??E> 8?s????V??[]?w?^????w?s?_~?^??^?????Q?7??????hDe6??r???G?gg??:??uP?ʸLfQ??I?G_5????B?wF?_??jr???|????5?,????x?o??V????z?Z-????w?????xy???͏?????uW??G9??q|OM[/_??>??n??.?????G???'?0???^???/??7??s?>y??W?x|?fI???>?
Qi? N??a??A?????´ ??}??|? z??w4???=??Fa9 ?(???4L{o??? ??w?4?'g?&L??B??P??C?du??P?G[???D???:e????q?2u.5u?@?o????qףu???R?;I?G?%? ????f?+?)?S?qp?^:??8?ԡwj?}??|GM]?-:dS?)N?????#??@T)k?$|d~????O[?~?????u?5?gi??"?^b?? *?M?Dy@?fT???9>7G???=?pr?H?3?n ??? S?O?÷x?C?2???>A????F?<l r????5y?%??4zr??="" ?^??????,??t?gyth??????jЯe?="" ???="" anybody="" knows="" why?="" i="" though="" you="" need="" to="" sort="" before="" doing="" the="" variant="" calling="" and="" getting="" the="" consensus="" sequence="" but="" this="" look="" really="" weird.<="" p="">Of course. BAM is a binary (compressed) version of the SAM format. The default output of BWA is SAM.
Or more elegant:
Right! I run the bwa mem and I got the SAM file, then I thought to convert it in BAM before sorting, thanks a lot!
Also, if you wish to visualise the content of a BAM file you can use "samtools view out.bam"
about variant calling, after sorting I followed the pipeline of the page you gave me and I get a (consensus sequence with the reference genome header, don't know why) and when in the end I run :
I get this messages for different sites The site MG564755.1:122 overlaps with another variant, skipping...do you know whether is it an error and if yes what kind of error might be?
Yes, you're right (I was, wrongly, thinking about STAR, which produces by default a sam which contains only the aligned read)
Really, STAR does that? Ok good to know, that means I will never use aligned BAM files from STAR to get back the original reads in fastq format. Thanks for letting me know :-D
With STAR you can use the "--outReadsUnmapped" option to get the unmapped reads in a separate bam. I guess that you can get the original fastq by using both the "Aligned.out.sam" and the "unmapped.bam"