Samtools consensus sequence
1
2
Entering edit mode
8.8 years ago
Sissi ▴ 60

Hi guys!

It's probably the same question over and over but I can't find the solutions.

I mapped some contigs on a Plant reference genome with BWA, sorted the file with samtools and got the coverage by typing:

samtools depth my.bam > qry-depth>
wc -l qry-depth

The results of this command / sequence legth * 100 to have the % genome covered.

Then, I wanted to extract the consensus sequence from the bam file:

samtools mpileup -uf reference.fasta my.bam | bcftools call -mv -Oz -o calls.vcf.gz>

tabix calls.vcf.gz> cat reference.fasta | bcftools consensus calls.vcf.gz > con.fasta

And here I am. Now I have a problem and a general question.

  1. For two files I got this error message after the first mpileup commad:

    The QS annotation not present at gi|33590464|gb|AY244516.1|:2123 (NCBI header)
    

    So, I can't get the consensus.

  2. For other samples, I know the contigs don't cover the entire genome, so how does sometools export the consensus? By keeping in the not covered region the reference sequence? Is there a way to visualize this?

I hope I'm clear and not repetitive,

Thanks for your help :)

snp sequence • 9.6k views
ADD COMMENT
0
Entering edit mode

Thank you!

Yes, I realized that I changed the fasta file after mapping and that was the mistake ;)

ADD REPLY
4
Entering edit mode
8.8 years ago
stolarek.ir ▴ 700
samtools mpileup -uf dna.fa sorted.bam | bcftools view -cg - | perl vcfutils.pl vcf2fq -d 2 > cns.fq

samtools 1.9, bcftools 1.9. Make sure you use the same reference fasta to which you mapped your BAM.

Not covered positions represented as N with this pipeline

ADD COMMENT
0
Entering edit mode

But if the reads that have covered the gaps region in the reference genome, how can it output the consensus sequence that replace the gap with the covered reads? Thanks!

ADD REPLY
0
Entering edit mode

I absolutely don't understand your question, but you might be asking how it reports consensus sequeunce where there is no coverage. It reports it based on the reference file, and it is a downside of the pipeline. In the next step you should mask the output and replace those 0 coverage positions with N

ADD REPLY

Login before adding your answer.

Traffic: 1652 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6