Hi all,
This is my first-question message in the forum, so I hope it's in the right place. I've had a look to related threads with a similar topic, but I couldn't find another one with this issue, what was helpful to me.
I have a problem using Bowtie2 and Samtools. I've assembled paired-end reads usign Trinity. To get longer and/or more complete contigs, I've mapped the reads against these contigs using Bowtie2.
The problem comes when I convert this sam file in a fastq file. I use this command line (seen in http://samtools.sourceforge.net/mpileup.shtml):
samtools mpileup -uf ref.fa aln.bam | bcftools view -cg - | vcfutils.pl vcf2fq > cns.fq
I convert from fastq to fasta using:
seqtk fq2fa file.fastq > file.fasta
And too many of my transcripts have N's. There are transcripts without any, transcripts full on N's and transcripts with many N's along the sequence. So when I try to convert them into protein sequences, I get sequences full of X's.
I guess the reason is bcftools and vcfutils detect all the variant callings and they cannot decide which base is the right one.
How can I say them to select the most frequent base in each case, since I don't want to get variant calls? If there is another approach, like not using bcftools or vcfutils, or whatever (I can't imagine other options...) is welcome.
The version of the software I'm using is:
- bowtie2 -> BOWTIE/2.2.6
- samtools -> SAMTOOLS/0.1.18
I hope the problem is well explained and thanks in advance,
Samu
unrelated: your version of samtools is old
I know, but I work in a cluster without administrator permission. Sometimes it's quite difficult to be updated.
That's what the home directory is for :)
you don't need to be administrator to install samtools in your home.
I know, there is no excuses for that.
Try to create a new folder for your reference (ref.fa in samtools command) put it there alone, index it there and perform your commands again with new reference file - so the only thing changed is the destination of reference file. This might help
So, how could that output a different result, since all the files are the same? I'll try, I'm just trying to understand.
I'll update you. Many thanks,
Samu
There is a bug with several version of samtools (don't know which exactly). It results in improper mpileup (reference is not recognised properly and all or major part is substituted with N). Don't know why it helps, so I'm not sure whether it will help you or no - may be you faced another error, but it is not so hard to test.