Entering edit mode
6.2 years ago
c.clarido
▴
110
Hello community,
Below is a fragment of my VCF file:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT aln.sorted.bam
11561 6 . C X 0 . DP=1;I16=1,0,0,0,38,1444,0,0,6,36,0,0,0,0,0,0;QS=1.000000,0.000000,0.000000,0.000000 PL 0,3,6
11561 7 . C X 0 . DP=1;I16=1,0,0,0,38,1444,0,0,6,36,0,0,1,1,0,0;QS=1.000000,0.000000,0.000000,0.000000 PL 0,3,6
11561 8 . A X 0 . DP=1;I16=1,0,0,0,38,1444,0,0,6,36,0,0,2,4,0,0;QS=1.000000,0.000000,0.000000,0.000000 PL 0,3,6
11561 19 . G A,X 0 . DP=2;I16=1,0,1,0,38,1444,38,1444,6,36,6,36,7,49,13,169;QS=0.500000,0.500000,0.000000,0.000000;RPB=9.668049e-01 PL 1,0,1,4,4,4
11561 337 . ca cAa 0 . INDEL;IS=1,0.200000;DP=5;I16=1,1,1,0,68,2312,40,1600,2,2,15,225,35,697,6,36;QS=0.347826,0.652174,0.000000,0.000000;VDB=6.365454e-02 PL 6,0,0
11561 375 . taaaaaaaa taaaaaaa 0 . INDEL;IS=1,0.200000;DP=5;I16=0,0,1,0,0,0,18,324,0,0,6,36,0,0,25,625;QS=0.000000,1.000000,0.000000,0.000000 PL 6,3,0
11561 390 . A T,G,X 0 . DP=4;I16=0,0,2,1,0,0,114,4332,0,0,8,38,0,0,75,1875;QS=0.000000,0.571429,0.428571,0.000000;VDB=5.227153e-02 PL 5,6,0,5,2,2,5,6,5,5
11561 740 . AC ACC 0 . INDEL;IS=4,0.102564;DP=39;I16=20,7,2,1,741,22083,49,819,457,12337,5,13,524,11426,48,866;QS=0.970732,0.029268,0.000000,0.000000;VDB=5.891593e-02 PL 0,54,149
I need to find the SNPs and INDELs, however I don't quiet understand the X under the ALT. Does it mean X-chomosome? Does the following mean that there is a SNP from G to A on chromosome X? How do I know that significance of this result? Is it the QS=0.500000 or the QUAL=0? Is there anything I need to know when collecting the SNPs and INDELs?
11561 19 . G A,X 0 . DP=2;I16=1,0,1,0,38,1444,38,1444,6,36,6,36,7,49,13,169;QS=0.500000,0.500000,0.000000,0.000000;RPB=9.668049e-01 PL 1,0,1,4,4,4
Thank you in advance!
Hello,
the fragment you are showing is not a vcf. It is the output of samtools/bcftools mpileup. You need to do a
bcftools call
on it. See Devon's answer here.fin swimmer
Hello thank you for the link, however I did the following:
I am using samtools version 0.1.19. I thought mpileup should look like this:
Hello cedrickclarido ,
samtools 0.1.19 is very, very old. Current version is 1.9. I strongly recommend to update. (The easiest way is to use conda.) In the current version
samtools mpileup
is deprecated and replaced bybcftools mpileup
.Beside this: In your code you never use
aln.mpileup
. Instead you do a secondmpileup
and forced the output to be a uncompressedbcf
format. In the newer versions this is the standard (I believe the old output format your are showing isn't used anymore.) Yourbcftools view
command is then just converting thebcf
version of thempileup
result tovcf
. You are missing thebcftools call
step.fin swimmer
Hello, indeed the version is really old. However, as student I am limited to this version as I am working in a school's server where the tool is pre-installed. I can't seems to find any good manuals or documentations to convert pileup to bcf and bcf to vcf. As shown from my previous comment:
I used the first code to generate a PILEUP-file. I found online that without using the flag -u or -g, the command would generate a PILEUP file, otherwise it would create a BCF. According to the manual of version 0.1.19, the bcf file can be converted by using view option, where the input is on default BCF and output is VCF. Did I perhaps misused the command? The project also assigned us to convert the PILEUP to bcf using bcftools, however I can't find any commands that fits the criteria other than reusing the samtools mpileup again but without the -u/-g flags.
Thank you.
You can always request for the latest software to be installed on the school server. In all probability, it's already installed and has a mechanism to be loaded (such as
module load
) or the admin will install it in a couple of minutes. Working with outdated software puts you at a significant disadvantage.I think your problem is understanding what is mpileup for. I will try to explain it:
Using samtools/bcftools for variant discovering is a 2-step process.
In the first step - called mpileup - samtools will generate information about each single position of the reference sequence. This contains how many reads cover this site, if the base on the position matches the reference and what's the quality for each base.
Based on this information
bcftools call
decide in a second step on which position it will have a closer look if there is a variant. To decide if there is a variant more critera than only the number of reads that support the variant and the base quality are needed. For example the mapping quality, the position within the read, compare those metrics between read that support the reference and those supporting the variant, ...So in a very basic pipeline your command should look like this:
In the current version of samtools/bcftools like this:
fin swimmer
I'm not sure that's the right samtools command line. Don't you need -u? I'm pretty sure the regular eyeball-friendly pileup format is not what bcftools has ever wanted as input.
Yes, you're right. Fixed it.
It worked! thank you community!
Thank you the explanation sir, I tried the following and seems to get errors all the time:
Hello cedrickclarido ,
again: Update the version!
I edited my commands as I missed the
view
in thebcftools
command. And as swbarnes2 says the output ofmpileup
must be in abcf
format.fin swimmer
I don't think it is easy to help when we don't have access to the extensive bcftools documentation for the version you're using. There might be changes in arguments between the old
bcftools view
caller and the currentbcftools call
caller that cannot be easily mapped. Please upgrade to the latest version. I don't think you'll need any special permissions to compile it from source.