large bam file and get small mpileup file
1
0
Entering edit mode
9.6 years ago
bingnas ▴ 10

Hello all

I have file.bam around 4.6 GB and converted it to file.pileup around 8.3 GB by samtools:

samtools mpileup -B -f genome.fa file.bam > file.pileup

then I used VarScan to call variants like:

java -Xmx2g -jar $VARSCAN_DIR/VarScan.v2.3.7.jar mpileup2snp file.pileup --min-coverage 10 --min-base-qual 30 --output-vcf 1 > sample1.vcf

But the sample.vcf is too small which is 11,450 KB

So anyone know how I can make sure that the bam file is acceptable to get pileup file?

and also how I can know that pileup file is good input in VarScan?

Thank you in advance for your help

SNP • 3.4k views
ADD COMMENT
0
Entering edit mode

Now I am satisfied about the result and thank you so much both

ADD REPLY
0
Entering edit mode

could also be that most bases are below the filter (--min-coverage 10 and Fred score 30) as pointed by Devon above...just guessing from filters. Did you run QC?

ADD REPLY
1
Entering edit mode
9.6 years ago

The file sizes look reasonable to me. The size of the pileup file is typically significantly bigger than the BAM file, and the size of the vcf file is way smaller than the both bam and the pileup file.I don't see any evident problem here. Perhaps you are using too stringent threshold of 10 reads to call SNPs and as a result not getting many variants.

ADD COMMENT
0
Entering edit mode

I agree completely. I'll add that another possibility is that there simply aren't many variants versus the reference in the sample being looked at. The simplest way to determine all of this is just look through the data a bit.

ADD REPLY

Login before adding your answer.

Traffic: 1988 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6