Entering edit mode
7 months ago
RT
•
0
Hello, I am working to reproduce variant calling and detection part of a paper.
I have asked chatgpt about it and it gave me
Step 1: Generate Pileup File
samtools mpileup -uf reference.fa alignment.bam > output.pileup
Step 2: Call Variants
bcftools call -mv -Ov output.pileup > variants.vcf
Step 3: Filter Variants
bcftools filter -i 'QUAL > 50 && INFO/MQ > 30 && FMT/DP > 3 && FMT/GQ > 50 && (FMT/AO[0] == 0 || FMT/AO[0] == FMT/DP) && (FMT/AO[1] == 0 || FMT/AO[1] == FMT/DP) && (FMT/RO + FMT/AO[0] >= 0.8 * FMT/DP) && (FMT/RO + FMT/AO[1] >= 0.8 * FMT/DP)' variants.vcf > filtered_variants.vcf
This code and the first step doesn't work for me. WSL says
[warning] samtools mpileup option `u` is functional, but deprecated. Please switch to using bcftools mpileup in future.
[mpileup] 1 samples in 1 input files
I dont understand if I can use bcftools in the 1st step or not since the paper used samtools for the same. Thanks!
It's been a long time since I did this sort of thing, but I have a vague recollection that the mpileup process was made simpler and/or rolled in to other tools (but I could be wrong) such that this process is somewhat obsolete.
My advice would be ignore ChatGPT and just refer to the actual software manuals and make sure you know your versions. Remember that GPT is trained on a lot of historical code and may not accurately reflect the best way to approach things.
If you're trying to reproduce that dataset exactly, down to the last variant call, you may need to pin the versions of software you use to those which the paper used (depends how old it is and how 'clean' the SNP signal is).
Seconding this. Apply current best practices (which is bcftools mpileup followed by something I forgot, see bcftools manual for variant calling) rather than sticking with deprecated approaches. Careful with ChatGPT. It's cool to get an idea, but it is not up to date with quickly evolving software such as samtools and others in the bioinformatics realm.
I just realised I didn't align my reads with the tools they used and used bowtie2 instead. I can't download the older version that is mentioned in the paper
Also, if i used bcftools instead of samtools unlike in the paper, will I be not able to produce the same data/result ?
Older versions of software are usually available via distribution tools or the websites. It may require a lot of digging, but its almost certainly out there somewhere.
You could try with
bcftools
and just see what kind of results you get. How important reproducing the paper perfectly is will depend on the question you're asking.You could always also try and contact the authors and have them send you any intermediate files they have.
I didn't use the same tool as in the paper...