Question

Getting fasta file from the VCF after variant calling

0

Entering edit mode

8 months ago

Neil ▴ 20

Hello. I need to get information on the genomes belonging to the Pangolin lineages of coronavirus. I have one reference strain as fasta file, and two strains as fastq file. I filtered them, aligned them to the reference and did variant calling. My advisor said that I MUST assemble the genomes (I don’t understand what this means to him). But the goal is to find information about two strains on https://cov-lineages.org/resources/pangolin.html. I guess I could do this by getting fasta files from VCF with accepting variant calling, but don’t know how to do it.

I would greatly appreciate any help.

fastq pangolin assembly • 972 views

ADD COMMENT • link updated 8 months ago by Michael 55k • written 8 months ago by Neil ▴ 20

2

Entering edit mode

First, I agree with Pierre you are not trying hard enough to follow up. I thought I had given extensive advice already. If your supervisor wants an assembly they likely mean de novo assembly. Anyway, why don't you just ask them what they want? Just using a tool because it can produce some output in a certain format without understanding what it does is the kind of black-box bioinformatics that leads to more problems than it solves.

ADD REPLY • link 8 months ago by Michael 55k

1

Entering edit mode

Don't forget to follow up on your threads, that is bad etiquette. If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one answer if they all work. If an answer was not really helpful or did not work, provide detailed feedback so others know not to use that answer.

Upvote|Bookmark|Accept

Processing fastq files for genome assembly MCP counter's problems SSD or HHD for genome analysis

ADD REPLY • link 8 months ago by Pierre Lindenbaum 164k

score 2 · Answer 1 · 2024-03-08

You can generate consensus sequence using BBTools like this:

bbmap.sh in=reads.fastq ref=reference.fasta out=mapped.sam
consensus.sh in=mapped.sam ref=reference.fasta out=consensus.fasta

I suggest that you learn what it means to "Assemble a genome". You will not satisfy your advisor if you do not do that, so why bother with hypotheticals? It's not like "Assemble a genome" has some special meaning to your advisor; it is something you should understand at least the basics of if you have a degree in anything related to genetics. You work with VCF files, but VCF files are not actually useful without genomes. It is important to learn what VCF files are, what they mean, and how to generate them. Anyone can filter them on a 0.1234 cutoff for column 7, but the purpose of a bioinformatician is to understand what column 7 means and why 0.1234 is a good cutoff.

score 1 · Answer 2 · 2024-03-07

1

Entering edit mode

8 months ago

GenoMax 147k

See this tutorial: Generating consensus sequence from bam file

ADD COMMENT • link 8 months ago by GenoMax 147k

1

Entering edit mode

Following this tutorial will not give an assembly. Instead, it will give an ALT-genome. The function is called "consensus" but this is in fact a misnomer.

ADD REPLY • link 8 months ago by Michael 55k

1

Entering edit mode

I believe that is what OP wants. To get a sequence that incorporates the changes present in their VCF.