Getting fasta file from the VCF after variant calling
2
0
Entering edit mode
8 months ago
Neil ▴ 20

Hello. I need to get information on the genomes belonging to the Pangolin lineages of coronavirus. I have one reference strain as fasta file, and two strains as fastq file. I filtered them, aligned them to the reference and did variant calling. My advisor said that I MUST assemble the genomes (I don’t understand what this means to him). But the goal is to find information about two strains on https://cov-lineages.org/resources/pangolin.html. I guess I could do this by getting fasta files from VCF with accepting variant calling, but don’t know how to do it.

I would greatly appreciate any help.

fastq pangolin assembly • 977 views
ADD COMMENT
2
Entering edit mode

First, I agree with Pierre you are not trying hard enough to follow up. I thought I had given extensive advice already. If your supervisor wants an assembly they likely mean de novo assembly. Anyway, why don't you just ask them what they want? Just using a tool because it can produce some output in a certain format without understanding what it does is the kind of black-box bioinformatics that leads to more problems than it solves.

ADD REPLY
1
Entering edit mode

Don't forget to follow up on your threads, that is bad etiquette. If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one answer if they all work. If an answer was not really helpful or did not work, provide detailed feedback so others know not to use that answer.

Upvote|Bookmark|Accept

Processing fastq files for genome assembly MCP counter's problems SSD or HHD for genome analysis

ADD REPLY
2
Entering edit mode
8 months ago

You can generate consensus sequence using BBTools like this:

bbmap.sh in=reads.fastq ref=reference.fasta out=mapped.sam
consensus.sh in=mapped.sam ref=reference.fasta out=consensus.fasta

I suggest that you learn what it means to "Assemble a genome". You will not satisfy your advisor if you do not do that, so why bother with hypotheticals? It's not like "Assemble a genome" has some special meaning to your advisor; it is something you should understand at least the basics of if you have a degree in anything related to genetics. You work with VCF files, but VCF files are not actually useful without genomes. It is important to learn what VCF files are, what they mean, and how to generate them. Anyone can filter them on a 0.1234 cutoff for column 7, but the purpose of a bioinformatician is to understand what column 7 means and why 0.1234 is a good cutoff.

ADD COMMENT
1
Entering edit mode
8 months ago
GenoMax 147k

See this tutorial: Generating consensus sequence from bam file

ADD COMMENT
1
Entering edit mode

Following this tutorial will not give an assembly. Instead, it will give an ALT-genome. The function is called "consensus" but this is in fact a misnomer.

ADD REPLY
1
Entering edit mode

I believe that is what OP wants. To get a sequence that incorporates the changes present in their VCF.

ADD REPLY
1
Entering edit mode

But that's not an assembly...

ADD REPLY

Login before adding your answer.

Traffic: 2395 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6