convert fasta/gb to vcf
0
1
Entering edit mode
6.2 years ago

Dear all,

Would be possible to convert a fasta or genbank file into a variant calling file VCF or the only source is a GFF?

Thank you

fasta vcf genbank gff • 6.7k views
ADD COMMENT
3
Entering edit mode

would be possible to convert a fasta or genbank file into a variant calling file VCF or the only source is a GFF?

This makes no sense as composed. What exactly are you trying to do?

ADD REPLY
1
Entering edit mode

You can take fastq file, fasta reference genome and GFF annotation file to call variant and get VCF files.

fasta/fastq files are reads/sequences files

GFF and GTF are annotation files (where are genes, exons...)

VCF or Variant Call Format is the ouput of variant caller softwares

The files you mentioned in your post are completely different

ADD REPLY
0
Entering edit mode

If I have a fasta reference file for the organism X and its correspondent geneBank file, would it be possible to generate the VCF? Or the only way to obtain a VCF is by using a GFF file for the organism X. I know they are different files, the problem is how to make them. For humans and other selected organisms, these files are already present in the public domain. what happens when I have to make them from scratch?

ADD REPLY
1
Entering edit mode

If I have a fasta reference file for the organism X and its correspondent geneBank file, would it be possible to generate the VCF?

Not unless you map your data to that reference. I assume you are thinking of using the annotation present in GenBank file to annotate any variants you find? As far as the actual sequence goes there is no difference in that in either file.

Or the only way to obtain a VCF is by using a GFF file for the organism X

Most traditional variant annotation tools will use a GFF/GTF file but you still need to call variants independently. You do not need a GFF file to call variants, which are commonly stored in VCF files.

Based on your past posts you do this type of work with human data. It should be directly applicable for other species.

ADD REPLY
0
Entering edit mode

If you are looking for already known variants for human, for example, you can take a look at the dbSNP. If you want to discover new variants, you will have to "call" these variants using a variant caller

ADD REPLY
0
Entering edit mode

And to call variants you need aligned sequences, which you have in a bam file

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

Thank you for the tip, but this looks a bit of an overkill, even because technically I am working with WGS, not RNAseq. The problem I am facing is: in order to call the variants I first need to re-align the reads to obtain a more accurate picture of the genomic variation; this step is done with the base quality score recalibration (BQSR), but the command -- at least with GATK's implementation -- requires a VCF file to feed the -knownSites/--known-sites option. Thus this looks to me as a circular approach: I need a VCF to generate a VCF. The only things I have so far -- for non-human genomes -- are the fasta and the genbank files. SO the question is: how can I generate a VCF from these files? -- if possible.

ADD REPLY
0
Entering edit mode

As far as I know you don't need to use the vcf of the sample you sequenced. It just requires variants which are common in the population.

This information should have been in your initial post.

ADD REPLY
1
Entering edit mode

I wanted to make a very general question, independent from the GATK implementation, that is: can I build a VCF from fasta/genbank?

ADD REPLY
2
Entering edit mode

Okay, here comes the general answer: no.

ADD REPLY
0
Entering edit mode

fair enough, case closed

ADD REPLY
0
Entering edit mode

I have a consensus fasta file from a de novo genome and the associated GFF file, what tool can I use to convert fasta to vcf? Bastien, above, said this was possible?

ADD REPLY

Login before adding your answer.

Traffic: 1553 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6