Make fasta file from SNPs in two vcf files
3
0
Entering edit mode
9.6 years ago
sun.nation ▴ 140

Hello,

I have two vcf files with SNPs compared to same reference.

Vcf1:

Position SNP
1             A
3             T
6             G
8             C

Vcf2:

2            C
6            A
8            T
10          T

There are 6 different positions, I want to make fasta files for both vcf files. N for no data.

>Vcf1
ANTGCN
>Vcf2
NCNATT

Is there any helpful tools or scripts?

Thanks in advance

SS

fasta vcf phylogenetic SNP • 5.7k views
ADD COMMENT
1
Entering edit mode
9.6 years ago
Brice Sarver ★ 3.8k

If you have a true VCF, vcf-tab-to-fasta.pl is one of the easiest ways to convert to a fasta sequence. You convert to a tab-delimited format first, then run the perl script. You can also convert to fasta with invariant reference bases using GATK's FastaAlternateReferenceMaker.

ADD COMMENT
0
Entering edit mode

Thanks Brice, This worked good.

I was wondering if I can filter based on missing data. eg if 50% sample has missing SNP in a particular position- remove that position.

ADD REPLY
1
Entering edit mode
9.5 years ago

Whats about the R package PopGenome,

readData(, format="VCF")
region.as.fasta(...,type=1)

Best,
Bastian

ADD COMMENT
0
Entering edit mode
9.6 years ago

You can use the pyfaidx package for this. The VCF files should be tabix indexed:

ADD COMMENT
0
Entering edit mode

I was not able to figure out how to use the script. I know less about python. Can this be used with UNIX commands after installing? Please let me know, I will try.

Thanks

ADD REPLY
0
Entering edit mode

No, if you're looking for something that is ready to run you're better off with brice's solution. Glad you found something that works!

ADD REPLY

Login before adding your answer.

Traffic: 1666 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6