Calculating FST from Transcriptomic Sequence Data stored in a VCF
0
0
Entering edit mode
8.2 years ago
mek362 ▴ 10

Hi all,

I am working with de novo - assembled transcriptomes, and would like to estimate a suite of neutrality and F stats for each locus (i.e. contig) using sequence variation (as opposed to SNPs). So far, I have used the R package PopGenome to estimate single-population and pairwise FST, as well as some neutrality stats. PopGenome gives results for this by contig, but after looking at FST code on the PopGenome github repo, this value appears to be SNP FST pooled across each contig, rather than FST estimated from contig sequences (though I am not completely sure about this). With the haplotype FST method, each population for a region is assigned an identical FST estimate.

Does anybody know of a package that would allow for FST calculation using sequence data? Preferably, a distance-based hierarchical approach like AMOVA.

I have considered HierFstat, but am not able to find a file converter capable of parsing the 2 GB VCF (uncompressed) that my data are stored in. Interested in hearing about other possibilities before taking the time to write my own.

Please let me know if I've left out any useful info.

Cheers!

Mikey

RNA-Seq FST PopGenome Sequence Data de novo • 2.7k views
ADD COMMENT
0
Entering edit mode

mek362 I've implemented weir and cockerham's 1984 Fst in VCFLIB. It will work directly from a vcf. There are several downstream tools in VCFLIB for smoothing and doing permutations. The program is called wcFst.

ADD REPLY

Login before adding your answer.

Traffic: 1968 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6