Annotation of vcftools fst output using GTF or GFF
1
0
Entering edit mode
3.0 years ago
reza ▴ 300

hi everyone

I have an outputted file from vcftools for fst calculation in the following format

CHROM   BIN_START   BIN_END N_VARIANTS  WEIGHTED_FST    MEAN_FST
ch1 1   40000   75  0.0516003   0.0355082
ch1 20001   60000   22  -0.00980986 -0.0205035
ch1 40001   80000   46  0.0180676   0.0236424
ch1 60001   100000  72  0.0273771   0.0317944

how can I annotate this file using GTF or GFF file? My goal is to identify possible overlapping genes with detected windows.

Thanks in advance for your help

fst vcftools annotation GTF • 1.4k views
ADD COMMENT
2
Entering edit mode
3.0 years ago

convert GTF to bed

awk -F '\t' '($3=="gene") {printf("%s\t%d\t%s\t%s\n",$1,int($4)-1,$5,$9);}' in.gtf

sort both files on chrom and position and then use bedtools intersect

ADD COMMENT
0
Entering edit mode

Thank you very much;

based on your recombination I followed:

awk -F '\t' '($3=="gene") {printf("%s\t%d\t%s\t%s\n",$1,int($4)-1,$5,$9);}' in.gtf > GTF.bed

then

awk '{$4=$5=$6=""; print $0}' vcftools_output.fst > Fst_Windows.bed

bedtools intersect -a GTF.bed -b Fst_Windows.bed > annotated_windows

Is my method correct?

ADD REPLY
0
Entering edit mode

sort both files on chrom and position

ADD REPLY
0
Entering edit mode

Before using the bedtools intersect, I used the bedtools sort fo sorting the files

ADD REPLY

Login before adding your answer.

Traffic: 2646 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6