Identify SNPs with Fst estimates > 0.9 and annotate them
2
1
Entering edit mode
18 months ago
anasjamshed ▴ 140

I have SNPS in the .fst file containing the estimated fst value in the 6th column like:

2R  4459    1   1.000   96.0    1:2=0.01762811
2R  9728    1   1.000   99.0    1:2=0.01340363
2R  9828    1   1.000   100.0   1:2=0.01554609
2R  9928    1   1.000   99.0    1:2=0.01454173
2R  10028   1   1.000   100.0   1:2=0.01317223
2R  10128   1   1.000   100.0   1:2=0.01554917
2R  10228   1   1.000   100.0   1:2=0.01202964
2R  10328   1   1.000   100.0   1:2=0.01316962
2R  10428   1   1.000   100.0   1:2=0.01317223
2R  10528   1   1.000   100.0   1:2=0.01316962
2R  10628   1   1.000   100.0   1:2=0.01778599
2R  10728   1   1.000   100.0   1:2=0.01554609
2R  10828   1   1.000   100.0   1:2=0.01554917

I want to filer those SNPs that have a value greater than 0.9 so I am trying this command in Linux:

awk -F"\t" '$6>0.02' file.fst

But it's not fetching an exact 0.9 from the 6th column due to the presence of 1:2=0 in every row of the 5th column.

Which changes do I need to make in the awk command?

After finding SNPs, I need to annotate them by using snp eff so is it possible to apply SnpEff to the .fst file?

snpeff fst SNP • 791 views
ADD COMMENT
0
Entering edit mode

Don't forget to follow up on your threads. If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one answer if they all work. If an answer was not really helpful or did not work, provide detailed feedback so others know not to use that answer.

Upvote|Bookmark|Accept

ADD REPLY
1
Entering edit mode
10 weeks ago

I don't have much experience using Linux but I use R to do the same thing for which i use this command

library(dplyr)
library(purrr)

FST_file <- read.delim("YourFSTfile.fst")

significatn_snp <- FST_file %>% filter(FST > 0.9)

write.csv(signfificant_snp, "significant_snps.csv", row.names = FALSE)
ADD COMMENT
1
Entering edit mode
10 weeks ago

using '=' as the separator, the column for FST is now the 2nd:

LC_ALL=C awk -F"=" '$2>0.9' file.fst
ADD COMMENT

Login before adding your answer.

Traffic: 1424 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6