How to extract SNPs from multiple alignment fasta file?
1
0
Entering edit mode
6.1 years ago

I am using following scripts to read fasta files

>library(Biostrings)
>dna <- readDNAStringSet("<<PATH TO FASTA FILE>>")

But, further I would like to extract SNPs from these alignment file, but I don't know how to extract the SNPs.

Does anyone know?

snp SNP alignment sequencing • 5.3k views
ADD COMMENT
1
Entering edit mode

With adegenet package in R, fasta2DNAbin("text.fasta", snpOnly = T)

ADD REPLY
0
Entering edit mode

Thanks, I will try it

ADD REPLY
0
Entering edit mode

Hello Naung.M,

I used the script as you suggested from adegenet library

fasta2DNAbin("Path to fasta", snpOnly = T)

I found the following result: I have 109 sequences and each approximately 1169 length

Converting FASTA alignment into a DNAbin object...

Finding the size of a single genome...

genome size is: 1,169 nucleotides

( 60 lines per genome )

Importing sequences... .................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... Forming final object...

Extracting SNPs...

...done.

109 DNA sequences in binary format stored in a matrix.

All sequences of same length: 1058

Labels: Seq1 Seq2 Seq3 ...

Base composition: a c g t 0.284 0.202 0.253 0.261

(Total: 115.32 kb)

It's giving only above information, but I would like to extract the SNPs.

Then, I used

myPath <- system.file("path to fasta",package="adegenet") myPath [1] " "

I read the file as below

obj <- fasta2DNAbin(myPath, chunk=109) Error in if (!ext %in% c("FASTA", "FA", "FAS")) warning("wrong file extension - '.fasta', '.fa' or '.fas' expected") : argument is of length zero

Showing this error.

Any one, please suggest me how to resolve this to extract SNPs from multiple fasta aligned file.

ADD REPLY
0
Entering edit mode

You can convert DNAbin object into csv files by the following script: write.csv(DNAbin, "filename.csv"). I guess allele will be coded in number format for each SNPs position.

ADD REPLY
2
Entering edit mode
ADD COMMENT
0
Entering edit mode

Thanks, I can try it.

ADD REPLY
0
Entering edit mode

Hi Peirre, Thanks, I installed SNP-sites in Linux-Ubuntu and I extracted the SNPs, but seems I couldn't findout the SNPs position in comparison with the first sequence, which is reference sequence. I used the command: vagrant@ubuntu-xenial:/vagrant$ snp-sites test10.fasta . Could you please let me know, if there is a specific script/command for retrieving the SNPs position in comparison with the reference sequence. Also, could you please let me know, how I can install the Jvarkit (https://omictools.com/jvarkit-tool) in Linux Ubuntu? Is there specific script for the installation of Jvarkit?

ADD REPLY
1
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 2576 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6