Question

Extract mutations from fasta sequences

0

Entering edit mode

5.1 years ago

dasilvauirajan ▴ 20

I have a large amount of align protein sequences in the .fasta forma, and a reference sequence, every of that has the same length. I would like to extract only the amino acid mutations from these sequences, so that, in the end, I want to have a list that looks something like this: I456L, W675T, etc . Is there a software or any way to do this? Thankful

Mutations fasta sequence • 2.4k views

ADD COMMENT • link updated 5.1 years ago by Pierre Lindenbaum 166k • written 5.1 years ago by dasilvauirajan ▴ 20

1

Entering edit mode

Pierre has a complete solution but in case that does not work you could use blastp with -outfmt 3 which will identify the difference and output it so.

Query_1    181  VAATMCIGPEGDLHGVPPGECAVRLVKAGASIIGVNCHFDPTISLKTVKLMKEGLEAARL  240
Subject_1  181  ..............................V.............................  240

Biopython blast parser may be able to help finish the rest.

ADD REPLY • link 5.1 years ago by GenoMax 151k

score 4 · Accepted Answer · 2020-03-25

4

Entering edit mode

5.1 years ago

Pierre Lindenbaum 166k

Using bioalcidaejdk; http://lindenb.github.io/jvarkit/BioAlcidaeJdk.html and a fasta file where the very first sequence is the reference:

java -jar dist/bioalcidaejdk.jar -e 'FastaSequence ref=null; while(iter.hasNext()) { final FastaSequence seq = iter.next(); if(ref==null) { ref=seq; } else { for(int i=0;i< seq.length() && i< ref.length();i++) { char aa1 = ref.charAt(i); char aa2 = seq.charAt(i); if(aa1!=aa2) println(seq.getName()+"\t"+aa1+(i+1)+aa2); } } }'  input.fasta

ADD COMMENT • link 5.1 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

Pierre Lindenbaum I cannot find the fold dist, and bioalcidaejdk.jar too. There is a file named bioalcidaejdk.java in the bioalcidae folder.

ADD REPLY • link 5.0 years ago by dasilvauirajan ▴ 20

0

Entering edit mode

Did you follow install instructions:

Requirements / Dependencies

java compiler SDK 11. Please check that this java is in the ${PATH}. Setting JAVA_HOME is not enough : (e.g: https://github.com/lindenb/jvarkit/issues/23 )

Download and Compile

$ git clone "https://github.com/lindenb/jvarkit.git"
$ cd jvarkit
$ ./gradlew bioalcidaejdk

ADD REPLY • link 5.0 years ago by GenoMax 151k

0

Entering edit mode

Thank you both genomax and Pierre Lindenbaum. I had a problem with my JDK compiler, just solved it, i get running and everything went well!!!

ADD REPLY • link 5.0 years ago by dasilvauirajan ▴ 20