Creating multiple partitions of the alignment
1
0
Entering edit mode
5.5 years ago
Mac ▴ 20

I have alignment that has sequences with an average length of 900 base pairs (bp). I would like to breakup this alignment into several blocks of alignments each 300 bases long. I can do that in Se-Al by simply copying and pasting into a new window, but I wanted to know if there is a command line tool that can allow me to do so. I want to specify a window length in the command (e.g 300), and it outputs for me several alignments (300 bp each) extracted consecutively from my original big alignment (900 bp) and also with the correct sequence labels. Thank you in advance

alignment • 1.1k views
ADD COMMENT
1
Entering edit mode

an example of input/output is needed

ADD REPLY
0
Entering edit mode

An example of input and ouputs:

----Input: alignment_900bp_long.fasta 
----Options: partitions [window_length]
----Outputs: alignment1_positions_1-300.fasta    alignment2_positions_301-600.fasta     alignment3_positions_601-900.fasta
ADD REPLY
0
Entering edit mode

ok, so it's fasta....

ADD REPLY
0
Entering edit mode
5.5 years ago

using bioalcidae: http://lindenb.github.io/jvarkit/BioAlcidaeJdk.html

$ java -jar src/jvarkit-git/dist/bioalcidaejdk.jar -e 'final List<FastaSequence> seqs = stream().collect(Collectors.toList()); final int max_len = seqs.stream().mapToInt(S->S.length()).max().orElse(0); int p=0; while(p< max_len) { java.io.PrintWriter w=new java.io.PrintWriter(String.format("tmp.%05d.fa",p)); for(FastaSequence s:seqs) {w.println(">"+s.getName());for(int i=p;i< p+300 && i < max_len;i++) w.print(i<s.length()?""+s.charAt(i):"-");w.println();} w.flush();w.close();p+=300;};' msa.fa
ADD COMMENT
0
Entering edit mode

Thank you so much, let me check it out

ADD REPLY

Login before adding your answer.

Traffic: 1886 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6