I have a fasta file of approx 3 Megabasepairs (Accession no. CM001634.1) and want to extract the sequences of all regions given below and then save the extracted sequences in fasta format in new text file. Also want that new file should be containing accession of file and the name of repeat such as low_complexity, simple repeat.
Kindly suggest me perl script.
SW perc perc perc query position in query matching repeat position in repeat score div. del. ins. sequence begin end (left) repeat class/family begin end (left) ID
21 3.6 0.0 0.0 gi|411024077|gb|CM001634.1| 11554 11581 (28623556) + AT_rich Low_complexity 1 28 (0) 1
22 0.0 0.0 0.0 gi|411024077|gb|CM001634.1| 12224 12245 (28622892) + AT_rich Low_complexity 1 22 (0) 2
22 8.0 0.0 0.0 gi|411024077|gb|CM001634.1| 12609 12658 (28622479) + AT_rich Low_complexity 1 50 (0) 3
22 5.6 0.0 0.0 gi|411024077|gb|CM001634.1| 12691 12726 (28622411) + AT_rich Low_complexity 1 36 (0) 4
26 3.0 0.0 0.0 gi|411024077|gb|CM001634.1| 13965 13997 (28621140) + AT_rich Low_complexity 1 33 (0) 5
222 3.7 0.0 0.0 gi|411024077|gb|CM001634.1| 16373 16399 (28618738) + (TA)n Simple_repeat 1 27 (0) 6
219 12.5 0.0 0.0 gi|411024077|gb|CM001634.1| 17247 17286 (28617851) + (G)n Simple_repeat 1 40 (0) 7
198 0.0 0.0 0.0 gi|411024077|gb|CM001634.1| 20074 20095 (28615042) + (CAAAAA)n Simple_repeat 3 24 (0) 8
189 0.0 0.0 0.0 gi|411024077|gb|CM001634.1| 20344 20364 (28614773) + (TA)n Simple_repeat 1 21 (0) 9
23 0.0 0.0 0.0 gi|411024077|gb|CM001634.1| 21437 21459 (28613678) + AT_rich Low_complexity 1 23 (0) 10
198 0.0 0.0 0.0 gi|411024077|gb|CM001634.1| 22420 22441 (28612696) + (GAA)n Simple_repeat 2 23 (0) 11
189 0.0 0.0 0.0 gi|411024077|gb|CM001634.1| 27191 27211 (28607926) + (TTTTTG)n Simple_repeat 4 24 (0) 12
21 0.0 0.0 0.0 gi|411024077|gb|CM001634.1| 27481 27501 (28607636) + AT_rich Low_complexity 1 21 (0) 13
24 16.1 0.0 0.0 gi|411024077|gb|CM001634.1| 33144 33174 (28601963) + AT_rich Low_complexity 1 31 (0) 14
186 4.3 0.0 0.0 gi|411024077|gb|CM001634.1| 34503 34525 (28600612) + (CGAAT)n Simple_repeat 2 24 (0) 15
What have you tried so far? This would seem to be solved simply by iterating over the rows (you can use
samtools faidx
to retrieve an arbitrary region of an indexed fasta file).