Can anyone tell me how can I do it
I have two files. file1 = 1.fasta asd is like
>123.1
tgcgtgctagctgacctgcgtgcagctgc
>123.2
tcgtcgatcgacgtgcagctgactgcttgct
>123.3
acccggtgcggggggtcgatcgacgtc
file2 = result.ods file in ubuntu and contains data as:
id seq_start_pos seq_end_pos
123.1 10 15
123.2 11 18
123.3 8 16
and I want a script which can generate output like:
seq_is small_seq large_seq
123.1 ctgac ctagctgacctgc
123.2 cgtgcagc tcgacgtgcagctgac
123.3 cggggggt ggtgcggggggtcgat
This is the proper format of result which I want. actually small seq is region between 15-10 =5 bp in seq file 1 and large seq is 4-4 bp up/downstream of the region 10-15 .
Basically I want to extract small_seq along with 4-4 bp upstream ad downstream in excel file
If anyone can provide me script for extracting this region, I shall be highly thankful to him/her.
have you tried anything from your end ? You can play around with bedtools getfasta and command
paste
to achieve what you want or if you know some python, its easy with pyfaidxI ran this commad biy it is not working
see my answer here: A: How to extract a region of protein or small domain '150-300' from a long sequenc