Entering edit mode
3.9 years ago
Explorer
▴
10
I am trying to subsetting a FASTA file at a specific nucleotide positions. For example
>random sequence 1
tatgtgcgag
>random sequence 2
agggtgttat
>random sequence 3
tatgtgcgag
>random sequence 4
gactcgcggt
>random sequence 5
tatgtgcgag
>random sequence 6
gcagccatcg
>random sequence 7
gactcgcggt
>random sequence 8
tatgtgcgag
>random sequence 9
tatgtgcgag
>random sequence 10
tatgtgcgag
I am able to cut the sequence from position 3 to 6 but ID is missing. I want to same IDs as the original file. Can anyone help to modify my code, please? Thanks
cat random.fasta |sed -n 2~2p |cut -c3-6 >out.fasta
tgtg
ggtg
tgtg
ctcg
tgtg
agcc
ctcg
tgtg
tgtg
tgtg
Extraction of nt bases from sequence
Thanks for sharing the link. I tried the command for multiline FASTA and it partially worked. I am extending the thread there.
or you sure you are using a multiline fasta?
With 'multiline' we mean that the sequence is block-formatted and is thus present on several lines under 1 header. It does not refer to the fact you have several entries in a single fasta file.
I am not sure about that. When I open it in a text editor, there are blocks of 60nt each while in SnapGene, ApE it is as per software settings. I got confused because the code suggested for multiline sort-of worked.
In the original thread shared by Pierre Lindenbaum, the following code was suggested.
For single line FASTA file
For multiline FASTA file
When I tried the first code, there was no subsetting, while for 2nd code it worked up to 1000 position.