Hi,
I have fasta file containing loci of like 500 introns. I don't know how to have just the last 100 bases using awk command lines.
Thanks, Farid
Hi,
I have fasta file containing loci of like 500 introns. I don't know how to have just the last 100 bases using awk command lines.
Thanks, Farid
with seqkit:
$ seqkit subseq -r -100:-1 input.fa
if fasta sequence is in single line, with awk:
$ awk -v OFS="\n" '{getline seq} {print $0, substr(seq, length(seq)-99, length(seq))}' test.fa
if fasta sequence is in single line, with sed:
$ sed -n '/^>/p;/^>/! s/.*\(.\{100\}\)/\1/p' test.fa
Maybe this will work for you, unless you have IDs longer than 100 characters.
perl -pe 'if(/\>/){s/\n/\t/}; s/\n//; s/\>/\n\>/' | sed -Ee '/^$/d ; s/^.*(.{100})$/\1/' file.fasta
You will get fasta file in single line, removed the empty lines and get the last characters.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Use
bioawk
orsubstr
inawk
with pre-calcuated sequence lengths (you can uselength()
for that). Read: https://www.gnu.org/software/gawk/manual/html_node/String-Functions.htmlThanks RamRS. I think this command works when I know the length of the introns. But I have different lengths and I want the last 100 bases from each sequence.
You could use
length($seq)
in biooawk to calculate length on the fly. I don't see why you need to know length before you start the entire operation. It just needs to be calculated before thesubstr
step.Thanks everyone. The
sed -Ee 's/^.*(.{100})$/\1/' file.fasta
worked great. Appreciate it.Please accept answers that worked for you. You can accept more than one answer if they all work.