Entering edit mode
3.5 years ago
FadyNabil
▴
20
I have a Trinity.fasta file an I want to extract the shortest and the longest isoform from it
I have a Trinity.fasta file an I want to extract the shortest and the longest isoform from it
Hey,
samtools faidx Trinity.fasta # create an index to get the length of each sequence (length values are stored in the second column of the Trinity.fasta.fai file)
awk 'BEGIN{getline; min=$2;max=$2}
{(min>$2)?min=$2:"";(max>$2)?"":max=$2}
END{print min, max}' Trinity.fasta.fai # print the max and min values in the second column of the Trinity.fasta.fai
#201 2114
awk '/^>/ {printf("\n%s\n",$0);next; } { printf("%s",$0);} END {printf("\n");}' < Trinity.fasta > linear.fasta # linearize the Trinity.fasta
cat linear.fasta | grep -A1 'len=201' --no-group-separator > sequences_201.fasta # get the shortest sequences using the field 'len=###' in the fasta header
cat linear.fasta | grep -A1 'len=2114' --no-group-separator > sequences_2114.fasta # get the longest sequences using the field 'len=###' in the fasta header
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.