Entering edit mode
7.2 years ago
namgyalwangstuklama
▴
10
Hello, I want to extract all the sequences which has 'tRNA-phe' present in it (along with the fasta head). And, I also want to extract the tRNA-phe sequences from multiple files, along with fasta head.
for eg, following fasta sequence contain tRNA-Phe, I want all the fasta head along with sequences with this key.
'' >JSAA01000083.1 Elizabethkingia anophelis strain nuh11 contig_83, whole genome shotgun sequence 1 genes found 1 tRNA-Phe c[5099,5172] 34 (gaa) ggtttcttagctcagttggtagagcaatggattgaaaatccatgtgtccc tggttcgattcctggagaaaccac ''
please help...
have you try anything?
I tried using grep command.
''grep -w tRNA-Phe file.txt ''
it gave me the output - 1 tRNA-Phe c[5099,5172] 34 (gaa) 1 tRNA-Phe c[10265,10338] 34 (gaa)
but not the sequences and fasta head.
There are lots of similar posts in this site, please search before asking.
It is not working.
what's the error? Try provide more information when giving feedback.
I guess you did not install the tool, haha
I already have seqkit,regexp installed .
can u inbox your mail id . I'll mail you the files
just paste the main error message here
grep -w
will search for matches that represent / contain the entire word. Did you try a normalgrep
?Also, if you want also the sequence afterwards, assuming the sequence is all in one line you can use
grep -A1
which will return the "after context" of one line after the the one that matched your pattern.Finally, why don't you install Bioawk and use that? It's super easy for these tasks.
The first line contains the fasta head, the second one contains 'no. of genes found' and 3rd line contains the no. name of the gene,location and size. and 4th and 5th line contains the sequences.
''>LNOG01000023.1 Elizabethkingia anophelis strain 0422 contig_7, whole genome shotgun sequence
3 genes found
1 tRNA-Phe c[9122,9195] 34 (gaa)
ggtttcttagctcagttggtagagcaatggattgaaaatccatgtgtccc tggttcgattcctggagaaaccac
2 tRNA-Phe c[9762,9835] 34 (gaa)
ggtttcttagctcagttggtagagcaatggattgaaaatccatgtgtccc tggttcgattcctggagaaaccac
3 tRNA-Ser [118406,118494] 35 (gga)
agagaggtggccgagtggtcgaaggcgcacgcctggaaagtgtgtatact ccaaaagggtatcgagggttcgaatcccttcctctctgc ''
that's interesting. check this https://en.m.wikipedia.org/wiki/FASTA_format
it's not a FASTA format.
you need write some scripts to handle this special format. yes you do.