Extraction of specific sequences from a FASTA file
1
0
Entering edit mode
3.5 years ago
aranyak111 • 0

I have the entire hairpin sequences downloaded from the miRbase website. The first few lines look like this

cel-let-7 MI0000001 Caenorhabditis elegans let-7 stem-loop UACACUGUGGAUCCGGUGAGGUAGUAGGUUGUAUAGUUUGGAAUAUUACCACCGGUGAAC UAUGCAAUUUUCUACCUUACCGGAGACAGAACUCUUCGA cel-lin-4 MI0000002 Caenorhabditis elegans lin-4 stem-loop AUGCUUCCGGCCUGUUCCCUGAGACCUCAAGUGUGAGUGUACUAUUGAUGCUUCACACCU GGGCUCUCCGGGUACCAGGACGGUUUGAGCAGAU cel-mir-1 MI0000003 Caenorhabditis elegans miR-1 stem-loop AAAGUGACCGUACCGAGCUGCAUACUUCCUUACAUGCCCAUACUAUAUCAUAAAUGGAUA

I want to extract just the human hairpin sequences from the entire file. I have used the grep command as follows

 grep hsa-mir hairpin.fa > human_hairpin.fa

However, it only extracts the header line but I need the sequences as well like this.

hsa-mir-548ab MI0016752 Homo sapiens miR-548ab stem-loop AUGUUGGUGCAAAAGUAAUUGUGGAUUUUGCUAUUACUUGUAUUUAUUUGUAAUGCAAAA CCCGCAAUUAGUUUUGCACCAACC

Which commands should I follow?

Genomics • 940 views
ADD COMMENT
0
Entering edit mode
3.5 years ago
Mensur Dlakic ★ 28k

You obviously don't know it or else you wouldn't be asking, but your question is easily one of the top 3-5 most frequently asked.

Simply enter extract fasta sequences into the search box at the top and you will get many answers. For a single sequence, you can also open a file and copy-paste it by hand.

ADD COMMENT
0
Entering edit mode
I have used 

grep -wEA1 --no-group-separator hsa-mir hairpin.fa > a.fa

which works fine with single-line FASTA sequences but I want to use it for multi-line FASTA sequences.

ADD REPLY

Login before adding your answer.

Traffic: 1671 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6