remove the fasta sequence which doesn't contain ATGC
0
0
Entering edit mode
3.0 years ago
harry ▴ 40

I have a fasta sequence in which some sequence doesn't contain "ATGC" like below so I want to remove all fasta sequences from my file which doesn't contain the proper sequence:-

>hsa-NHS_0029
partial

Thanks in advance

fasta • 1.6k views
ADD COMMENT
0
Entering edit mode

partial has two 'a's and one 't' in it. Try

$ seqkit -w 0 grep -srip '[ATGC]' test.fa
ADD REPLY
0
Entering edit mode

it doesn't remove those fasta sequences with a header in my output file.

ADD REPLY
0
Entering edit mode

Please post example fasta file, code you have run and output from example fasta file.

ADD REPLY
0
Entering edit mode

Below is the example fasta file in which you can see a fast sequence is "partial" which is not actually a fast sequence and I want to remove all this sequence from my fasta file with a header.

>hsa-NHS_0029
partial

>hsa-MAP3K2_0035
CAAAGCAGACACAGTCTTTTTCCTTATGGGGCTTATAGTGTGGTATTGGTGGTCTTCAAGCTTTAGGTGCATGCTCTCTAGGGGCAGCACAGGCCAGTCTTTGGGAAGTAGAAAGAAAACACTAGAATTTTAATCTTGTAAAAATATTTATTTTGGAAGTATGCCTACATGATATATGTACCCAAAAATGTTGAGTTGATAAGGATACAGAAATCAAAATTTGAATCCCTGGAGAACAACTTTATTCTTTTTAACAGTTTCATTTCATTCTGTCATACAGATGTCACATAGTTAAGTTTAATCAGTTTCCTGAATAAACATTTTGCATTGCTCTTTAACATTCATATGGTAGTTTCAGTGCAAATTAAAAACTGCAGAAAATTTTGAGAATTTCCTTTTTAAAATTTGTATTAGGATAAAACAGACTTCAGAGTTTTTGAACTTTGTGTTGAAAATGTGTGCCTGTGTTAAAAAATGTGCTGTTGTAGTGTTAAGTAAAGTGAATATTTACGTAATTTTTAGTTAATCTTGTATCACTCTAAAAGGAG

>hsa-TMOD2_0026
ATGTTGTCAAAGGTGAAAAAGTAAAGCCAGTATTTGAGGAACCACCAAATCCCACAAATGTGGAAATAAGCCTGCAGCAGATGAAAGCCAATGATCCTAGCTTGCAAGAAGTCAACCTCAACAACATTAAGTTCGTAAGAAGAGAGTTGAAGCAGACCGAAG

As you suggested a seqkit command, I ran it but it don't remove those sequences from my file.

seqkit -w 0 grep -srip '[ATGC]' test.fa >test_new.fa

So I want the output is like below :- In which the "partial" fasta sequence is removed with a header.

>hsa-MAP3K2_0035
    CAAAGCAGACACAGTCTTTTTCCTTATGGGGCTTATAGTGTGGTATTGGTGGTCTTCAAGCTTTAGGTGCATGCTCTCTAGGGGCAGCACAGGCCAGTCTTTGGGAAGTAGAAAGAAAACACTAGAATTTTAATCTTGTAAAAATATTTATTTTGGAAGTATGCCTACATGATATATGTACCCAAAAATGTTGAGTTGATAAGGATACAGAAATCAAAATTTGAATCCCTGGAGAACAACTTTATTCTTTTTAACAGTTTCATTTCATTCTGTCATACAGATGTCACATAGTTAAGTTTAATCAGTTTCCTGAATAAACATTTTGCATTGCTCTTTAACATTCATATGGTAGTTTCAGTGCAAATTAAAAACTGCAGAAAATTTTGAGAATTTCCTTTTTAAAATTTGTATTAGGATAAAACAGACTTCAGAGTTTTTGAACTTTGTGTTGAAAATGTGTGCCTGTGTTAAAAAATGTGCTGTTGTAGTGTTAAGTAAAGTGAATATTTACGTAATTTTTAGTTAATCTTGTATCACTCTAAAAGGAG

    >hsa-TMOD2_0026
    ATGTTGTCAAAGGTGAAAAAGTAAAGCCAGTATTTGAGGAACCACCAAATCCCACAAATGTGGAAATAAGCCTGCAGCAGATGAAAGCCAATGATCCTAGCTTGCAAGAAGTCAACCTCAACAACATTAAGTTCGTAAGAAGAGAGTTGAAGCAGACCGAAG
ADD REPLY
2
Entering edit mode

do you really have string 'partial' as your sequence? didn't understand that. Here is the solution:

$ seqkit -w 0 grep -svp "partial" test.fa

>hsa-MAP3K2_0035
CAAAGCAGACACAGTCTTTTTCCTTATGGGGCTTATAGTGTGGTATTGGTGGTCTTCAAGCTTTAGGTGCATGCTCTCTAGGGGCAGCACAGGCCAGTCTTTGGGAAGTAGAAAGAAAACACTAGAATTTTAATCTTGTAAAAATATTTATTTTGGAAGTATGCCTACATGATATATGTACCCAAAAATGTTGAGTTGATAAGGATACAGAAATCAAAATTTGAATCCCTGGAGAACAACTTTATTCTTTTTAACAGTTTCATTTCATTCTGTCATACAGATGTCACATAGTTAAGTTTAATCAGTTTCCTGAATAAACATTTTGCATTGCTCTTTAACATTCATATGGTAGTTTCAGTGCAAATTAAAAACTGCAGAAAATTTTGAGAATTTCCTTTTTAAAATTTGTATTAGGATAAAACAGACTTCAGAGTTTTTGAACTTTGTGTTGAAAATGTGTGCCTGTGTTAAAAAATGTGCTGTTGTAGTGTTAAGTAAAGTGAATATTTACGTAATTTTTAGTTAATCTTGTATCACTCTAAAAGGAG
>hsa-TMOD2_0026
ATGTTGTCAAAGGTGAAAAAGTAAAGCCAGTATTTGAGGAACCACCAAATCCCACAAATGTGGAAATAAGCCTGCAGCAGATGAAAGCCAATGATCCTAGCTTGCAAGAAGTCAACCTCAACAACATTAAGTTCGTAAGAAGAGAGTTGAAGCAGACCGAAG
ADD REPLY

Login before adding your answer.

Traffic: 1985 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6