Entering edit mode
12.0 years ago
macmath
▴
170
Dear colleagues,
Need to remove reduancy sequence based on taxonomic name (genus name) example [Serratia plymuthica A30] and [Serratia sp. AS12] consider only the first or based on sort. Leading to a unique set of sequences with genus name along with the accession info (ZP_07379498)
Sample Input file
>EKF64793 phenylalanine--tRNA ligase, beta subunit [Serratia plymuthica A30].
MKFSELWLREWVNPAISSEALSDQITMAGLEVDGVEPIAGVFNGVVVGHVVECGQHPNADKLRVTKVNVGGDRLLDIVCGAPNCRTGL
>ZP_07379498 phenylalanyl-tRNA synthetase, beta subunit [Pantoea sp. aB].
MKFSELWLREWVNPALDSAALSEQITMAGLEVDGVEPVAGAFHGVVVGEVVECGQHPNADKLRVTKINVGGERLLDIVCGAPNCRQG>YP_004500523 Phenylalanyl-tRNA synthetase subunit beta [Serratia sp. AS12].
MKFSELWLREWVNPAISSEALSDQITMAGLEVDGVEPVAGVFNGVVVGHVVECGQHPNADKLRVTKVNVGGDRLLDIVCGAPNCRTG>ZP_04615044 Phenylalanyl-tRNA synthetase beta chain [Yersinia ruckeri ATCC 29473].
MKFSELWLREWVNPAISSDELAHQITMAGLEVDGVEAVAGEFNGVVVGEVVECGQHPNADKLRVTKVNVGGERLLDIVCGAPNCRQG>ZP_10294785 phenylalanyl-tRNA ligase subunit beta [Pseudoalteromonas rubra ATCC 29570].
MKFSEKWLREWVNPAIDTEALSEQLSMAGLEVDGVDPVAGDFEGVVIGEVVECGQHPDADKLRVTKVNVGEDELLDIVCGAANCRTG>ZP_04635334 Phenylalanyl-tRNA synthetase beta chain [Yersinia intermedia ATCC 29909].
MKFSELWLREWVNPAISSDDLAHQITMAGLEVDGVDAVAGEFNGVVIGHVVECGQHPNADKLRVTKIDVGGDRLLD>ZP_04626227 Phenylalanyl-tRNA synthetase beta chain [Yersinia kristensenii ATCC 33638].
MKFSELWLREWVNPAISSDDLAHQITMAGLEVDGVDAVAGEFNGVVIGHVVECGQHPNADKLRVTKIDVGGERLLDIVCGAPNCRQGLKVAVATVGAVLPGDFKIKAAKLRGEPSEGMLCSFSELAIAEDHDGIIELPADAPIGVDLREYLKLDDKTIEISVTPNRAD
>ZP_04630893 Phenylalanyl-tRNA synthetase beta chain [Yersinia frederiksenii ATCC 33641].
MKFSELWLREWVNPAISSDDLAHKITMAGLEVDGIDPVAGEFNGVVVGHVVECGQHPNADKLRVTKIDVGGDRLLDIVCGAPNCRQGLKVAVATVGAVLPGDFKIKAAKLRGEPSEGMLCSFSELAISEDHDGIIELPADAPIGVDLREYLHLDDKTIEISVTPNRAD
>ZP_09390203 phenylalanine--tRNA ligase, beta subunit [Yokenella regensburgei ATCC 43003].
MKFSELWLREWVNPAVDSEALSDQITMAGLEVDGVEPVAGEFHGVVVGEVVECGQHPNADKLRVTKINVGGERLL
>ZP_04610982 Phenylalanyl-tRNA synthetase beta chain [Yersinia rohdei ATCC 43380].
MKFSELWLREWVNPAISSDDLAHQITMAGLEVDGIDAVAGEFNGVVVGQVVECGQHPNADKLRVTKIDVGGDRLLD
>ZP_04640876 Phenylalanyl-tRNA synthetase beta chain [Yersinia mollaretii ATCC 43969].
MKFSELWLREWVNPAISSDELAHQITMAGLEVDGVESVAGEFNGVVVGHVVECGQHPNADKLRVTKIDVGGERLLDIVCGAPNCRQGLKVAVATVGAVLPGDFKIKAAKLRGEPSEGMLCSFSELAISDDHDGIIELPADAPIGVDVREYLQLNDKTIEISVTPNRAD
>ZP_04627403 Phenylalanyl-tRNA synthetase beta chain [Yersinia bercovieri ATCC 43970].
MKFSELWLREWVNPAISSDALAHQITMAGLEVDGVESVAGEFNGVVVGHVVECGQHPNADKLRVTKIDVGGDRLLD
>ZP_09375718 phenylalanine--tRNA ligase, beta subunit [Hafnia alvei ATCC 51873].
MKFSELWLREWVNPAISSEALSEQITMAGLEVDGVEPVAGEFNGVFVGEVVECGQHPNADKLRVTKVNVGGERLLD
>CBX80523 phenylalanyl-tRNA synthetase, beta subunit [Erwinia amylovora ATCC BAA-2158].
MKFSELWLREWVSPAIDSAALCEQITMAGLEVDGVDAVAGAFHGVVVGDVVECAQHPNADKLRVTKINVGGDRLLDI
>ZP_08825428 Phenylalanyl-tRNA synthetase beta chain [Thiorhodococcus drewsii AZ1].
MRFSEAWLREWVNPPVDTQQLADQLSMAGLEVDAVEPAASAFSGVFVGLVRAIAPHPDAAKLRICSVDVGQGDPLQIICGAANVAEGMRVPVATIGARLPGDFKIKRAKLRGVESFGMICSAKELGLAESSDGILPLPADAPLGEDFRAWLALDDQCIEVDLTPDRG
>ZP_10495587 phenylalanyl-tRNA synthetase subunit beta [Alishewanella aestuarii B11].
MKFSESWLREWVNPALDSTALSEQLSMAGLEVDGMDKVAGDFHGVVVGEVVECGKHPEADKLQVTKVNIGGAELLDIVCGARNCRLGLKVAVATVGAVLPGNFEIKQAKLRGQPSHGMLCSFSELGMADDSDGIIELPADAPIGQDLRQYLALDDLSIEVDLTPND
>ZP_10115070 phenylalanyl-tRNA synthetase, beta subunit [Beggiatoa alba B18LD].
MKFSEQWLRTWVNPQMTTTELVDCLTMAGLEVDDVETVAPAFDNVVVGEVLTIERHPDAEKLKVCQVNTGTESPLTIVCGASNVQAG>ZP_10063574 phenylalanyl-tRNA synthetase subunit beta [gamma proteobacterium BDW918].
MKFSEQWLREWVNPAVGTDELAAQITMAGLEVDAIDPVAGVFSGVVVAEIVATAPHPDAEKLQVCRVNAGSEEVQIVCGAANARPGIKVPLATLGAVLPGDFKIKKAKLRGVESFGMLCAEEELGLAEKSDGLMELPLDAPVGEDIRVFLGLDDSIIELGLTPNRADC
>ZP_10350440 phenylalanyl-tRNA ligase subunit beta [Alishewanella agri BL06].
MKFSESWLREWVNPALDSTALSEQLSMAGLEVDGMDKVAGDFHGVVVGEVVECGKHPEADKLQVTKVNIGGAELLDIVCGARNCRL>>ZP_09228799 phenylalanyl-tRNA synthetase beta chain [Pseudoalteromonas sp. BSi20311].
MKFSEKWLREWVNPAIDTQALSEQLSMAGLEVDGVEPAAAKFNGVVVGEVIECGQHPDADKLRVTKINVGGDELLDIVCGAPNCRQGI>ZP_09240506 phenylalanyl-tRNA synthetase beta chain [Pseudoalteromonas sp. BSi20480].
MKFSEKWLREWVNPAIDTQALSEQLSMAGLEVDGVEPAAAKFNGVLVGEVVECGQHPDADKLRVTKINVGGDELLDIVCGAPNCREGI>ZP_09243405 phenylalanyl-tRNA synthetase beta chain [Pseudoalteromonas sp. BSi20495].
MKFSEKWLREWVNPAIDTQALSEQLSMAGLEVDGVEPAAAKFNGVVVGEVVECGQHPDADKLRVTKINVGGDELLDIVCGAANCRLGI
example output file
>EKF64793 phenylalanine--tRNA ligase, beta subunit [Serratia plymuthica A30].
MKFSELWLREWVNPAISSEALSDQITMAGLEVDGVEPIAGVFNGVVVGHVVECGQHPNADKLRVTKVNVGGDRLLDIVCGAPNCRTGL
>ZP_07379498 phenylalanyl-tRNA synthetase, beta subunit [Pantoea sp. aB].
MKFSELWLREWVNPALDSAALSEQITMAGLEVDGVEPVAGAFHGVVVGEVVECGQHPNADKLRVTKINVGGERLLDIVCGAPNCRQG>ZP_04615044 Phenylalanyl-tRNA synthetase beta chain [Yersinia ruckeri ATCC 29473].
MKFSELWLREWVNPAISSDELAHQITMAGLEVDGVEAVAGEFNGVVVGEVVECGQHPNADKLRVTKVNVGGERLLDIVCGAPNCRQG>ZP_10294785 phenylalanyl-tRNA ligase subunit beta [Pseudoalteromonas rubra ATCC 29570].
MKFSEKWLREWVNPAIDTEALSEQLSMAGLEVDGVDPVAGDFEGVVIGEVVECGQHPDADKLRVTKVNVGEDELLDIVCGAANCRTG
>ZP_09390203 phenylalanine--tRNA ligase, beta subunit [Yokenella regensburgei ATCC 43003].
MKFSELWLREWVNPAVDSEALSDQITMAGLEVDGVEPVAGEFHGVVVGEVVECGQHPNADKLRVTKINVGGERLL
>ZP_09375718 phenylalanine--tRNA ligase, beta subunit [Hafnia alvei ATCC 51873].
MKFSELWLREWVNPAISSEALSEQITMAGLEVDGVEPVAGEFNGVFVGEVVECGQHPNADKLRVTKVNVGGERLLD
>CBX80523 phenylalanyl-tRNA synthetase, beta subunit [Erwinia amylovora ATCC BAA-2158].
MKFSELWLREWVSPAIDSAALCEQITMAGLEVDGVDAVAGAFHGVVVGDVVECAQHPNADKLRVTKINVGGDRLLDI
>ZP_08825428 Phenylalanyl-tRNA synthetase beta chain [Thiorhodococcus drewsii AZ1].
MRFSEAWLREWVNPPVDTQQLADQLSMAGLEVDAVEPAASAFSGVFVGLVRAIAPHPDAAKLRICSVDVGQGDPLQIICGAANVAEGMRVPVATIGARLPGDFKIKRAKLRGVESFGMICSAKELGLAESSDGILPLPADAPLGEDFRAWLALDDQCIEVDLTPDRG
>ZP_10495587 phenylalanyl-tRNA synthetase subunit beta [Alishewanella aestuarii B11].
MKFSESWLREWVNPALDSTALSEQLSMAGLEVDGMDKVAGDFHGVVVGEVVECGKHPEADKLQVTKVNIGGAELLDIVCGARNCRLGLKVAVATVGAVLPGNFEIKQAKLRGQPSHGMLCSFSELGMADDSDGIIELPADAPIGQDLRQYLALDDLSIEVDLTPND
>ZP_10115070 phenylalanyl-tRNA synthetase, beta subunit [Beggiatoa alba B18LD].
MKFSEQWLRTWVNPQMTTTELVDCLTMAGLEVDDVETVAPAFDNVVVGEVLTIERHPDAEKLKVCQVNTGTESPLTIVCGASNVQAG>ZP_10063574 phenylalanyl-tRNA synthetase subunit beta [gamma proteobacterium BDW918].
MKFSEQWLREWVNPAVGTDELAAQITMAGLEVDAIDPVAGVFSGVVVAEIVATAPHPDAEKLQVCRVNAGSEEVQIVCGAANARPGIKVPLATLGAVLPGDFKIKKAKLRGVESFGMLCAEEELGLAEKSDGLMELPLDAPVGEDIRVFLGLDDSIIELGLTPNRADC
>ZP_09228799 phenylalanyl-tRNA synthetase beta chain [Pseudoalteromonas sp. BSi20311].
MKFSEKWLREWVNPAIDTQALSEQLSMAGLEVDGVEPAAAKFNGVVVGEVIECGQHPDADKLRVTKINVGGDELLDIVCGAPNCRQGI
http://whathaveyoutried.com ?