Duplicate_SeqIds Blast database
1
0
Entering edit mode
8.3 years ago
sukesh1411 ▴ 30

Hi

I could not create a blast database for nucleotide i.e nt text file which has sequences in fasta format.

because Error: Duplicate seq_ids are found.

How can remove this dup seq_ids. Can anyone help me on this

blast • 4.4k views
ADD COMMENT
1
Entering edit mode
8.3 years ago
Prasad ★ 1.6k

you can go through the How To Remove The Same Sequences In The Fasta Files?

using cd-hit or uclust tools (with 100% identity and coverage cutoff) you can remove the duplicates.

ADD COMMENT
0
Entering edit mode

I am trying with uclust tools. The nt file which i downloaded from blast database is text file which has sequences in fasta format. This text format is not accepted by uclust tools. How can i convert text to fasta format.

ADD REPLY
0
Entering edit mode

can you post few lines as an example from you text file

ADD REPLY
0
Entering edit mode

gi|4|emb|X17276.1| Giant Panda satellite 1 DNA GATCCTCCCCAGGCCCCTACACCCAATGTGGAACCGGGGTCCCGAATGAAAATGCTGCTGTTCCCTGGAGGTGTTTTCCT GGACGCTCTGCTTTGTTACCAATGAGAAGGGCGCTGAATCCTCGAAAATCCTGACCCTTTTAATTCATGCTCCCTTACTC ACGAGAGATGATGATCGTTGATATTTCCCTGGACTGTGTGGGGTCTCAGAGACCACTATGGGGCACTCTCGTCAGGCTTC CGCGACCACGTTCCCTCATGTTTCCCTATTAACGAAGGGTGATGATAGTGCTAAGACGGTCCCTGTACGGTGTTGTTTCT GACAGACGTGTTTTGGGCCTTTTCGTTCCATTGCCGCCAGCAGTTTTGACAGGATTTCCCCAGGGAGCAAACTTTTCGAT GGAAACGGGTTTTGGCCGAATTGTCTTTCTCAGTGCTGTGTTCGTCGTGTTTCACTCACGGTACCAAAACACCTTGATTA TTGTTCCACCCTCCATAAGGCCGTCGTGACTTCAAGGGCTTTCCCCTCAAACTTTGTTTCTTGGTTCTACGGGCTG gi|7|emb|X51700.1| Bos taurus mRNA for bone Gla protein GTCCACGCAGCCGCTGACAGACACACCATGAGAACCCCCATGCTGCTCGCCCTGCTGGCCCTGGCCACACTCTGCCTCGC TGGCCGGGCAGATGCAAAGCCTGGTGATGCAGAGTCGGGCAAAGGCGCAGCCTTCGTGTCCAAGCAGGAGGGCAGCGAGG TGGTGAAGAGACTCAGGCGCTACCTGGACCACTGGCTGGGAGCCCCAGCCCCCTACCCAGATCCGCTGGAGCCCAAGAGG GAGGTGTGTGAGCTCAACCCTGACTGTGACGAGCTAGCTGACCACATCGGCTTCCAGGAAGCCTATCGGCGCTTCTACGG CCCAGTCTAGAGCTTGCAGCCCTGCCCACCTGGCTGGCAGCCCCCAGCTCTGGCTTCTCTCCAGGACCCCTCCCCTCCCC GTCATCCCCGCTGCTCTAGAATAAACTCCAGAAGAGG

ADD REPLY
0
Entering edit mode

Just > is missing from the header line. If your file is small you can just replace gi| with >gi|. If the file is huge use any code

perl -ne '{if ($_=~/^gi/){print ">",$_;}else{print}}'  input_file >out_file
ADD REPLY
0
Entering edit mode

the same question you have already posted here. It seems you have a fasta file. I assume you have got the answer for this question

ADD REPLY

Login before adding your answer.

Traffic: 2499 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6