hello
i am trying to retrieve some genes from ensemble database using the biomart tool http://www.ensembl.org/biomart/martview/a5ef51e4e51a2077dbe7e13e2a015f6a
i got the genes but i got them duplicated or sometime without sequence .
can someone tell me how to remove the duplicates and genes without sequence? or how to get unique sequence for each gene using the biomart tool?
thanks for your help.
Can you provide some additional detail about what you are doing to get to this point? Perhaps it is something that you are doing (perhaps incorrectly) that is causing this to happen. Genes are represented by multiple splice variants so they may appear as multiples. You should take that into account.
thanks for your kind reply. first i am trying to retrieve the 3'utr of genes involved into rig-i-like pathway in zebrafish.
can you help me through biomart tool to get unique sequence such for each gene id just one sequence ? thanks for your help.
Have you tried to use the gene names (ddx58, ifih1, mavs, dhx58 etc derived from your list) and then selecting "input external reference ID list" as the filter)? In attributes --> sequence --> Select one of the gene options to get the full length representation.
thanks but i got the same results duplicate of sequence or gene name without sequence as shown in the pic
i did as you said
On the final page where you download the results there is a check box that says
unique results
. Check that and then download the result file.i already did it before but it gave me same problem.
If the sequences are identical you could use
dedupe.sh
from BBMap to remove them. It looks like the gene names are identical (hard to tell from the pictures) but there are some numbers after the name that look different so they must represent different entries. You could try leaving just gene names in your output and unchecking all other identifiers.thanks for your help. i will do as you suggested