Hi everyone, I have a cds from several genes, and I would like to find the location of thoses genes on a genome assembly (start,stop, and exon positions). Does anybody have an idea how to do that? Thanks
Hi everyone, I have a cds from several genes, and I would like to find the location of thoses genes on a genome assembly (start,stop, and exon positions). Does anybody have an idea how to do that? Thanks
Go to UCSC table browser and choose
group: Genes and Gene Prediction
table: knownGene
click paste list in "identifiers (names/accessions):", and paste the name of the gene from step 1.
(because I have a lot of genes)
Hmmm.. Change a cds from a gene
in your question accordingly! I'll give you some hints to do it programatically. There are two parts to the solution
Getting the names / location of genes from CDS: Though web-BLAT allows more than 1 sequences at a time, there is still a limit on maximum num of seqs. Instead, you can do megablast (from blast suite). Use the option -m 8 or -m 9
(see manual for details) to get the results in tabular format. Megablast is used to blast highly similar sequences. You need to choose only the top hit for each sequence if there are more than one hits, as they are ordered according to best -> worst. From this tabular result, you can get the chromosomal location of each CDS.
Getting the name of gene and location of exons: you can paste multiple co-ordinates in query field (see step 3 of my answer and click define regions in the table browser). Alternatively, if you don't select any region at all, you can download ALL the gene-table. Then intersect this table with CDS-location table got from step1 using bedtools.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Did you try blast already?
Yes I have but it doesn't give me the positions of the exons.