Entering edit mode
10.6 years ago
merajazizmeraj
▴
20
Hi, I am trying to make multiple exon sanger sequenced gene submission to NCBI and would like to get the exact genomic coordinates of my exons sequence and the CDS. Is there a tool out there that does that? I have thousands of samples, only a few rows are shown below. Thanks.
>SeqX [organism=Homo sapiens] [isolate=ABC] Stromal Antigen 2 (STAG2) gene, Exon3, Exon4, Exon5, Exon6, Exon7
TCCTTTCCGAATATTTTTGGTGCATTTGTAATAAATGTCATTTNTCTCCTTTTTAAAGGAATTGTCTTAGAAGAAAGAAGGCAAGCCACCATTTTACCCACGTAAATATATGAATATATTTCTGACATTGAGGTGTTCCAGAAGATGATAAAGAAATGATAGCAGCTCCAGAAATACCAACTGATTTTAATCTACTACAGTAAGTAAATTATATTCTGATAATTTTTAAATACTTGTTTATTCCACAAAATGGGGAATGCATTAACTTCAGTTAAATTTCCTTCTGCTCGAGAAGATCTAATATATAAAATAGCTTTTATGCTTTGCAAGAGTTTATATCA
>?unk100
GTTTTGGGGAACATCTTAATTACTTATAATGCTAATATGAAGTTTTGTAATGAGTTAACCAAGCCTTTCTTTTAGAAAATATGGCAAAAATTAGAAACTCAATATAAATTTCTAAGGAAGGGTTTTAATTCTTATCTTTCTGTCACAGGGAGTCAGAAACACATTTTTCTTCTGACACAGATTTTGAAGATATCGAAGGAAAAAACCAAAAGCAAGGCAAAGGCAAAGTATGTATCAAATATTTGACTTTATTTTGTTTCCTAAGATCTCACACACACACAGATTTAAGTTATGTCTCAGATAGTTTTATCTTTTAAAAATGGCTTTTTAAGGGGGTGGGAGCTGATTGGTATGGTA
>?unk100
AAGTGGATGGAATTCTTTAGGGCAAGTTTAAGCATGTTATGTACCCTATCAGCTACTTCTACTGTAGCTGTGTTTTGAACTCTCAAGGATAGTGATATAACTTAACCACCTCGTATTTTTTATGCAGACTTGTAAAAAAGGCAAAAAGGGCCCAGCAGAAAAGGGCAAAGGTGGAAATGGAGGAGGAAAACCTCCTTCTGGTCCAAACCGAATGAATGGTCATCACCAACAGAATGGAGTGGAAAACATGATGTTGTTTGAAGTTGTTAAAATGGGCAAGAGTGCTATGCAGGTAAGATTTATGTTGTTCTTCCCAGTTCATTTGTACATTTTAAACTTTAATGAGTTATATAGAGTGTAGCTCTG
>?unk100
AAGTGACTATTTGAGAGCTGCTGATTTCAAAATAAATATATCTTACCTTTACAGCCTGAACACTGAATAAAAAAGTTGATAAGGTCAAGAAGTGCTATATCTCGGTCATGCTTGTATGATTCTATCCAATCATCTACCACCGACTACAGCAGAGGGAAAAAAATAAAATCATTAGCTTCTTCTAATTTTCTCAAAATCAATTAAGTCTGATAAAGTCATAAAATTCAAGATTATATAGTATCACATTACTTTAATATAAATACTTATACACTGAAATTTAAAGTTCAATTTTAACAATAATAAAATAGAATCGAATTCAGTAAAACAATTATCTGATAACACAAAATGACCTATCAATCTTCTATTTATTTTGCATTGAAAAGAATGTG
>?unk100
TAAGTTATCAAAACACTTAAGGTAGTAAGTTACCTCATCGAATTCTTCAGTCATTTTTCGAATTATCTCAGAGTTCTGCATATGTCTAAACATTTCTGCTGTGACAACTCCTGAAATTTGCAAATGTCAGAAGTTAATATATGGTGTGATAAAAAAATAAAGAAAACTTCCAAGTAAGTCTCTAACACTAAGAAGTCTATGGTCACACAATAAAAGGCATACTTCTTCAACCATCATCTAATAATCTTTACCATGATACTCTAATCTATAAATAAAGCACAAACAAATGCTATCTATTCTCAGTATGCACAAGAAAACAGCCCCATACTTCTGACAGATATCTTTTTTCCTAACACAATTAACTTTGGCCATTTCT
@Alex Reynolds ..thanks. Since i have 3000 sequences can you please elaborate on the commandline syntax to connect to UCSC BLAT server and execute the BLAT part to generate the psl file.
You can build and install BLAT locally, so that you don't go through their web server. BLAT is part of the Jim Kent tools, and you'll need the
2bit
files for your assembly-of-interest. For hg19, at least, UCSC has a prebuilt 2bit file. At minimum, you'd then run something like:There are other options depending on how much stringency you need, or if you want to mask regions, etc.
To convert to BED:
Thanks... i have gotten as far as generating the .bed file from the psl. But i dont see how the bedmap or bedops can help annotate my exon sequences with exon number and start and end of the exon on the original gene (STAG2) and whether its a CDS and if it is a CDS what are the coordinates of that CDS in an exon.
Basically, you need a BED file containing exon and CDS information. Then you can do set operations on those annotations, i.e. map your results to exons or CDSs. I have an answer to another question (Locating SNP's to genes) which suggests how to get GENCODE annotations; perhaps that might help get you started with your analysis. Good luck!