Entering edit mode
6.4 years ago
joangibert14
•
0
Hi!
I have some coordinates bellow a given coverage from DNA seq experiments. Something like :
chr2 212578373 212578415
I would like to obtain the genomic sequence, the protein position with the exon information (although I think I solved this: https://doi.org/doi:10.18129/B9.bioc.ensembldb ) and the protein sequence.
Any ideas how to do it? Thanks! Joan
Hello Joan,
could you please describe to what data you have access? Do you have the reference fasta? Do you have an annotation file? Do you already know which gene and/or transcripts this regions overlap?
Depending on your answer there are several solutions.
fin swimmer
Hello Fin,
I have the reference fasta (GRCh37) and the annotation file (in this case refGene). Also, I know which gene corresponds the region but this could change between samples so I don't know if this should be strictly necessary.
Thanks for your help :) J
Hello Joan,
obtaining the genomic sequence is the easiest part. This can be done with bedtools:
For the other things you asked for, it would be useful if you can provide an example of the desired output and how your annotation file looks like. As the protein sequence and exon informations depends on the transcript you might get multiple outputs.
fin swimmer