Dear all,
Given a start and end position in genomic coordinates (hg19), known to fall within coding regions of the same gene, how does one extract the coding sequence between those positions?
Some background info: I'm trying to use deep sequencing as a readout for a method in which we enrich a library of cells for those with a particular phenotype. Each cell in the library contains an unknown partial cDNA sequence (+/-500bp each, covering the entire transcriptome). After enrichment, the pool of cells is subjected to PCR to extract the partial cDNA sequences, and these are paired-end sequenced on a MiSeq. After QC and mapping to the human genome, I thus obtain BAM files of mapped 2x150bp paired-end reads.
I know need to know what the amino acid sequence is of all these partial cDNAs. I've played around with bedtools, and R packages like GRanges and GenomicFeatures, but I just cannot seem to find a solution as most tools extract the entire transcript or CDS sequence, not the partial one between specific positions. I can turn the BAM file to a GRanges object with the start and stop position of each pair as genomic coordinates, but that's it.
Any help would be greatly appreciated.
Thank you for your answer abascalfederico. It's more or less what I thought. I'll try to make it work using the exon coordinates and intersections with my data. Thanks!