mRNA sequence is submitted to the nucleotide database of NCBI. But can we determine from which sequence of the chromosome or gene the mRNA is coming. Because in some cases when we are trying to align the genomic and mRNA sequence,they are aligning completely but sometimes the reverse complement of the mRNA is aligning completely. So how to determine or describe this strand issue?
The submitted sequence can be in sense stand or its complement for mRNA sequences. Typically this depends on if the submitter has submitted the raw sequence obtained by sequencing a cDNA library, in which case the submission will be a mixture of sense and complementary strands, or if they have done some post processing to determine the appropriate orientation for each sequence. In many cases this is as simple as looking for poly-A tails or poly-T headers and orientating appropriately, alternatively examination of translation products and comparison with known homologs or predicted genes can help deduce the orientation.
The mRNA entries representing EST sequences have the same issues. It may be useful to perform clustering on larger EST submissions in order to derive more complete transcripts from the set of ESTs, rather then dealing with them directly.
The mRNA sequences don't normally contain the strand information. That information will typically be held within an annotation file for the organism (presuming that a given mRNA is in said annotation, of course).
Ok.That's fine.But what if we want to do the target prediction for the miRNA? The mRNA is transcribed from the antisense strand of the DNA. That means the the sequences of the mRNA and the sense strand should be same. But in some cases the mRNA sequence and the chromosomal sequence given in the database are not the same (because we have assumed that the chromosomal sequence given in the nucleotide database is of the sense strand ). So what should be the convention considered in that case?
It sounds a bit like you're confusing sense/anti-sense with +/- strands. It should normally be the case that the sense sequence is deposited for an mRNA, since that's its actual sequence. When you query a chromosomal sequence, you'll get the + strand (unless you specifically request otherwise and however you're querying the sequence supports that). So if the gene is on the - strand, then the sense sequence would be the reverse complement as well.
BTW, I'm not sure I get the point behind getting genomic sequence for miRNA target prediction. You already have the transcript sequences, so just directly use them.
Actually the problem arised when we were taking the mRNA sequences for the study of impact of SNPs in the miRNA target sites. To study that interaction we took the sequences present in the dbSNP database. But the database is itself confusing.So we got confused in taking the sequences.i.e we could not decide which sequence should we take,the given sequence or the reverse complement of that sequence? then we got messed up with that foreward/reverse and +/- strand. Which is still confusing us.
Ok.That's fine.But what if we want to do the target prediction for the miRNA? The mRNA is transcribed from the antisense strand of the DNA. That means the the sequences of the mRNA and the sense strand should be same. But in some cases the mRNA sequence and the chromosomal sequence given in the database are not the same (because we have assumed that the chromosomal sequence given in the nucleotide database is of the sense strand ). So what should be the convention considered in that case?
It sounds a bit like you're confusing sense/anti-sense with +/- strands. It should normally be the case that the sense sequence is deposited for an mRNA, since that's its actual sequence. When you query a chromosomal sequence, you'll get the + strand (unless you specifically request otherwise and however you're querying the sequence supports that). So if the gene is on the - strand, then the sense sequence would be the reverse complement as well.
BTW, I'm not sure I get the point behind getting genomic sequence for miRNA target prediction. You already have the transcript sequences, so just directly use them.
Actually the problem arised when we were taking the mRNA sequences for the study of impact of SNPs in the miRNA target sites. To study that interaction we took the sequences present in the dbSNP database. But the database is itself confusing.So we got confused in taking the sequences.i.e we could not decide which sequence should we take,the given sequence or the reverse complement of that sequence? then we got messed up with that foreward/reverse and +/- strand. Which is still confusing us.
Yeah, that can get confusing. Have a look at the answer from Hamish, which I think will help you out.