Entering edit mode
2.4 years ago
Bioinfo
▴
20
I have the intron seq of a complex and I would like to align and annotate it, is there any platform, workflow that you guys could share?
as an example the fastq file look like this
I need to get the exons and then be able to translate to protein, any idea I would appreciate
What does this mean? Link you provided is for a bacterial sequence so there should be no introns.
@GenoMax it is just an example, I have an intron and I am trying to align it to the human genome reference, the get the exons and then translate it to protein.
Are you simply looking to see which exons flank that sequence?
If you have a single sequence then using
blat
would likely be the fastest way to do this.Note: Remove the link from the original post since that does not have any connection with this post.
@GenoMax Yes but the seq is larger than what Blat is supporting. Here is what I am trying to do, finding the exons , translating to protein
If sequence is larger than an intron then I am going to assume that you are referring to PacBio or nanopore long read sequence in fastq format.
In that case your best option is likely to use an aligner like
minimap2
.Still a little unclear as to what part you want to translate to protein. Are you looking for mutations affecting coding sequences?
@GenoMax Here is what I am trying to do, I have an intron with over 200000n, I am trying to align it to the human genome, then check for genes that are located in there and then find which genes expressed where .
Can you tell us what your definition of an intron is and how you obtained that file?
what is the size of 'seq' ?
otherwise, use
lastz
...@Pierre Lindenbaum it is about 201750 while Blat only support 75000. I can use the https://blast.ncbi.nlm.nih.gov/ but the issue is that how can I extract the exons!!!! etc
You can use command line
blat
.@GenoMax do you have an example or some sort of workflow that I could use?
We are at least 10+ comments into this thread but we still don't know what kind of data you have.
Is it DNAseq or RNAseq? You said it is fastq but is it short or long read sequences? How is this data related to the said "intron" sequence? Is there more than one of these? What kind of genes are you looking for in the "intron" since by definition there should be none.