Hello! I have a question for you.
I have a fasta file of transposons, with name and sequences. I would to quantify the expression of transposons in some different transcriptomes. What kind of analysis do you suggest to me? What software could I use? Thanks a lot
I have used SalmonTE in the past and had good experiences. It uses salmon the background but then aggregates the counts per element, family and class. The results tables are also ready to use with DESeq2
Is it possible to use your own reference index from a fasta file with transposable elements generated by repeatscout instead of the ones present in the database of salmonTE?
How long are these sequences on average (ok 500-1000bp), and are they polyadenylated? There are two things to consider:
First, if they are not polyA they will be missed in most RNA-seq samples as most are polyA-enriched.
Second, they must be at least in the range of 200bp or longer as shorter sequences typically get exluded in the library preparation except it is shortRNA sequencing. Transposons are not my field so be sure that it is common to detect them in RNA-seq as there are some RNA species that are rapidly degraded and might require special library prep techniques to preserve them, which might not be the case in most standard RNA-seq samples.
From the technical side, check first if these sequences are already present in the respective reference transcriptome. If so, use a tool such as salmon to quantify your data against it. If not include the sequences (without polyA tails) into that reference and then use salmon. Alternatively align data against a reference genome with tools such as star or hisat2 and then make sure you have a annotation file (GTF) where you included the coordinates of these sequences. Tools such as featureCounts can then assign the aligned reads to the features in the GTF. This is all pretty much standard so please first get a background in RNA-seq and the related analysis techniques.
Thank you a lot. I've yet done a kind of analysis. I produced a gff file with coordinates of transposons in the genome using repeatmasker, then I used featurecounts to assign the aligned reads as you suggested. My doubt was if there was a tool that can find the expression using fasta file without alignments and you answered to me. Thank you
Hi Dominigues! Thank you a lot!
Is it possible to use your own reference index from a fasta file with transposable elements generated by repeatscout instead of the ones present in the database of salmonTE?
No idea. I suggest asking the developers in github. They have been quite responsive whenever I had similar questions.