Hi all,
I am having transcriptome data of c.elegans i would like to analyse the differential expression of siRNA between wild and mutant type. Is it possible to identify the siRNA using this data?
I have followed the below protocol, Is it correct?
After preprocessing the fastq file i have performed the alignment using HISAT2 with c.elegans mRNA and ncRNA (ncRNA-does not include the siRNA)
I have filtered the unmapped reads and i converted the bam file to fastq file. (because siRNA sequences have not been annotated)
I have filtered the fastq file using following criteria, ( I am looking into G-siRNA with 22 nucleotide length) Reads should start with G Read length 22
I have used salmon/sailfish for quantification using annotated GTF
DESEQ2 for differential analysis
Using this procedure we obtained 25 genes are differentially expressed. but i am not sure about whether these reads belongs to siRNA or it is a piece of coding region.
Please help me to identify the siRNA from transcriptome data.
Looking forward for the reply.
Thanks with Regards, Akila Ranjith, Research Scholar, Department of Biotechnology, Indian Institute of Technology - Madras.
Based on the (absent) information on the data, it's impossible to judge. Was the sequencing done in such a way that you included also fragments of the size of the siRNA or siRNA precursors? Was it Illumina, PacBio, which technology? Which read length?
You should seriously think to filter reads by quality as well.
Why did you use an alignment-free method such as sailfish when you have the edge of working on one of the organisms with the best characterized genomes (C. elegans)?