Hello!
I know this have been asked in many ways before but I've been struggling a while now so it's time to ask.
I'm trying to use small RNA seq data from: https://bmcplantbiol.biomedcentral.com/articles/10.1186/1471-2229-14-142
These sequences are ~34nt length so they have some kind of adaptor with no doubt.
They use ‘vector strip’ in the EMBOSS package, but I cannot find the suitable vector file.
I've tried with trimmomatic but I still get the same read length
java -jar trimmomatic-0.38.jar SE -phred33 /home/juan/Desktop/juan/bio/mrcv/data/sun/SRR1195024.fastq.gz /home/juan/Desktop/juan/bio/mrcv/data/sun/SRR1195024.trimmed.fastq.gz ILLUMINACLIP:adapters/TruSeq-Small-RNA.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:18
I've tried with cutadapt but I still get the same read length
cutadapt -a TGGAATTCTCGGGTGCCAAGG -o SRR1195024.trimmed.fastq.gz SRR1195024.fastq.gz
I've tried with trim galore but I still get the same read length
trim_galore --small_rna SRR1195025.fastq.gz .fastq.gz -o SRR1195025.trimm_gal.fastq.gz
Total reads processed: 14,011,412 Reads with adapters:
8,639,554 (61.7%) Reads written (passing filters): 14,011,412 (100.0%)
Trim galore seems to be doing it's work (61% of sequences with adapter) but then I open fastqc and see that the sequences are not the expected lenght, they're all 34nt.-
I expect to sea a peak in 21 / 24 nt., but it is flat as earth. Any ideas what am I doing wrong?
Convert a subset of the data to fasta and see if you can align the reads on the 3'-end to identify an adapter sequence. Did you check the methods section to see if they describe a kit/method used.
Yes, I see nothing with reformat. They do not specify adapters
The adapter would likely not be in the same exact location (if it is indeed on 3'-end) so you may or may not see it right away, without actually trying to align the sequences.
I will leave this for you to consider:
There are two papers linked which seem to have sequences etc in their supplementary materials. Have you looked at those?
I'm checking this MAS with your feedback. https://mafft.cbrc.jp/alignment/server/spool/_ho.190611011723805E0SZhm924bXDkjdeAHfqVlsfnormal.html
what papers are those? in Electronic supplementary material?
When running
cutadapt
are you confident that you're using the correct adapter sequence? Running these adapter trimming software with no cuts happening makes me think that you're using an incorrect sequence. Do they specify the sequence in the manuscript? Doesfastqc
specify an overrepresented sequence?I see tons of over represented sequences, but I do not get hits with adapters anywhere
Do a multiple sequence alignment of the last ~15 nucleotides of some hundred of reads and you should be able to identify the sequence of your adapter
Done, tried it, still getting that weird distribution of reads length where almost all are 34nt.
well it does seem like 60% of reads were trimmed, right?
yes! that part looks good. The problem now is that I still see a huge and only peak in 34nt. I'm expecting to see 21 and 24 peaks (and some more).
Reads which actually have the adapters should be data you are interested in. That looks to be a healthy (relatively) % above. Separate those reads and then do fastqc on them.
I should only keep thos 61.7% of reads and then quality trim them? Is there a way to keep only those with trimmomatic? what I'm seeing is that it keeps all the reads