It is a naive question, but I am still new :)
How can I check if there is remaining short adapters in the reads if FastQC adapter content shows no adapters?
When I search the original reads for "AGATCGGAAGAGC" , I found plenty. But when I search the reads for this sequence after trimgalore/cutadapt, there is nothing.
If there is small subsequence of this sequence (presumably remaining of the adapters), how can I guarantee they are not a part of native sequence. So, how can I say, there is a remainings of the adapters?
Where are the plenty "AGATCGGAAGAGC" in the read position? Are they in the read tails?
Please be noted that normal sequences can also have lots of AGATCGGAAGAGC, since this is too short to be unique for adapters.
But if you found they are all near the read tails, that might be adapters, and you can try to remove them using fastp: https://github.com/OpenGene/fastp , which can remove adapters without giving the adapter sequences.
I am not sure what is going to convince your supervisor to allow you to go to the next step in the analysis. If you are working with Illumina data then you do not have Illumina adapter sequences (ref: your previous post Short Insert size and aggressive trimming ). If you have used bbduk.sh from BBMap to scan/trim your data then be assured that there are no adapters left (post the scan stats).
It is time for you to take the initiative and move on to the next step in the analysis.
Thanks genomax so much. I used trimgalore and cutadapt. The fastqc figure in the mentioned post is from trimgalore/cutadapt.
We already continued with the pipeline, using tophat and cuffdiff then repeated using salmon and edgeR. We have 70% ish alignment from tophat and 70% ish mapping from salmon.
The expressions from cuffdiff and edgeR are consistent. But they seem weird to my professor. So he thinks this is because low alignment/mapping which could be due to short insert size causing adapters to go through reads. I verified the expressions matches the counts in edgeR and FPKM in cuffdiff. But I don't know how to find short adapters if I have a clean Fastqc adapter content and I searched the sequences. And what makes the results wrong?
The expressions from cuffdiff and edgeR are consistent. But they seem
weird to my professor. So he thinks this is because low
alignment/mapping which could be due to short insert size causing
adapters to go through reads.
At first glance it appears that you have done all the right things. You need to keep in mind that most aligners will soft-clip sequences that do not align (and that will include adapters). I don't understand the statement above. If you have short inserts then the adapters on the other end should get trimmed/removed (as long as you have done the trimming right).
Sounds like the result does not "fit" the hypothesis/expected outcome. As long as your experiment was done right and the analysis is correct not much else you can do.
Thanks Chen, will give it a try !