Hi, Is it possible to automatize submition of a large number of transcriptomes to TSA NCBI database without having to create a new TSA submission for each transcriptome? I have many tens of transcriptomes and submitting them one-by-one would be quite laborious. Also, is there a way how to simulate the TSA's check for the matches to UniVec vector database? With large number of transcriptomes to be submitted, uploading transcriptomes one-by-one only to retrieve the errors related to presumed adapter contamination and having to re-upload cleaned transcriptomes is very cumbersome.
I'd de thankful for any advice or hints!
Using fastp you can detect the presence of adapters
Thanks, that looks handy. But I was actually looking for a method which would mimic as closely as possible the adapter detection on NCBI web - because I actually did adapter trimming as a part of the transcriptome assembly but NCBI is still reporting few adapters here and there: in total 137 supposedly contaminated contigs in 37 out of 55 transcriptomes I was uploading so the prevalence is extremely low but it still means that I need to reupload and let the webpage reanalyze 37 transcriptomes (which takes long time and the process aparently sometimes crashes).
In the end, I uploaded everything to TSA, let it analyze the transcriptomes, used their adapter contamination report to clean my transcriptomes and now I'm reuploading the cleaned transcriptomes and waiting for them being processed.
Did you contact NCBI? Maybe the tool they use for detecting the adaptors is in their GitHub?