Hi guys,
so I decided to upload my transcriptomes (non model animals without reference genomes) to TSA but obviously I didn't do a good job with Trimmomatic for some reason so I have to do some trimming of my transcripts. For example out of 139,196 sequences, 1624 sequences has to be trimmed.
The NCBI report for all these contaminated sequences gives information about sequence ID, length, span of contamination and source of contamination.
> **Sequence name** length **span(s)** apparent source
> **TRINITY_DN21678_c0_g1_i1** 2529 **2497..2529** adaptor:multiple
> ****TRINITY_DN21678_c0_g1_i4** 1222 **1190..1222** adaptor:multiple
etc...
Most of the bases to be trimmed are at the beginning or end of the transcript.
I am not very proficient in coding so can someone help me out with the script and program that I have to use so my transcriptome will be screened for the Sequence name (column A) and then when that particular sequence is found in the next step bases from the designated span (column C) will be trimmed. So I guess my inputs could be assembly in fasta file and tsv table with transcript id and desired span to be trimmed.
I see this question was asked here, but it was more than 5 years ago + still not sure how to exactly do that.
Thank you and yes, I agree going step back is the best option but I am in a hurry at the moment so I just need to stick to the transcriptomes I already have.
i checked both of these programs (
bbduk
andfastp
) and it seems they use reads as inputs (fastq formats)... how can I trim my transcripts (fasta file) ?bbduk.sh
can use the transcripts as input. There is aadapters.fa
included in theresources
directory of the distribution. You can do something likeUsing
ktrim=rl
because you said that the adapters were at the beginning and end of reads. Otherwise you can individually doktrim=r
and thenktrim=l
.Thank you very much! I tried it out! This is a handy tool useful for many different applications, and in regards to my problem - although it "thinks" these are reads because I see it use this term in the reports, it works perfectly fine with transcripts too. From what I see it is trimming too much so I have to read a little bit more into the documentation and figure out how to give the program exact instructions on which transcript to trim on which site but this is a very good start!