Trimming single end reads for STAR?
2
2
Entering edit mode
6.7 years ago
caggtaagtat ★ 1.9k

Hi,

I just started to work with single end reads, which are already trimmed for adapter sequences and quality. Do I have to trimm the reads now to the same length of e.g. 100nt for mapping them with STAR? Is there a negative effect, if I don't?

RNA-Seq STAR trimming • 5.5k views
ADD COMMENT
5
Entering edit mode
6.7 years ago

If the qualities are ok and there are no adapters you can proceed with mapping. There is a recent paper about trimming of RNAseq data and its possible consequence on downstream analysis - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4766705/

ADD COMMENT
0
Entering edit mode

Thank you! I will proceed with the mapping than.

ADD REPLY
2
Entering edit mode
6.7 years ago
h.mon 35k

If they are already trimmed for adapters and quality, don't trim more. Trimming will make sequences shorter, and shorter sequences tend to map more to multiple locations.

What is the length range of your reads? I generally keep reads only within a certain range, and discard the shorter reads. For example, for a 100bp dataset, I keep reads from 70-100bp after trimming, and discard the rest.

ADD COMMENT
0
Entering edit mode

That makes sense! My reads are 40-155nt long.

Here is a plot of the percentage I would discard vs the possible minimal read length. Would a minimal length of 80nt be appropriate?

https://ibb.co/gLZ7q7

ADD REPLY
1
Entering edit mode

80 seems reasonable. What is the organism? Also, if you used trimmomatic for trimming then it has an option to remove trimmed reads shorter than given value.

ADD REPLY
0
Entering edit mode

Ok thank you. The reads were obtained from human cardiovascular endothelial cells. Thank you, I was going to use trimmomatic :)

ADD REPLY
0
Entering edit mode

50bp should be fine for counting applications for human genome. You may be throwing good data away by being too strict.

ADD REPLY
0
Entering edit mode

Ok, but since I do analysis of alternative splicing, I will stick with a minimal lenght of 75nts for now. I read somewhere in this forum, that reads schould not be shorter than 70nt for isoform analysis

ADD REPLY
0
Entering edit mode

That sounds reasonable. Curious why you did not choose to do paired-end sequencing to get spatial information in that case.

ADD REPLY
0
Entering edit mode

I was told that using single-end sequencing would be better for doing splicing analysis, althoug I can't remember why . Besides, I was not included in that desicion and would maybe also guess financial reasons ;)

ADD REPLY
0
Entering edit mode

Sufficient makes sense rather than better. The financial reason angle is always critical :-)

ADD REPLY

Login before adding your answer.

Traffic: 1875 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6