1: does the STAR aligner perform read clipping from 3' and 5' end while trying to re-align unmapped reads? If yes, are there any parameters to influence the clipping?
2: i am not interested in reads that are spliced...how to do that? (other than filtering the SAM output)
If you're not interested in spliced reads, then why use an aligner designed for them (OK, STAR is crazy fast, so I guess that's reason enough)?
Regarding clipping, there are a couple ways of understanding what you wrote. Firstly, you might mean "soft clipping", that is, the S CIGAR operator. STAR will do that by default. You might instead mean simply trimming off a few bases from either end of each read. STAR can do that, see the clip3pNbases, clip5pNbases, clip3pAdapterSeq, clip3pAdapterMMp, and clip3pAfterAdapterNbases options for how to go about doing that (as you can see, STAR can trim adapters as well, though I can't say I've ever used that feature).
For looking only at non-spliced reads, the easiest thing to do would be to just filter the output SAM file (just remove reads with an N in the CIGAR string). Alternatively, setting both alignIntronMin and alignIntronMax to 1 would probably produce similar results.
yes speed was the main idea:) do you know how much soft clipping STAR is performing on both ends? i am wondering if STAR (or other aligners) work in the way that they try to chop-off nucleotide per nucleotide from 5' and 3' until the read maps (if it maps); is this described somewhere? thanks
It will soft-clip in a way that maximizes the alignment score. Basically, each alignment is given a score according to how many bases match/mismatch/etc. the reference. Soft-clipping will decrease the number of mismatches, but if you decrease the mismatch penalty a bit then you might end up with less soft-clipping. This isn't really STAR-specific, but rather how local alignment works (look up "Smith-Waterman algorithm", which is probably what STAR uses (I'd have to reread the paper)).
yes speed was the main idea:) do you know how much soft clipping STAR is performing on both ends? i am wondering if STAR (or other aligners) work in the way that they try to chop-off nucleotide per nucleotide from 5' and 3' until the read maps (if it maps); is this described somewhere? thanks
It will soft-clip in a way that maximizes the alignment score. Basically, each alignment is given a score according to how many bases match/mismatch/etc. the reference. Soft-clipping will decrease the number of mismatches, but if you decrease the mismatch penalty a bit then you might end up with less soft-clipping. This isn't really STAR-specific, but rather how local alignment works (look up "Smith-Waterman algorithm", which is probably what STAR uses (I'd have to reread the paper)).
ok thanks, will check it out