I want to index the genome assembly "GRCh38.p14" before aligning to my reads. however one parameter that STAR needs is the overhang length --sjdbOverhang ReadLength-1
I only have the chromosome assembly and the gtf file, how should I find out what is the length of reads for this assembly?
Thank you I already have read the manual about the length. It is not my data, I am using the published human genome release 44. What I understood from the paper, is that they have used different combination of sequencing technologies (short read and long read) to generate the chromosome assembly. I would appreciate if anyone with the knowledge on this could help.
I am not sure then what your exact question is.
STAR
is an aligner and can't do any assemblies. The parameter you are asking about in original post is only relevant for creating an index from an existing reference genome file. Whether short or long read data was used for generating that assembly is not important for the purpose of creating the aligner index.maybe I didn't really understand what is this "overhang length for constructing the splice junctions" parameter referring to, I thought it means the length of the raw reads. but anyway, I used the 100 and it apparently worked!