Hello, I have an annotation file which I'm trying to replicate.
So far, the results are horrible:
I'm intentionally not using the reference annotation I have in order to get there "without help".
I'm building the index using STAR, clean my fastqs and then aligning with:
STARlong --runThreadN 4 --genomeDir {args.dest} --outSAMtype BAM SortedByCoordinate --readFilesCommand zcat --readFilesIn {fastqs} --outFileNamePrefix {args.org} --sjdbOverhang {max_len-1} --twopassMode Basic --outSAMattributes All
I then sort with samtools and then with stringTie:
stringtie -p 4 {sorted_bam} -o stringOutput{itr}.gtf
Finally I merge all gtf's with stringtie --merge
I compare the final merged gtf file to the annotation reference I have and the results are in the image above.
I don't have the SRA data used to make the reference annotation, so I tried downloading 2 different projects from NCBI but they both led me to these poor results (more or less)
- What am I doing wrong? is the main problem I'm facing is not having the SRA inputs used to make the annotation I'm trying to replicate?
- I'm planning to add/change the current parameters, but I think it won't have a significant effect, am I correct?
- I'll be happy to receive any information you can share with me on this subject, the main goal is to make a gene prediction & annotation pipeline.
Thanks a lot!
Hi Lior, thank you for the informative and much helpful comment! I can't seem to find anything useful regarding "Reference gene lift-over", could you provide more context or explain what you meant by that a bit more? Thanks again.