Hey all, I'm new to this :)
I have about 8 SRR reads of some organism.
It will be great if you could help me determining the best course of action for me.
I took 2 reads as an example, cleaned them with Trim Galore, then ran STAR for indexing and aligning the reads to a reference genome. Then used samtools for sorting and more indexing. After that I used stringTie on the sorted bam file for the assembly and got a GTF file.
My question is what is the best way to approach this in regards to multiple reads? If I have 8 reads (or more), should I use STAR on each pair and then merge the bam's with samtools merge
and use stringTie once or should I create an GTF file for each pair and merge them with stringTie --merge
?
Also, is there anything i should do or add between the steps I mentioned?
The goal is to to create a gene annotation pipeline.
If you have any information regarding this I'll be happy to hear it!
Hey Mike, I followed the same pipeline as you. Trimgalore to trim and filter, STAR to map. How did you implement StringTie on the Aligned.sortedByCoord.out.bam? I'm looking for some exact code...
Best,
Todd
Hi Todd, first I sorted using samtools:
Then:
if you don't have a reference file-
if you have a reference file-
then I merged all outputs with-