Alignment of seq reads to a genome, process after STAR?
1
0
Entering edit mode
8.3 years ago
Biogeek ▴ 470

Hey guys,

Just a quick question and some advice. I've indexed my target organism's genome and I am now aligning my cleaned reads back to the genome with Star again. My reads were cleaned with Trimmomatic.

I've read that people regularly use cufflinks package for all in one analysis;however, I would be keen on using EdgeR. Once my reads have been aligned. Is there anyway I can use the SAM/ converted BAM files to calculate counts then feed them into R and EdgeR? Most of my experience has so far been in de novo assembly.

Are there any good tutorials I can visit online?

Thanks.

genome rnaseq reference genome • 3.9k views
ADD COMMENT
0
Entering edit mode

Yes you can. STAR now has the ability to generate counts during alignments or you could use featureCounts with the aligned sequence files to generate the count matrix.

ADD REPLY
0
Entering edit mode

Hi Genomax2,

This presumably does away with the need of using cufflinks software? I have multiple replicates per treatment and I read cuff-merge is good for this. Any obvious advantages to using cufflinks or straight up STAR?

Thanks

ADD REPLY
0
Entering edit mode

Yes. You would want to use DESeq2 or edgeR anyway. Sounds like you are all set with replicates etc. See the paper Devon linked below. Vignette for DESeq2 would be similarly useful.

ADD REPLY
2
Entering edit mode
8.3 years ago

This F1000 article has commands for generating counts (near the end, note that they use featureCounts from within R, though you can use it at the command line too) and using edgeR. That'll be a good tutorial to base your analysis on.

ADD COMMENT
0
Entering edit mode

Thanks for the article Devon, much appreciated. I've had a read and whilst appealing, I am going to try using STAR first with the new transcript counts feature in Version 2.4.2a. I'll then feed the BAM into RSEM and do my usual pipeline from there on in. I am determining if de novo is better than using the draft genome of the organism in terms of coverage. Perhaps I may venture into using Rsubread down the line. Thanks.

ADD REPLY
1
Entering edit mode

If you want to go that route you might appreciate that Salmon or Kallisto will get you similar results in a fraction of the time.

ADD REPLY
0
Entering edit mode

I would second salmon or kallisto in that case since both will run faster generating counts and tpm for each replicates and finally one can aggregate the results to generate the matrix. If I am not wrong the latest version of salmon already has trascript to gene summarisation if one is keen on gene count matrix else you will have transcript counts. Good luck!

ADD REPLY
0
Entering edit mode

Thanks guys. I've already completed the de novo analysis using RSEM and EdgeR, so I guess it would be most appropriate to stick with RSEM again and EdgeR, as to not go off a beaten track......The reason I'm doing such analysis as additional work to the de novo, is so that I can compare coverage of the genome in case I'm asked when defending my thesis why I didn't use the reference.

Bit of a generalized question. Have any of you attempted a hybrid assembly, or is that highly time consuming and requiring a lot of knowledge?

ADD REPLY

Login before adding your answer.

Traffic: 1954 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6