Question

Protocol/Pipeline/Precausions for analysis 3'-end RNAseq data

0

Entering edit mode

8.2 years ago

Mike ★ 1.9k

Hi All,

I have 3'-end RNAseq data and going to analyse. Could you please recommand specific protocol/pipeline or any precausion/suggesion for analyse 3'-end RNAseq data.

Thanks a lot.

RNA-Seq sequencing • 2.2k views

ADD COMMENT • link 8.2 years ago by Mike ★ 1.9k

score 1 · Answer 1 · 2016-09-20

1

Entering edit mode

8.2 years ago

WouterDeCoster 47k

Although you are fairly unspecific about which method/kit was used to prepare the library, the main difference with 'normal' RNA-seq is probably that you don't need gene/transcript length normalization. A "standard" pipeline using (personal preferences) STAR, featureCounts and DESeq2/edgeR/limma-voom should be fine. You cannot use transcript-assembly-based pipelines (e.g. cufflinks, rsem).

ADD COMMENT • link 8.2 years ago by WouterDeCoster 47k

0

Entering edit mode

Thanks WouterDeCoster,

yes, after lots of serach I also agree that " STAR, featureCounts/htseq and DESeq2/edgeR/limma-voom should be fine", but still Im not sure about trimming poly-A tail.

ADD REPLY • link 8.2 years ago by Mike ★ 1.9k

0

Entering edit mode

I think you definitely have to trim the poly A tail, and for that I use prinseq.

ADD REPLY • link 8.2 years ago by WouterDeCoster 47k

0

Entering edit mode

Actually for trimming I used Cutadapt

cutadapt  -a "A{10}"  sample.fastq.gz    >sample_trimmed.fastq

But when I used this trmmed fastq in STAR , I got following error,

 STAR --genomeDir /databank/genomes/Homo_sapiens/path --readFilesIn  sample_trimmed.fastq  --runThreadN 8

EXITING because of FATAL ERROR in reads input: short read sequence line: 1
Read Name=@K00198:97:HFJ5FBBXX:8:1101:20537:12321
Read Sequence====
DEF_readNameLengthMax=50000
DEF_readSeqLengthMax=500

Sep 20 16:45:30 ...... FATAL ERROR, exiting

ADD REPLY • link 8.2 years ago by Mike ★ 1.9k

0

Entering edit mode

STAR is working fine for un-trimmed reads

ADD REPLY • link 8.2 years ago by Mike ★ 1.9k

0

Entering edit mode

there is nothing left of that read...

ADD REPLY • link 8.2 years ago by WouterDeCoster 47k

0

Entering edit mode

Thanks,

so I should define minimum length of reads and discard trimmed reads that are shorter than minimum length.

ADD REPLY • link 8.2 years ago by Mike ★ 1.9k

1

Entering edit mode

That would prevent this issue, probably.

ADD REPLY • link 8.2 years ago by WouterDeCoster 47k

0

Entering edit mode

Thanks, its works,

Sorry I have one more question regarding featureCounts output.

Acutally I want to compare HTseq and featureCounts read count results, but featureCounts output result is in different format, in HTseq there is only two column gene name and corresponding count, but in case of featureCounts output there are seven column file, so how to compare count with HTseq.

Thanks

ADD REPLY • link 8.2 years ago by Mike ★ 1.9k

0

Entering edit mode

Based on the manual, these are the columns: ‘Geneid’, ‘Chr’, ‘Start’, ‘End’, ‘Strand’, ‘Length’ and read counts for each gene in each library. Comparing that to htseq count should be a trivial analysis.

ADD REPLY • link 8.2 years ago by WouterDeCoster 47k

0

Entering edit mode

Thanks, why header of 7th colum is name of input bam file in featureCounts ??

Is it read count or something else??

Geneid  Chr Start   End Strand  Length  Aligned.out.bam

ADD REPLY • link 8.2 years ago by Mike ★ 1.9k

0

Entering edit mode

That's most likely as a surrogate for sample identifier and I assume that indeed contains read counts.

ADD REPLY • link 8.2 years ago by WouterDeCoster 47k

0

Entering edit mode

Thanks , highly appreciated for you help.

ADD REPLY • link 8.2 years ago by Mike ★ 1.9k

0

Entering edit mode

Is there any difference in read count normalization and DEA using DESeq2/edgeR in case of 3'-end seq data?

I am using DESeq2/edgeR for DEA and my codes:

How to compare between two samples CR & TR ? (each sample having 3 technical replicate)

It would be great help if you please check my code.

DESeq2 code:

#count matrix
raw_count <- read.delim("htseqCount.txt", header=TRUE, row.names=1)
condition <- factor(c("CR","CR","CR","TR", "TR", "TR"))

##if you have any data matrix count
dds <- DESeqDataSetFromMatrix(raw_count, DataFrame(condition), ~ condition)

#Pre-filtering
dds <- dds[ rowSums(counts(dds)) > 1, ]

#DESeq2 analysis
dds <- DESeq(dds)

And EdgeR code:

rawCountTable  <- read.delim("htseqCount.txt", header=TRUE, row.names=1)
sampleInfo <- read.table("design.csv", header=TRUE, sep=",", row.names=1)
 group=sampleInfo$condition

##Create a DGEList data object
dgeFull <- DGEList(rawCountTable, group=sampleInfo$condition)

y <- calcNormFactors(dgeFull)
design <- model.matrix(~group)

y <- estimateDisp(y,design)
##To perform likelihood ratio tests
fit <- glmFit(y,design)
lrt <- glmLRT(fit)
topTags(lrt)

Is this OK?

Thanks

ADD REPLY • link 8.2 years ago by Mike ★ 1.9k