Protocol/Pipeline/Precausions for analysis 3'-end RNAseq data
1
0
Entering edit mode
8.2 years ago
Mike ★ 1.9k

Hi All,

I have 3'-end RNAseq data and going to analyse. Could you please recommand specific protocol/pipeline or any precausion/suggesion for analyse 3'-end RNAseq data.

Thanks a lot.

RNA-Seq sequencing • 2.2k views
ADD COMMENT
1
Entering edit mode
8.2 years ago

Although you are fairly unspecific about which method/kit was used to prepare the library, the main difference with 'normal' RNA-seq is probably that you don't need gene/transcript length normalization. A "standard" pipeline using (personal preferences) STAR, featureCounts and DESeq2/edgeR/limma-voom should be fine. You cannot use transcript-assembly-based pipelines (e.g. cufflinks, rsem).

ADD COMMENT
0
Entering edit mode

Thanks WouterDeCoster,

yes, after lots of serach I also agree that " STAR, featureCounts/htseq and DESeq2/edgeR/limma-voom should be fine", but still Im not sure about trimming poly-A tail.

ADD REPLY
0
Entering edit mode

I think you definitely have to trim the poly A tail, and for that I use prinseq.

ADD REPLY
0
Entering edit mode

Actually for trimming I used Cutadapt

cutadapt  -a "A{10}"  sample.fastq.gz    >sample_trimmed.fastq

But when I used this trmmed fastq in STAR , I got following error,

 STAR --genomeDir /databank/genomes/Homo_sapiens/path --readFilesIn  sample_trimmed.fastq  --runThreadN 8

EXITING because of FATAL ERROR in reads input: short read sequence line: 1
Read Name=@K00198:97:HFJ5FBBXX:8:1101:20537:12321
Read Sequence====
DEF_readNameLengthMax=50000
DEF_readSeqLengthMax=500

Sep 20 16:45:30 ...... FATAL ERROR, exiting
ADD REPLY
0
Entering edit mode

STAR is working fine for un-trimmed reads

ADD REPLY
0
Entering edit mode

there is nothing left of that read...

ADD REPLY
0
Entering edit mode

Thanks,

so I should define minimum length of reads and discard trimmed reads that are shorter than minimum length.

ADD REPLY
1
Entering edit mode

That would prevent this issue, probably.

ADD REPLY
0
Entering edit mode

Thanks, its works,

Sorry I have one more question regarding featureCounts output.

Acutally I want to compare HTseq and featureCounts read count results, but featureCounts output result is in different format, in HTseq there is only two column gene name and corresponding count, but in case of featureCounts output there are seven column file, so how to compare count with HTseq.

Thanks

ADD REPLY
0
Entering edit mode

Based on the manual, these are the columns: ‘Geneid’, ‘Chr’, ‘Start’, ‘End’, ‘Strand’, ‘Length’ and read counts for each gene in each library. Comparing that to htseq count should be a trivial analysis.

ADD REPLY
0
Entering edit mode

Thanks, why header of 7th colum is name of input bam file in featureCounts ??

Is it read count or something else??

Geneid  Chr Start   End Strand  Length  Aligned.out.bam
ADD REPLY
0
Entering edit mode

That's most likely as a surrogate for sample identifier and I assume that indeed contains read counts.

ADD REPLY
0
Entering edit mode

Thanks , highly appreciated for you help.

ADD REPLY
0
Entering edit mode

Is there any difference in read count normalization and DEA using DESeq2/edgeR in case of 3'-end seq data?

I am using DESeq2/edgeR for DEA and my codes:

How to compare between two samples CR & TR ? (each sample having 3 technical replicate)

It would be great help if you please check my code.

DESeq2 code:

#count matrix
raw_count <- read.delim("htseqCount.txt", header=TRUE, row.names=1)
condition <- factor(c("CR","CR","CR","TR", "TR", "TR"))

##if you have any data matrix count
dds <- DESeqDataSetFromMatrix(raw_count, DataFrame(condition), ~ condition)

#Pre-filtering
dds <- dds[ rowSums(counts(dds)) > 1, ]

#DESeq2 analysis
dds <- DESeq(dds)

And EdgeR code:

rawCountTable  <- read.delim("htseqCount.txt", header=TRUE, row.names=1)
sampleInfo <- read.table("design.csv", header=TRUE, sep=",", row.names=1)
 group=sampleInfo$condition

##Create a DGEList data object
dgeFull <- DGEList(rawCountTable, group=sampleInfo$condition)

y <- calcNormFactors(dgeFull)
design <- model.matrix(~group)

y <- estimateDisp(y,design)
##To perform likelihood ratio tests
fit <- glmFit(y,design)
lrt <- glmLRT(fit)
topTags(lrt)

Is this OK?

Thanks

ADD REPLY

Login before adding your answer.

Traffic: 1880 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6