Question

Advice on downstream analisys: data from RNA-Seq

0

Entering edit mode

4.6 years ago

marcelolaia ▴ 10

My scenario:

I run featureCounts in two way:

Approach a: featureCounts -p -B -a Specie.transcript.fa.gtf -t exon -g gene_id -o A1.counts.txt -f A1.bbduk.bam
Approach b: featureCounts -p -B -a Specie.gene_exons.gtf -t exon -g transcript_id -o A1.counts_transcript_id.txt -f A1.bbduk.bam

From 'a', I obtained a list of genes differentially expressed (GDE) by NOIseq package - 1,714 genes. From 'b', I obtained a list of 3,067 exons DE.

I submitted that two lists to Blast2GO program and got Blastx, Interpro and EC for almost all sequences in each lists.

I have downloaded the GeneSCF and I will try it, too.

From here, I need help.

I would like to conduct a more refined analysis of this data. I tried to do a heatmap (pheatmap package in R), but, the huge amount of data shows up an unintelligible graphic. So, I did a subset of the data based on M value (NOISeq foldchange) >(+-)X (absolute value of X) and got a 84 DE exons/genes suitable for a plot. However, I see that plot and it like isn't a good idea doing a subset on data in this manner.

Have you ever been in a situation like this? Large amount of data? How did you do to extract the best biological information from them?

Any suggestion/advice is very welcome!

I'm a Debian user from Potato to now, but, I am not a programmer.

If this is a off topic, please, don't hesitate to tell me. I delete the post immediately.

Best

differentially-expressed-genes RNA-Seq • 810 views

ADD COMMENT • link updated 18 months ago by Ram 44k • written 4.6 years ago by marcelolaia ▴ 10