My scenario:
I run featureCounts
in two way:
- Approach a: featureCounts -p -B -a Specie.transcript.fa.gtf -t exon -g gene_id -o A1.counts.txt -f A1.bbduk.bam
- Approach b: featureCounts -p -B -a Specie.gene_exons.gtf -t exon -g transcript_id -o A1.counts_transcript_id.txt -f A1.bbduk.bam
From 'a', I obtained a list of genes differentially expressed (GDE) by NOIseq
package - 1,714 genes. From 'b', I obtained a list of 3,067 exons DE.
I submitted that two lists to Blast2GO
program and got Blastx, Interpro and EC for almost all sequences in each lists.
I have downloaded the GeneSCF
and I will try it, too.
From here, I need help.
I would like to conduct a more refined analysis of this data. I tried to do a heatmap (pheatmap
package in R
), but, the huge amount of data shows up an unintelligible graphic. So, I did a subset of the data based on M value
(NOISeq
foldchange) >(+-)X (absolute value of X) and got a 84 DE exons/genes suitable for a plot. However, I see that plot and it like isn't a good idea doing a subset on data in this manner.
Have you ever been in a situation like this? Large amount of data? How did you do to extract the best biological information from them?
Any suggestion/advice is very welcome!
I'm a Debian user from Potato to now, but, I am not a programmer.
If this is a off topic, please, don't hesitate to tell me. I delete the post immediately.
Best