im mapped 2 types of rna seq data and after i did htseq COunt, and differentiate it with limmam i saw per3 and per4 logFC and p valueAdj exactly the same . i tried it with different type of rna seq data and get the same answer. my question is, is it normal or i am making some mistakes ?
in human , i used STAR, FEATURE counts, i tried with Single and paired end data. , annotatation was GENECODE V29. .i dont know what is masking pseudogenes.
JC means that you can, for example, exclude the pseudogenes from the featureCounts stage.
If you have used the 'comprehensive' GENCODE, then you will have ~199,000 transcripts and isoforms, the majority of which are non-coding. ~50,000 relate to pseudogenes.
You may also want to explore multi-mapping parameters to both STAR (outFilterMultimapNmax) and featureCounts (-M)
2 types are including circadian rhythm in fibroblast
i am using galaxy because my department dosent have the super computer to work with
i first get the raw data and then perform fastqc and Trimmomatic
i mean the expresion rate of PER3(parent gene) and PER4(per3s pseudogene) . at the end what matters are logfc(fold change) and pvalueAdj and FDR(false discovery rate)
If these 2 genes exhibit ~100% sequence similarity, then it may be impossible to faithfully distinguish them via short read NGS technology. Is PER4 a processed (contains only spliced portion of PER3) or unprocessed (contains the genomic sequence of PER3) pseudogene?
it dose not has any intron, so i think it is processed pseudogene. after i perform limma, i got ~200000 gene fold changes.but in near than 20000 of them i saw a same numbers. like per3 and per4,. also the Abcc6 and Abcc6P1 were like the per3-4 situation. but in some genes and pseudogenes the numbers was different. and my question is right here,,is it possible the limma couldnt identify the reads or it is ok and these numbers are same, just because of their similar sequence. and i must say ,the numbers are different in different time points
these are some of those transcripts. as you see , they all have the same numbers,
In that case, the result makes sense. When you include genes/transcripts that are just zero, they will be converted to some constant value (like in your data) as a result of the normalisation / transformation. So, you should consider removing these 'zero' genes before you perform the normalsiation step.
What species? How do you map? What aligner did you use? Which annotation? Did you mask pseudogenes? Why don't you format your text?
in human , i used STAR, FEATURE counts, i tried with Single and paired end data. , annotatation was GENECODE V29. .i dont know what is masking pseudogenes.
JC means that you can, for example, exclude the pseudogenes from the featureCounts stage.
If you have used the 'comprehensive' GENCODE, then you will have ~199,000 transcripts and isoforms, the majority of which are non-coding. ~50,000 relate to pseudogenes.
You may also want to explore multi-mapping parameters to both STAR (
outFilterMultimapNmax
) and featureCounts (-M
)im working with galaxy and i dont know they have this option on their tools or not. So in general you think these data are trust able or .. ?