Hi all
I am enrolled in an undergraduate computational biology course and I am struggling with the task of locating publicly accessible data for a course project. Ideally, from human patients / participants with control (unaffected) and experimental (affected) tissue sequences. I have been exploring The Cancer Genome Atlas, but I find the interface confusing in that I do not know where, or indeed if, sequence data in a format I understand, e.g., FASTA, can be found. I am required to determine the secondary RNA structures, e.g., alpha pleated sheets, beta barrels, etc. and PSIPRED requires that all submissions be amino acids. I am familiar with both Python and R so I could quickly learn how to use a specific package if doing so would facilitate the completion of a noteworthy project. Please note: Although, I mentioned a specific form of breast cancer in the title, triple negative, I am perfectly willing to change the major focus to another form of cancer as long as sequence data can be readily obtained. I also endeavor to obtain microarray data for the subsequent construction of a heat map illustrating differing levels of gene expression.
Thank you.
~Caitlin
Have a look at OncoTrack and/or the Etriks portal. It may be worth having a look at the Open Targets Platform too. These are the targets associated with triple-negative breast cancer based on our latest release. Try searching for other cancers as well. The association is made based on differential expression (from microarray and RNASeq experiments) and other data sources. We also show if the expression is up or downregulated in patients x controls (look for 'increased' under the 'Activity' column in this table showing the evidence for BRIP1-breast carcinoma association. Just be aware that the Platform does not store patient data though. This data comes from Expression Atlas and we link the study back to the original database.