Hello,
I am trying to find datasets for a project on HNSCC. I have been using GEO as my main website to find datasets but have not found anything. I am trying to find a dataset about HNSCC, tumor and control, RNA, for the tonsil body part, and then FASTA files. I find it hard to find GEO datasets that are also SRA, which contains fasta files, unlike normal geo datasets with only txt files most of the time. Adding on to the previous sentence, I found numerous geo datasets that fit my bill, but contain no fasta files. I am wondering if you know how I can find SRA datasets better or any other website that has datsets(with Fasta files)?
Thanks
If I’m trying to do rna-seq, how do you think the pipeline would look if I start with bam files from tcga. Normally I would something like fastqc, STAR, trimmomatic, and then feature counts. With the bam, would I just go straight to featurecounts?
You will still need to apply for access to BAM's. They are not publicly available. If you can use counts then those are publicly available. There are portals like cBioPortal and Xenabrowser that give you access to analyzed TCGA data.
oooh makes sense. My end goal is to find a number of genes associated to that dataset. Would I need to compare it to a control dataset or could I apply it to deseq2 with the counts file and a sample data file? If so, How could I compare it to a control dataset using R?