Hi everyone,
I have dbGaP metadata for a project that has multiple datatypes (WES, WGS, RNA-seq etc) and I am trying to select only samples corresponding to RNA-seq but I am quite confused looking at a few columns in the metadata. Here are the columns:
> plyr::count(dat[,c('Assay_Type_s','analyte_type_s','molecular_data_type_s')])
Assay_Type_s analyte_type_s molecular_data_type_s freq
1 RNA-Seq DNA <not provided> 1
2 RNA-Seq RNA <not provided> 309
3 RNA-Seq RNA miRNA (NGS) 150
4 RNA-Seq RNA RNA Seq (NGS) 324
5 RNA-Seq RNA Targeted Exome (NGS) 32
6 RNA-Seq RNA Whole Exome (NGS) 4
7 RNA-Seq RNA Whole Genome (NGS) 62
Here, as you can see the Assay Type
is RNA-Seq but then you also see Analyte Type
as DNA or RNA and Molecular Data Type
as either miRNA, RNA-Seq, Targeted, Whole Exome and Whole Genome. How is the relationship determined between these columns and which samples would really be from a RNA-seq experiment?
Thanks
I don't get it. My question is what are those samples with
Molecular Data Type != RNA Seq (NGS)
? Are those also RNA-sequencing? If not, then why is theAssay Type = RNA-Seq
?Check the paper, you'll get the idea.
"...Unique to capture transcriptomes is an overnight capture reaction (RNA-DNA hybridization) using exon-targeting RNA probes, followed by a washing step, and an additional set of PCR cycles..."