I've downloaded some miRNA expression data from TCGA (for CHOL) and the isoform quantification files look like this:
miRNA_ID isoform_coords read_count reads_per_million_miRNA_mapped cross-mapped miRNA_region
hsa-let-7a-1 hg38:chr9:94175939-94175962:+ 1 0.706072 N precursor
hsa-let-7a-1 hg38:chr9:94175942-94175962:+ 1 0.706072 N precursor
hsa-let-7a-1 hg38:chr9:94175961-94175984:+ 2 1.412144 N mature,MIMAT0000062
hsa-let-7a-1 hg38:chr9:94175962-94175981:+ 45 31.773244 N mature,MIMAT0000062
However, in other projects and papers, I always see selected features labeled as hsa-let-7a-1-3p
or hsa-let-7a-5p
, etc. Where is the 3p/5p
coming from? Does it correspond with the +/- strand?
Additionally, how do I pool this data between different samples so I can run differential expression analysis between data from CHOL samples and other cancer types (i.e., BRCA). My end goal is to perform feature selection methods and then use the selected features to predict cancer types, but I am unsure how to process this data.
Thanks in advance.
Did you find a way to do this? I want to figure out the 3p/5p forms from the isoform quantification files too, but don't know how or where to begin!