Entering edit mode
5.1 years ago
gsmur
•
0
We've recently downloaded a few TCGA RNASeq datasets from the GDC (bam files) and have obtained both the old GenomeAnalyzer and HiSeq datasets. For some datasets such as COAD, we can detect the different sequencing instruments based upon the @RG tags however, the format of the @RG isn't the same for each experiment (e.g, STAD).
Example @RG tags:
COAD HiSeq: ID::130128_UNC12-SN629_0253_AC1M9FACXX_GGCTAC_L002
COAD GA: ID::100810_UNC6-RDR300211_00022_FC_629L9AAXX.7
STAD unsure: ID::D1UVJACXX_1_GCCAAT
We've also tried to use the GDC API but it appears that the 'instrument_model' field is empty for many samples. Does anyone know how I can obtain this information?