Entering edit mode
8 days ago
cthangav
▴
110
Using a manifest file from a paper, I am trying to obtain an expression matrix with genes and samples on the axes from GDC.
I tried to use TCGAbiolinks but GDCPrepare won't accept a manifest file and wants to specify samples by creating a "query" with GDCquery or summarized experiment object.
The manifest looks like this. Is there a way to turn it into a GDCquery object?
id filename md5 size state
752efdfa-92e4-4f0d-8c9f-23791ac82eae 90985f9d-0403-4f49-9cd5-020d220ac220.rna_seq.augmented_star_gene_counts.tsv 0934480ded0ec7be97fc0407a9b1da11 4230676 released
f8ba326c-a068-4d8c-a544-07fd13c39cbd b79c9932-bc2b-4c87-b954-3b6efd0d76e5.rna_seq.augmented_star_gene_counts.tsv a5ba87a5d2a3b70ef6f90c71d5b69ec9 4245285 released
0cf3b928-c697-4d9d-a920-519db2d3d060 b4da5920-c3b0-434b-9fb8-e2909d898b3a.rna_seq.augmented_star_gene_counts.tsv 6088855cd134a064acd793a8fbfae906 4211903 released
a3ed21c8-e48e-4a4c-83ea-13d378909970 5b4ed5af-d39b-432e-9d41-9862403c9208.rna_seq.augmented_star_gene_counts.tsv e6b6601dd5353457a692d609840b92ce 4228730 released
58e2eaf8-916c-4601-85d0-0a01ffcbb9ef b1ddf742-6c66-4f3d-a405-5d26d03b431a.rna_seq.augmented_star_gene_counts.tsv fbb3cd3557cd141d8141488597c2b665 4239375 released
196cdd75-bb1f-4778-83f5-e26986ed2b2f 04464a56-e420-4e61-aa79-f1afacdb3c91.rna_seq.augmented_star_gene_counts.tsv e0556cbe056f239a98a5152d3fd02155 4254799 released
650165ae-5691-4dc1-b36f-1d9fb92ec7f1 7720992f-1f3e-46c0-a8f2-11149d70dd4a.rna_seq.augmented_star_gene_counts.tsv e645772a9ace36e794a7f5567fe66497 4239486 released
839e6752-8bce-4eab-8b31-91661aab52f9 f6ec3da4-b8e6-4f35-8473-7a8bb9bf5cc8.rna_seq.augmented_star_gene_counts.tsv 9dcff604aa9d505cf5ae1b7769827bc9 4202828 released
54532364-a3ea-4f72-990e-a173198139f9 ddbb58a4-beb7-49f8-8f82-38fa4ea61642.rna_seq.augmented_star_gene_counts.tsv 26912c1419ef897c31a8c3ff1e62b507 4241592 released
6c2b6438-4faf-4e1c-bee4-8dbcace35871 fd342f63-b31b-4d95-bb94-029aff2b4ed0.rna_seq.augmented_star_gene_counts.tsv 48c67baa080721d1e2dfdf61f2717b7d 4237262 released
b25942f0-b57a-4a96-820c-4115cf471572 2dca88b8-727f-446f-899a-86d8871aa148.rna_seq.augmented_star_gene_counts.tsv b444069c4a08d3121ce7d9118e79decf 4259327 released
The original paper used around 700 files, so is downloading them all and running a read loop the best way in terms of time/space?