Entering edit mode
7 weeks ago
1769mkc
★
1.3k
How to combine multiple single cell data(ATAC) into one file. As an example the dataset I have is this to start with
zcat GSE268807_G129_D_barcodes.tsv.gz | head
AAACAGCCAAAGCTCC-1
AAACAGCCAAGGTCGA-1
AAACAGCCACAGGAAT-1
AAACAGCCACATTAAC-1
AAACAGCCACCTGGTG-1
AAACAGCCACGCAACT-1
AAACAGCCAGCTAACC-1
AAACAGCCAGCTAATT-1
AAACATGCAAATTCGT-1
AAACATGCAACACCTA-1
zcat GSE268807_G150_D_barcodes.tsv.gz | head
AAACATGCAGACAAAC-1
AAACATGCATAAAGCA-1
AAACATGCATGAATCT-1
AAACCAACACCTACGG-1
AAACCGAAGCAGCTCA-1
AAACCGAAGCTTGCTC-1
AAACCGAAGGCGGATG-1
AAACCGCGTTTGACCT-1
AAACGCGCAAAGCCTC-1
AAACGCGCAATGCCTA-1
zcat GSE268807_G129_D_matrix.mtx.gz| head %%MatrixMarket matrix coordinate integer general
%metadata_json: {"software_version": "cellranger-arc-2.0.2", "format_version": 2}
131903 17176 60994557
25 1 1
33 1 1
54 1 1
60 1 1
61 1 1
63 1 1
85 1 1 zcat GSE268807_G150_D_matrix.mtx.gz| head
%%MatrixMarket matrix coordinate integer general
%metadata_json: {"software_version": "cellranger-arc-2.0.2", "format_version": 2}
114970 3068 23594006
69 1 2
137 1 1
158 1 1
248 1 1
465 1 1
469 1 1
476 1 1
zcat GSE268807_G129_D_features.tsv.gz| head
ENSG00000243485 MIR1302-2HG Gene Expression chr1 29553 30267
ENSG00000237613 FAM138A Gene Expression chr1 36080 36081
ENSG00000186092 OR4F5 Gene Expression chr1 65418 69055
ENSG00000238009 AL627309.1 Gene Expression chr1 120931 133723
ENSG00000239945 AL627309.3 Gene Expression chr1 91104 91105
ENSG00000239906 AL627309.2 Gene Expression chr1 140338 140339
ENSG00000241860 AL627309.5 Gene Expression chr1 149706 173862
ENSG00000241599 AL627309.4 Gene Expression chr1 160445 160446
ENSG00000286448 AP006222.2 Gene Expression chr1 266854 266855
ENSG00000236601 AL732372.1 Gene Expression chr1 360056 360057
zcat GSE268807_G150_D_features.tsv.gz| head
ENSG00000243485 MIR1302-2HG Gene Expression chr1 29553 30267
ENSG00000237613 FAM138A Gene Expression chr1 36080 36081
ENSG00000186092 OR4F5 Gene Expression chr1 65418 69055
ENSG00000238009 AL627309.1 Gene Expression chr1 120931 133723
ENSG00000239945 AL627309.3 Gene Expression chr1 91104 91105
ENSG00000239906 AL627309.2 Gene Expression chr1 140338 140339
ENSG00000241860 AL627309.5 Gene Expression chr1 149706 173862
ENSG00000241599 AL627309.4 Gene Expression chr1 160445 160446
ENSG00000286448 AP006222.2 Gene Expression chr1 266854 266855
ENSG00000236601 AL732372.1 Gene Expression chr1 360056 360057
Here I have feature file, barcode file and mtx file., this is the data source .
My final objective is the make one file for each such as one barcode, one feature and one mtx file.
For barcode and feature I can think of merging where I can filter the duplicates, but I'm not able to figure out how to merge the mtx file.
Any suggestion or help would be really appreciated
`