Generate Gene Feature Matrix from ATAC-seq data
0
0
Entering edit mode
2.6 years ago
el24 ▴ 40

I am trying to convert single-cell ATAC-seq regions to their overlapping gene values using promoter regions and gene information. I am following this tutorial, but I cannot generate the final feature matrix. I appreciate if you have any suggestions.

I have used bedmap to get overlap between my fragments.bed and genes.bed file as follows:

bedmap --ec --delim "\t" --echo --echo-map-id genes.sort.bed fragments.sort.bed > atac_genes_bc.bed

Some information about my files:

head genes.sort.bed

chr1    10868   12368   DDX11L1     .   +
chr1    16935   18435   MIR6859-1   .   -
chr1    28553   30053   MIR1302-2HG .   +
chr1    29069   30569   WASH7P.     .   -
chr1    29365   30865   MIR1302-2   .   +

head fragments.sort.bed #single-cell ATAC-seq data; last column is the value in the open chromatin region for column 4's cell name.

chr1    191337  191587  AAAGTGCCACAT    1
chr1    191337  191587  AAATTCAGGCGG    1
chr1    191337  191587  ATCCTCTTACAT    1
chr1    191337  191587  CTCACCATACGG    1
chr1    191337  191587  GACGTTCAATCC    1

head -100 atac_genes_bc.bed | tail -5

chr1    1385710 1399338 NM_030937   0   -   1387230 1399306 0   11          1872,93,112,142,105,100,65,121,110,75,320,  0,2066,2243,4519,4748,5055,7685,9683,12522,12886,13308,     AACCCTGGCGGA;AACCTCACTATA;AATGCCAGACTA;ACAAATAACAGT;ACACTGAGCAGG;ACCGGTAGCTGA;ACTTGTTAGTCT;AGACGAATCTAA;AGTGAACCAGTG;AGTGAAGATGGC;ATCCTCTTACAT;ATGCGATCTCTG;ATTTGTTGAGCA;CAAAGGGCACTA;CAGCTGGTGATG;CAGTTACGAATG;CATCTTAATTAG;CCAACCGTAGAT;CCTGTGATAAGG;CGAGCTGTTATC;CTCAACCCATTG;CTGCGACACTGG;GAAATTTATTAG;GACGTATATGCC;GAGTCGCTAGGA;GAGTTGGGGATG;GATCTCCACAGG;GCGACGCCACCA;GCGCATATAAGT;GGCCACTTGCTG;GGTCTTTGTGCC;GTATCATTTGCA;GTCAGATTGCCT;GTGCTTATAGTC;TAAGCTCTTAAG;TACCAAAACATT;TAGGTCGGATAA;TAGTGTCTGGCG;TCATGAGTTCCG;TCCCCCTCGGCT;TGACCTGGATAA;TGCACCATATCA;TGCACTTCACAT;TGGACGCGCTTT;TTCATTATCTGC;TTCATTGAGCAT;TTCCTTTCGGTA;TTTTCCGCCACC
chr1    1392287 1399338 NM_001039577    0   -   1392781 1399306 0   6   516,65,121,110,75,320,  0,1108,3106,5945,6309,6731, AACCCTGGCGGA;AACCTCACTATA;AATGCCAGACTA;ACAAATAACAGT;ACACTGAGCAGG;ACCGGTAGCTGA;ACTTGTTAGTCT;AGACGAATCTAA;AGTGAACCAGTG;AGTGAAGATGGC;ATCCTCTTACAT;ATGCGATCTCTG;ATTTGTTGAGCA;CAAAGGGCACTA;CAGCTGGTGATG;CAGTTACGAATG;CATCTTAATTAG;CCAACCGTAGAT;CCTGTGATAAGG;CGAGCTGTTATC;CTCAACCCATTG;CTGCGACACTGG;GAAATTTATTAG;GACGTATATGCC;GAGTCGCTAGGA;GAGTTGGGGATG;GATCTCCACAGG;GCGACGCCACCA;GCGCATATAAGT;GGCCACTTGCTG;GGTCTTTGTGCC;GTATCATTTGCA;GTCAGATTGCCT;GTGCTTATAGTC;TAAGCTCTTAAG;TACCAAAACATT;TAGGTCGGATAA;TAGTGTCTGGCG;TCATGAGTTCCG;TCCCCCTCGGCT;TGACCTGGATAA;TGCACCATATCA;TGCACTTCACAT;TGGACGCGCTTT;TTCATTATCTGC;TTCATTGAGCAT;TTCCTTTCGGTA;TTTTCCGCCACC
chr1    1399529 1402046 NR_015434   0   +   1402046 1402046 0   3   160,146,1442,   0,628,1075, AACCCTGGCGGA;AACCTCACTATA;AATGCCAGACTA;ACAAATAACAGT;ACACTGAGCAGG;ACCGGTAGCTGA;ACTTGTTAGTCT;AGACGAATCTAA;AGTGAACCAGTG;AGTGAAGATGGC;ATCCTCTTACAT;ATGCGATCTCTG;ATTTGTTGAGCA;CAAAGGGCACTA;CAGCTGGTGATG;CAGTTACGAATG;CATCTTAATTAG;CCAACCGTAGAT;CCTGTGATAAGG;CGAGCTGTTATC;CTCAACCCATTG;CTGCGACACTGG;GAAATTTATTAG;GACGTATATGCC;GAGTCGCTAGGA;GAGTTGGGGATG;GATCTCCACAGG;GCGACGCCACCA;GCGCATATAAGT;GGCCACTTGCTG;GGTCTTTGTGCC;GTATCATTTGCA;GTCAGATTGCCT;GTGCTTATAGTC;TAAGCTCTTAAG;TACCAAAACATT;TAGGTCGGATAA;TAGTGTCTGGCG;TCATGAGTTCCG;TCCCCCTCGGCT;TGACCTGGATAA;TGCACCATATCA;TGCACTTCACAT;TGGACGCGCTTT;TTCATTATCTGC;TTCATTGAGCAT;TTCCTTTCGGTA;TTTTCCGCCACC
chr1    1401895 1407313 NM_017971   0   -   1402082 1407217 0   4   361,78,111,183, 0,3913,5013,5235,   AATCCCACTAAT;ACCGCTAGCGGA;ATTCGCAATGCT;CTGGGCTCTTTA;CTTTTCTATCTC;GACGTTCAATCC;GATAGGTCGCAC;GATCCCTTGTGG;GATTGGCCTCTC;GCAGTCTCGGAC;GCCGTACCGCCT;GGAGGATACTTA;GGGTCGATTGTT;GTCAGATTGCCT;GTGGAGCTGTAA;TAGGTAACTAGC;TTCATGCCATGT
chr1    1418419 1421444 NM_001145210    0   -   1419099 1421005 0   4   1130,541,209,312,   0,1632,2377,2713,   CGCGATACGGCG;GAACACTTGCGT;GAGTGGGGCCCT;GCTGGCCGTTGA;GTCACTGGTCGC;GTGACTTCAAAT;GTGGATGATCAC;TAAGAGGGCTAC

My goal is to use the LIGER package to integrate single-cell RNA-seq and ATAC-seq data, according to this tutorial. The problem is that when I run the following R commands I get an error:

Reading barcodes of cells from original sc ATAC-seq data where I extracted fragments.bed to convert ATAC-seq regions to overlapping gene values with promoter regions.

library(rliger)
atac <- read.table('chromatin_counts.tsv', sep = '\t', header = TRUE, as.is = TRUE)
genes.bc <- read.table(file = "atac_genes_bc.bed", sep = "\t", as.is = c(4,12), header = FALSE)
barcodes <- colnames(atac)
genes.counts <- makeFeatureMatrix(genes.bc, barcodes)

Error: Error in makeFeatureMatrix(genes.bc, barcodes) : Expecting a string vector: [type=integer; required=STRSXP].

Thank you in advance!

integration atac bed bedmap rna • 421 views
ADD COMMENT

Login before adding your answer.

Traffic: 1703 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6