scATAC-seq analysis, data preprocessing

0

Entering edit mode

5.5 years ago

chipolino ▴ 150

Hi,

During scATAC-seq data preprocessing, does it make sense to filter data matrix, so it contains only most variable peaks (in the same way how we do it for scRNA-seq), before any further dimensionality reduction or clustering analysis?

Thanks

scATAC-seq • 2.1k views

ADD COMMENT • link updated 5.5 years ago by GouthamAtla 12k • written 5.5 years ago by chipolino ▴ 150

1

Entering edit mode

To better define cell types, it makes sense.

ADD REPLY • link 5.5 years ago by GouthamAtla 12k

1

Entering edit mode

That depends on the type of analysis you're referring to. PCA, for example, will always focus on the most variable regions. I haven't looked at scATAC-seq data myself but given that it's basically binary, I'm not sure how well the typical variance measures even hold up.

ADD REPLY • link 5.5 years ago by Friederike 9.0k

0

Entering edit mode

can I do sparse PCA on scATAC-seq matrix and see, what peaks correspond to, let's say the first component? And choose those as the most informative (variable)?

ADD REPLY • link 5.5 years ago by chipolino ▴ 150

0

Entering edit mode

Well, I'm not sure how "peaks" would be defined in scATAC-seq as there's a maximum of 2 reads per open region per cell. Maybe you want to collapse the information from multiple cells at the same region? What exactly is the question you're trying to address?

ADD REPLY • link 5.5 years ago by Friederike 9.0k

0

Entering edit mode

Usually, dimensionality reduction is done on top variable features (usually top 500). So you can take top variable peaks and build a PCA and see how the tSNE clusters looks like. If you want to overcome the sparsity of data, you could use KNN approach to merge data from n-similar cell. Before doing that I would check tSNE on top 500 variable peaks.

I did not know that the data is binary, so this paper seems to have a nice method to process the data.

ADD REPLY • link 5.5 years ago by GouthamAtla 12k

0

Entering edit mode

Thanks! But how do you find most variable peaks, if the data is binary?

ADD REPLY • link 5.5 years ago by chipolino ▴ 150

0

Entering edit mode

Sorry I am not aware that it's binary. I updated my answer and moved it to comment as it doesn't qualify as an answer anymore