Hi there, I am analyzing a dataset from another paper and am trying to recreate one their methods on CLIP Seq. I have the bam files and in their methods they claimed:
'For each RNA, we enumerated 100 nucleotide windows across the entire RNA. For each window, we calculated the enrichment by computing the number of reads overlapping the window in the protein elution sample divided by the total number of reads within the protein elution sample. We normalized this ratio by the number of reads in the input sample divided by the total number of reads in the input sample.'
I have two questions with this :
First, when they say they protein elution sample, should I be taking the total number of reads within the entire BAM file? Or just the ones reads within the RNA I am looking at?
Second, what methods are used for normalization of CLIP data? Is there an efficient way to do this in python? I created my sliding window and plots of the reads in python so I would like to stick with it to try and completely recreate this. Unfortunately, the paper didn't link any github or source code - so this is all I really have.
Thanks so much.
Thanks so much I'll take a look! I appreciate it tons.