CpG islands density calculation

0

Entering edit mode

7.6 years ago

Lila M ★ 1.3k

Hi everybody, I downloaded the promoter sequences fortwo gene list using USCS, so I have all the fasta files stored in a txt file (file_a and file_b). I would like to know if there is any difference for the CpG in both files. To do that, I've performed a little code in R

fastaFile_a = readDNAStringSet("file_a")
#seq_name_a = names(fastaFile_a)
#sequence_a = paste(fastaFile_a)
CG_file_a = sum(vcountPattern("CG", fastaFile_a))

fastaFile_b = readDNAStringSet("file_b")
CG_file_b =sum(vcountPattern("CG", fastaFile_b))

I'm not feel very confident at it, because I'm not sure the accuracy to identify CpG density properly... any idea or suggestion?

Thank!

RNA-Seq CpG promoters • 3.0k views

ADD COMMENT • link 7.6 years ago by Lila M ★ 1.3k

1

Entering edit mode

Two quick notes:

You should probably normalise for lengths.

Are these sequences directional? Should you include an inverse pattern of "GC", given DNA is double stranded.

ADD REPLY • link 7.6 years ago by jotan ★ 1.3k

0

Entering edit mode

Hi, as all the sequences have the same length (1,000 nt) I don't have to normalize for length. I downloaded the sequences for USCS, how can I know if they are directional? Thanks for the tips!

ADD REPLY • link 7.6 years ago by Lila M ★ 1.3k

Login before adding your answer.