Entering edit mode
7.6 years ago
Lila M
★
1.3k
Hi everybody, I downloaded the promoter sequences fortwo gene list using USCS, so I have all the fasta files stored in a txt file (file_a and file_b). I would like to know if there is any difference for the CpG in both files. To do that, I've performed a little code in R
fastaFile_a = readDNAStringSet("file_a")
#seq_name_a = names(fastaFile_a)
#sequence_a = paste(fastaFile_a)
CG_file_a = sum(vcountPattern("CG", fastaFile_a))
fastaFile_b = readDNAStringSet("file_b")
CG_file_b =sum(vcountPattern("CG", fastaFile_b))
I'm not feel very confident at it, because I'm not sure the accuracy to identify CpG density properly... any idea or suggestion?
Thank!
Two quick notes:
You should probably normalise for lengths.
Are these sequences directional? Should you include an inverse pattern of "GC", given DNA is double stranded.
Hi, as all the sequences have the same length (1,000 nt) I don't have to normalize for length. I downloaded the sequences for USCS, how can I know if they are directional? Thanks for the tips!