Sequencing By Hybridization Data Analysis
1
0
Entering edit mode
11.1 years ago
jack ▴ 520

I have NGS data of a gene, it's sequenced by SBH technology.as sequencing library, a hexamer library is used (4 in power of 6). like this :

TTTGAGGTGCAGATAGCTTGCTTTATTTTGTTGTTACTATCTCAAGGAGG
TCCAACAATTATAACTAACAATTGAATTTATACTTGCATGAAAAGAACTA
CATCAAATTGACATTTTGGGCAATTAGTAATATTGTTTAAAATTTAACAA
CAGCTTTATTTTGTTGTTGTTCTTTACTTTTTGCTGTGGCTCATTGCTTA
GGTGCCCAGGTTTTTCAGGTGCAATTAAAATTTAGAACTACCACACAAAG
GCATTGGCTGCACTCTGGGACCTCCAAGAGTTGGCACTGCTCTGGCATAG
GAATACTTGAATAGCTTGGTTAAATGAAGGGATGGCCAGGAGATGTTACT
.
.
.

I want to calculate the following things :

i) how many percent of gene I can discover uniquely with this hexamer library (assume all hexamer are used in library)

ii) how many different hexamer are present in this gene

Can somebody guide me how can I calculate them ?

ngs bioinformatician statistics bioconductor genomics • 1.7k views
ADD COMMENT
0
Entering edit mode
11.1 years ago
Ido Tamir 5.2k

scala:

>val hexamerMap = gene.sliding(6).toList.groupBy(s => s).mapValues(_.length)
>hexamerMap.size
res11: Int = 301
>hexamerMap.values.groupBy(c => c).mapValues(_.toSeq.length).toList.sortBy(_._1)
res14: List[(Int, Int)] = List((1,264), (2,30), (3,7))

So 246 hexamers are there once, 30 twice and 7 thrice. From this one could calculate the percentage of uniquely coverage, if I understood your question i correctly.

Edit: actually I don't really understand question i.

ADD COMMENT

Login before adding your answer.

Traffic: 2722 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6