Hello, I am following along with the EBI's online course for ChIP-Seq analysis and have encountered an error when trying to retrieve chromosome lengths:
> library(BSgenome.Hsapiens.UCSC.hg19)
> seqlengths(NFKB) = seqlengths(Hsapiens)
Error in .normargSeqlengths(value, seqnames(x)) :
length of supplied 'seqlengths' must equal the number of sequences
I assume the reason for this is the additional chromosome entries in the Hsapiens object:
> seqlengths(NFKB)
chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr2 chr20 chr21 chr22 chr3 chr4 chr5 chr6 chr7 chr8
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
chr9 chrM chrX chrY
NA NA NA NA
> seqlengths(Hsapiens)
chr1 chr2 chr3 chr4 chr5
249250621 243199373 198022430 191154276 180915260
chr6 chr7 chr8 chr9 chr10
171115067 159138663 146364022 141213431 135534747
chr11 chr12 chr13 chr14 chr15
135006516 133851895 115169878 107349540 102531392
chr16 chr17 chr18 chr19 chr20
90354753 81195210 78077248 59128983 63025520
chr21 chr22 chrX chrY chrM
48129895 51304566 155270560 59373566 16571
chr1_gl000191_random chr1_gl000192_random chr4_ctg9_hap1 chr4_gl000193_random chr4_gl000194_random
106433 547496 590426 189789 191469
chr6_apd_hap1 chr6_cox_hap2 chr6_dbb_hap3 chr6_mann_hap4 chr6_mcf_hap5
4622290 4795371 4610396 4683263 4833398
chr6_qbl_hap6 chr6_ssto_hap7 chr7_gl000195_random chr8_gl000196_random chr8_gl000197_random
4611984 4928567 182896 38914 37175
chr9_gl000198_random chr9_gl000199_random chr9_gl000200_random chr9_gl000201_random chr11_gl000202_random
90085 169874 187035 36148 40103
chr17_ctg5_hap1 chr17_gl000203_random chr17_gl000204_random chr17_gl000205_random chr17_gl000206_random
1680828 37498 81310 174588 41001
chr18_gl000207_random chr19_gl000208_random chr19_gl000209_random chr21_gl000210_random chrUn_gl000211
4262 92689 159169 27682 166566
chrUn_gl000212 chrUn_gl000213 chrUn_gl000214 chrUn_gl000215 chrUn_gl000216
186858 164239 137718 172545 172294
chrUn_gl000217 chrUn_gl000218 chrUn_gl000219 chrUn_gl000220 chrUn_gl000221
172149 161147 179198 161802 155397
chrUn_gl000222 chrUn_gl000223 chrUn_gl000224 chrUn_gl000225 chrUn_gl000226
186861 180455 179693 211173 15008
chrUn_gl000227 chrUn_gl000228 chrUn_gl000229 chrUn_gl000230 chrUn_gl000231
128374 129120 19913 43691 27386
chrUn_gl000232 chrUn_gl000233 chrUn_gl000234 chrUn_gl000235 chrUn_gl000236
40652 45941 40531 34474 41934
chrUn_gl000237 chrUn_gl000238 chrUn_gl000239 chrUn_gl000240 chrUn_gl000241
45867 39939 33824 41933 42152
chrUn_gl000242 chrUn_gl000243 chrUn_gl000244 chrUn_gl000245 chrUn_gl000246
43523 43341 39929 36651 38154
chrUn_gl000247 chrUn_gl000248 chrUn_gl000249
36422 39786 38502
Therefore I tried subsetting the Hsapiens object:
> Hsapiens[c("chr1","chr2")]
Error in Hsapiens[c("chr1", "chr2")] :
object of type 'S4' is not subsettable
Since it is not sub-settable, I tried retrieving the lengths via:
> lengths = seqlengths(Hsapiens)
> mylengths = lengths[1:25]
> mylengths
chr1 chr2 chr3 chr4 chr5 chr6 chr7
249250621 243199373 198022430 191154276 180915260 171115067 159138663
chr8 chr9 chr10 chr11 chr12 chr13 chr14
146364022 141213431 135534747 135006516 133851895 115169878 107349540
chr15 chr16 chr17 chr18 chr19 chr20 chr21
102531392 90354753 81195210 78077248 59128983 63025520 48129895
chr22 chrX chrY chrM
51304566 155270560 59373566 16571
Now I have two objects, the NFKB containing no chromosome lengths and the mylengths object containing the sequence lengths:
> seqlengths(NFKB)
chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr2
NA NA NA NA NA NA NA NA NA NA NA NA
chr20 chr21 chr22 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chrM chrX
NA NA NA NA NA NA NA NA NA NA NA NA
chrY
NA
> mylengths
chr1 chr2 chr3 chr4 chr5 chr6 chr7
249250621 243199373 198022430 191154276 180915260 171115067 159138663
chr8 chr9 chr10 chr11 chr12 chr13 chr14
146364022 141213431 135534747 135006516 133851895 115169878 107349540
chr15 chr16 chr17 chr18 chr19 chr20 chr21
102531392 90354753 81195210 78077248 59128983 63025520 48129895
chr22 chrX chrY chrM
51304566 155270560 59373566 16571
I then tried setting the mylengths to the seqlengths(NFKB):
> seqlengths(NFKB) = mylengths
Error in .normargSeqlengths(value, seqnames(x)) :
when the supplied 'seqlengths' vector is named, the names must match the seqnames
To my eyes, the names do match, but maybe it's because they are in the wrong order? Am I being slow or is there a much easier way to retrieve the chromosome lengths?
Thanks for the help, James
Have you tried reordering
mylengths
so it's in the same order asseqlengths(NFKB)
?Hello, yes I just tried that now and it has worked, thankfully:
I just wondered if there was a better way of doing this, given for every analysis I'll have to do something similar each time
Hi Irsan, if I have interpreted your comment correctly, that creates a vector of the names but not in the order they appear in the NFKB object? Instead I have done the following and it appears to work :
Isn't the my.sorted.lengths the same as names(seqlengths(NFKB)) ?