Nebula 30x read length distribution
0
0
Entering edit mode
17 months ago
berndmann ▴ 10

I used

samtools view {path}/{sample_name}.bam -F 4 | cut -f 10 | perl -ne \'chomp;print length($_) . "\\n"\' | sort -n | uniq -c > {path}readlength.txt

to create a read length distribution for my nebula.bam. The output looks like this:

  98929 30
  98321 31
  85283 32
  93128 33
  72783 34
  90507 35
  81362 36
  81355 37
  73827 38
  70665 39
  82116 40
  74862 41
  68171 42
  69581 43
  65017 44
  74617 45
  65990 46
  66215 47
  63188 48
  63776 49
  61673 50
  69611 51
  63448 52
  67838 53
  57148 54
  58645 55
  56490 56
  57091 57
  56761 58
  55588 59
  53437 60
  53376 61
  52779 62
  53832 63
  52846 64
  51626 65
  50242 66
  49143 67
  51991 68
  48566 69
  45442 70
  45470 71
  42825 72
  43132 73
  41201 74
  42314 75
  37014 76
  34177 77
  29547 78
  26587 79
  23665 80
  22278 81
  18312 82
  19352 83
  16901 84
  16819 85
  14827 86
  14269 87
  12903 88
  12951 89
  11324 90
  11640 91
   9157 92
  10129 93
   8531 94
   8585 95
   7440 96
   6783 97
   6379 98
   6730 99
   5959 100
   6763 101
   3692 102
   3033 103
   2804 104
   2142 105
   1844 106
   1234 107
   1035 108
    868 109
    635 110
    570 111
    411 112
    441 113
    331 114
    297 115
    247 116
    235 117
    408 118
    281 119
    183 120
     54 121
     72 122
     64 123
     68 124
     16 125
     21 126
      3 127
      2 128
      1 129
752448666 150

Is this correct for 30x WGS that the read length is almost 150bp all the time? I guess the cut after 150 is due to Illumina sequencing limitation, right?

Is there a useful way to plot such data?

Illumina nebula.WGS 30x • 620 views
ADD COMMENT
1
Entering edit mode

Is this correct for 30x WGS that the read length is almost 150bp all the time? I guess the cut after 150 is due to Illumina sequencing limitation, right?

No there are Illumina sequencing kits that will sequence longer (up to 300 cycles). It looks like your BAM contains 150 bp reads max but that has nothing do with 30x WGS part.

ADD REPLY
0
Entering edit mode

I'm just looking for some bam file with standard 30x WGS Illumina sequencing to get a feeling for their read-length distribution. Is there some example bam they provide that I missed?

ADD REPLY
0
Entering edit mode

to get a feeling for their read-length distribution

That will be completely sample quality dependent. You may encounter samples where 99% of reads may be 150 bp if they are from libraries with longer inserts.

ADD REPLY

Login before adding your answer.

Traffic: 1756 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6