This is the description I received from the SRA staff (Adam Stine).
The spot model is Illumina GA centric. The flowcells have the locations where the adapters have stuck them to the glass of the lane. There are X and Y coordinates that identify these 'spots'. As the camera reads the fluorescent flashes during sequencing, the coordinates indicate which spot the new base is added to. All of the bases for a single location constitute the spot. There may be one or more divisions of those bases for technical reads (adapters, primers, barcodes, etc) and there will always be at least one biological read (forward, reverse). I usually think of the technical reads as the "known" sequence and the biological as the "unknown". When we store the data, the bases for a single spot are all stored as one string with the description of where the breaks occur as well as the type of read each segment represents. The spot length is the expected total length for all reads (used as a check to make sure we have all the data). As an example, a 2x150 run with a 6bp barcode and 12bp primer on the forward read would have 4 reads.
0 - barcode basecoord 1
1 - primer basecoord 7
2 - forward basecoord 19
3 - reverse basecoord 151
But you only need to explain SRA about the barcode and primer is you submit sequences that contains it..In my case, a third party provided me with the BAM files and I do not have the untrimmed sequences.
So the SPOT datamodel is useful for supplying untrimmed BAM.. yet, enable you to specify where the biological reads begin.
In my case, I have 2X100 bp without index and I am only supplying the Application read with the adapter trimmed. so I simply submit.
0 - forward basecoord 1 (Application read)
1 - reverse basecoord 101 (Application read)
Hi I agree with Stefano. Spot does contain more than a read. I didn't find any official document to prove this but actually when we use fastq-dump on the sra file so as to convert it into a fastq files , after completion it is written that "Written 38424688 spots for SRR032.sra" Now if we look at the fastq file, each read has 3 more things attached to it starting with @. Something like this
@SRR032238.12186 HWI-EAS6:3:1:246:1981 length=50 GGCCAGCTCTACACCTTCAAGGCCGAGACGGAGGAGCTGAAGGGANGCTG
+SRR032238.12186 HWI-EAS6:3:1:246:1981 length=50 BBB@=@BBBABBBBBBBB>0>BBB@6@A?446/8+;AAA@=9(7-!817&
In total each read has 4 lines. Now count the number of lines in your fastq file and divide it by 4. That would give you the same number as i mentioned above i.e 38424688(IT WOULD BE DIFFERENT FOR DIFFERENT FILES OFCOURSE) SO a spot contains 4 lines in fastq of which read is a part.
Hope this helps
Isn't your explanation a round about way of saying that number of spots is exactly the same as number of reads? Which is not always true.