Entering edit mode
3.5 years ago
DN99
▴
20
Sorry if this might be a simple question. I'm looking in the ExAC database, specifically in the Gene constraint scores TSV (from here https://gnomad.broadinstitute.org/downloads) and I see for each row per gene there is a bp column.
What does the bp column mean in relation to the gene? Is it the number of base pairs that gene has? Or is it a specific position of that gene on the chromosome, like the start point?
The first row of the data I'm looking at looks like:
transcript gene chr n_exons tx_start tx_end bp mu_syn mu_mis mu_lof n_syn n_mis n_lof exp_syn exp_mis exp_lof syn_z mis_z lof_z pLI n_cnv exp_cnv cnv_z
ENST00000263100.3 A1BG 19 8 58858387 58864803 1488 1.22623810613e-05 2.31370910656e-05 1.00149904809e-06 87 170 8 104.728743317 199.807808895 12.3013823748 1.07397341153102 1.03143095067218 1.21484488615106 9.0649236354772e-05 3 3.60990172920741 0.111439851405077
ENST00000373995.3 A1CF 10 11 52566488 52610547 1785 6.39891945771e-06 1.54440933739e-05 1.8987381109e-06 86 168 9 76.6988402846 178.585954564 25.9365837039 -0.658403707874478 0.387458005554534 3.29427011505961 0.00361970078438154 NaN NaN NaN
ENST00000318602.7 A2M 12 36 9220418 9268445 4425 1.76240458841e-05 4.04871757669e-05 3.98398665823e-06 187 393 16 187.602696614 414.516709098 51.7060915327 0.0272791650795749 0.516917311081667 4.9188222793601 0.000540114865271392 3 8.70631909864876 0.833503390443042
ENST00000299698.7 A2ML1 12 35 8975247 9027607 4365 1.7870125509e-05 4.01510386566e-05 3.7123159726e-06 226 502 42 216.075661755 467.040245986 56.0645988342 -0.41854907644017 -0.791240082514084 1.86068429240947 1.32902210264609e-22 63 11.8468312922777 -2.28143080359217
We can only guess what the column means right now. Can you show us the first few lines of the file? Edit your post and add it in there. See this post for formatting tips: How to Use Biostars Part-3: Formatting Text and Using GitHub Gists
Thank you for your response, I've had a go at adding in the first line of the data. I've been trying to find their README file for more information too but I haven't found it yet
It doesn't seem to be described in their Supplementary Material pages 74-77 (as mentioned on the webpage). You may want to email them to be sure, but I think
bp
corresponds to the number of exonic bases in the transcript. The first transcript ID seems to match a GRCh37-annotated transcript, and grch37 EnsEMBL seems to be down right now so I'm unable to verify.I agree I will get in touch with them, but also agree that exonic bases sounds correct - thank you for looking into it!