@henryiii @jpivarski Can you tell me if this is a hist
Issue or a uproot
Issue or neither? https://gist.github.com/matthewfeickert/ab6ac8677aad2e04738111d0af3e0549
(There's a Binder link in the Gist if you want to play with it in browser)
@henryiii @jpivarski Another followup question on moving from root files to hist.Hist
histograms via uproot
: Is there any way to be able to use uproot
's .to_hist()
API to get a hist.Hist
with storage=hist.storage.Weight()
? Or at the moment should I just write a little converter like I did here?
https://github.com/matthewfeickert/heputils/issues/24#issuecomment-800867686
Well that's super cool to hear. Congrats in advance Boost.Histogram
team. :)
boost-histogram
returning consistent objects via view()
for different storages. I personally value a consistent API for the tiny subset of features I use in practice (e.g. double/weight storages) higher than the extra flexibility. I suspect API consistency may also help with typing. Is this more consistent API something that may fit into hist
(or is it maybe already available there)?
h.variances()[…] =
will not work if variance is computed, as you are setting a computed value (mean storages)..values()[…] =
should work on all the existing storages, though. I would mostly recommend setting them all at once, using theh[…] =
syntax, though.
This could be fixed, though, right
h.variances()
returns a NumPy array that has been generated. Though https://github.com/scikit-hep/boost-histogram/discussions/504 would make this all much more elegent; you could write h.variances = …
and that would just work (and support flow / noflow).
The naive assumption when dividing histograms with error bars is that the error bars are independent (the same assumption that is usually made when adding or subtracting), but the most common use-case for dividing is to make an efficiency plot, in which the numerator is a strict subset of the denominator and both are counting statistics. Even when we know that this is the case, there are different ways of handling the statistics that differ for ratios close to 0 or 1. See the table on this page. There are strong arguments for some of these options, but not everybody agrees.
So if there is a way to divide histograms in a histogramming library, it should probably be some kind of method call, so that the statistical treatment can be configurable. If it's just the /
operator, a lot would have to be assumed.
So if there is a way to divide histograms in a histogramming library, it should probably be some kind of method call, so that the statistical treatment can be configurable. If it's just the
/
operator, a lot would have to be assumed.
yes, definitely! If it's not appropriate to include in a histogramming library, is this something that people are doing manually in their analyses? Or is there some other stats package that's more appropriate to be using here? Because right now, the simplest way of making an efficiency plot seems to me to convert boost_histograms into TH1 and divide there, which is a little silly!
coffea
histogram implementation has some of the relevant methods already implemented (https://coffeateam.github.io/coffea/modules/coffea.hist.html).
hist
now too: https://hist.readthedocs.io/en/latest/reference/hist.html#module-hist.intervals
hist
to make a ratio plot where the ratio is an efficiency then you could just follow the example in the User Guide with kwarg
rp_uncertainty_type="poisson-ratio"
(where of course you'd change the histograms so that hist_1
is a strict subset of hist_2
).
I think the
coffea
histogram implementation has some of the relevant methods already implemented (https://coffeateam.github.io/coffea/modules/coffea.hist.html).
thanks, I always steered away from coffea because all the examples are CMS-based and I haven't sat down and translated to atlas jargon :D I didn't actually realise it had histogramming!
and
If you want to use hist to make a ratio plot where the ratio is an efficiency then you could just follow the example in the User Guide with kwarg rp_uncertainty_type="poisson-ratio"
I also hadn't realise that hist could do this properly because I didn't see any efficiencies in the example.
two options now, thanks everyone!
coffea
has implemented were basically ported into the hist.intervals
module, so using either should give identical results and if coffea
moves to using hist
it will basically just be changes in the API called. :+1: Big thanks to @nsmith- here as he was the first to implement these in coffea
and has been very helpful in giving feedback and advice.
Hi all,
When using UHI on a 3D histogram with an IntCategory
first axis, I notice that it seems to ignore the starting indices of my projection, e.g.
h[1::sum, ...]
should project the contents after 0 in the first dimension, but this start index seems to be ignored. I see the same result if I slice and then manually call project.
Upon further investigation, it seems that this only happens if I don't provide the stop
attribute of the slice, i.e. h[1:len:sum]
works. This is really, really useful, btw. Thanks for all the hard work.
h[1:len:sum]
cuts off the overflow bin, while h[1::sum]
includes overflow