Conversation
d9ec6e0 to
d777de3
Compare
mxinden
left a comment
There was a problem hiding this comment.
Oh silly me.
Great catch. Thanks for the patch.
Can you include a changelog entry?
Out of curiosity, did you hit this race condition, or found it while reading?
We discovered it the hard way with gauges reporting increments of 18.4 quintillions after making the switch in 3 repos. |
|
I'll add a changelog entry. Can you release 0.18.1 with that fix? |
d777de3 to
fe41d10
Compare
mxinden
left a comment
There was a problem hiding this comment.
Out of curiosity, did you hit this race condition, or found it while reading?
We discovered it the hard way with gauges reporting increments of 18.4 quintillions after making the switch in 3 repos.
Sorry to hear that. My bad. Thanks for debugging.
Do you have some time to tackle the second race condition as well? If not, I will give it a shot tomorrow.
I'll add a changelog entry. Can you release 0.18.1 with that fix?
Yes, I will do that as soon as we have both race conditions fixed. See below.
src/metrics/family.rs
Outdated
| .entry(label_set.clone()) | ||
| .or_insert_with(|| self.constructor.new_metric()); | ||
|
|
||
| RwLockReadGuard::map(self.metrics.read(), |metrics| { |
There was a problem hiding this comment.
With Family::remove there is actually a second race condition.
In the scenarios where we have two threads:
- A: Call
Family::get_or_createwith a new label set.
1. Acquiring the read lock. Nothing there.
2. Acquiring the write lock. Inserting the new metric.
3. Being preempted by the OS. - B: Call
Family::removewith the above label set. - A: Acquiring the read lock for the second (last) time,
.expect()ing the previously inserted metric to be there.
This can be prevented by leveraging RwLockWriteGuard::downgrade of the previously acquired write lock instead of doing acquiring another read lock via self.metrics.read.
Does the above make sense?
There was a problem hiding this comment.
It makes sense yes. The patch now uses RwLockWriteGuard::downgrade.
fe41d10 to
e08410a
Compare
The method should not overwrite an existing key after it obtains the write lock. Consider the following scenario with `Family<LabelSet, Gauge>` used by threads A and B: 1. A and B both call `family.get_or_insert(&label_set)` 2. A and B both acquire read lock and finds no key 3. A gets write lock and inserts a new gauge with value 0 4. A increments returned gauge to 1 5. B gets write lock and inserts a new gauge with value 0 6. B increments returned gauge to 1 7. A decrements gauge back to 0 8. B decrements gauge which now overflows to `u64::MAX` Signed-off-by: Anthony Ramine <nox@nox.paris>
e08410a to
290369f
Compare
The method should not overwrite an existing key after it obtains the write lock.
Consider the following scenario with
Family<LabelSet, Gauge>used by threads A and B:family.get_or_insert(&label_set)u64::MAX