Time out HTLCs before they expire#587
Conversation
|
Rebase plz? |
d3a418c to
3a3403a
Compare
|
Done :). |
lightning/src/ln/channelmanager.rs
Outdated
| pending_msg_events.push(events::MessageSendEvent::SendAnnouncementSignatures { | ||
| let res = channel.block_connected(header, height, txn_matched, indexes_of_txn_matched); | ||
| if let Ok((chan_res, mut timed_out_pending_htlcs)) = res { | ||
| timed_out_htlcs.reserve(timed_out_pending_htlcs.len()); |
There was a problem hiding this comment.
I think this commit won't compile bc timed_out_htlcs is declared in the next commit
There was a problem hiding this comment.
Hmm, every commit seems to compile fine here?
There was a problem hiding this comment.
oops, think I was looking at the commits in the wrong order (I blame github)
lightning/src/ln/functional_tests.rs
Outdated
| } | ||
|
|
||
| #[test] | ||
| fn test_htlc_timeout() { |
There was a problem hiding this comment.
Could you add a holding cell HTLC that gets failed back, to the test?
I found that case a bit weird since if an HTLC is in the holding cell when it times out, if we are the sender it seems like we could just remove it from the holding cell without failing it backwards.
There was a problem hiding this comment.
Hmm. I'm not sure which case you're asking about. This test/the previous commit is really only intended to be useful for inbound HTLCs (which can't be in the holding cell, that's just for outbound stuff or the fail/fulfill action itself) which are to us, as forwarded ones we are mainly focused on going to the chain to broadcast HTLC-Success if it matters.
There was a problem hiding this comment.
Oops, I was looking at commits out of order.
Thanks for the explanation 👍 though, I thought we still have to close a channel if a forward HTLC times out?
There was a problem hiding this comment.
If we have the preimage and its outbound, yea, or if its inbound, always, yea. If we don't, its probably not worth the fee to close the channel when we aren't gonna make money for it.
3a3403a to
a342d32
Compare
| }, onion_packet), channel_state, chan) | ||
| } { | ||
| Some((update_add, commitment_signed, monitor_update)) => { | ||
| if let Err(e) = self.monitor.update_monitor(chan.get().get_funding_txo().unwrap(), monitor_update) { |
There was a problem hiding this comment.
Idea for another day, but could there be a way to push a a "test the waters" monitor update, before pushing the real one? to help help dodge these sorts of problems/confusing partial timeouts due to monitor update failure?
There was a problem hiding this comment.
Yes, and no. There was a long discussion at #441 (comment) which is somewhat related, but the real issue with this API is its not really possible to implement a client for it. Imagine a hardware wallet that has to implement it - there's no way to guarantee that an update goes through in advance, even if you can send the update to the computer before you need to commit it - what if the user unplugged the device at an inopportune time?
|
|
||
| let (payment_preimage, payment_hash) = get_payment_preimage_hash!(&nodes[0]); | ||
| let payment_secret = PaymentSecret([0xdb; 32]); | ||
| let mut route = nodes[0].router.get_route(&nodes[3].node.get_our_node_id(), None, &[], 100000, TEST_FINAL_CLTV).unwrap(); |
There was a problem hiding this comment.
Worth asserting what route.paths[0] is, to make sure we're not just using 2 of the same path?
There was a problem hiding this comment.
pass_along_path does it for us, albeit indirectly.
lightning/src/ln/channelmanager.rs
Outdated
| pending_msg_events.push(events::MessageSendEvent::SendAnnouncementSignatures { | ||
| let res = channel.block_connected(header, height, txn_matched, indexes_of_txn_matched); | ||
| if let Ok((chan_res, mut timed_out_pending_htlcs)) = res { | ||
| timed_out_htlcs.reserve(timed_out_pending_htlcs.len()); |
There was a problem hiding this comment.
oops, think I was looking at the commits in the wrong order (I blame github)
lightning/src/ln/channel.rs
Outdated
| &HTLCUpdateAwaitingACK::AddHTLC { ref payment_hash, ref source, ref cltv_expiry, ref amount_msat, .. } => { | ||
| if cltv_expiry <= &height { | ||
| timed_out_htlcs.push((source.clone(), payment_hash.clone(), *amount_msat)); | ||
| false |
There was a problem hiding this comment.
Might be good to add a test for this case, to ensure holding cell HTLCs are timed out properly?
There was a problem hiding this comment.
Done! Caught a few bugs along the way, of course :/.
lightning/src/ln/channelmanager.rs
Outdated
|
|
||
| channel_state.claimable_htlcs.retain(|&(ref payment_hash, _), htlcs| { | ||
| htlcs.retain(|htlc| { | ||
| if height >= htlc.cltv_expiry - CLTV_CLAIM_BUFFER - LATENCY_GRACE_PERIOD_BLOCKS { |
There was a problem hiding this comment.
Could you help me understand LATENCY_GRACE_PERIOD_BLOCKS? I would have thought it should be added here if its purpose is to give our peer more time. Specifically, could you tell me if I'm reading this comment incorrectly?
There was a problem hiding this comment.
Or if I'm reading this wrong, maybe it would be clearer if LATENCY_GRACE_PERIOD_BLOCKS was moved to the other side of the inequality:
if height + LATENCY_GRACE_PERIOD_BLOCKS >= htlc.cltv_expiry - CLTV_CLAIM_BUFFER {There was a problem hiding this comment.
claimable_htlcs isnt about peers, its about the user of RL finding the preimage and claim. Another way that may help would be to look at LATENCY_GRACE_PERIOD_BLOCKS as "the time it takes us to do a commitment update". I added a comment to this effect.
There was a problem hiding this comment.
Would it make sense then to define an appropriately named constant for CLTV_CLAIM_BUFFER + LATENCY_GRACE_PERIOD_BLOCKS?
I see these constants used together in a few places, so if there is a name that suitably captures this concept, it may be worth defining such a constant to make the code more understandable and document it there instead.
lightning/src/ln/functional_tests.rs
Outdated
| // Expect pending failures, but we don't bother trying to update the channel state with them. | ||
| let events = nodes[0].node.get_and_clear_pending_events(); | ||
| assert_eq!(events.len(), 1); | ||
| match events[0] { | ||
| Event::PendingHTLCsForwardable { .. } => { }, | ||
| _ => panic!("Unexpected event"), | ||
| }; |
There was a problem hiding this comment.
The comment essentially states how the following code block differs from expect_pending_htlcs_forwardable (i.e., "but we don't bother trying to update the channel state with them"). However, I think the comment would be more useful if it stated why we're not updating the channel state, assuming that this is pertinent to the test.
Also, given that this block of code and preceding comment is repeated a few times, I'd recommend making an appropriately named utility function for it, especially since it is almost identical to expect_pending_htlcs_forwardable. Then a comment might be unnecessary.
There was a problem hiding this comment.
Hmm, not sure what to say aside from "the test doesn't need to do the forwards". I converted it to a macro with "_ignore" so its a bit more DRY, thgouh.
| 'path_loop: for path in route.paths.iter() { | ||
| macro_rules! check_res_push { | ||
| ($res: expr) => { match $res { | ||
| Ok(r) => r, | ||
| Err(e) => { | ||
| results.push(Err(e)); | ||
| continue 'path_loop; | ||
| }, | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Glad to see a helper method added rather than using this macro! 😀
lightning/src/ln/channelmanager.rs
Outdated
| } | ||
|
|
||
| // Only public for testing, this should otherwise never be called direcly | ||
| pub(crate) fn send_payment_path(&self, path: &Vec<RouteHop>, payment_hash: &PaymentHash, payment_secret: &Option<PaymentSecret>, total_value: u64, cur_height: u32) -> Result<(), APIError> { |
There was a problem hiding this comment.
To others, I found this command helpful for reviewing since the diff contains mostly moved code:
git diff --color-moved=dimmed-zebra --color-moved-ws=allow-indentation-change 9224f01^ 9224f01 lightning/src/ln/channelmanager.rs
There was a problem hiding this comment.
^ this is what i was missing 😄
e799bf6 to
664987b
Compare
lightning/src/ln/channelmanager.rs
Outdated
|
|
||
| channel_state.claimable_htlcs.retain(|&(ref payment_hash, _), htlcs| { | ||
| htlcs.retain(|htlc| { | ||
| if height >= htlc.cltv_expiry - CLTV_CLAIM_BUFFER - LATENCY_GRACE_PERIOD_BLOCKS { |
There was a problem hiding this comment.
Would it make sense then to define an appropriately named constant for CLTV_CLAIM_BUFFER + LATENCY_GRACE_PERIOD_BLOCKS?
I see these constants used together in a few places, so if there is a name that suitably captures this concept, it may be worth defining such a constant to make the code more understandable and document it there instead.
lightning/src/ln/channelmanager.rs
Outdated
|
|
||
| channel_state.claimable_htlcs.retain(|&(ref payment_hash, _), htlcs| { | ||
| htlcs.retain(|htlc| { | ||
| if height >= htlc.cltv_expiry - CLTV_CLAIM_BUFFER - LATENCY_GRACE_PERIOD_BLOCKS { |
There was a problem hiding this comment.
Worth adding a check/panic in case the height is much higher than we expect, and we should be closing the channel rather than failing the HTLC backwards?
There was a problem hiding this comment.
Hmm, we could maybe be more strict about things like "the user is connecting a block at height 20 but we thought we were at height 10", but, in general, there's no reason to fail a channel to the chain just because we Definitely Cannot Claim such an HTLC.
lightning/src/ln/functional_tests.rs
Outdated
| commitment_signed_dance!(nodes[0], nodes[1], htlc_timeout_updates.commitment_signed, false); | ||
| let events = nodes[0].node.get_and_clear_pending_events(); | ||
| match events[0] { | ||
| Event::PaymentFailed { payment_hash, rejected_by_dest, error_code } => { |
There was a problem hiding this comment.
Getting this compilation error on this commit: https://i.imgur.com/dXJErz0.png
There was a problem hiding this comment.
Should be resolved by the later commit: "f rebase on ..."
| if height >= htlc.cltv_expiry - CLTV_CLAIM_BUFFER - LATENCY_GRACE_PERIOD_BLOCKS { | ||
| timed_out_htlcs.push((HTLCSource::PreviousHopData(htlc.prev_hop.clone()), payment_hash.clone(), htlc.value)); | ||
| let mut htlc_msat_height_data = byte_utils::be64_to_array(htlc.value).to_vec(); | ||
| htlc_msat_height_data.extend_from_slice(&byte_utils::be32_to_array(height)); |
There was a problem hiding this comment.
nice catch in this commit
|
|
||
| macro_rules! expect_payment_failed { | ||
| ($node: expr, $expected_payment_hash: expr, $rejected_by_dest: expr) => { | ||
| ($node: expr, $expected_payment_hash: expr, $rejected_by_dest: expr $(, $expected_error_code: expr, $expected_error_data: expr)*) => { |
There was a problem hiding this comment.
Why not just always check the error code/data?
There was a problem hiding this comment.
Just because that requires going back and writing in a new error check at every callsite. It could be done in a followup but its quite a bit of work.
valentinewallace
left a comment
There was a problem hiding this comment.
finished going through the commits :)
| // Pass the first HTLC of the payment along to nodes[3]. | ||
| let mut events = nodes[0].node.get_and_clear_pending_msg_events(); | ||
| assert_eq!(events.len(), 1); | ||
| pass_along_path(&nodes[0], &[&nodes[1], &nodes[3]], 0, payment_hash.clone(), Some(payment_secret), events.pop().unwrap(), false); |
There was a problem hiding this comment.
nit: could check that we aren't able to claim before the monitor gets successfully updated
There was a problem hiding this comment.
pass_along_path checks for us that we don't get the PaymentReceived event (the last false/true bool).
| let mut htlc_msat_height_data = byte_utils::be64_to_array(htlc.value).to_vec(); | ||
| htlc_msat_height_data.extend_from_slice(&byte_utils::be32_to_array(height)); | ||
| timed_out_htlcs.push((HTLCSource::PreviousHopData(htlc.prev_hop.clone()), payment_hash.clone(), HTLCFailReason::Reason { | ||
| failure_code: 0x4000 | 15, |
There was a problem hiding this comment.
I don't see why this failure code is "missing payment details" but above it's "expiry too soon"? expiry too soon makes more sense to me in both situations.. could be wrong but this just doesn't feel like the intended purpose of the "missing payment details" error.
There was a problem hiding this comment.
The spec for incorrect_or_unknown_payment_details says "or the CLTV expiry of the htlc is too close to the current block height for safe handling." :)
| })); | ||
| } | ||
| if let Some(funding_locked) = chan_res { | ||
| pending_msg_events.push(events::MessageSendEvent::SendFundingLocked { |
There was a problem hiding this comment.
nit: could add a log here as well similar to the one just below.
There was a problem hiding this comment.
Hmm? I'm not sure which you're referring to (the log line three down is sent in either if case.
664987b to
edb93e2
Compare
|
Addressed all the outstanding feedback, I think. |
edb93e2 to
05904e3
Compare
|
Great explanation of |
We only do this for incoming HTLCs directly as we rely on channel closure and HTLC-Timeout broadcast to fail any HTLCs which we relayed onwards where our next-hop doesn't update_fail in time.
expect_payment_failed!() was introduced after many of the tests which could use it were written, so we take this opportunity to switch them over now, increasing test coverage slightly by always checking the payment hash expected.
This is a key test for our automatic HTLC time-out logic, as it ensures we don't allow an HTLC which indicates we should wait for additional HTLCs before responding to cause us to force-close a channel due to HTLC near-timeout.
Relatively simple test that, after a monitor update fails, we get the right return value and continue with the bits of the MPP that did not send after the monitor updating is restored.
In case of committing out-of-time outgoing HTLCs, we force ourselves to close the channel to avoid remote peer claims on a non-backed HTLC
05904e3 to
d316f30
Compare
This was broken out from the rest of #441 as @valentinewallace pointed out that it wasn't strictly necessary and #441 was big. I am tagging it 0.0.11 because it is kinda necessary, just not strictly. Namely, if we get a partial payment, but not a full one, we really need to fail back the partial bits before the channel has to go to chain for it. Based on #441, and also includes another test that makes sense in #441 but requires some of the refactors that were already in here so might as well wait.