Discussion:
[PATCH] ipv4: mitigate an integer underflow when comparing tcp timestamps
(too old to reply)
Zhang Le
2010-11-14 07:40:01 UTC
Permalink
Behind a loadbalancer which does NAT, peer->tcp_ts could be much smaller than
req->ts_recent. In this case, theoretically the req should not be ignored.

But in fact, it could be ignored, if peer->tcp_ts is so small that the
difference between this two number is larger than 2 to the power of 31.

I understand that under this situation, timestamp does not make sense any more,
because it actually comes from difference machines. However, if anyone
ever need to do the same investigation which I have done, this will
save some time for him.

Signed-off-by: Zhang Le <***@gentoo.org>
---
net/ipv4/tcp_ipv4.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 8f8527d..1eb4974 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1352,8 +1352,8 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
peer->v4daddr == saddr) {
inet_peer_refcheck(peer);
if ((u32)get_seconds() - peer->tcp_ts_stamp < TCP_PAWS_MSL &&
- (s32)(peer->tcp_ts - req->ts_recent) >
- TCP_PAWS_WINDOW) {
+ ((s32)(peer->tcp_ts - req->ts_recent) > TCP_PAWS_WINDOW &&
+ peer->tcp_ts > req->ts_recent)) {
NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_PAWSPASSIVEREJECTED);
goto drop_and_release;
}
--
1.7.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Eric Dumazet
2010-11-14 09:00:03 UTC
Permalink
Post by Zhang Le
Behind a loadbalancer which does NAT, peer->tcp_ts could be much smaller than
req->ts_recent. In this case, theoretically the req should not be ignored.
But in fact, it could be ignored, if peer->tcp_ts is so small that the
difference between this two number is larger than 2 to the power of 31.
I understand that under this situation, timestamp does not make sense any more,
because it actually comes from difference machines. However, if anyone
ever need to do the same investigation which I have done, this will
save some time for him.
---
net/ipv4/tcp_ipv4.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 8f8527d..1eb4974 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1352,8 +1352,8 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
peer->v4daddr == saddr) {
inet_peer_refcheck(peer);
if ((u32)get_seconds() - peer->tcp_ts_stamp < TCP_PAWS_MSL &&
- (s32)(peer->tcp_ts - req->ts_recent) >
- TCP_PAWS_WINDOW) {
+ ((s32)(peer->tcp_ts - req->ts_recent) > TCP_PAWS_WINDOW &&
+ peer->tcp_ts > req->ts_recent)) {
NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_PAWSPASSIVEREJECTED);
goto drop_and_release;
}
This seems very wrong to me.

Adding a : if (peer->tcp_ts > req->ts_recent) condition is _not_ going
to help. And it might break some working setups, because of wrap around.

Really, if you have multiple clients behind a common NAT, you cannot use
this code at all, since NAT doesnt usually change TCP timestamps.

What about following patch instead ?

[PATCH] doc: extend tcp_tw_recycle documentation

tcp_tw_recycle should not be used on a server if there is a chance
clients are behind a same NAT. Document this fact before too many users
discover this too late.

Signed-off-by: Eric Dumazet <***@gmail.com>
---
Documentation/networking/ip-sysctl.txt | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index c7165f4..406f0d5 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -446,7 +446,12 @@ tcp_tso_win_divisor - INTEGER
tcp_tw_recycle - BOOLEAN
Enable fast recycling TIME-WAIT sockets. Default value is 0.
It should not be changed without advice/request of technical
- experts.
+ experts. If you set it to 1, make sure you dont miss connections
+ attempts (check LINUX_MIB_PAWSPASSIVEREJECTED netstat counter).
+ In particular, this might break if several clients are behind
+ a common NAT device, since their TCP timestamp wont be changed
+ by the NAT. tcp_tw_recycle should be used with care, most
+ probably in private networks.

tcp_tw_reuse - BOOLEAN
Allow to reuse TIME-WAIT sockets for new connections when it is


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Zhang Le
2010-11-14 15:00:02 UTC
Permalink
Post by Eric Dumazet
Post by Zhang Le
Behind a loadbalancer which does NAT, peer->tcp_ts could be much smaller than
req->ts_recent. In this case, theoretically the req should not be ignored.
But in fact, it could be ignored, if peer->tcp_ts is so small that the
difference between this two number is larger than 2 to the power of 31.
I understand that under this situation, timestamp does not make sense any more,
because it actually comes from difference machines. However, if anyone
ever need to do the same investigation which I have done, this will
save some time for him.
---
net/ipv4/tcp_ipv4.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 8f8527d..1eb4974 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1352,8 +1352,8 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
peer->v4daddr == saddr) {
inet_peer_refcheck(peer);
if ((u32)get_seconds() - peer->tcp_ts_stamp < TCP_PAWS_MSL &&
- (s32)(peer->tcp_ts - req->ts_recent) >
- TCP_PAWS_WINDOW) {
+ ((s32)(peer->tcp_ts - req->ts_recent) > TCP_PAWS_WINDOW &&
+ peer->tcp_ts > req->ts_recent)) {
NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_PAWSPASSIVEREJECTED);
goto drop_and_release;
}
This seems very wrong to me.
Adding a : if (peer->tcp_ts > req->ts_recent) condition is _not_ going
to help. And it might break some working setups, because of wrap around.
Yeah, you are right. And sorry for overlooking this.

I should have reviewed time_{before,after}'s implementation before posting this.

So it seems we can't do anything to improve this except to add some warning in
documentation. Maybe some comments in the code too.
Post by Eric Dumazet
Really, if you have multiple clients behind a common NAT, you cannot use
this code at all, since NAT doesnt usually change TCP timestamps.
What about following patch instead ?
[PATCH] doc: extend tcp_tw_recycle documentation
tcp_tw_recycle should not be used on a server if there is a chance
clients are behind a same NAT. Document this fact before too many users
discover this too late.
---
Documentation/networking/ip-sysctl.txt | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index c7165f4..406f0d5 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -446,7 +446,12 @@ tcp_tso_win_divisor - INTEGER
tcp_tw_recycle - BOOLEAN
Enable fast recycling TIME-WAIT sockets. Default value is 0.
It should not be changed without advice/request of technical
- experts.
+ experts. If you set it to 1, make sure you dont miss connections
+ attempts (check LINUX_MIB_PAWSPASSIVEREJECTED netstat counter).
+ In particular, this might break if several clients are behind
+ a common NAT device, since their TCP timestamp wont be changed
+ by the NAT. tcp_tw_recycle should be used with care, most
+ probably in private networks.
tcp_tw_reuse - BOOLEAN
Allow to reuse TIME-WAIT sockets for new connections when it is
--
Zhang, Le
Gentoo/Loongson Developer
http://zhangle.is-a-geek.org
0260 C902 B8F8 6506 6586 2B90 BC51 C808 1E4E 2973
David Miller
2010-11-14 20:00:02 UTC
Permalink
From: Eric Dumazet <***@gmail.com>
Date: Sun, 14 Nov 2010 09:52:25 +0100
Post by Eric Dumazet
Really, if you have multiple clients behind a common NAT, you cannot use
this code at all, since NAT doesnt usually change TCP timestamps.
NAT is %100 incompatible with TW recycling, full stop.

There is no maybe, or maybe not.

If you are behind NAT you must not turn this feature on, ever.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
ZHANG, Le
2010-11-25 17:00:02 UTC
Permalink
Post by David Miller
Date: Sun, 14 Nov 2010 09:52:25 +0100
Post by Eric Dumazet
Really, if you have multiple clients behind a common NAT, you cannot use
this code at all, since NAT doesnt usually change TCP timestamps.
NAT is %100 incompatible with TW recycling, full stop.
There is no maybe, or maybe not.
If you are behind NAT you must not turn this feature on, ever.
Sorry, this question may be OT on this list, but I am just curious:

Is there any other OS has implemented this feature like Linux?

To be very specific, by this feature, I mean rejecting old duplicates based
on per-host cache of last timestamp received from any connections.
As suggested in RFC1323 Appendix B.2 (b).

Does anyone, by any chance, know the answer? Thanks in advance!
--
ZHANG, Le
http://zhangle.is-a-geek.org
0260 C902 B8F8 6506 6586 2B90 BC51 C808 1E4E 2973
Loading...