Discussion:
[Cerowrt-devel] inbound cake or fq_codel shaping fails on cable on netflix reno
Dave Taht
2018-07-21 16:09:59 UTC
Permalink
This is something I noticed years ago, inspired me to think about a
"bobbie" policer, and I decided I could live with, and never poked
into further. After our successes with shaping inbound cable at
then-typical 20mbit rates, I was happy, although never satisifed with
the 85% magic number we use to come up with a set rate for inbound
shaping.

Simultaneously with all that work we did on sqm, linux added tcp tsq,
pacing, etc, etc and in general inbound shaping cable seems to work ok
against one or more linux tcp flows. We end up with some persistent
queuing delay (at the cmts in the 5-15ms range) which I've generally
assumed was unavoidable that fq cannot cut through (oh, I dream of FQ
at the CMTS!)

BUT: In testing fast.com's test, I see it fail to shape it to anything
sane, with spikes of up to 160ms.

So, anyway, I've seen this pathology before with netflix flows. What I
didn't realize then was that it was independent of the shaped rate.
(see plots) It's independent of the rtt setting in cake too (at least,
down to 25ms). My assumption is the netflix (freebsd) traffic + the
cable is so bursty as to not let codel keep improving its drop rate.

I'm curious if it's also the case on other link layer techs (dsl, fiber)?

The structure of my test is simple: shape inbound to 80% of the
cablemodem rate, setup irtt (not strictly needed), start this,

root # flent -H flent-fremont.bufferbloat.net -t 'your_location' -s .02 ping

, let it run a few sec, then run the fast.com test.

I tried to verify this using linux reno on a recent kernel, but that
seemed healthy. Assuming it actually got reno. Or that this weirdness
is a function of the RTT.

1) Can someone else on a cablemodem (even without the latest cake,
this happens to me on older cake and fq_codel) try this test?
2) Can someone with a dsl or fiber device try this test?
3) Is there a freebsd box "out on the net", 45ms or so from me, we can
setup netperf/irtt/etc on to run flent with? (I can donate a linode in
LA I but we'd need someone that can setup freebsd)

Some pics attached, flent data files at:
http://www.taht.net/~d/fast_vs_cable.tgz

PS I also have two other issues going on. This is the first time I've
been using irtt with a 20ms interval, and I regularly see single 50+ms
spikes (in both ping and irtt) data and also see irtt stop
transmitting. On this front, it could merely be that my (not tested in
months!) test cablemodem setup is malfunctioning also! Or we're
hitting power save. Or (finally) seeing request-grant delays. Or
scheduling delay somewhere in the net-next kernel I'm using... Or....
(regardless, this seems independent of my main issue, and I've not had
such high res data before)
--
Dave TÀht
CEO, TekLibre, LLC
http://www.teklibre.com
Jonathan Morton
2018-07-21 17:23:12 UTC
Permalink
I got the same result as you. This is using latest cake.
I'd like to see a tcptrace of what's going on here. A packet capture with snaplen 100 should allow me to generate one.

- Jonathan Morton
Dave Taht
2018-07-21 17:47:48 UTC
Permalink
for reference can you do a download and capture against flent-newark,
while using the ping test?
Post by Jonathan Morton
I'd like to see a tcptrace of what's going on here. A packet capture
with snaplen 100 should allow me to generate one.
I ran it again, with net.ipv4.tcp_congestion_control=reno.
Same settings as before. 'tcpdump -s 100' ran on the host (not the
router).
Georgios
--
Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619
Dave Taht
2018-07-21 18:20:14 UTC
Permalink
hmm? you only have 15mbits down?
Post by Dave Taht
for reference can you do a download and capture against flent-newark,
while using the ping test?
1) Started a ping test using the flent-fremont server.
2) Started a tcp_8down test (for 15 seconds) using the flent-newark
server. I chose tcp_8down since fast.com was also using 8 flows.
3) Captured on the host where the above tests ran.
It seems to be working as expected here.
Georgios
--
Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619
Dave Taht
2018-07-21 20:01:07 UTC
Permalink
To summarize:

A) I think the magic 85% figure only applies at lower bandwidths.
B) We are at least partially in a pathological situation where

CMTS = 380ms of buffering, token bucket fifo at 100mbit
Cakebox: AQMing and trying to shape below 85mbit, gradually ramping up
the signalling once per 100ms downwards.

The cmts buffer fills more rapidly, particularly in slow start, while
presenting packets to the inbound shaper at 100mbit. cake starts
signalling, late, trying to achieve but at that point the apparent
RTTs are still growing rapidly (because of the buffer building up in
the cmts inflicting that RTT), so as fast as we signal, we've got such
a big buffer built up in the CMTS that tcp only sees one signal per
RTT which is mismatched against what cake is trying to thwart. The
pathology persists.

the idea for bobbie was that the goal for codel is wrong for inbound
shaping, that instead of aiming for a rate, we needed to sum all the
overage over our rate and reduce that until it all drains from cmts
shaper. So, lets say (waving hands a bit here)

we get 160mbits/sec for 8 seconds with an outbound shaped rate of 100.
That 480mbits (independent of any signalling we did to try to reduce
it) "stuck" up there. We're trying to gradually get it to 85mbits/sec,
but the signalling is now so far behind the tcp's now observed actual
rtt that it takes forever to get anywhere and we end up in steady
state.

The more, aggressive, flows you have, the worst this disparity gets.

using perhaps cake's ingress estimator it seems possible to "bob" the
rate down until it drains, or to work on more aggressively drain the
built up queue than the gentle approach fq_codel uses, policer style.
Post by Dave Taht
for reference can you do a download and capture against flent-newark,
while using the ping test?
1) Started a ping test using the flent-fremont server.
2) Started a tcp_8down test (for 15 seconds) using the flent-newark
server. I chose tcp_8down since fast.com was also using 8 flows.
3) Captured on the host where the above tests ran.
It seems to be working as expected here.
Georgios
--
Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619
Jonathan Morton
2018-07-21 20:24:11 UTC
Permalink
Post by Dave Taht
The cmts buffer fills more rapidly, particularly in slow start, while
presenting packets to the inbound shaper at 100mbit. cake starts
signalling, late, trying to achieve but at that point the apparent
RTTs are still growing rapidly (because of the buffer building up in
the cmts inflicting that RTT), so as fast as we signal, we've got such
a big buffer built up in the CMTS that tcp only sees one signal per
RTT which is mismatched against what cake is trying to thwart. The
pathology persists.
the idea for bobbie was that the goal for codel is wrong for inbound
shaping, that instead of aiming for a rate, we needed to sum all the
overage over our rate and reduce that until it all drains from cmts
shaper.
Another possibility, which I've previously mentioned but haven't got around to implementing, is to give ECN more flexibility in signalling - so that it can indicate impending congestion as well as actual congestion.

That is, as well as the present CE mark meaning "back off now", there may be softer signals carried on the dual encodings of ECT, meaning "ramp down now", "don't ramp up", and "ramp up only with caution". These signals can be given without delay, according to instantaneous conditions at the bottleneck, without needing to estimate path RTT. You could think of it as a version of DCTCP that can actually be deployed in the internet, because it doesn't destroy the existing meaning of CE.

The main problem is with getting the endpoints (both receiver and sender) to recognise these new signals and react appropriately to them. Producing these signals at the AQM is relatively easy. I think I worked out a way to do it with the two padding bytes that normally accompany the Timestamp option in TCP - this requires replacing the Timestamp option with one that has the same semantics, but also carries the extra data about recent ECT marks, and doesn't require padding to be naturally aligned in the packet.

This would give a way to halt slow-start when it reaches roughly the correct window size, instead of having it overshoot first. It would also give a way to gently control the cwnd to the ideal value while in steady-state, instead of oscillating around it.

- Jonathan Morton
Dave Taht
2018-07-21 20:36:51 UTC
Permalink
This is my "inbound trying to shape a cable connection" smoking gun.
The delay curve is the same
shaping the 110mbit cmts down to 85mbit OR 55mbit.
Dave Taht
2018-07-21 17:27:15 UTC
Permalink
Yours is not as horrific as mine in either case.

Can you provide an unshaped result as well?
Two more data points. Shaped my connection to 250Mbit out of the advertised 250Mbit (my usual setting) and shaped to 200Mbit out of the 250Mbit. This is a pre-linux-net-next cake running on an Edgemax ER4 with kernel 3.10.107-UBNT.
The regular spikes in ICMP ping is due to the crappy Puma 6 chipset in the cable modem. UDP is not affected.
Post by Dave Taht
1) Can someone else on a cablemodem (even without the latest cake,
this happens to me on older cake and fq_codel) try this test?
I just tried this on my cable comcast connection. I set ingress to ~80%
of what fast.com reports when no shaper is in place.
#tc qdisc add dev ens4 root handle 8011 cake bandwidth 16000kbit dual-
dsthost docsis ingress
#tc qdisc add dev ens3 root handle 8012 cake bandwidth 2500kbit dual-
srchost nat docsis ack-filter
I got the same result as you. This is using latest cake.
Georgios
_______________________________________________
Cake mailing list
https://lists.bufferbloat.net/listinfo/cake
_______________________________________________
Cake mailing list
https://lists.bufferbloat.net/listinfo/cake
--
Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619
Dave Taht
2018-07-21 17:42:43 UTC
Permalink
net.ipv4.tcp_congestion_control=cubic
net.ipv4.tcp_congestion_control=reno
Georgios
In the fast test this has no effect on the remote server's tcp, it's
always going to be reno.

Trying to cross-check behavior using our tests...

There isn't a specific reno setting test in flent for tcp_download as
best as I recall, so I was just calling netperf -H wherever -l 60 --
-K reno,reno

then running the flent ping test as previous mentioned.

(flent-fremont.bufferbloat.net and flent-newark both support reno bbr
and cubic, I haven't checked the others)

PS A side note is that we are not fully successfully moving the
inbound bottleneck to cake (at least in the cable case), as we do get
quite a bit of queuing delay even with linux tcp driving the tests.
I'd long written this off as inevitable, due to the bursty cable mac
but I'm grumpy this morning. 0 delay via fq would be better than even
the 15-40ms I'm getting now with linux flows.....

reno bbr cubic
--
Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619
Toke Høiland-Jørgensen
2018-07-21 19:57:25 UTC
Permalink
Post by Dave Taht
net.ipv4.tcp_congestion_control=cubic
net.ipv4.tcp_congestion_control=reno
Georgios
In the fast test this has no effect on the remote server's tcp, it's
always going to be reno.
Trying to cross-check behavior using our tests...
There isn't a specific reno setting test in flent for tcp_download as
best as I recall, so I was just calling netperf -H wherever -l 60 --
-K reno,reno
--test-parameter tcp_cong_control=reno should work for all tests...

-Toke
Sebastian Moeller
2018-07-22 10:29:53 UTC
Permalink
I believe that cable modems all default to 192.168.100.1, this seems to be backed by "Cable Modem Operations Support System Interface Specification", CM-SP-CM-OSSIv3.1-I04-150611:

" • The CM MUST support 192.168.100.1, as the well-known diagnostic IP address accessible only from the CMCI interfaces. The CM MUST support the well-known diagnostic IP address, 192.168.100.1, on all physical interfaces associated with the CMCI. The CM MUST drop SNMP requests coming from the RF interface targeting the well-known IP address."

There might be exceptions to this, but I would be amazed if these would be common...

so:

sudo ping -l 100 -c 5000 -i 0.001 192.168.100.1

should work on all/most docsis setups.
Post by Dave Taht
PS I also have two other issues going on. This is the first time I've
been using irtt with a 20ms interval, and I regularly see single 50+ms
spikes (in both ping and irtt) data and also see irtt stop
transmitting.
irtt should keep sending for the duration of the test. I noticed that it looks like irtt was actually used in only one of these initial tests: ping-2018-07-21T082842.445812.flent-newark-reno.flent. In the rest, netperf UDP_RR was used, which can stop sending upon packet loss.
If irtt was configured but didn’t run, that may be because flent does a connectivity check to the server with “irtt client -n”, where it sends three requests within 900ms (200ms timeout, then 300ms then 400ms) and if it doesn’t receive a reply, it falls back to netperf UDP_RR. Do you think that’s what happened here?
Post by Dave Taht
On this front, it could merely be that my (not tested in
months!) test cablemodem setup is malfunctioning also! Or we're
hitting power save. Or (finally) seeing request-grant delays. Or
scheduling delay somewhere in the net-next kernel I'm using... Or....
(regardless, this seems independent of my main issue, and I've not had
such high res data before)
Regarding the spikes both you and Arie you’re seeing, I also saw in one of your later emails "0 delay via fq would be better than even
https://community.ubnt.com/t5/airMAX-Installation/NanoStation-M5-ping-spikes-about-once-per-second-even-just-to/m-p/2359800/highlight/true#M119202
To summarize, with airOS on the NanoStation M5, there are isochronous pauses around once per second in the processing of all packets, not just for the WiFi device but Ethernet also. Packets are not lost, but queued for either 20ms, if one Ethernet port is connected, or 40ms, if both are connected. This behavior is exactly described by the ar7240sw_phy_poll_reset function in ag71xx_ar7240.c, so it looks to me like the ar7240 internal switch is being reset once per second for no apparent reason. So far I’ve gotten crickets in response.
sudo ping -l 100 -c 5000 -i 0.001 cablemodem
Now, back to vacation :)
_______________________________________________
Cake mailing list
https://lists.bufferbloat.net/listinfo/cake
Dave Taht
2018-07-24 02:36:51 UTC
Permalink
George does your result mean you also have a crappy cablemodem?
Post by Dave Taht
1) Can someone else on a cablemodem (even without the latest cake,
this happens to me on older cake and fq_codel) try this test?
I just tried this on my cable comcast connection. I set ingress to ~80%
of what fast.com reports when no shaper is in place.
#tc qdisc add dev ens4 root handle 8011 cake bandwidth 16000kbit dual-
dsthost docsis ingress
#tc qdisc add dev ens3 root handle 8012 cake bandwidth 2500kbit dual-
srchost nat docsis ack-filter
I got the same result as you. This is using latest cake.
Georgios
--
Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619
Loading...