Discussion:
[Cerowrt-devel] apu2 sqm/htb issue + a minor win for speeding up fq_codel itself
Dave Taht
2018-09-04 19:59:15 UTC
Permalink
less than scientifically (via monitoring top) - on the apu2

100Mbit sqm (htb + fq_codel)

fq_codel_mainline | fq_codel_fast
idle 78.8 | 83.5 |
si 20 | 16.1 |

Yea! But:

900Mbit sqm (htb + fq_codel)

fq_codel_mainline | fq_codel_fast
idle 74.4 | 74.4 |
si 25 | 25.1 |

Here: completely bottlenecked on ksoftirqd - and I only get 340Mbits
out of the 900mbit setting. quantum 96k and burst of 15000. Haven't
fiddled with higher values yet...
--
Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619
Dave Taht
2018-09-04 21:14:07 UTC
Permalink
making htb's cburst and burst parameters 64k gets the APU2 up to
where it can shape 900mbits. 3 ksoftirq handlers start getting cpu
time, and we end up 54% idle to achiefe that.

I should really go around running my own old code. I was deeply
involved in sqm when we still had to run at sub 200mbit levels. since
then it's been
mostly tbf (burst 64k) + fq_codel or cake, and me ignoring various bug
reports about it not scaling well enough at higher rates.
Post by Dave Taht
less than scientifically (via monitoring top) - on the apu2
100Mbit sqm (htb + fq_codel)
fq_codel_mainline | fq_codel_fast
idle 78.8 | 83.5 |
si 20 | 16.1 |
900Mbit sqm (htb + fq_codel)
fq_codel_mainline | fq_codel_fast
idle 74.4 | 74.4 |
si 25 | 25.1 |
Here: completely bottlenecked on ksoftirqd - and I only get 340Mbits
out of the 900mbit setting. quantum 96k and burst of 15000. Haven't
fiddled with higher values yet...
--
Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619
--
Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619
Dave Taht
2018-09-04 21:16:27 UTC
Permalink
my guess is that burst and cburst should scale roughly as a function
of the bytes that can fit into 1ms.
Post by Dave Taht
making htb's cburst and burst parameters 64k gets the APU2 up to
where it can shape 900mbits. 3 ksoftirq handlers start getting cpu
time, and we end up 54% idle to achiefe that.
I should really go around running my own old code. I was deeply
involved in sqm when we still had to run at sub 200mbit levels. since
then it's been
mostly tbf (burst 64k) + fq_codel or cake, and me ignoring various bug
reports about it not scaling well enough at higher rates.
Post by Dave Taht
less than scientifically (via monitoring top) - on the apu2
100Mbit sqm (htb + fq_codel)
fq_codel_mainline | fq_codel_fast
idle 78.8 | 83.5 |
si 20 | 16.1 |
900Mbit sqm (htb + fq_codel)
fq_codel_mainline | fq_codel_fast
idle 74.4 | 74.4 |
si 25 | 25.1 |
Here: completely bottlenecked on ksoftirqd - and I only get 340Mbits
out of the 900mbit setting. quantum 96k and burst of 15000. Haven't
fiddled with higher values yet...
--
Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619
--
Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619
--
Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619
Dave Taht
2018-09-06 18:03:44 UTC
Permalink
I put a bug here. Someone with a non apu product struggling with
shaping (edgerouter? omnia?)

https://github.com/tohojo/sqm-scripts/issues/71
Cool, well I for one would like to see the APU be able to handle higher speeds, for FreeNet’s backhaul, at least. Although frankly, I’ve not definitively witnessed any significant bloat in their backhaul yet with production traffic.
A good number of their routers are still ALIX (https://www.pcengines.ch/alix2d2.htm), all of which are on an upgrade list. These don’t do hfsc + sfq on kernel 2.6.26 much beyond about 70 Mbit. Not a problem to focus on… :)
my guess is that burst and cburst should scale roughly as a function
of the bytes that can fit into 1ms.
making htb's cburst and burst parameters 64k gets the APU2 up to
where it can shape 900mbits. 3 ksoftirq handlers start getting cpu
time, and we end up 54% idle to achiefe that.
I should really go around running my own old code. I was deeply
involved in sqm when we still had to run at sub 200mbit levels. since
then it's been
mostly tbf (burst 64k) + fq_codel or cake, and me ignoring various bug
reports about it not scaling well enough at higher rates.
less than scientifically (via monitoring top) - on the apu2
100Mbit sqm (htb + fq_codel)
fq_codel_mainline | fq_codel_fast
idle 78.8 | 83.5 |
si 20 | 16.1 |
900Mbit sqm (htb + fq_codel)
fq_codel_mainline | fq_codel_fast
idle 74.4 | 74.4 |
si 25 | 25.1 |
Here: completely bottlenecked on ksoftirqd - and I only get 340Mbits
out of the 900mbit setting. quantum 96k and burst of 15000. Haven't
fiddled with higher values yet...
--
Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619
Mikael Abrahamsson
2018-10-03 13:56:37 UTC
Permalink
Post by Dave Taht
I put a bug here. Someone with a non apu product struggling with
shaping (edgerouter? omnia?)
https://github.com/tohojo/sqm-scripts/issues/71
Yes, I have the same problem. My WRT1200AC (Marvell Armada 385) has
seriously degraded performance in OpenWrt 18.06.1 compared to whatever was
in in 17.01.x, I'd say factor 3-4 worse.
--
Mikael Abrahamsson email: ***@swm.pp.se
Dave Taht
2018-10-03 14:44:27 UTC
Permalink
Post by Mikael Abrahamsson
Post by Dave Taht
I put a bug here. Someone with a non apu product struggling with
shaping (edgerouter? omnia?)
https://github.com/tohojo/sqm-scripts/issues/71
Yes, I have the same problem. My WRT1200AC (Marvell Armada 385) has
seriously degraded performance in OpenWrt 18.06.1 compared to whatever was
in in 17.01.x, I'd say factor 3-4 worse.
OK, I'm basically seeing that too on the same hardware. I can barely
shape 100Mbit inbound, But it's not cake, fq_codel is also running out
of cpu.

I *think*, but am not sure, this box could do a lot more prior to
this, but I never really tried. I'm off mostly debugging a babel
problem at the moment,
and (sigh), having no ipv6, a babel bug, and a severe performance hit
thus far in this release is depressing as hell.

life was better for everyone when we spent more time rigorously
testing stuff before it got released. I still have one
good ole 'cerowrt box online, no money coming in, and while I did get
a reprieve on having to close up the lab, I have to give up the yurt
shortly.
Post by Mikael Abrahamsson
--
--
Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619
Dave Taht
2018-10-03 14:45:35 UTC
Permalink
at least, the bsd version issues may have settled down somewhat:
https://forum.netgate.com/topic/112527/playing-with-fq_codel-in-2-4/553
Post by Dave Taht
Post by Mikael Abrahamsson
Post by Dave Taht
I put a bug here. Someone with a non apu product struggling with
shaping (edgerouter? omnia?)
https://github.com/tohojo/sqm-scripts/issues/71
Yes, I have the same problem. My WRT1200AC (Marvell Armada 385) has
seriously degraded performance in OpenWrt 18.06.1 compared to whatever was
in in 17.01.x, I'd say factor 3-4 worse.
OK, I'm basically seeing that too on the same hardware. I can barely
shape 100Mbit inbound, But it's not cake, fq_codel is also running out
of cpu.
I *think*, but am not sure, this box could do a lot more prior to
this, but I never really tried. I'm off mostly debugging a babel
problem at the moment,
and (sigh), having no ipv6, a babel bug, and a severe performance hit
thus far in this release is depressing as hell.
life was better for everyone when we spent more time rigorously
testing stuff before it got released. I still have one
good ole 'cerowrt box online, no money coming in, and while I did get
a reprieve on having to close up the lab, I have to give up the yurt
shortly.
Post by Mikael Abrahamsson
--
--
Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619
--
Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619
Mikael Abrahamsson
2018-10-03 15:30:00 UTC
Permalink
Post by Dave Taht
I *think*, but am not sure, this box could do a lot more prior to
this, but I never really tried. I'm off mostly debugging a babel
problem at the moment,
I know for a fact that this box (WRT1200AC) did gigabit at MSS=400 one-way
using fq_codel/cake before. I tested it a lot back then. Right now, I am
using it as a 250/100 megabit/s machine, and it seems to spend a lot CPU
doing that.
--
Mikael Abrahamsson email: ***@swm.pp.se
Dave Taht
2018-10-03 16:05:12 UTC
Permalink
Well, fq_codel does use a lot less cpu but everything seems slower....
Post by Mikael Abrahamsson
Post by Dave Taht
I *think*, but am not sure, this box could do a lot more prior to
this, but I never really tried. I'm off mostly debugging a babel
problem at the moment,
I know for a fact that this box (WRT1200AC) did gigabit at MSS=400 one-way
using fq_codel/cake before. I tested it a lot back then. Right now, I am
using it as a 250/100 megabit/s machine, and it seems to spend a lot CPU
doing that.
--
--
Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619
Toke Høiland-Jørgensen
2018-10-03 17:43:32 UTC
Permalink
Post by Dave Taht
Well, fq_codel does use a lot less cpu but everything seems slower....
I don't suppose 18.06 enables any of the SPECTRE mitigations (was that
an issue on ARM)?

-Toke
Dave Taht
2018-10-03 17:53:00 UTC
Permalink
Post by Toke Høiland-Jørgensen
Post by Dave Taht
Well, fq_codel does use a lot less cpu but everything seems slower....
I don't suppose 18.06 enables any of the SPECTRE mitigations (was that
an issue on ARM)?
I have no idea, but certainly those could be a factor.
Post by Toke Høiland-Jørgensen
-Toke
--
Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619
Jonathan Morton
2018-10-03 18:32:23 UTC
Permalink
Post by Toke Høiland-Jørgensen
I don't suppose 18.06 enables any of the SPECTRE mitigations (was that
an issue on ARM)?
That depends on the ARM core involved. Most of them in CPE devices (eg. Cortex-A5/7/53) have in-order execution engines, so should be immune - but it's not inconceivable that some of the mitigations are enabled regardless.

The WRT1200AC uses the Marvell 88F6820 which has a pair of Cortex-A9 cores. These are mildly out-of-order engines which would be at least theoretically vulnerable to Spectre v1, but that is not a kernel-level exploit. According to https://www.techarp.com/guides/complete-meltdown-spectre-cpu-list/4/#arm the Cortex-A9 is also vulnerable to Spectre v2 which is the branch-predictor poisoning attack, for which kernel-level mitigations may be appropriate. It is however immune to Meltdown.

I'm not familiar with precisely what mitigations are now in use on ARM. I am however certain that, on a device running only trustworthy code (ie. not running a Web browser), mitigating Spectre is unnecessary. If an attacker gets into a position to exploit it, he's already compromised the device enough to run a botnet anyway.

- Jonathan Morton
Toke Høiland-Jørgensen
2018-10-03 20:12:50 UTC
Permalink
Post by Jonathan Morton
I'm not familiar with precisely what mitigations are now in use on
ARM. I am however certain that, on a device running only trustworthy
code (ie. not running a Web browser), mitigating Spectre is
unnecessary. If an attacker gets into a position to exploit it, he's
already compromised the device enough to run a botnet anyway.
Yup, especially on openwrt, where most daemons run as root anyway :)

I would assume that something like the retpoline indirect function call
protection is not actually enabled on openwrt; but since we were talking
about performance regressions, that is certainly a major one...

-Toke
Mikael Abrahamsson
2018-10-05 12:52:41 UTC
Permalink
Post by Mikael Abrahamsson
Post by Dave Taht
I *think*, but am not sure, this box could do a lot more prior to
this, but I never really tried. I'm off mostly debugging a babel
problem at the moment,
I know for a fact that this box (WRT1200AC) did gigabit at MSS=400 one-way
using fq_codel/cake before. I tested it a lot back then. Right now, I am
using it as a 250/100 megabit/s machine, and it seems to spend a lot CPU
doing that.
I did some new tests. Now I can't reproduce the problem.

I installed 18.06.1 and it'll do single TCP flow MSS 200 (-M 200 in
iperf3) at 550 megabit/s shown in iperf3, and 87% sirq shown in top on
WRT1200AC. When I set in/out speed at 800M and enable cake/layer_cake.qos
then performance drops to ~400 megabit/s with the same packet flow. So the
performance degradation is only around 20%, which I think is perfectly
acceptable. I get very similar results with fq_codel and simple.qos.

I also took some power meter readings, WRT1200AC idles as 9.2W and at full
CPU goes up to 10.4W.
--
Mikael Abrahamsson email: ***@swm.pp.se
Loading...