Discussion:
[Cerowrt-devel] some notes on the archer c7v2's suitability for make-wifi-fast
Dave Taht
2015-03-27 02:10:50 UTC
Permalink
I took the archer c7v2 out for a set of test runs over the weekend.

A) The good news: I couldn't crash it with a full workload nor
overheat it with external temps at at 23C. I had tested the 3800 with
external temp of 44C, and i would prefer to test any new product at
that before wanting to use it here.

It was easy to configure from my test build on snapon, only needing
the addition of the kmod-ath10k package to have support for both
radios. the gui seems to work well in that test build

the "new" cake2 shaper/fq/codel qdisc (barely) managed to deal with
115mbit down and 12mbit up with 5% or so of cpu to spare with bridging
and hostapd turned off. (htb + fq_codel fell apart at 90mbits.) I
think cake can be improved quite a bit more and we really need to do
some profiling to find other bottlenecks.

having both an ath9k (fixable) and an ath10k (ac, probably not
fixable) in the box is something of a plus also.

B) The bad news: I didn't get around to testing wifi at all. I ran
into an interesting problem, where testing it with full nat enabled,
with no sqm-scripts, would peak at about 400 mbits on the rrul test,
and:

0) If hostapd was running it would run a lot - cutting performance by
about 50Mbits on the test runs. I think I posted the strace here
already.

1) the queues for both eth1 and eth0 (wan and lan, respectively) would
fill up - quite a lot, 100s of packets, on the rrul tests.

Even though the base rate of the ethernet interfaces was a gigabit,
the box could not service interrupts fast enough to clear out the
device queues in either direction, thus engaging fq_codel as part of
the overall cpu overload-handling mechanism to reduce the queue sizes
somewhat.

So I saw fairly long delays (7ms or more) when running at these
speeds through the router.

While reducing queue size when running out of cpu is a pleasing
result, it also points to possible tuning options for napi, maybe
adding xmit_more support in (or removing it entirely), in order to
fully service all interrupts in one direction or another, and also
compile options specific to the mips74k which has a long pipeline in
particular, and so far as I know the archer has no issues (as the
wndr3800 had) with unaligned access so we can turn off various hacks
on that front.

A linux feature I have long longed for is to do all timestamping (as
well as calculating the 5 tuple) on the rx path, and the tx path
leveraging that hash to fq on, and merely checking the rx entry time
on dequeue for codel.

(I know how hard this is to do, but it has become easier in more
recent linux kernel versions. This would better account for running
out of cpu in the router,
and IMHO work better on cache-hot data on the rx side)

I have 4 more routers in my stack, so far the two dlink ones were
horrible, next up are the buffalo and belkin boxes, which I hope to
get to next weekend.

With a bit more testing of the wifi, the tplink archer c7v2 may prove
out to be "not horrible" and a suitable candidate for make-wifi-fast,
but I think the cpu limitation is kind of bad and would really like to
try a quad mips or dual arm core.

And normally I don't leave nat running, and left my dataset behind on
the test box behind this router when I left for SF yesterday. :(
--
Dave Täht
Let's make wifi fast, less jittery and reliable again!

https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb
Rich Brown
2015-03-27 02:25:32 UTC
Permalink
Dave,
Post by Dave Taht
I took the archer c7v2 out for a set of test runs over the weekend.
Is there a build out there for those of us who a) own an Archer c7v2 and b) are crazy enough could try out? Or should we hold off for a while? Thanks.

Rich
Dave Taht
2015-03-27 02:30:16 UTC
Permalink
Post by Rich Brown
Dave,
Post by Dave Taht
I took the archer c7v2 out for a set of test runs over the weekend.
Is there a build out there for those of us who a) own an Archer c7v2 and b) are crazy enough could try out? Or should we hold off for a while? Thanks.
My test builds are here:

http://snapon.lab.bufferbloat.net/~cero3/ubnt/ar71xx/

primarily targetted at being able to deploy the picostation and
nanostation stuff on the core of my backbone - it happens to be
building for a lot of stuff, but not (notably) the wndr4300 which is
another low-end candidate. I can start building that too...

But as for these releases: Keep them away from your SO and children's
internet access paths please!!!

My intent at the moment is to update the whole backbone I have to
something with the latest dnsmasq ( presently 2.73rc1) and babel-1.6
when it stablizes AND find a router along the way for
make-wifi-fast.... I still have routers in my deployment running
3.7.4....

this is going to take many weekends. Hal came by to help last weekend,
we got 2 new routers up that are working pretty well, 20 to go....
more yurtlab visitors welcomed!
Post by Rich Brown
Rich
--
Dave Täht
Let's make wifi fast, less jittery and reliable again!

https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb
Jonathan Morton
2015-03-27 02:30:35 UTC
Permalink
Post by Dave Taht
I couldn't crash it with a full workload nor
overheat it with external temps at at 23C. I had tested the 3800 with
external temp of 44C, and i would prefer to test any new product at
that before wanting to use it here.
I wish thermal testing had been done on my 3G dongle. It frequently overheats and shuts itself down at 25°C ambient. It’s approaching the point where I want to move my firewall out onto the (usually cooler) balcony.

- Jonathan Morton
Jonathan Morton
2015-03-27 05:05:44 UTC
Permalink
I think cake can be improved quite a bit more and we really need to do some profiling to find other bottlenecks.
I’ve got far enough with the improved Diffserv logic to see that, at the very least, cake3 will need to do less work to figure out that it’s throttled. That’s because the hard shaper is now global rather than class-local, so I can hoist it before any of the class-specific work. If it gets past that, it can be confident that it’s got a packet to deliver.

This is important, because cake_dequeue() often gets called twice per packet - once just after cake_enqueue(), when it might be too soon to transmit, and again when the watchdog timer fires to denote the correct transmit time.

The class selection loop is also smaller and simpler (fewer edge cases to cope with), and I worked out a shortcut to put in further down, so it doesn’t have to re-run the class selection if a flow happens to be in deficit. That’s another likely win.

So those might turn out to be significant efficiency improvements, altogether. Of course, if the real overhead is elsewhere, the improvements in throughput might turn out to be small, but for the moment I’m actually focusing on behaviour rather than throughput.

On that note, I’ve added a four-class Diffserv mapping alongside the existing eight-class one. This new mapping is:

Latency Sensitive (CS7, CS6, EF, VA, CS5, CS4)
Streaming Media (AF4x, AF3x, CS3, AF2x, TOS4, CS2, TOS1)
Best Effort (CS0, AF1x, TOS2, and all not otherwise specified)
Background Traffic (CS1)
So I saw fairly long delays (7ms or more) when running at these speeds through the router.
TBH, it’s a sign of how far we’ve come that we now consider 7ms to be painful. :-)

- Jonathan Morton
Felix Fietkau
2015-03-27 20:06:56 UTC
Permalink
Post by Dave Taht
B) The bad news: I didn't get around to testing wifi at all. I ran
into an interesting problem, where testing it with full nat enabled,
with no sqm-scripts, would peak at about 400 mbits on the rrul test,
0) If hostapd was running it would run a lot - cutting performance by
about 50Mbits on the test runs. I think I posted the strace here
already.
Might be fixed in current trunk.
Post by Dave Taht
While reducing queue size when running out of cpu is a pleasing
result, it also points to possible tuning options for napi, maybe
adding xmit_more support in (or removing it entirely), in order to
fully service all interrupts in one direction or another, and also
compile options specific to the mips74k which has a long pipeline in
particular, and so far as I know the archer has no issues (as the
wndr3800 had) with unaligned access so we can turn off various hacks
on that front.
A while back, I tested xmit_more, and it didn't seem to be making any
visible difference on the router that I tested it on (MT7621).

- Felix

Loading...