I do a lot of closed network design for projects. Island networks for developer teams, with no internet, but all the collaboration accoutrements a productive team might need. Authentication, repositories, build systems, file sharing, email, SSO, etc.
Yesterday one of them blew up. The collaboration suite stopped working. My first theory was that something ran out of space. And I was right, but not the collaboration suite itself. Turns out the LDAP server which handles authentication ran out of space, and the collaboration suite died because it couldn’t contact the LDAP server.
But wait, why did the LDAP server run out of space? All it’s doing is LDAP and DNS.
And the journey begins.
A while back, I had disabled recursive DNS queries because someone’s chatty MS product was spewing so many DNS lookups that would never resolve, and those queries were subject to a timeout, and those backed up queries created a logjam that prevented legitimate queries for local assets from getting through. Disabling/disallowing recursive queries seemed to shut everyone up, since the queries were immediately denied rather than waiting for the timeout, so I moved on.
Yesterday’s problem was a bit more intense. Someone had pulled an email from outside the system into Outlook on the closed system. Not a problem, right? Well, Outlook is downright screwy sometimes. Just the act of doing that caused that user’s Outlook to spew over 600 DNS queries per second, and since the DNS server had defaulted to query logging, it resulted in 20+ GB of query logs, to the tune of 46 million queries in less than 60 hours.
This seemed slightly excessive to me.
I know I could have just turned off query logging, but I thought of another approach that might stop the noise without sacrificing query logging, because, you never know how that information might help. Also, that doesn’t STOP the traffic, it only stops recording the traffic. So I took all the domains from the chattiest queries — by far the highest was from that clearly broken Outlook process, an infinitely-repeating query to an outlook mobile / O365 address on msedge.net — and created fake authoritative zones on my DNS server. I was almost surprised that that shut things up immediately. Because I left query logging on, I could see an immediate effect. I guess an authoritative no is enough to shut things up in cases where a denied query might not.
All this is to say, really, I think 99% of people really have no idea just how much communication goes on behind the scenes in their so-called private networks. This is a set of clients that have never touched the Internet. Fresh out of the box, with updates applied from WSUS offline bundles, and I’ve got hundreds of thousands of queries to Facebook, Twitter, Ebay, Amazon, Google, MS and more. None of it was initiated by the user. This is all of that “user as product” bullshit.