Practical Headaches of Running a Business VPN Setup

Setting up split tunneling that doesn’t leak half your traffic

Split tunneling sounds simple until half your Slack file uploads vanish into the void. I had a team setup where we funneled everything except video calls through the VPN — or so I thought. Turned out that Slack’s background telemetry traffic was still going over WAN unless we whitelisted 4 separate domains manually. One of them wasn’t even under slack.com — it was cdn.brandfolder.io. That one took me three hours to trace. Not proud.

If you’re configuring OpenVPN or WireGuard manually, double check the route priorities, and look hard at DNS resolution, especially if you’re using systemd-resolved. When your split tunnel config pushes a route for 0.0.0.0/1 and 128.0.0.0/1, but your clients are defaulting to a higher metric route left behind by an old provider, your tunnel’s not being used despite what your status widget says.

“It’s routing through the tunnel!”
No, it’s not. Run traceroute on an IP, not a domain.

This caught me off-guard on a Debian box where systemd was caching upstream DNS outside the tunnel. Internal tools failed silently. Had to nuke /etc/resolv.conf so dnsmasq on the VPN could forward it properly. Ugly, but it worked.

Cloudflare for Teams: You will break something and not know for hours

Cloudflare’s Access and Warp with zero trust rules are kind of magical until you realize that some of your users are still binding to local DNS, and therefore bypassing all those tidy traffic policies you’ve dreamed up in your dashboard. If Traffic Logs say it’s allowed but they’re getting a 403, double check their local resolver is set to warp-client. Or just delete the damned profile and start fresh—they do get corrupted sometimes, and no, there won’t be an error message.

Also: if you’re using service tokens for programmatic access to internal APIs (kudos!), remember one critical thing: if your secret gets rotated, there’s about a ten-minute propagation delay until the old one gets invalidated. Learned that while accidentally brute-forcing one of my own endpoints thinking I’d wrecked the auth middleware.

  • Don’t overly rely on pre-canned identity groups, make custom ones per app
  • Always map by email domain AND individual user — some aliases won’t match
  • Tokens expire silently unless wrapped in an automated health check
  • Device posture checks don’t log failures to the main dashboard unless it’s turned on in a second setting buried under Security → Client Policies

If you’re thinking, “But we followed the Cloudflare docs,” the docs assume you work at a company with an IAM team and a SOC. For a six-person vanilla dev shop, they’re overkill and underhelpful.

VPN client auto-updates: They might not be your friend

The time Windows auto-updated FortiClient VPN and silently re-enabled IPv6 traffic bypassing our AWS VPN interface was, let’s just say, not a good Tuesday. I still don’t know how the IPv6 stack came back alive — we disabled it via GPO months ago. No system log entry. Just IPv6 traffic silently routing around the VPN.

Watch for split-stack misbehavior

If your team is on dual-stack networks (which is increasingly default), and your VPN provider doesn’t properly funnel IPv6 through the tunnel (multiple don’t — looking at you, Cisco AnyConnect), then some of your apps may fully bypass your traffic inspection. Worse, they won’t show up in your logs because your SIEM might not even monitor IPv6 separately unless you’ve configured it to.

Seriously: test both curl -6 ifconfig.io and curl ifconfig.me while connected. You might vomit.

Managing VPN credentials at scale without violating every security principle known to man

Our first version of credential distribution was just someone pasting static VPN logins into Slack with a 7-day expiration. You know why we stopped? Because someone copy/pasted the creds into the wrong workspace chat and a random ex-contractor connected to our test environment. To his credit, he told us. Could’ve been worse.

So now we rotate VPN secrets via Vault, but only after hacking together a webhook trigger that pings Slackbot with one-time links. It’s held together with elbow grease and an old Lambda function we haven’t dared re-deploy in a year.

There is no turnkey solution for issuing per-user temporary VPN credentials that respects both your CI/CD velocity and SSO simplicity. Tailscale gets close — WireGuard keys auto-rotate behind the scenes, tied to your identity provider. But if you’re using vanilla IPSec or OpenVPN with LDAP, you’re gonna have to build the scaffolding yourself.

Undocumented behavior: macOS and captive portals with VPN installed

If you install a VPN client that registers as a system network extension (like Cisco Secure Client or Palo Alto’s GlobalProtect), your MacBook may stop detecting captive portals entirely. Like when you’re at a hotel or airport. The browser just sits there on a blank page, no captive.apple.com redirect, no handshake, no nothing.

You can squint into packet captures all you want, but the cause is usually Apple’s captive portal detector being blocked or routed through the VPN stub adapter before DNS resolves. Here’s a thing you can do:

networksetup -setv6off Wi-Fi
dscacheutil -flushcache
killall -HUP mDNSResponder

Then reload your browser. You might get the captive portal splash page. Or not. Sometimes I have to disable the VPN profile entirely, reboot, and *then* the portal kicks in.

This feels undocumented, because it is. Apple’s docs don’t mention what happens when system extensions hijack DNS before captive.apple.com is hit. It’s one of those weird layering problems that everyone notices but nobody logs.

Routing traffic through multiple VPNs without nuking your DNS setup

Had a fun moment trying to connect to two different client environments simultaneously — one over AWS Client VPN (OpenVPN based), the other through a corporate site using Check Point. I got both connections up. Then every DNS query started returning garbage because both profiles claimed to own /etc/resolv.conf and my resolver flipped a coin on every request. The wildcard search domains overlapped too. So I was SSH-ing into boxes named the same thing across two entirely different subnets. Didn’t realize it until I almost rebooted the prod box that wasn’t ours.

So here’s what I pieced together:

  • If your VPN client uses a proprietary DNS rewriter (like Check Point), run it inside a VM
  • WireGuard handles dual VPN profiles better than OpenVPN in my experience, thanks to interface-based routing
  • Consider moving to DNS over HTTPS via a stub like Stubby or Cloudflare’s cloudflared, which stays out of VPN turf wars

Also: enable logging for systemd-resolved in journald. It will at least tell you which upstream it’s using. Mine was bouncing between 10.0.0.53 and 192.168.1.1 in a way that looked random, but turned out to follow priority rules I hadn’t even configured.

ChromeOS and VPN: That little lock icon lies

I’m going to say this nice and slow for the folks running remote Chromebooks: That VPN lock icon near your Wi-Fi isn’t law. I had a client using a managed Chromebook fleet, and when we pushed a WireGuard configuration via the console, everything looked fine. Icon was there. Tunnel claimed to be connected. But background Android subsystems were still leaking analytics (yes, even with the Play Store disabled).

We eventually saw it in a Zscaler trace — outbound connections to firebase.googleapis.com and stats.tiktok.com skipping our tunnel entirely. The only way to fix it was enabling the “Always-on VPN” enforcement at the managed profile level and denying non-VPN traffic. But guess what? That setting only works fully on accounts with Google Workspace Enterprise, not just standard Chrome licenses.

So unless you’re paying for the top-tier admin tools, your devices aren’t actually locked down — no matter what the icon says.

Similar Posts