What was once an elegant machine built up by the wisdom of thousands of volunteers is now a rusty bucket with holes; my Debian installation.
As I have recently fixed my interface configuration I thought I was done solving my networking problems.
On my local network, I rely on mDNS a lot, for both macOS as well as Debian. To
ssh, mosh, and sync git annexes between computers, I use .local
addresses
extensively.
In this scenario, I have two computers:
helium
, my trusty Debian workstation, connected via Ethernet, andlithium
, a 2023 M2 MacBook Air, for when I need to surf Facebook or MySpace, talking tohelium
over WiFi.
To my surprise, when turning on my computers this morning, I noticed that I wasn’t able to mosh into my MacBook Air to look at emails through neomutt, or access my Pomodoro timer (which I named Pomoglorbo).
I usually connect using mosh lithium.local tmux
from my helium
workstation.
When I try to ping lithium.local
, I get:
ping: lithium.local: Name or service not known
But interestingly, when I try pinging just lithium
, which I don’t remember
working before, I would get
PING lithium (10.0.57.235) 56(84) bytes of data.
64 bytes from lithium (10.0.57.235): icmp_seq=1 ttl=64 time=2.11 ms
64 bytes from lithium (10.0.57.235): icmp_seq=2 ttl=64 time=2.70 ms
64 bytes from lithium (10.0.57.235): icmp_seq=3 ttl=64 time=2.09 ms
and at least I know that there is a route. So now I can try and see if I can’t
ping myself from lithium
while ssh’ing into it from helium
. First, we try
using the .local
mDNS domain:
# ssh lithium ping -c 2 helium.local
PING helium.local (10.0.56.202): 56 data bytes
64 bytes from 10.0.56.202: icmp_seq=0 ttl=64 time=4.288 ms
64 bytes from 10.0.56.202: icmp_seq=1 ttl=64 time=2.279 ms
--- helium.local ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 2.279/3.284/4.288/1.005 ms
And for good measure we see what happens when we ping our Debian workstation
without the .local
suffix:
# ssh lithium ping -c 2 helium
PING helium (10.0.56.202): 56 data bytes
64 bytes from 10.0.56.202: icmp_seq=0 ttl=64 time=2.281 ms
64 bytes from 10.0.56.202: icmp_seq=1 ttl=64 time=2.765 ms
--- helium ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 2.281/2.523/2.765/0.242 ms
Finally, we ping ourselves using our mDNS domain and frustratingly get a good response:
# ping -c 1 helium.local
PING helium.local (10.0.56.202) 56(84) bytes of data.
64 bytes from helium (10.0.56.202): icmp_seq=1 ttl=64 time=0.052 ms
--- helium.local ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.052/0.052/0.052/0.000 ms
So, we know that
- There is a route between the two computers.
- Somehow,
helium
can still resolve a name forlithium
, just notlithium.local
. lithium
can resolvehelium.local
without any difficulties.helium
can resolvehelium.local
.
Most likely, something is wrong with the mDNS configuration on helium
.
Currently we run Avahi as our mDNS resolver, which we can confirm by running
systemctl status avahi-daemon.service
.
# systemctl status avahi-daemon.service
● avahi-daemon.service - Avahi mDNS/DNS-SD Stack
Loaded: loaded (/lib/systemd/system/avahi-daemon.service; enabled; preset: enabled)
Active: active (running) since Fri 2024-02-02 09:52:51 JST; 1h 27min ago
TriggeredBy: ● avahi-daemon.socket
[...]
And if we try to resolve devices on the local network using avahi-browse
, we
see:
# avahi-browse --all --resolve --terminate | head
+ enp7s0 IPv6 Brother HL-L2375DW series Web Site local
+ enp7s0 IPv4 Brother HL-L2375DW series Web Site local
+ enp7s0 IPv6 Brother HL-L2375DW series Internet Printer local
+ enp7s0 IPv4 Brother HL-L2375DW series Internet Printer local
+ enp7s0 IPv6 Brother HL-L2375DW series UNIX Printer local
+ enp7s0 IPv4 Brother HL-L2375DW series UNIX Printer local
+ enp7s0 IPv6 Brother HL-L2375DW series PDL Printer local
+ enp7s0 IPv4 Brother HL-L2375DW series PDL Printer local
Avahi is the mDNS resolver I deserved pre-installed, but not the one I need right now. So I’ll uninstall it and see how deep the rabbit hole of mDNS configuration goes. (Also, breaking your system and fixing it is exciting)
systemd-resolved
systemd-resolved is absolutely not the standard network name resolver on Debian. The Debian 12 (bookworm) release notes somewhat mention that systemd-resolved is a second class citizen.
Note that systemd-resolved was not, and still is not, the default DNS resolver in Debian.
That has never kept me from chasing my dreams.
First I remove avahi-*
, and install systemd-resolved
:
# sudo apt remove 'avahi-*'
[...]
The following packages will be REMOVED:
avahi-autoipd avahi-daemon avahi-utils
0 upgraded, 0 newly installed, 3 to remove and 0 not upgraded.
After this operation, 528 kB disk space will be freed.
[...]
sudo apt install systemd-resolved
[...]
The following additional packages will be installed:
libnss-resolve
The following NEW packages will be installed:
libnss-resolve systemd-resolved
0 upgraded, 2 newly installed, 0 to remove and 0 not upgraded.
Need to get 402 kB of archives.
After this operation, 1,028 kB of additional disk space will be used.
[...]
Converting /etc/resolv.conf to a symlink to /run/systemd/resolve/stub-resolv.conf...
Created symlink /etc/systemd/system/dbus-org.freedesktop.resolve1.service → /lib/systemd/system/systemd-resolved.service.
Created symlink /etc/systemd/system/sysinit.target.wants/systemd-resolved.service → /lib/systemd/system/systemd-resolved.service.
[...]
We test what happens in this twilight state of mDNS resolution and try to
resolve lithium.local
(the MacBook) and helium.local
(ourselves) address:
# ping -c 1 lithium.local
ping: lithium.local: Temporary failure in name resolution
# ping -c 1 -w 5 helium.local
PING helium.local(helium (fe80::dabb:c1ff:fed0:357c%enp7s0)) 56 data bytes
--- helium.local ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 4092ms
For the first result, I don’t have a good mental model for the internals of the whole chain going from ping to mDNS and to sending an actual ICMP packet, but name resolution failing certainly does make some sense.
For the second one, it seems like our network interface’s IPv6 is resolved without any response coming back. This could be because of an mDNS issue, or because we don’t respond to ICMPv6. Simply listening with tcpdump for ICMP6 echo requests yields nothing:
$ sudo tcpdump "icmp6[0] == 8"
<- imagine nothing here
The service itself is running, which is good.
# systemctl is-active systemd-resolved.service
active
We run resolvectl status
to check the status of multicast DNS and see that it
is not enabled on enp7s0
, our Ethernet interface:
# resolvectl status
Global
Protocols: +LLMNR +mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub
Link 2 (enp7s0)
Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6
Protocols: +DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: 10.0.48.1
DNS Servers: 10.0.48.1
Fine, so we go on actually configuring systemd-resolved and refer to the ArchWiki article on this matter.
Our /etc/systemd
folder has the following resolved related files:
# fd resolve /etc/systemd/
/etc/systemd/resolved.conf
/etc/systemd/system/dbus-org.freedesktop.resolve1.service
/etc/systemd/system/sysinit.target.wants/systemd-resolved.service
We see that /etc/resolv.conf
is now symlinked and managed by
systemd-resolved:
# cat /etc/resolv.conf (symlinked to -> ../run/systemd/resolve/stub-resolv.conf)
# This is /run/systemd/resolve/stub-resolv.conf managed by man:systemd-resolved(8).
[...]
nameserver 127.0.0.53
options edns0 trust-ad
search .
That looks perfectly fine. We are able to resolve regular domains on the
Internet, and we now just have to make mDNS work in our local network. We refer
to the
section on mDNS
configuration and update the /etc/systemd/network/10-ethernet.network
, which
we have previously created to make systemd-networkd handle network
configuration for us automatically.
The contents currently are:
# /etc/systemd/network/10-ethernet.network
[Match]
Type=ether
[Network]
DHCP=yes
We add the following line at the end of the [Network]
section:
# ...
[Network]
DHCP=yes
MulticastDNS=yes
We run sudo networkctl reload
and resolvectl status enp7s0
to see if
anything has changed:
# resolvectl status enp7s0
Link 2 (enp7s0)
Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6 mDNS/IPv4 mDNS/IPv6
Protocols: +DefaultRoute +LLMNR +mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: 10.0.48.1
DNS Servers: 10.0.48.1
At this point I suspect that there is something wrong with my MacBook, and I
use third computer running macOS on the same network to see if I can ping it at
lithium.local
. It turns out I can’t. I restart lithium
, and now I can reach
it from the third computer. Perhaps, I have encountered a rare networking
problem in macOS? Maybe, I am too quick to blame my humble Debian configuration
skills.
Sometimes turning it off and on again is a cure to all kinds of problems. It’s anti-climactic, but we still feel good about having learned a new tool.
To round it off, we see if resolvectl
can show us whether lithium.local
is
reachable:
# resolvectl query lithium.local
lithium.local: fe80::14af:74d4:5643:ecf8%2 -- link: enp7s0
-- Information acquired via protocol mDNS/IPv6 in 920us.
-- Data is authenticated: no; Data was acquired via local or encrypted transport: no
-- Data from: cache
We have configured systemd-resolved correctly to resolve local network host names. And learned the value of restarting machines from time to time.
Bonus round: Allow both IPv4 as well as IPv6 for ~/.ssh/authorized_keys
:
Given an SSH key fingerprint like ssh-rsa AAAA...
, you can make it available
for SSH authentication in your ~/.ssh/authorized_keys
file for only certain
networks like so:
from="10.0.0.0/8,fe80::*" ssh-rsa AAAA...
Where we allow two networks:
10.0.0.0/8
(private network), andfe80::*
(the IPv6 equivalent of a private network).
This adds another layer of defense, not even letting devices authenticate with
a key if they are on the wrong network. Of course, your /etc/ssh/sshd_config
should have something
like these lines:
KbdInteractiveAuthentication no
UsePAM yes