Hello Quinn,
I am in the middle of investigating an issue arising in the call to setsockopt syscall where it returns an undocumented and unexpected errno.
What’s that value?
A IPv4 SOCK_DGRAM socket that's bound and subsequently a setsockopt on that socket for IP_ADD_MEMBERSHIP option is leading to the return value from that call to be -1 with errno set to 8, which gets reported as "Exec format error". The setsockopt reproducer is very trivial
#include <netinet/in.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <unistd.h>
#include <arpa/inet.h>
int main(int argc, char *argv[]) {
if (argc != 3) {
fprintf(stderr, "Error, expected usage: <program> <multicast-ip-address> <network-interface-ip-address>\n");
fprintf(stderr, "example usage: ./a.out 225.4.5.6 192.168.1.2\n");
return -1;
}
char *mcast_join_group_addr = argv[1];
char *network_intf_addr = argv[2];
fprintf(stderr, "test will join multicast group address = %s of network interface address = %s\n",
mcast_join_group_addr, network_intf_addr);
// create a datagram IPv4 socket
int type = SOCK_DGRAM;
int domain = AF_INET;
int fd = socket(domain, type, 0);
if (fd < 0) {
fprintf(stderr, "FAILED to create socket, errno %d - %s\n", errno, strerror(errno));
return -1;
}
fprintf(stderr, "SOCK_DGRAM socket created, fd=%d\n", fd);
// bind the socket to a wildcard address and ephemeral port
struct sockaddr_in sa;
memset((char *) &sa, 0, sizeof(sa));
sa.sin_family = AF_INET;
sa.sin_port = 0;
// bind to wildcard
inet_pton(AF_INET, "0.0.0.0", &(sa.sin_addr.s_addr));
socklen_t len = sizeof(sa);
int b = bind(fd, (struct sockaddr *) &sa, len);
if (b < 0) {
fprintf(stderr, "failed to bind: errno=%d - %s\n", errno, strerror(errno));
return -1;
}
fprintf(stderr, "socket successfully bound\n");
// set IP_ADD_MEMBERSHIP socket option on the socket
struct ip_mreq mreq;
// multicast group address
inet_pton(AF_INET, mcast_join_group_addr, &(mreq.imr_multiaddr.s_addr));
// interface IP address
inet_pton(AF_INET, network_intf_addr, &(mreq.imr_interface.s_addr));
int opt = IP_ADD_MEMBERSHIP;
void *optval = (void *) &mreq;
int optlen = sizeof(mreq);
fprintf(stderr, "setting IP_ADD_MEMBERSHIP on socket\n");
int n = setsockopt(fd, IPPROTO_IP, opt, optval, optlen);
if (n < 0) {
fprintf(stderr, "FAILED - setsockopt(IP_ADD_MEMBERSHIP) returned %d with errno %d - %s\n",
n, errno, strerror(errno));
close(fd);
return -1;
}
close(fd);
fprintf(stderr, "SUCCESSFUL completion of the test\n");
}
The fact that the errno is set to (or atleast interpreted as a) ENOEXEC is surprising since man setsockopt makes no mention of that error for this call.
My guess is that some specific filter/extension code gets run through the setsockopt syscall. Reading through https://developer.apple.com/library/archive/documentation/Darwin/Conceptual/NKEConceptual/socket_nke/socket_nke.html I suspected it could be some socket filter.
This issue has been reported to the JDK team since around a decade https://bugs.openjdk.org/browse/JDK-8144003 but it's only recently that we have started noticing it more frequently in our setups. It could be something to do with our macosx hosts, but at this point I don't have an idea of what tools/commands/options I should be using to understand what code from within setsockopt is interfering here.
Would you happen to know any tracing (ktrace?) that might help narrow this down further? The system logs (viewed through Console app) haven't shown anything specific.
Having said that, the netstat output you posted makes it clear that all the filters currently attached were attached by the OS. You are not dealing with third-party code here.
That's good to know. In context of multicasting (or more specifically that setsockopt IP_ADD_MEMBERSHIP option) do these OS attached filters play any role or apply any specific rules that I should be aware of?