TL;DR: Bài viết chia sẻ cách fix lỗi client backend (PID 3151) was terminated by signal 4: Illegal instruction khi triển khai pgvector (Postgresql) extension trên một số CPU cũ không có tập lệnh AVX/FMA
Context & Problems#
I have a old mini PC and currently it’s running as a Kubernetes worker node on my homelab cluster. The node work well until I try to enable pgvector extension on a Postgresql instance that is running on the above node.
When I try to enable pgvector with a SQL query
CREATE EXTENSION IF NOT EXISTS vector CASCADE;, my SQL client return a weird message:An I/O error occurred while sending to the backend. I checked the Postgresql pod and confirmed the pod still up and running. I also queried some random data and the SQL client return data normally but the above error returned everytime I try to enablevector(the extension name of pgvector).I tried to deeper investigate and got some valuable log from postgresql container:
2025-12-15 16:15:05.837 GMT [1] LOG: client backend (PID 714) was terminated by signal 4: Illegal instruction
2025-12-15 16:15:05.837 GMT [1] DETAIL: Failed process was running: CREATE EXTENSION IF NOT EXISTS vector CASCADE
2025-12-15 16:15:05.837 GMT [1] LOG: terminating any other active server processes
2025-12-15 16:15:05.841 GMT [1] LOG: all server processes terminated; reinitializing
2025-12-15 16:15:05.910 GMT [1274] LOG: database system was interrupted; last known up at 2025-12-15 16:08:59 GMT
2025-12-15 16:15:05.948 GMT [1277] FATAL: the database system is in recovery mode
2025-12-15 16:15:06.340 GMT [1274] LOG: database system was not properly shut down; automatic recovery in progress
2025-12-15 16:15:06.386 GMT [1274] LOG: redo starts at 0/1CF60C0
2025-12-15 16:15:06.525 GMT [1274] LOG: invalid magic number 0000 in WAL segment 000000010000000000000002, LSN 0/2152000, offset 1384448
2025-12-15 16:15:06.525 GMT [1274] LOG: redo done at 0/2151AE8 system usage: CPU: user: 0.00 s, system: 0.02 s, elapsed: 0.13 s
2025-12-15 16:15:06.585 GMT [1275] LOG: checkpoint starting: end-of-recovery immediate wait
2025-12-15 16:15:07.145 GMT [1278] FATAL: the database system is not yet accepting connections
2025-12-15 16:15:07.145 GMT [1278] DETAIL: Consistent recovery state has not been yet reached.
2025-12-15 16:15:07.152 GMT [1275] LOG: checkpoint complete: wrote 961 buffers (5.9%), wrote 3 SLRU buffers; 0 WAL file(s) added, 0 removed, 0 recycled; write=0.176 s, sync=0.285 s, total=0.599 s; sync files=320, longest=0.181 s, average=0.001 s; distance=4464 kB, estimate=4464 kB; lsn=0/2152048, redo lsn=0/2152048
2025-12-15 16:15:07.201 GMT [1] LOG: database system is ready to accept connections
2025-12-15 16:15:11.873 GMT [1] LOG: client backend (PID 1288) was terminated by signal 4: Illegal instruction
2025-12-15 16:15:11.873 GMT [1] DETAIL: Failed process was running: CREATE EXTENSION IF NOT EXISTS vector CASCADE
2025-12-15 16:15:11.873 GMT [1] LOG: terminating any other active server processes
2025-12-15 16:15:11.876 GMT [1] LOG: all server processes terminated; reinitializing
2025-12-15 16:15:11.988 GMT [1299] LOG: database system was interrupted; last known up at 2025-12-15 16:15:07 GMT
2025-12-15 16:15:12.447 GMT [1299] LOG: database system was not properly shut down; automatic recovery in progress
2025-12-15 16:15:12.481 GMT [1299] LOG: redo starts at 0/21520C0
2025-12-15 16:15:12.481 GMT [1299] LOG: invalid record length at 0/2156B88: expected at least 24, got 0
2025-12-15 16:15:12.481 GMT [1299] LOG: redo done at 0/2154C20 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2025-12-15 16:15:12.534 GMT [1300] LOG: checkpoint starting: end-of-recovery immediate wait
2025-12-15 16:15:12.757 GMT [1300] LOG: checkpoint complete: wrote 4 buffers (0.0%), wrote 3 SLRU buffers; 0 WAL file(s) added, 0 removed, 0 recycled; write=0.046 s, sync=0.087 s, total=0.253 s; sync files=6, longest=0.055 s, average=0.015 s; distance=18 kB, estimate=18 kB; lsn=0/2156B88, redo lsn=0/2156B88
2025-12-15 16:15:12.794 GMT [1] LOG: database system is ready to accept connections
- From the log, we can realized Postgresql report the SQL query
CREATE EXTENSION IF NOT EXISTS vector CASCADEcrashed some internal process. After that, pgsql tried to recovery and then accept connection again. That why the pod still up and running but cannot complete the query.
Invetigate the root cause and solution#
Copy and google the error message, top search results gave me some information about the problem:
The first issue report some parameters used to built pgvector binaries can optimize the output binaries
but these optimizations may not be suitable for the machine environment where the code is executedThe second issue is more closed to the root cause. Confirm by the repository maintainer:
Ahh that CPU is missing AVX and FMA. We compile the binary for our docker images with AVX and FMA as target-features. I think running a binary with those features enabled on a CPU without those features can segfault.Generally, the root cause may be related to incompatible between host’s CPU and pgvector binaries used on the Postgresql image at hardware level.
I also google to find out more information about
AVX,FMAand the result summaried as following:AVX (Advanced Vector Extensions) are CPU instruction sets (AVX, AVX2, AVX-512) for wider data processing (128, 256, 512-bit vectors), while FMA (Fused Multiply-Add) is a crucial extension (FMA3, FMA4) within or alongside AVX, performing (A * B) + C in one fused step for massive floating-point performance gains, essential for scientific, AI, and graphics tasks, with AVX2 and FMA3 often bundled in modern CPUs.
- In short words, they are advantage CPU features to speed up floating-point calculation task, specfially related to vector and matrix processing. And a extension such as pgvector, as it’s name used a lot the above processing.
Verify the root cause
- On my personal laptop, I ran a docker container with same image tag used on my k8s cluster and then execute the previous SQL command to enable pgvector on that container => Everything work well
- I also want to verify the difference between the 2 CPUs so I checked those CPU as following:
# On personal laptop
sudo lshw -c CPU | grep -E 'avx|fma'
capabilities: fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp x86-64 constant_tsc rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid tsc_known_freq pni pclmulqdq vmx ssse3 'fma' cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave 'avx' f16c rdrand hypervisor lahf_lm abm 3dnowprefetch ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow ept vpid ept_ad fsgsbase tsc_adjust bmi1 'avx2' smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves 'avx_vnni' vnmi umip waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize flush_l1d arch_capabilities
# On k8s worker node:
sudo lshw -c CPU | grep -E 'avx|fma'
## <nothing returned>
The log indicate the k8s worker node CPU missing some capabilities related to AVX and FMA feature.
Find out the solution
The previous issues proposal to re-compile pgvector with
OPTFLAGS=""flag (to override the-march=nativedefault value) which remove optimize for mordern CPU but allow to legacy CPU running there program normally.Check the pgvector makefile at https://github.com/pgvector/pgvector/blob/master/Makefile, we can confirm the above flag in the offical makefile as forllowing:
... # To compile for portability, run: make OPTFLAGS="" OPTFLAGS = -march=native ...
Implement solution#
- The following is the Dockerfile I used to re-compile pgvector and override Bitnami default pgvector extension
- After that, the postgresql instance on mini PC k8s node can working normally with pgvector
# Multi-stage build for Bitnami PostgreSQL with a custom pgvector extension
## Define build arguments with default values
ARG BASE_IMAGE=bitnami/postgresql@sha256:d2acf863c8df356ee94773c791c34eef55fd5ed4435daefd6fd9792ed73477c7
### PostgreSQL installation directory of the base image (according to https://hub.docker.com/r/bitnami/postgresql or usse `pg_config` to find out)
ARG PG_BASE_DIR=/opt/bitnami/postgresql
### Adjust according to the PostgreSQL version in the base image
ARG PG_VECTOR_VERSION=0.8.1
FROM $BASE_IMAGE AS builder
ARG PG_VECTOR_VERSION
# Install required packages
USER root
# NOTE: the base image have to install postgresqlXX-devel package already, otherwise, you have to install it here.
RUN yum update -y && \
yum install -y build-essential
## Rebuild pgvector to support CPU without AVX, FMA instructions
WORKDIR /pgvector
## Download the specified version of pgvector source code
RUN curl -L "https://github.com/pgvector/pgvector/archive/refs/tags/v${PG_VECTOR_VERSION}.tar.gz" -o pgvector.tar.gz
## Unzip the tarball
RUN tar -xzvf pgvector.tar.gz
## # To compile for portability, run: make OPTFLAGS="" (default: OPTFLAGS = -march=native)
RUN cd pgvector-* && make OPTFLAGS="" && make install
# Copy necessary files to the final image
FROM $BASE_IMAGE AS final
ARG PG_BASE_DIR
ARG PG_VECTOR_VERSION
## Copy pgvector extension files (dựa theo output lệnh make install của pgvector)
COPY --from=builder /pgvector/pgvector-${PG_VECTOR_VERSION}/vector.so $PG_BASE_DIR/lib/vector.so
COPY --from=builder /pgvector/pgvector-${PG_VECTOR_VERSION}/vector.control $PG_BASE_DIR/share/extension/vector.control
COPY --from=builder /pgvector/pgvector-${PG_VECTOR_VERSION}/sql $PG_BASE_DIR/share/extension
COPY --from=builder /pgvector/pgvector-${PG_VECTOR_VERSION}/src/halfvec.h $PG_BASE_DIR/include/server/extension/vector/halfvec.h
COPY --from=builder /pgvector/pgvector-${PG_VECTOR_VERSION}/src/sparsevec.h $PG_BASE_DIR/include/server/extension/vector/sparsevec.h
COPY --from=builder /pgvector/pgvector-${PG_VECTOR_VERSION}/src/vector.h $PG_BASE_DIR/include/server/extension/vector/vector.h- NOTE
- Your might be update some instruction in the dockerfile to match with your base Postgresql image, related to build-tools and postgresql library.
- For more information: https://github.com/pgvector/pgvector#installation-notes---linux-and-mac
Learned lessons#
- A docker image can run normally on your local device and environment does not guarantee same behavior on your production environment.
- The problem also indicate a well-known statement about docker and container: “build once, run everywhere” is not true at all. There are some edge case need to be handle correctly at build time to achieve the above statement.
- Anyway, this is a compatible problem, not a containerlize problem.

