Python Setup

I work between both Windows and Linux and find myself writing the same shell script for every project I spin up. Here's now it's done:

#!/bin/bash

cd "$(dirname "$0")"

CREATE_VENV=0

if [[ ! -d venv ]]
then
    python3 -m venv venv || python -m venv venv || (
        echo "Could not create a Python virtual environment."
        exit 1
    )

    CREATE_VENV=1
fi

source ./venv/bin/activate
source ./venv/Scripts/activate

if [[ $CREATE_VENV -eq 1 ]]
then
    pip install dependency1 dependency2 etc
fi

python -u app.py $@

Keep in mind this is a shell script for Bash. To run on Windows, you must also have Bash on path. Once you do, this script works great and is portable between the big three: Windows, macOS, and Linux.

PM2 Setup

Here's my preferred way to run a script like a service. First, to install PM2 on a Debian-based system, here's what I would run as root:

apt install nodejs
apt install npm
npm install -g n
n latest
npm install -g pm2
pm2 startup

Then, in what ever directory contains the software project that needs a "run this forever" treatment, I simply run:

pm2 start script.sh --name script -- argument1 argument2
pm2 save

That will start script.sh with the name "script" and start it with all the command line arguments that come after the last two hyphens.

LLMs

Screw the frontier models.

Get yourself a llama.cpp release that's appropriate for your computer. If you have an NVIDIA GPU, be sure to download the .dll files as well and extract them to the same directory. Vulkan builds should run everywhere.

Get yourself a Qwen3.6 35B A3B quant that fits well on your GPU or in your system's RAM. This model's Q4_K_M quant replaced Qwen3-Coder-Next's Q8 quant for me. Don't forget the mmproj file to go along with it if you want OCR capabilities!

Make yourself a launch script like so:

#!/bin/bash

./llama-server \
    --n-gpu-layers -1 \
    --host 0.0.0.0 \
    --port 8080 \
    --ctx-size 262144 \
    --n-gpu-layers -1 \
    --model "Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-Q4_K_M.gguf" \
    --mmproj "mmproj-Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive-f16.gguf" \
    --ubatch-size 1024 \
    --batch-size 1024 \
    --jinja \
    --webui-mcp-proxy \
    --chat-template-kwargs "{ \"enable_thinking\": false }"

Modify it so that it points to your model's two .gguf files and run. I run this exact script on my Framework Desktop with a Ryzen AI Max+ 395 with 128 GB of unified memory and I am getting close to 70 t/s in generation with small context.

"Impose" by Bad Omens

Give it a listen and read the lyrics along with it.

Fine-tune xrdp

Set the kernel option net.core.wmem_max to 8388608 via issuing sysctl -w net.core.wmem_max=8388608. You can have this change persist across reboots by writing:

net.core.wmem_max = 8388608

...to /etc/sysctl.d/xrdp.conf.

Configure xrdp.ini

Ensure the following:

tcp_send_buffer_bytes=4194304
max_bpp=16

This was all that was needed to get my unbearably slow experience to something usable for connections over a private VPN. No tweaks to the compositor needed.

Source

Multi-session User in xrdp

Insert the following line

export $(dbus-launch)

...before the test -x and exec lines in /etc/xrdp/startwm.sh to allow signing into your user account more than once. This is especially useful for users coming from Windows environments where you can be signed in locally (at home) and connect to your same user account from afar via RDP.