Closing the logs-to-issues loop with AI agents

Generating code is the easy part of agentic engineering. The harder goal is letting the agent manage the whole development cycle, including watching how a change behaves in production once it ships. The missing link is usually feedback: logs sit on the machines the service runs on, and the agent shouldn't need a login to those machines to read them. The setup below pushes logs into Grafana Cloud Loki, then gives the agent a read-only token plus logcli. Every N minutes the agent queries Loki for errors and opens (or comments on) GitHub issues against the repo. The agent never has to touch a VM, and there's no extra infrastructure to babysit.
Read more

Secure read-only DB access for AI agents

Cursor running Claude Opus 4.6 wiped a production database in 9 seconds. I keep Claude on a read-only Postgres role, with the password in a service file and an allowlist that pins the connection target. The risk I care about, for trading and market data (public numbers, no user records), is destructive writes and runaway queries.
Read more

Kill the bits and gain the speed?

Recently, Facebook AI Research in collaboration with University of Rennes released paper “And the Bit Goes Down: Revisiting the Quantization of Neural Networks” which was accepted to ICLR 2020. The authors proposed a method of weight quantization for ResNet-like architectures using Product Quantization. Unlike many other papers, the error caused by codewords was not minimized directly. The training method aims to minimize the reconstruction error of fully-connected and convolutional layer activations using weighted k-means. Quantization was applied to all 3x3 and 1x1 kernel sizes except for the first convolutional layer. The paper emphasizes the importance of optimizing on in-domain input data in both quantizing and fine-tuning stages. Using their technique, weights in ResNet 50 can be compressed with a 20x factor while maintaining competitive accuracy (76.1 %). The potential impact of byte-aligned codebooks on efficient inference on CPU was briefly mentioned, but no actual method was presented. We propose and explore one possible way of exploiting frequent redundant codewords across input channels in order to accelerate inference on mobile devices.
Read more

Convolutional network without multiplication operation

In September 2018, Google Research team released paper with the title “No Multiplication? No floating point? No problem? Training networks for efficient inference” which we will refer to as NMNF. The main building blocks of convolutional neural networks are convolutional layers and the great majority of inference time is spent in them. NMNF paper targets devices like hearing aids, earbuds or wearables. Such devices are highly resource constrained, in terms of memory, power, and computation, and therefore benefit from a specialized implementation of convolutional layer introduced in the paper. Inference-time floating point operations are not only energy-hungry compared to integer operations but also computationally demanding. NMNF approach avoids floating point operations entirely and consequently, we can enjoy reduced model size as well.
Read more