Uellue's Blog

Proxy caches for apt

To speed up Debian online updates of several computers in my network and to lighten the load on the Debian mirrors I use a caching proxy server.

There are several small programs written explicitely for caching or mirroring Debian-style archives. My first attempt was Approx.

Approx acts from the outside like an apt mirror and loads packages on request from other apt mirrors specified in it's configuration file. If the requested file is already in the cache the data are directly served from disk. On the clients the Approx server must be added in the sources.list file.

Approx worked more or less for a single client, but as soon as parallel updates on several machines requested the same file the Approx server ran into locking problems. Result was that the updates stopped because the server returned garbage.

Even for single clients it gave errors because of package signatures that didn't match the signing keys. Only deleting the cached package lists and keys in the cache and this way enforcing a reload solved the problem.

Apt-proxy was too difficult to configure for my taste, so I didn't really try it, and Apt-cacher ran after a short test into the same locking problems as Approx.

In a forum I heard people recommend Http-replicator for caching Debian archives. Http-replicator acts as HTTP proxy and mirrors the file hierachy of the accessed servers in it's cache.

This program is not included in Debian, but a .deb package is available. Installing and configuring it was not much of a problem, and this program could handle parallel access better.

On the client side the normal Debian archive servers were added to the /etc/apt/sources.list file and the caching server was added as HTTP proxy to the client configuration. To make Apt use a HTTP proxy a file containing the line Acquire::http::Proxy "http://aptproxy:9999/"; must be added to /etc/apt/apt.conf.d/

Unfortunately it was not possible to restrict the access to Debain archive mirrors. Http-replicator would cache the whole internet and also grant access to local services on the machine that is running Http-replicator.

The solution was to use Squid. In the default configuration it is not so well-suited for caching Debian archives with their often somewhat larger files, but it can be configured to act quite smoothely. It was even possible to restrict access from the "outside", i.e. untrusted networks, to Debian archives and allow unlimited access to machines in the local VPN.

Here the relevant lines of /etc/squid/squid.conf:

# 3128 is the squid default port. It's only open from "inside"
# aptproxy at port 9999 is public
http_port 9999

# Also cache large files, otherwise it wouldn't be so useful as apt cache
maximum_object_size 100 MB

# This optimizes byte hit rate
cache_replacement_policy heap LFUDA

# Give it 10 GB space :-)
cache_dir ufs /var/spool/squid 10000 16 256

# Access control
acl all src
acl localhost src
acl to_localhost dst

# is the "inside" IP of the server
acl vpn_access myip

# "outside" network and "inside" network
acl our_networks src

# All the official debian mirrors are below that domain
# Other allowed sources can be added here
acl debian_cache dstdomain .debian.org

# Allow everything from the inside
http_access allow vpn_access
http_access allow localhost

# only VPN and localhost can query the local web server
http_access deny to_localhost

# allow to acces the debian archives
http_access allow debian_cache our_networks

# here proxying for the "outside" can be turned on. Don't forget to open the port!
# http_access allow our_networks

# And finally deny all other access to this proxy
http_access deny all

Squid can handle concurring access to the same file without problems, didn't cause signature problems so far and uses less CPU time than the other solutions. The hit rate is quite well for parallel updates. It is not a good solution if you want to create a local Debian mirror that keeps files infinitely.

Squid should work with any distribution that updates via HTTP if you grant access to the appropriate domains, not only Debian or APT-based distributions. For existing Squid installations only the maximum object size and maybe the disk space need to be large enough.

On the client side you have to configure your package management or update system to use the proxy—that's all.