Skip to content

CentOS mirrorlist service

CentOS mirrorlist service


the is crucial for all deployed CentOs instances all around the world as each deployed CentOS instance will query the mirrorlist webservice to get a list of validated and up2date mirrors to retrieve their content from. It's using GeoIP or checking if coming from a cloud provide (like EC2), in which case it would redirect to the nearest (GeoIP) or internal (Cloudfront setup for AWS/EC2) mirror


mirrorlists schema

It contains the following kind of scripts:

  • backend : so scripts used by our "crawler" node, validating in loop all the external mirrors through IPv4 and IPv6 and so producing the 'mirrorlists', each one per repo/arch/country
  • frontend : python scripts used for :
Backend (crawler)

There are two Perl scripts for checking mirrors:

  • for creating files for
  • for creating files for

Both scripts can create lists for all CentOS supported released ,including SIG and AltArch content. will test each mirror separately for IPv4 and IPv6. will then be able to present only IPv6-capable mirrors to the clients when is accessed over IPv6. More details about the internals of these scripts can be found in backend/mirrorlist_crawler_deployment_notes.txt


All scripts are located in the frontend folder. The following items are needed for the mirrorlist/isoredirect service:

  • A http server (apache) using mod_proxy_balancer (see frontend/httpd/mirrorlist.conf vhost example)
  • python-bottle to run the {ml,isoredirect}.py code for various instances
  • Maxmind Geolite2 database : City version
  • python-geoip2 pkg (to consume those Geolite2 DB)
  • python-memcached (to cache results for GeoIP/Cloud providers)
  • For each worker, a specific instance/port can be initialized and added to Apache config for the proxy-balancer (see frontend/systemd/centos-ml-worker@.service)

Those services (mirrorlist/isoredirect) just consume mirrorlist files, pushed to those nodes, and updated in loop by the Crawler process (see Backend section above)

When a request is made to the service, the python script :

  • checks for IPv4 or IPv6 connectivity
  • checks if IP is in memcached (for country/cloud provider)
  • searches if IP is from cloud provider
  • computes Geolocation based on the origin IP
  • searches for validated mirrors in the same country/state for the request arch/repo/release
  • returns such list