A copy of project README as is from: https://github.com/codingfuture/puppet-cfdb

cfdb

Description

Setup & auto tune service instances based on available resources:
- Elasticsearch
- MongoDB
- MySQL
- PostgreSQL
- Redis
- a general framework to easily add new types is available
Support auto-configurable clustering:
- Native for Elasticsearch
- Native for MongoDB
- Galera Cluster for MySQL
- repmgr for PostgreSQL
- Sentinel for Redis Master-Slave
High Availability and fail-over out-of-the-box
- Specialized HAProxy on each client system
Support for secure TLS tunnel with mutual authentication for database connections
Automatic creation of databases
Automatic upgrade of databases after DB software upgrade
Complete management of user accounts (roles)
- automatic random generated password management
- full & read-only database roles
- custom grants support
- ensures max connections meet
- PostgreSQL extension support
Easy access configuration
- automatic firewall setup on both client & server side
- automatic protection for roles x incoming hosts
- automatic protection for roles x max connections
Automatic incremental backup and automated restore
Strict cgroup-based resource isolation on top of systemd integration
Automatic DB connection availability checks
Automatic cluster state checks
Support easy migration from default data dirs (see init_db_from).
Scheduled actions:
- For example, log cleanup in ELK stack

Terminology & Concept

cluster - infrastructure-wide unique associative name for a collection of distributed database instances. In standalone cases, there is only one instance per cluster.
instance - database service process and related configuration. Each instance runs in own cgroup for fair weighted usage of RAM, CPU and I/O resources.
- primary instance - instance where dynamic setup of roles and databases occurs. It also gets all Read-Write users by default.
- secondary instance - a slave instance suitable for temporary automatic fail-over and Read-Only users. It is not allowed to define databases & roles in configuration, unless the instance is switched to primary node in static configuration.
- arbitrator - an instance participating in quorum, but with no data. Mitigates split-brain cases.
database - a database instance per cluster. All privileged on the database role with the same name is automatically created. There is no need to define such role explicitly.
role - a database user. Each role name is automatically prefixed with related database name to prevent name collisions.
access - a definition of client system and user to access particular role on specific database of specific cluster. All connections parameters are saved in “.env” file in user’s home folder. It’s possible to specify multiple access definitions per single system user using different .env variable prefixes. In case of multi-node cluster, a local HAProxy reverse-proxy instance is implicitly created with required high-Availability configuration.

Technical Support

Example configuration
Free & Commercial support: support@codingfuture.net

Setup

Up to date installation instructions are available in Puppet Forge: https://forge.puppet.com/codingfuture/cfdb

Please use librarian-puppet or cfpuppetserver module to deal with dependencies.

There is a known r10k issue RK-3 which prevents automatic dependencies of dependencies installation.

IMPORTANT NOTES!!!

Please understand that PuppetDB is heavily used for auto-configuration. The drawback is that new facts are updated only on second run after the run which makes any changes. Typical case is to do the following:

Provision instance with databases & roles
Provision the same instance again (collect facts)
Provision access locations
Provision access locations again (collect facts)
Provision instance (update configuration based on the access facts)
Restart instances, if asked during provisioning
Provision all nodes in cycle until there are no new changes and no restart is required

For cluster configuration:

Provision primary node
Provision primary node again (collect facts)
Provision secondary nodes
Provision secondary nodes again (collect facts)
Provision primary node (configure firewall & misc. based on facts)
Provision secondary nodes again (clone/setup)
Restart instances, if asked during provisioning
Provision all nodes in cycle until there are no new changes and no restart is required

As available system memory is automatically distributed between all registered services:

Please make sure to restart services after distribution gets changes (or you may run in trouble).
Please avoid using the same system for codingfuture derived services and custom. Or see #3.

Please make sure to reserve RAM using cfsystem_memory_weight for any custom co-located services. Example:

cfsystem_memory_weight { 'my_own_services':
    ensure => present,
    weight => 100,
    min_mb => 100,
    max_mb => 1024
}

Examples

Please check codingufuture/puppet-test for example of a complete infrastructure configuration and Vagrant provisioning.

Example for host running standalone instances, PXC arbitrator & PostgreSQL slaves + related accesses.

cfdb::iface: main
cfdb::instances:
    mysrv1:
        type: mysql
        databases:
            - db1_1
            - db1_2
        iface: vagrant
        port: 3306
    mysrv2:
        type: mysql
        databases:
            db2:
                roles:
                    readonly:
                        readonly: true
                    sandbox:
                        custom_grant: 'GRANT SELECT ON $database.* TO $user; GRANT SELECT ON mysql.* TO $user;'
    myclust1:
        type: mysql
        is_arbitrator: true
        port: 4306
    myclust2:
        type: mysql
        is_arbitrator: true
        port: 4307
        settings_tune:
            cfdb:
                secure_cluster: true
    pgsrv1:
        type: postgresql
        iface: vagrant
        databases:
            pdb1: {}
            pdb2: {}

    pgclust1:
        type: postgresql
        is_arbitrator: true
        is_secondary: true
        port: 5300
        settings_tune:
            cfdb:
                node_id: 3
    pgclust2:
        type: postgresql
        is_arbitrator: true
        is_secondary: true
        port: 5301
        settings_tune:
            cfdb:
                secure_cluster: true
                node_id: 3

    esearch:
        type: elasticsearch
        port: 9200
                        
cfdb::access:
    vagrant_mysrv2_db2:
        cluster: mysrv2
        role: db2
        local_user: vagrant
        max_connections: 100
    vagrant_mysrv2_db2ro:
        cluster: mysrv2
        role: db2readonly
        local_user: vagrant
        config_prefix: 'DBRO_'
        max_connections: 200
    vagrant_mysrv2_db2sandbox:
        cluster: mysrv2
        role: db2sandbox
        local_user: vagrant
        config_prefix: 'DBSB_'
    vagrant_myclust1_db1:
        cluster: myclust1
        role: db1
        local_user: vagrant
        config_prefix: 'DBC1_'
    vagrant_myclust1_db2:
        cluster: myclust1
        role: db2
        local_user: vagrant
        config_prefix: 'DBC2_'
    vagrant_pgsrv1_pdb1:
        cluster: pgsrv1
        role: pdb1
        local_user: vagrant
        config_prefix: 'PDB1_'
    vagramt_esearch:
        cluster: esearch
        local_user: vagrant
        config_prefix: 'ESRCH_'

For another related host running primary nodes of clusters:

cfdb::iface: main
cfdb::mysql::is_cluster: true
cfdb::instances:
    myclust1:
        type: mysql
        is_cluster: true
        databases:
            db1:
                roles:
                    ro:
                        readonly: true
            db2: {}
        port: 3306
    myclust2:
        type: mysql
        is_cluster: true
        databases:
            - db1
            - db2
        port: 3307
        settings_tune:
            cfdb:
                secure_cluster: true
    pgclust1:
        type: postgresql
        is_cluster: true
        databases:
            pdb1:
                roles:
                    ro:
                        readonly: true
            pdb2: {}
        port: 5300
    pgclust2:
        type: postgresql
        is_cluster: true
        databases:
            - pdb3
            - pdb4
        port: 5301
        settings_tune:
            cfdb:
                secure_cluster: true

For third related host running secondary nodes of clusters:

cfdb::iface: main
cfdb::mysql::is_cluster: true
cfdb::instances:
    myclust1:
        type: mysql
        is_secondary: true
        port: 3306
    myclust2:
        type: mysql
        is_secondary: true
        port: 3307
        settings_tune:
            cfdb:
                secure_cluster: true
    pgclust1:
        type: postgresql
        is_secondary: true
        port: 5300
    pgclust2:
        type: postgresql
        is_secondary: true
        port: 5301
        settings_tune:
            cfdb:
                secure_cluster: true
    esearch:
        type: elasticsearch
        is_secondary: true
        port: 9200

Implicitly created resources

# for every instance
#------------------
cfnetwork::describe_service:
    cfdb_${cluster}:
        server: "tcp/${port}"

# local cluster system user access to own instance
cfnetwork::service_port:
    "local:cfdb_${cluster}": {}
cfnetwork::client_port:
    "local:cfdb_${cluster}":
        user: $user

# client access to local cluster instance
cfnetwork::service_port:
    "${iface}:cfdb_${cluster}":
        src: $client_hosts
        
# for each repmgr Elasticsearch cluster (inter-node comms)
# > access to local instance ports
cfnetwork::describe_service:
    "cfdb_${cluster}_peer":
        server: "tcp/${peer_port}"
cfnetwork::service_port:
    "${iface}:cfdb_${cluster}_peer":
        src: $peer_addr_list
cfnetwork::service_ports:
    "${iface}:cfdb_${cluster}_peer":
        src: 'ipset:cfdb_${cluster}'
cfnetwork::client_ports:
    "${iface}:cfdb_${cluster}_peer":
        dst: 'ipset:cfdb_${cluster}'
        user: $user

# for each Galera cluster (inter-node comms)
cfnetwork::describe_service:
    "cfdb_${cluster}_peer":
        server: "tcp/${port}"
    "cfdb_${cluster}_galera":
        server:
            - "tcp/${galera_port}"
            - "udp/${galera_port}"
    "cfdb_${cluster}_sst":
        server: "tcp/${sst_port}"
    "cfdb_${cluster}_ist":
        server: "tcp/${ist_port}"
cfnetwork::ipset:
    cfdb_${cluster}:
        type: ip
        addr: $peer_addr_list
cfnetwork::service_ports:
    "${iface}:cfdb_${cluster}_peer":
        src: 'ipset:cfdb_${cluster}'
    "${iface}:cfdb_${cluster}_galera":
        src: 'ipset:cfdb_${cluster}'
    "${iface}:cfdb_${cluster}_sst":
        src: 'ipset:cfdb_${cluster}'
    "${iface}:cfdb_${cluster}_ist":
        src: 'ipset:cfdb_${cluster}'
cfnetwork::client_ports:
    "${iface}:cfdb_${cluster}_peer":
        dst: 'ipset:cfdb_${cluster}'
        user: $user
    "${iface}:cfdb_${cluster}_galera":
        dst: 'ipset:cfdb_${cluster}'
        user: $user
    "${iface}:cfdb_${cluster}_sst":
        dst: 'ipset:cfdb_${cluster}'
        user: $user
    "${iface}:cfdb_${cluster}_ist":
        dst: 'ipset:cfdb_${cluster}'
        user: $user

# for each repmgr PostgreSQL cluster (inter-node comms)
# > access to local instance ports
cfnetwork::describe_service:
    "cfdb_${cluster}_peer":
        server: "tcp/${port}"
cfnetwork::service_port:
    "${iface}:cfdb_${cluster}_peer":
        src: $peer_addr_list
cfnetwork::service_ports:
    "${iface}:cfdb_${cluster}_peer":
        src: 'ipset:cfdb_${cluster}'
cfnetwork::client_ports:
    "${iface}:cfdb_${cluster}_peer":
        dst: 'ipset:cfdb_${cluster}'
        user: $user



# for every cfdb::access when HAProxy is NOT used
#------------------
cfnetwork::describe_service:
    cfdb_${cluster}:
        server: "tcp/${port}"
cfnetwork::client_port:
    "any:cfdb_${cluster}:${local_user}":
        dst: [cluster_hosts]
        user: $local_user

# for every cfdb::access when HAProxy IS used
#------------------
cfnetwork::describe_service:
    "cfdbha_${cluster}_${port}":
        server: "tcp/${port}"
cfnetwork::client_port:
    "any:${fw_service}:${host_underscore}":
        dst: $addr,
        user: $cfdb::haproxy::user