Countly is an extensible, open-source product analytics solution for mobile and web apps. It tracks applications, customer behavior and game mechanics - so you can focus on increasing user loyalty and engagement. With Countly, collected data is converted into meaningful information in true real-time, benefiting from underlying infrastructure with MongoDB, Node.js and Nginx.
Countly provides two APIs to write and read data. In order to write data, Countly also provides SDKs (e.g. Android, iOS and web) to be used within applications on mobile devices. SDK collects usage information and sends it to the Countly server. Read API can be used to retrieve data from the Countly server. The whole dashboard retrieves information and visualizes it in a graphical form using this Read API.
Countly is a true real-time service, showing updates every 10 seconds from the user interface – which is also a configurable parameter.
Basic installation scenarios
It’s advised to install Countly on a separate server to provide maximum compatibility, performance, and usability. Since Countly runs on port 80 (443 for https) to serve its dashboard, there shouldn’t be any other web service running on the same port.
Installation is well tested on Ubuntu, Red Hat Enterprise Linux, and CentOS. We strongly advise getting LTS (long term support) versions to receive timely updates. The installation script that is included in the Countly package automates the process by adding new repositories, downloading required files, and installing them to the target operating system. In the event that Countly needs to be updated, the corresponding update scripts, which come with the package, may be used.
In its basic form, the installer is used for a single instance, e.g. doesn’t configure the system to work on multiple nodes. A single node is capable of handling a limited volume of data initiating from devices. For example, an 8-core and 32GB RAM instance can roughly handle over 1 million requests per day. Spikes throughout the day and the properties of your data (such as event segments and their cardinality) are some of the things that will affect the deployment planning.
It may be necessary to separate the database instance from the application server, and later down the line, MongoDB sharding will be necessary to cope with the load. We have extensive knowledge about scaling Countly, therefore please consult Countly engineers for access to their well-prepared network and instance configuration.
In this chapter we’ll take a look at different installation options for Countly. The first one was already covered before (single node option), which is a straightforward way to install Countly if you have a few million users daily, and not more than 10,000 users at the same time. It’s important to provide a failover mechanism, so that if the instance Countly runs on goes down for any reason, you may rely on the other option and continue without interfering your service.
Note that SDK’s are designed so that if they cannot reach the internet or cannot reach the server for some reason, they do not crash the application to which they are bound, and end gracefully, meaning end users don’t get notified at all. Additionally, SDKs have the ability to store data that cannot be sent, and stored data is sent to the Countly instance once the network connection is up.
While the sharding is optional and depends on the data load, replica sets should always be used for high availability and disaster recovery. We provide necessary configuration documentation for replica sets for Enterprise Edition customers.
Note that there are also other deployment options using Docker & Kubernetes (supported by the Countly team) and Ansible (as a 3rd-party implementation and not directly supported by the Countly team), which are depicted in their respective documentation.
Installing MongoDB on a separate server
To install MongoDB on a server separate from the Countly one, you may use our MongoDB installation script which also configures MongoDB to the best settings used together with Countly and runs a checklist of other prerequisites, e.g. Filesystem format, to ensure the maximum performance of the database.
wget -qO- https://c.ly/install/mongodb | bash
Splitting the database (sharding)
Sharding is the method MongoDB uses in order to split its data across two or more servers (called clusters). Countly uses MongoDB, and it’s possible to use MongoDB’s simple sharding technology to balance the load across several instances. This way it’s possible to make the cluster always available for each and every read and write process. The Countly servers are mainly dependent on a high number of writes issued from mobile devices, compared to a low number of reads coming from the administration dashboard.
Converting an unsharded database to a sharded one is seamless, therefore it’s much better to go for sharding should a need arise for it. As a minimum, there are 3 config servers for a shard, and at least 2 sharded MongoDB instances. On config servers, MongoDB instances are also run, with a minor difference that they are configured to act as configuration servers.
Since reads in MongoDB are much faster than writes, it’s best to dedicate a higher percentage of system resources to writes (e.g. by using faster disks). MongoDB writes are in RAM and they eventually get synced to the disk as MongoDB uses memory mapped files.
Note that sharding is required for more than 800M writes per month.
If the expected deployment load is high in the beginning, we recommend starting with a MongoDB sharded cluster. If the data size of the replica set is too large, converting the replica set to a sharded cluster will take a lot of time, thus causing downtime. Please consult the Countly team before using a sharded cluster.
You may go with MongoDB Atlas, a SaaS database solution from MongoDB. In this case, please get in touch with us beforehand to go over the best practices together, including any necessary configuration advice.
High availability and disaster recovery (replica sets)
While gathering analytics, data is not as critical as gathering customer information data and keeping it safe. We must make sure that all data is replicated and recovered in case a failure occurs. Major advantages of replica sets are business continuity through high availability, data safety through data redundancy, and read scalability through load sharing (reads).
With replica sets, MongoDB language drivers know the current primary. All write operations go to the primary. If the primary is down, the drivers know how to get to the new primary (an elected new primary). This is auto failover for high availability, the data is replicated after writing. Drivers always write to the replica set's primary (called the master), the master then replicates to slaves. The primary is not fixed – the master/primary is nominated.
Typically, you have at least three MongoDB instances in a replica set on different server machines. You may add more replicas of the primary for read scalability if you would like, but you only need three for high availability failover. If there are three instances and one goes down, the load for the remaining instances only goes up by 50% (which is the preferred situation). If business continuity is important, then having at least three instances is the best plan.
Required deployment for replica sets
The minimum required deployment for a replica set is 1 primary, 1 secondary, and 1 arbiter server. After the initial deployment, the primary and secondary may be scaled vertically without causing any downtime. Now, the procedure is as simple as shutting down the secondary, upgrading it, and taking it up. Then once the secondary is up, repeat the process with the primary.
Deciding on RAM, CPU, and disk size
For Countly only servers
A number of CPU cores is entirely dependent on the expected system load. For each core we recommend having 4GB RAM available. We recommend a boot disk size of at least 20GB.
For MongoDB primary and secondary servers
The number of CPU cores is not relevant to the MongoDB servers; this may be kept at a minimum. For each core you have in your Countly server, we recommend having 4GB RAM in your MongoDB primary and secondary. Disk size should be 100GB at minimum, however, depending on the initial expected load, we recommend starting with a bigger disk in order to avoid increasing disk size too early. MongoDB recommends data bearing disks to be SSDs when possible. Apart from the data bearing disk, these servers will also require a minimum 20GB boot disk.
For hybrid servers
If you plan to deploy County and MongoDB on the same server, you should combine the recommendations above (4GB RAM for each core and disk size).
Putting it all together
The figure below shows the complete system, with sharding and replication enabled.
Here, you’ll easily see that:
- Each replica set consists of a shard. There are 3 shards with 3 replica sets.
- Shard config servers are on the left with separate instances. However, since the load is low, they may be located on replica sets.
- Nginx drives mongos routing servers.
- Each mongos routes traffic to shards. Node.js on mongos serves as a web server.
There are two dashboards on two routing servers, however one is redundant and can be omitted.
Countly is developed with scalability and performance in mind, and this document describes potential implementation scenarios. While a single server can handle several tens of millions requests per day, in some circumstances, a high performance, high throughput server farm is necessary to handle incoming traffic. Once the steps towards sharding and replication are complete, scaling the cluster will be as simple as bringing another server and adding it to the configuration database.