Countly is an extensible, open source product analytics solution for mobile and web apps. It tracks applications, customer behaviour or game mechanics - so you can focus on increasing user loyalty and engagement. With Countly, collected data is converted into meaningful information in true real-time, benefiting from underlying infrastructure with MongoDB, Node.js and Nginx.
Countly provides two APIs to write and read data. In order to write data, Countly also provides SDKs (e.g Android, iOS and web) to be used within applications on mobile devices. SDK collects usage information and sends it to Countly server. Read API can be used to retrieve data from Countly server. The whole dashboard retrieves information and visualizes in a graphical form using this Read API.
Countly is a true real-time service, showing updates every 10 seconds from user interface – which is also a configurable parameter.
It’s advised to install Countly on a separate server to provide maximum compatibility, performance and usability. Since Countly runs on port 80 to serve its dashboard, there shouldn’t be any other web service running on the same port.
Installation is well tested on Ubuntu, Red Hat Enterprise Linux and CentOS. We strongly advise to get LTS (long term support) versions to get updates timely. Installation script that is included in Countly package automates the process by adding new repositories, downloading required files and installing to target operating system. In case Countly needs to be updated, corresponding update scripts can be used that come with package.
In its basic form, installer is used for a single instance, e.g doesn’t configure the system to work on multiple nodes. A single node is capable of handling a limited volume of data initiating from devices. For example, an 8 core and 32GB RAM instance can roughly handle over 1 million requests per day. Spikes throughout the day and the properties of your data (such as event segments and their cardinality) are some of the things that will affect the deployment planning. It may be necessary to separate the database instance from the application server, and down the line MongoDB sharding will be necessary to cope with the load. We have extensive knowledge about scaling Countly, therefore please consult Countly engineers for a well prepared network and instance configuration.
In this chapter we’ll have a look at different installation options for Countly. The first one is already covered before (single node option), which is a straightforward way to install Countly if you have a few million users daily, and not more than 10.000 users at the same time. It’s important to provide a failover mechanism, so that if the instance Countly runs on goes down for a reason, you can rely on the other option and continue without interfering your service.
Note that SDK’s are designed so that if they cannot reach internet, or cannot reach the server for some reason, they do not crash the application they are bound to, and end gracefully so end user doesn't get notified at all. Additionally, SDKs have the ability to store data that cannot be sent, and stored data is sent to Countly instance once network connection is up.
While the sharding is optional and depends on the data load, for high availability and disaster recovery, replica sets should always be used. We provide necessary configuration documentation for replica sets for Enterprise Edition customers.
Splitting the database (sharding)
Sharding is the method MongoDB uses in order to split its data across two or more servers (called clusters). Countly uses MongoDB, and it’s possible to use MongoDB’s simple sharding technology to balance load across several instances. This way, it’s possible to make the cluster always available for each and every read and write process. Countly servers are mainly dependent on high number of writes issued from mobile devices, compared to low number of reads coming from administration dashboard.
Converting an unsharded database to a sharded one is seamless, therefore it’s much better to go for sharding in case there’s a need for it. As a minimum, there are 3 config servers for a shard, and at least 2 sharded MongoDB instances. On config servers, also MongoDB instances are run, with a minor difference that they are configured to act as configuration servers.
Since reads in MongoDB are much faster than writes, it’s best to dedicate a higher percentage of system resources to writes (e.g by using faster disks). MongoDB writes are in RAM and they eventually get synced to disk since MongoDB uses memory mapped files.
If expected load of the deployment is high in the beginning we recommend starting with a MongoDB sharded cluster. If data size of replica set is too big, converting a replica set to a sharded cluster will take a lot of time, thus causing downtime. Please consult Countly team before using a sharded cluster.
You may go with MongoDB Atlas, a SaaS database solution from MongoDB. In this case, please get in touch with us beforehand for best practices together with necessary configuration advise.
High availability and disaster recovery (replica sets)
While gathering analytics data is not as critical as gathering a customer information data and keeping it safe, we must make sure that all data is replicated, and recovered in case of a failure occurs. Major advantages of replica sets are business continuity through high availability, data safety through data redundancy, and read scalability through load sharing (reads).
With replica sets, MongoDB language drivers know the current primary. All write operations go to the primary. If the primary is down, the drivers know how to get to the new primary (an elected new primary), this is auto failover for high availability. The data is replicated after writing. Drivers always write to the replica set's primary (called the master), the master then replicates to slaves. The primary is not fixed – the master/primary is nominated.
Typically you have at least three MongoDB instances in a replica set on different server machines. You can add more replicas of the primary if you like for read scalability, but you only need three for high availability failover. If there are three instances and one goes down, the load for the remaining instances only go up by 50% (which is the preferred situation). If business continuity is important, then having at least three instances is the best plan.
Required deployment for replica sets
Minimum required deployment for a replica set is 1 primary, 1 secondary and 1 arbiter server. After the initial deployment primary and secondary can be scaled vertically without causing any downtime and the procedure is as obvious as shutting down secondary, upgrading it, taking it up and then once it is up, shutting down primary, upgrading it and taking it up.
Deciding on RAM, CPU and disk size
For Countly only servers
Number of CPU cores is entirely dependant on the expected system load. For each core we recommend having 4GB RAM available. We recommend boot disk size to be at least 20GB.
For MongoDB primary and secondary servers
Number of CPU cores is not relevant for MongoDB servers thus this can be kept at minimum. For each core you have in your Countly server, we recommend having 4GB RAM in your MongoDB primary and secondary. Disk size should be 100GB at minimum but depending on the initial expected load we recommend starting with a bigger disk in order to avoid increasing disk size too early. MongoDB recommends data bearing disks to be SSDs when possible. Apart from the data bearing disk these servers will also require a minimum 20GB boot disk.
For hybrid servers
If you plan to deploy County and MongoDB on the same server you should combine the recommendations above (4GB RAM for each core and disk size).
Putting it all together
Figure below shows how the complete system with sharding and replication enabled.
Here, you’ll easily see that:
- Each replica set consists of a shard. There are 3 shards with 3 replica sets
- Shard config servers are on the left, with separate instances. However, since the load is low, they can be located on replica sets.
- Nginx drives mongos routing servers.
- Each mongos routes traffic to shards. Node.js on mongos serves as web server.
There are two dashboards on two routing servers, however one is redundant and can be omitted.
Countly is developed with scalability and performance in mind, and this document describes potential implementation scenarios. While a single server can handle several tens of million requests per day, in some circumstances a high performance, high throughput server farm is necessary to handle incoming traffic. Once the steps towards sharding and replication are complete, scaling the cluster will be as simple as putting another server and adding it in configuration database.