Perfomance vs Test
Techniques for Scaling Applications with a Database
Applications grow. As an application attracts more users, so do the databases that store the information created, whether that’s sales transactions or scientific data gathered. Larger datasets require more resources to store and process that data. Plus, with more simultaneous users using the system, the database needs more resources.
When your application becomes popular, it needs to scale to meet the demand. Nobody sticks around if an application is slow — not willingly, anyway.
If you need to scale, celebrate it as a good problem! But that doesn’t make the process simple. Scaling has multiple possible options, each requiring different levels of sophistication. Here, we cover scaling both as a generic challenge and specifically for Redis databases, with attention to advanced scaling using Redis Enterprise.
Scaling is a multidimensional problem with several distinct solutions.
Vertical scaling involves increasing a database’s resources. Typically, this involves moving the database to a more powerful computer or to a larger instance type.
“More” is the key word. As with any hardware choice, you consider more powerful processors, more memory and/or more network bandwidth. You have to find a balance between them that optimally improves the database’s performance and the number of simultaneous users it can support, not to mention optimizing your hardware budget.
One element in vertical scaling is adjusting the amount of RAM available to the database. In the case of Redis, RAM limits the amount of data that the database can store, so it’s an important consideration.
Vertical scaling is colloquially called scaling up or scaling down, depending on whether you move up to a more powerful computer or (in rare circumstances) shift down to a less powerful computer.
Redis provides a competitive edge to any business by delivering open source and enterprise-grade data platforms to power applications that drive real-time experiences at any scale. Developers rely on Redis to build performance, scalability, reliability, and security into their applications.
THE LATEST FROM REDIS
Setting New Benchmark in Partner Enablement
23 February 2023
Redis Enterprise 6.4.2 Highlights Client Validation and Access Management Features
23 February 2023
Techniques for Scaling Applications with a Database
22 February 2023
Horizontal scaling involves adding additional computer nodes to a cluster of instances that operate the database, without changing the size or capacity of any individual node. Horizontal scaling is also called scaling out (when you add nodes) or scaling in (when you decrease the number of nodes).
Depending on how it’s implemented, horizontal scaling can also improve the database’s overall reliability. It eliminates a single point of failure because you are increasing the number of nodes that can be used in failover situations. However, horizontal scaling also increases time and effort (and thus costs), because you need more nodes (and hence more failure points) to keep the database functional.
In other words, vertical scaling increases the size and computing power of a single instance or node, while horizontal scaling changes the number of nodes or instances.
Vertical scaling is an easy way to improve database performance, assuming that you have or can acquire a larger computer or instance. It typically can be implemented easily in the cloud, with no impact on the application or database architecture.
Complexity isn’t a bad thing when it’s the right choice, as long as you know what you are doing.
When done correctly, horizontal scaling gives your database and your application significantly more room to grow. This scheme has plenty of history in responding to performance bottlenecks: Just throw more hardware at it!
However, horizontal scaling typically is harder to implement than vertical scaling. Adding additional nodes means more complexity. Are those nodes read-only nodes? Read/write master nodes? Active masters? Passive masters? The complexity of your database and your application architecture can increase dramatically.
Complexity isn’t a bad thing when it’s the right choice, though, as long as you know what you are doing.
There are several ways to implement horizontal scaling, each with a distinct set of advantages and disadvantages. Selecting the right model is important in building a data storage architecture. Redis supports many horizontal scaling options. Some are available in Redis open source (OSS), and some are available only in Redis Enterprise.
The Basics of Sharding
Sharding is a technique for improving a database’s overall performance, as well as increasing its storage and resource limits. It’s a relatively simple horizontal scaling technique.
With sharding, data is distributed across various partitions, or nodes. Each node holds only a portion of the data stored in the entire database. In Redis’ case, a key/value input is processed, and the data is stored in a shard.
When a request is made to the database, it is sent to a shard selector, which chooses the appropriate shard to send the request. In Redis, shard selection is often implemented by a proxy that looks at the key for the requested data, and based on the key, the proxy sends the request to the appropriate shard instance.
The shard selection algorithm is deterministic, which means every request for a given key always goes to the same shard. Only that shard has information for a given data key, as illustrated by Figure 1.
Sharding is a relatively easy way to scale out a database’s capacity. By adding three shards to a Redis OSS implementation, for instance, you can nearly triple the database’s performance and triple the storage limits.
But sharding isn’t always simple. Choosing a shard selector that effectively balances traffic across all nodes may require tuning. Sharding can also lower application availability because it increases dependency on multiple instances. If you don’t manage it properly, failure of a single instance can bring down the entire database. That’ll cause a bad day at work.
Redis clustering addresses these issues, and also makes sharding easier to implement. If resharding is necessary to rebalance a database for reasons of storage capacity or performance, the data is physically moved to a new node.
An application’s awareness of the shard selector algorithm can allow the application to perform better overall balancing across the shards, though at the cost of increased complexity.
Sharding effectiveness is only as good as the shard selector algorithm that is used. An application’s awareness of the shard selector algorithm can allow the application to perform better overall balancing across the shards, though at the cost of increased complexity.
Clustering in Redis OSS is led by the cluster with the client library being cluster-aware. Essentially, the shard selector is implemented in the client library. This requires client-side support of the clustering protocol.
In Redis Enterprise, a server-side proxy is used to implement the shard selector and to provide support for clustering server side. The proxy acts as a load balancer of sorts between the horizontally scaled Redis instances.
Clustering is a common solution to horizontal scaling, but it has pros and cons. On the plus side, sharding is an effective way to quickly scale an application, and it is used in many large, highly scaled applications. Also, it is available out of the box.
On the other hand, clustering requires additional management. You need to know what you’re doing. Individual, large keys can create imbalances that are difficult or impossible to compensate for.
Redis clustering eliminates much of shardings’ complexities. It allows applications to focus on the data management aspects of scaling a large dataset more effectively. It improves both write and read performance.
Ultimately, how well it all works depends on the access patterns that the application uses.
Another horizontal scaling option is read replicas. As the name suggests, the emphasis of read replicas is to improve the performance of reading data without regard to the time spent writing data to the database. The premise is that it is far more common to retrieve data than to change it or to add new data.
In a simple database, data is stored on a single server, and both read and write access to the data occurs on that server. With read replicas, a copy of the server’s data is stored on auxiliary servers, called read replicas. Whenever data is updated, the replicas receive updates from the primary server.
Each auxiliary server has a complete copy of the database. So when an application makes a read request, that request can go to any of the read replica servers. That means a significantly greater number of read requests can be handled simultaneously, which improves scalability and overall performance.
Read replicas cannot improve write performance, but they can increase read performance significantly.
But read replicas have limitations, depending on several factors, such as the consistency model that the database uses, or the network latency you need to contend with.
Read replicas cannot improve write performance, but they can increase read performance significantly. However, that does require you to consider how write-intensive your application is. It takes some time for a database write to the master database to propagate to the read replicas. This delay, called skew, can result in older data being returned to the application while the primary server updates the replica servers. The delay is only for a short period of time, but sometimes those delays are critical. This may or may not be an issue for your own situation, but take note of the issue as you design your system.
Think about the process of writing to a database.
- When you update information or add new data, the write is performed to the master database instance only. That’s sacrosanct; all writes must go to the one master database instance.
- This master instance then sends a message to all of the read replicas, indicating what data has changed in the database, and enabling the read replicas to update their copies of the data to match the master copy.
Since all database writes go through the master instance, there is no write performance improvement when additional read replicas are added. In fact, there can be a minor decrease in write performance when you add a new read replica. That’s because the master now has an additional node it must notify when a write occurs. Typically, this impact is not significant, but it’s certainly not a zero impact.
Consider the illustration in Figure 2, which shows a Redis implementation consisting of three servers. All writes to the Redis database are made to the single master database. This single master sends updates of the changed data to all the replicas. Each replica contains a complete copy of the stored Redis database.
Then, when the application wants to retrieve data, the read access to the Redis instance can occur on any server in the cluster. A load balancer takes care of routing the individual read requests, which directs traffic using one of a number of load balancing algorithms. (There are several load balancing algorithms, including round robin, least used, etc., but they are outside the scope of this discussion.)
Another benefit to using read replicas is in availability improvement. If a read replica crashes, the load balancer simply redirects traffic to another read replica. If the write master crashes, you can promote one of the read replicas to the role of master, so the system can stay operational.
Read replicas are an easy-to-implement model for horizontal scalability, and the method improves availability with little or no application impact.
Active-Active replication or Active-Active clustering is a way to improve performance for higher database loads.
As with read replicas, Active-Active (also called multimaster replication) relies on a database cluster with multiple nodes, with a copy of the database stored on all the nodes and a load balancer distributing the load.
With Active-Active replication, however, both read and write requests are distributed across multiple servers, and load balanced among all the nodes. The performance boost is meaningful, because a significantly larger number of requests can be handled, and they are handled faster.
Note that Active-Active replication is not supported directly by Redis OSS. If this turns out to be the appropriate scaling architecture for your needs, you’ll need Redis Enterprise. But the focus here is in explaining the computer science technique, no matter where you get it from (including building it yourself, if you have that sort of time).
With Active-Active replication, the read propagation happens exactly as described in the previous section.
When an application writes to one node, this database write is propagated to every master in the system. There are many ways this can occur, such as:
- The application can force the write to all masters.
- A write proxy can distribute the writes.
- The master node that receives the write call can forward the request to other non-receiving master servers.
Figure 3 illustrates a database implementation with a cluster consisting of three servers. Each server contains a complete copy of all the data. Any server can handle any type of data request — read or write — for any data in the database.
When the load balancer directs a write request to the database master instance, such as in this example, it sends the update to all the other replicated instances. If a write is sent to any of the other nodes, that node sends the update to all the other replicated instances in a similar manner.
But what happens when two requests are sent to update the same data value?
In a single-node database, the requests are serialized and the changes take place in order, with the last change typically overriding previous changes.
In the Active-Active model, though, the two requests could come to different masters. The masters could then send conflicting update messages to the other master servers. This is called a write conflict.
In the case of a write conflict, the application needs to determine which database-write to keep and which to reject. That requires a resolution algorithm of some type, involving application logic or database rules.
Additionally, since updates are sent to each node asynchronously, it’s possible for data lag to cause one node to go slightly out of sync with another node. That’s an issue even when that mismatch is only for a short period of time. Developers have to take care that the application considers this potential lag so that it does not affect operations. This is similar to the issues with read replicas, but is potentially more complex.
Besides improving performance, this model of horizontal scalability also increases overall database availability. If a single node fails, the other nodes can take up the slack. However, since each node contains a complete copy of the data, this has no impact on the storage limit of a database by adding additional servers.
The cost of this model is increased application complexity in dealing with conflicting data.
Redis Enterprise’s Active-Active Geo-Distribution
Redis OSS does not natively support multimaster redundancy in any form.
However, Redis Enterprise does Active-Active Geo-Distribution, which provides Active-Active multimaster redundancy.
Then Redis Enterprise’s Active-Active Geo-Distribution goes one step further. It enables individual clusters to be located in geographically distributed locations yet replicate data between them. Take a look at Figure 4.
This allows a Redis database to be geographically distributed to support software instances running in different geographic locations.
In this model, multiple master database instances are in different data centers. Those can be located across different regions and around the world. Individual consumers, via the application, connect to the Redis database instance that is nearest to their geographic location. The Active-Active Redis database instances are then synchronized in a multimaster model so that each Redis instance always has a complete and up-to-date copy of the data.
Redis Enterprise’s Active-Active Geo-Distribution has sophisticated algorithms for effectively dealing with write conflicts, including implementing conflict-free, replicated data types (CRDTs) that guarantee strong eventual consistency and make the process of replication synchronization significantly more reliable. The application still must be aware of and deal with data lag and write conflicts, so these issues don’t become a problem.
What’s Right for You?
You need to make your applications run faster and support additional burden on their databases. Fortunately, as this article demonstrates, you have plenty of options for scaling techniques. Each has a different impact on the amount of storage space available to your application and on the system resources.
The technique you ultimately choose depends on many factors, including your company’s goals, your software requirements, the skills of the people in your IT department, your application architecture and how much complexity you’re willing to take on.
To learn more about how Redis Enterprise scales databases — with more diagrams, which are always fun — consult “Linear Scaling with Redis Enterprise.”
Perché le piccole squadre vincono
Vincono perché risolvono la complessità della comunicazione.
La legge di Metcalfe dice che ogni volta che aggiungi un nuovo utente a una rete, il numero di connessioni aumenta proporzionalmente al quadrato del numero di utenti. Questa è tecnologia, M cosa significa questo per la tua organizzazione e i tuoi team?
La struttura della nostra mente è che non può davvero gestire un gran numero di relazioni. Scienziati come Robin Dunbar ci hanno detto per decenni che il nostro mondo sociale è molto piccolo.
Dicono che ci sono circa 5 persone con cui possiamo avere relazioni strette e altre 15 persone con cui possiamo avere relazioni leggermente meno intense. Pensa alle squadre sportive, per esempio. Raramente sono più grandi di 15 persone.
Per questo motivo, le organizzazioni progressiste si sono allontanate dalle tradizionali gerarchie manageriali. Al contrario, si strutturano come reti decentralizzate di team.
Queste reti non hanno (o pochi) quadri intermedi. Sono dotati di team altamente autonomi in cui i membri si occupano della comunicazione, del coordinamento e della contrattazione.
Ma come mostra la legge di Metcalfe, quando non ci sono manager, i team devono essere abbastanza piccoli da non sovraccaricare i membri di comunicazioni e informazioni
Quale pensi sia il futuro delle organizzazioni?
Learning GIT using McDonald’s
The most lightly taken & ignored skill: Git️️⚙️
When you go to McDonald’s🍔 & order a “Double Quarter Pounder Burger with whole wheat bread bun, some extra Cheese with less tomatoes slice & extra mayonnaise with a large Coke”🍔🍶
(By the way thats a huge burger man 😂)
The moment you place Order:
🍔 The manager will shout out your order within the team🗣️
🍔 2 3 people will start separately working on your order like one on chicken patty part, one on wheat bread bun, one on veggies & one on your soft drink so on🕴🏻🕴🏻
🍔 They all will first “Pull” the latest stock they have in their respective inventory like bread, veggies, chicken patty & all.
🍔 The moment all 3 4 guys are ready with their individual task, they come at a single place with their result🤌🏻😎
🍔 They will add & merge all their results in a wrapper
🍔 Then push it in a box & boom it will be handed over to you🥳
And This is how exactaly Git concept works✅🎉
The moment your Team gets a Project like “Create a Calculator”
👨🏻💻Your Team Lead will shout out the task within the Team➕✖️➗
👨🏻💻 2 3 developers, will start separately working on different features of Calculator like one on Addition➕, one on Division➗ & so on.
👨🏻💻They will first “Pull”, at present the latest code, of your project, from Github/Bitbucket or wherever your project is stored into their local machines💻💻
👨🏻💻 Once all 2 3 guys, are ready with their respective task, they all will “push” their code at a single place📥
👨🏻💻They will “merge” all their work & boom the Calculator is ready✅🎉
Lets now dig a bit deeper on the Practical & Processing part of Git🧙🏼♂️⚙️
Note: Keep patience for next 3 mints and read calmly, lets start😎️
The moment you are asked to code & add a new feature in your existing project code repo, you will first
🔥Clone the code repo by command : git clone
Your company will be storing their code in a repo on Github or Bitbucket, they will create your account as well there, so login into that account, go to the code repo and use the command
🧞♂”️git clone https://[email protected]/sample_repo.git -b develop”
The “-b develop” will clone the develop branch, from many other branches present in your main code repo of team.
🔥Now once you have project’s develop branch code, you will create your own branch from it (like your own working space)
🧞♂️git checkout -b mohit/feature_calculator_addition
“mohit/feature_calculator_addition” this branch is your private space, separated from your main team project, this will help to avoid any issue, from your side, in the main code of the team.
🔥Start doing the changes or code addition you want and once done do
🧞♂️git status : It will show all the scripts you have made changes to.
🧞♂️git add a.py b.py: This will add the changed script to a staging/space where they are made ready to be now pushed to main repo.
🧞♂️git commit -m “added addition feature”: This will be your msg to the team for the code you have worked on.
🧞♂️git push origin mohit/feature_calculator_addition: This will push your entire code to the branch you have created.
🔥Till now you have taken pull of the team’s code, created your own branch, worked on it, pushed the code to “your” branch.
But all this is at “your” level till now, so to put all this, in teams code repo you will raise a “Pull Request” to your Team Lead.
🔥Login to your Bitbucket/Github console -> Go to Pull Request tab,
select from where to where you want to move your code.
From: mohit/feature_calculator_addition To: Develop
🔥Once you raise the PR, by adding your team lead name, he will be notified & he can see your changes in all those scripts you have made.
🔥If he/she feels it’s proper, lead will accept your code & merge it in the teams’s develop branch or else he will reject it with a message by him & you will be notified why it’s been rejected & you will do the necessary changes & will raise a PR.
🔥Also when you raise a PR, there can be a scenario where you and one other team member have made changes in the exact same script in exactly the same line, now this will give a “Merge Conflict” when code will be added to the main branch🤕
Merge conflicts happen when you merge two branches that have competing/smiliar commits, and Git needs your help to decide which/whose changes to incorporate in the final merge.
🔥You have to resolve this merge conflict, by either changing the code’s position or by deciding which one of you will be keeping your code on that paticular line giving merge conflict🥶
🥳Once resolved, the final code will be added to the project’s repo code🥳
🤷🏻♂️Now the most asked question: What is Git and GitHub ?🙄
See Internet is a concept, on which we are making and using Whatsapp, Facebook, Gmail🤖
Same way, Git is the concept, an approach on which Github & Bitbucket have opened their shops, of giving developer a space, where we can save our code & version them, so that if by chance we have issue in our Production code, we can just rollback to our second last release code & can publish it for customer🥳
There are many more, deep down things in Git, but this article’s approach was to just give you a brief idea, about the concept of git, so that later on it will be easy for you to connect dots about git by any tutorial you like🤌🏻😎
And Thank You for being here with me and for your love🏻🙌🏻
It makes me so happy 😃 seeing a clap 👏🏻 or a comment✍🏻 or seeing article shared by someone. It makes all the efforts worth 🙏🏻
So that’s all from now, hope you have enjoyed the Git burger 🍔😂 as well the concept.
Wish you all a happy learning and an awesome year ahead🥳
Do you suffer from Imposter Syndrome? Always believing that you must be completely knowledgeable before acting? Adjust your viewpoint.
Tip: Critical thinking skills can be really valuable for Software engineers, Product and many other walks of life. It’s about approaching new information with a mix of humble curiosity and doubt.
Think independently and ask good questions that help make thoughtful decisions.
In broad strokes, some of the questions I like to ask based on critical thinking are:
➡️ How do we know we’re solving the right problem?
➡️ How do we know we’re solving the problem in the right way? (i.e. balancing rigor and efficiency, given our understanding of the problem and constraints)
➡️ If we don’t know the sources of our problem, how can we determine the root cause?
➡️ How can we break the key question down into smaller questions that we can analyze further?
➡️ Once we have one or more hypotheses, how do we structure work to evaluate them?
➡️ What shortcuts might we take if we’re under constraints (time pressure) without unduly compromising our analytics rigor around the question?
➡️ Does the evidence sufficiently support the conclusions?
How do we know when we are done? When is the solution “good enough”?
➡️ How do I communicate the solution clearly and logically to all stakeholders?
I’ve found these questions often help. Sometimes we’ll address the symptom of a problem, only to discover there are other symptoms that pop up. At other times, we might quickly ship a solution that creates more problems later down the road.
With a lens on critical thinking, we might challenge assumptions, look closer at the risk/benefit, seek out contradictory evidence, evaluate credibility and look for more data to build confidence we are doing the right thing.
Being in engineering or product, we can sometimes rush to solve a problem right away so it feels like we’re making progress or looks like we’re being responsive to stakeholders. This can introduce risks if we aren’t asking the right questions before doing so, fully considering causes and consequences. Put another way, critical thinking is thinking on purpose and forming your own conclusions. This goal-directed thinking can help you focus on root-cause issues that avoid future problems that arise from not keeping in mind causes and consequences.
➡️ Raise mindful questions, formulating them clearly and precisely
➡️ Collect and assess relevant information, validating how they might answer the question
➡️ Arrive at well-reasoned conclusions and solutions, testing them against relevant criteria and standards
➡️ Think open mindedly within alternative systems of thought, recognizing and assessing, as need be, their assumptions, implications, and practical consequences
➡️ Communicate effectively with others in figuring out solutions to complex problems