Miscellaneous

List of Countries and Capitals List of Chinese Apps banned by India List of Chinese Products in India List of Presidents in India List Of Pandemics List of Union Territories of India List of NITs in India List of Fruits List of Input Devices List of Insurance Companies in India List of Fruits and Vegetables List of IIMs in India List of Finance Ministers of India List of Popular English Songs List of Professions List of Birds List of Home Ministers of India List of Ayurvedic Treatments List of Antibiotics List of Cities in Canada List of South Indian Actress Pyramid of Biomass Axios Cleanest City in India Depression in Children Benfits of LMS for School Teachers First Gold Mine of India National Parks in India Highest Waterfall In India How Many States in India Largest Museum in India Largest State of India The Longest River in India Tourist Places in Kerala List of Phobias Tourist Places in Rameshwaram List of Cricket World Cup Winners List of Flowers List of Food Items Top 15 Popular Data Warehouse Tools YouTube Alternatives 5 Best Books for Competitive Programming Tourist Places in Tripura Frontend vs Backend Top 7 programming languages for backend web development Top 10 IDEs for Programmers Top 5 Places to Practice Ethical Hacking Pipelining in ARM Basics of Animation Prevention is Better Than Cure Essay Sharding Tourist Places in Uttrakhand Top Best Coding Challenge Websites 10 Best Microsoft Edge Extensions That You Can Consider Best Tech Movies That Every Programmer Must Watch Blood Plasma What are the effects of Acid Rain on Taj Mahal Programming hub App Feedback Control system and Feedforward Functional Programming Paradigm Fuzzy Logic Control System What is Competitive Programming Tourist places in Maharashtra Best Backend Programming Languages Best Programming Languages for Beginners Database Sharding System Design DDR-RAM Full Form and its Advantages Examples of Biodegradables Waste Explain dobereiner's triad Financial Statements with Adjustments How to Get Started with Bug Bounty Interesting Facts about Computers Top Free Online IDE Compilers in 2022 What are the Baud Rate and its Importance The Power Arrangement System in India Best Backend Programming Languages Features of Federalism Implementation of Stack Using Array List of IT Companies in India Models of Security Properties of Fourier Transform Top 5 Mobile Operating Systems Use of a Function Prototype Best Examples of Backend Technologies How to Improve Logics in Coding List of South American Countries List of Sports List of States and Union Territories in India List of Universities in Canada Top Product Based Companies in Chennai Types of Web Browsers What is 3D Internet What is Online Payment Gateway API Bluetooth Hacking Tools D3 Dashboard Examples Bash for DevOps Top Platform Independent Languages Convert a Number to Base-10 Docker Compose Nginx How to find a job after long gap without any work experience Intradomain and Interdomain Routing Preparation Guide for TCS Ninja Recruitment SDE-1 Role at Amazon Ways to Get into Amazon Bluetooth Hacking Tools D3 Dashboard Examples Bash for DevOps Top Platform Independent Languages Convert a Number to Base-10 Docker Compose Nginx How to find a job after long gap without any work experience Intradomain and Interdomain Routing Preparation Guide for TCS Ninja Recruitment SDE-1 Role at Amazon Ways to Get into Amazon 7 Tips to Improve Logic Building Skills in Programming Anomalies in Database Ansible EC2 Create Instance API Testing Tutorial Define Docker Compose Nginx How to Bag a PPO During an Internship How to Get a Job in Product-Based Company Myth Debunked College Placements, CGPA, and More Programming Styles and Tools What are Placement Assessment Tests, and How are they Beneficial What is Ansible Handlers What is Connectionless Socket Programming Google Cloud Instances Accounts Receivable in SAP FI FIFO Page Replacement Algorithm IQOO meaning Use of Semicolon in Programming Languages Web Development the Future and it's Scope D3 Dashboard with Examples Detect Multi Scale Document Type and Number Range in SAP FICO BEST Crypto Arbitrage Bots for Trading Bitcoin Best FREE Audio (Music) Editing Software for PC in 2023 Best FREE Second Phone Number Apps (2023) Characteristics of Speed What Is Console Log? Higher Order Functions and Currying Amazon Alexa Hackathon Experience Social Network API Data Compression Techniques Introduction to Vault

Database Sharding System Design

Nowadays, data servers have to collect and process a lot of data. So, the databases have to deal with large data set. The database server's capacity can be raised, but there is eventually a physical limit. The option is to distribute the data among a group of database servers. Although the primary benefit of sharding is to enhance database capacity. It also has the side benefit of enabling the database to manage more traffic as each server in the cluster only needs to react to a portion of the total requests. For a sharded architecture to be effective, there are a few essential components. We'll discuss how it works with shard keys and the significance of de-normalizing data.

What is Database Sharding?

A shard key is assigned to each row, indicating the logical shard on which it can be found. A logical shard cannot be divided into physical shards but can have multiple locations on the single physical shard.The objective of a sharded architecture is to have numerous little shards that evenly distribute data among the nodes. Fast response times are produced for all nodes because of preventing hotspots from overpowering any one of the nodes. Sharding can be put into practice at the database or application level. With the notable exception of PostgreSQL, one of the best relational databases, many databases at current time enable sharded designs. Data is divided into two or more logical shards, or smaller pieces, by the process of sharding. Then, the logical shards are spread among several database nodes, known as physical shards, each of which can accommodate multiple logical shards. In spite of this, the information contained in all the shards collectively represents a complete logical dataset.

Why the Database Sharding is used?

In the past, information was kept in an RDBMS (Relational Database Management System), where it was organized into tables with rows and columns. Instead of storing the data in a single table connected by a foreign key for data with 1-to-N or N-to-N relationships, a normalization process would store the data in separate tables that were joined together. This would ensure that the data did not become out of sync and that it could be joined to obtain a complete picture of the data.Traditional database systems, however, experience constraints in their ability to process, store, or retrieve data as data size grows. As a result, they will require more costly and advanced gear to maintain performance. Even with the best technology, most successful modern applications demand significantly more data than a typical RDBMS can handle.The most common way to implement sharding is at the application level, which means that the application's logic determines which shard to send reads and writes to. The ability to do sharding directly at the database level is provided by some database management systems, though.

Approaches used in Sharding

Based on how the shard key is assigned, many techniques to sharded architectures can be used. Shard keys must be distinct among shards regardless of where they came from, therefore their values must be coordinated. As a result, a stated distributed procedure that is quicker to compute must be preferred over a centralized "name server" that can dynamically optimize logical shards for efficiency.

To optimize for the most frequent queries, shard keys are obtained from some invariant attribute of the data. Tenant identifiers, locations, and timestamps are typical options. Based on usage patterns, real shard sizes, etc., custom setup can help optimize a sharded architecture's performance.

Geo-based Sharding

Data is divided up based on where the user is located, such as the user's continent of origin or a region of comparable size. Usually, a fixed location is picked, like the user's location at the time their account was created.Users can be directed to the node that is closest to their location using this strategy, which lowers latency. Users may not be distributed equally among the various geographic locations, though.

Range Sharding

 A database having sequential time-based data, such as log history, could be sharded based on month periods. Because data that is "near" within the specified range will be on the same shard, range-based shard keys have the significant benefit of making sequential access patterns relatively quick.The balance of the data can be uncertain, which is a drawback of ranges. The shard with the December range can become overburdened while the other shards aren't doing anything, for instance, if an e-commerce business receives much more orders in December due to holiday shopping.

Hash Sharding

This computes the partition by first creating a hash based on the key value using a hashing technique. A decent hash algorithm will equally distribute data among partitions, lowering the possibility of hotspots. However, since it is likely to divide up related rows into multiple divisions, the server cannot improve speed by attempting to foresee and pre-load upcoming queries.There is no single point of failure in a hash-based sharded architecture because any server that understands the hash function may compute the shard key.Hashing has a significant drawback in that, depending on the architecture, adding shards can result in significant overhead. This is limited by consistent hashing, which ensures that only a minimal quantity of data must be transferred each time a new node is added.This method's key selling point is that it may be used to equally disperse data to avoid hotspots. Additionally, unlike other systems like range-based or directory-based sharding, which require maintaining a map of all the data's locations, algorithmic distribution eliminates the requirement for this.

Advantages of Sharding

  • A relational database can be set up to work on a single machine and scaled up as needed by improving its computational power.
  • The ability to scale horizontally makes your solution much more versatile because, in the end, any non-distributed database will have a limited amount of storage and computation capacity.
  • Upgrading the hardware of an existing server entails vertical scaling, also known as scaling up, and often involves adding more RAM or CPU.
  • Adding more machines to an existing stack is a technique known as "horizontal scaling," which helps to distribute the load and encourages more traffic and faster processing.
  • Another factor that could lead some users to choose sharded database architecture is the requirement to speed up query response times.
  • When you submit a query to a database that hasn't been sharded, it could take a while for it to locate the desired result set since it has to search through every row of the table you're querying.
  • Queries can become unacceptably slow for an application with a big, monolithic database.
  • By reducing the effects of outages, sharding can also aid in increasing the dependability of an application. An outage could render your entire program unusable if your website or application depends on an unsharded database.
  • Utilizing standard hardware rather than cutting-edge equipment.
  • The scaling of the database can be done very quickly with the help of more number of shards.
  • Improved performance because of lower load on each unit.
  • Even if the computational constraints have not been reached, sharding may still be necessary to maintain distinct geographic zones.
  •  Your service will be faster if the data servers are located closer to the users, or there may be restrictions on the use and location of data in one of the nations where your service is available.

Disadvantages of Sharding

  • Complexity is the main factor in database sharding's drawbacks.
  • Because the queries must obtain the correct shard key and be mindful of preventing multi-shard queries, they become more complicated.
  • You must implement eventual consistency for duplicated data or maintaining relational constraints if the shards can't be completely isolated.
  • Your database's deployment and implementation, as well as failovers, backups, and other types of maintenance, become significantly more complex. In short, you should only employ database sharding in extreme cases.
  • Users must manage data across many shard locations, which may be disruptive to some teams since they are no longer able to access and manage their data from a single entry point.
  • A sizable part of your customers may experience application lag and stalling because the A-M shard gradually accumulates more data than the N-Z one.
  • To allow for a more equitable distribution of data, the database would probably need to be restored and re-sharded.

Conclusion

If you want to scale your database horizontally, sharding can be a fantastic approach. However, it also increases your application's complexity and the number of potential failure spots. Sharding might be required for some applications, while for others, the costs and time involved in developing and maintaining a sharded architecture may outweigh any advantages.The advantages and disadvantages of sharding should be more evident to you after reading this conceptual piece. This knowledge will help you decide whether sharded database architecture is appropriate for your application going ahead.