

The Least Recently Used (LRU) can be a reasonable policy for our system. The application servers, before making a query to the database, may check if the cache has the desired data.

We can use a distributed cache layer to store and retrieve frequently accessed URLs. This allows servers and objects to scale without affecting the overall system. If this approach still leads to overloaded partitions, we need to use Consistent Hashing.Ĭonsistent Hashing is a distributed hashing scheme that operates independently of the number of servers or objects in a distributed hash table by assigning them a position on an abstract circle, or hash ring. This process can make the distribution more random. The hash function will generate a server number, and we will store the key on that server. In this approach, the hash of the object determines the partition in which the data will be stored. We may combine less frequently occurring letters into one database partition. On the other hand we may have too many URLs that begin with the letter "E". For example, there are very few words starting with "Z". The problem with this approach is that it can lead to unbalanced partitioning. We can partition the data based on the first letter of the hash key, i.e., Range Based Partitioning. Key-value databases are highly partitionable and can scale horizontally to sizes that other types of databases cannot. A "key-value based" choice like "Amazon DynamoDB", "Aerospike" or "Couchbase" would be easier to scale.Ī key-value database stores data as a collection of key-value pairs in which a key serves as a unique identifier.īoth keys and values can be anything, ranging from simple objects to complex compound objects. We would have billions of records, small objects or records of 1 KB with no relation, and read heavy queries. We can overcome this limitation by having a standby replica of KGS. The issue with this approach is that the KGS server is now the single point of failure. KGS also has to make sure not to provide the same key to multiple servers or multiple requests from the same server. In this approach, Key Generation Service (KGS) makes sure that all the generated keys are unique. The first issue of duplicate URLs can be solved by appending the user id (unique) to the input URL, and to overcome the already encoded input URL, we can decode all input URLs before generating the hash. Two potential issues can arise out of this solution, i.e., multiple users can get the same shortened URL for the same input URL and cases need to be handled if parts of the input URL are already URL-encoded. Using base64 encoding, a "six-letter long key" would result in 64^6 ~= 68.7 billion possible strings. The total number of unique URL hashes we would need to generate in 5 years would be 1B * 12 months * 5 years = 60 B. Hard Questions How do I create a short, unique URL? 1) Encoding the actual URL Storage needed = 1 Billion * 5 years * 12 months * 1 KB = 60 Billion KB ~ 60 TBĬache Storage: If we assume that 20% of URLs generate 80% of the traffic, we would like to cache these 20% of hot URLs. Let's assume these 1 B URLs will be kept for five years and each URL takes around 1 KB of storage. URL redirections per second or QPS = 100 x URL shortening = 40K/s. URLs shortenings per second = 1B / (30 days * 24 hours * 3600 seconds) ~ 400 URLs/s. Trafficġ billion new URL shortenings or 100 billion redirections per month. Let's assume a 100:1 ratio between reading and writing. That means the system will have to be read extensive. There will be lots of redirection requests in comparison to new URL shortening requests.

a) Functionalġ) For a given URL, the application should generate a shorter and unique alias.Ģ) When this shorter URL is accessed or clicked, the service should redirect them to the original link.ģ) Users should be able to specify their own URL alias.Ĥ) Users should also be able to specify an expiration time, and if no expiration time is specified, the links should expire after a default timespan automatically.ġ) The system should be highly available and scalable to accommodate an increasing user base.Ģ) Users should have a smooth experience with minimal latency.ģ) Shortened links should not be predictable or guessable. If the interviewer wants to add some more features, he/she will mention that. Tell the interviewer that you are going to support the below features. Users are redirected to the original URL when they hit these aliases. URL shortening is used to create shorter aliases for long URLs.
