Upscaling LinkedIn’s Profile Datastore While Reducing Costs

Co-Authors: Estella Pham and Guanlin Lu
At peak, LinkedIn serves over 4.8 million member profiles per second. The number of requests to our storage infrastructure doubles every year. In the past, we addressed latency, throughput and cost issues by migrating off Oracle onto Espresso, an open-source document platform, and adding more nodes. We are now at the point where some of the core components are straining under the increasing load, and we can no longer address scaling concerns with the node addition strategy.
Instead, we chose to introduce Couchbase as a centralized storage tier cache for read scaling. This solution achieved a cache hit rate of over 99%, reduced tail latencies by more than 60%, and trimmed the cost to serve by 10% annually. The decision also set forth a number of engineering problems because the cache isn’t backed by a primary storage infrastructure. In this blog post we’ll discuss our decision to leverage Couchbase, the challenges that arose, and how we addressed each challenge in our final solution.
Background
Before we dive into the changes made to adopt Couchbase, we want to provide the readers with some important background information and context.
The majority of profile requests are read. Historically, Profile backend had employed a centralized cache using memcached as a read cache between the application and the database. The cache solution did not work well for us. We experienced performance degradation during cache expansion, node replacement, and cache warm up issues. Maintenance of memcached was challenging.
When we migrated from Oracle to Espresso in 2014, we no longer needed a read cache because of the built-in scalability and latency characteristics of Espresso. We succeeded in scaling Espresso profile datastore to support service call count over 1.4 million QPS at peak by expanding the cluster size of the backend application along with Espresso. Eventually, that approach hit the upper limit on the Espresso shared components, and raising them would require a substantial engineering investment.
Member profiles are stored in Espresso in Avro binary format. Espresso supports document schema and the schema can evolve over time. The schema version number is incremented with every update.
When a LinkedIn member updates their profile, the request is sent to the Profile backend application which serializes the change by converting the document into an Avro binary format before persisting it in Espresso. When someone views a LinkedIn profile, the Profile frontend application sends a read request to the Profile backend application which fetches the profile from Espresso, deserializes the response converting the Avro binary formatted content to a document to return to the frontend application.
A profile view or update travels through multiple applications as summarized in Figure 1. Espresso consists of routers and storage nodes. An Espresso router serves as a proxy for profile requests, deciding which database partition holds a member’s profile using the unique identifier of the record, and directing the read/write request to the right Espresso storage node. Every Espresso router has an off-heap cache (OHC).
Figure 1. Data flow for a profile view or update
A datum reader is used to deserialize an Avro binary content. It requires both the writer’s schema (one representing the profile document when it was persisted to Espresso) and the reader’s schema (one currently used by the application). Both the writer’s schema and the reader’s schema are required because they can differ due to the schema evolution. Converting a profile from an earlier schema’s version to the latest version is referred to as the schema upconversion. Prior to Couchbase adoption, the schema upconversion took place in Espresso storage nodes.
Every Espresso router has an OHC that is configured with the “Window Tiny-LFU” cache eviction policy to retain frequently accessed records. OHC is proven to be highly efficient as a hot key cache. However, for the LinkedIn member profile use case, OHC cache has a low cache hit rate for two reasons. OHC is a local cache on a router and only sees the read requests to the given router. Profile requests employ projection (i.e., applications request only the fields of interest rather than the entire document) which further reduces the cache hit rate.
At LinkedIn, we have used Couchbase as a distributed key-value cache for various applications. It is chosen for the enhancements that it confers over memcached, including data persistence between server restarts, replication such that one node can fail out of a cluster with all documents still being available, and dynamic scalability in that nodes can be added or removed with no downtime.
Because profile requests are characteristically dominated by read (>99% read and <1% write), caching was one option we explored when we considered how to expand the profile Espresso datastore. Note that the profile Espresso cluster had grown to a point where adding more hardware to expand its capacity as we had done was no longer an option without a major reengineering effort. Because Couchbase is widely used at LinkedIn, it is natural that we considered it for our use case. Furthermore, to use Couchbase as an upscaling solution for Espresso, the Couchbase cache can reside in the storage layer, affording the application owners the benefit of not having to deal with the caching internals.
However, for any cache to be used for the purpose of upscaling, it must operate completely independent from the source of truth (SOT) and must not be allowed to fall back to the SOT on failures. We addressed these requirements by solving three design principles.
Cache design principles
The three design principles for a cache that is used for upscaling:
-
A guaranteed resilience against any Couchbase failures
-
An all-time cached data availability
-
A strictly defined Service Level Objective (SLO) on data divergence between the SOT and the cache.
Resilience against any Couchbase failures
Resilience against any Couchbase failures is critical since it is not allowed to fall back to the SOT. We implemented the following guardrails to protect important components of the integrated system and prevent a cache failure from creating a cascade of failures.
-
Espresso router: We maintain a Couchbase health monitor in every router to track the health of all Couchbase buckets the router has access to. The health monitor evaluates a bucket’s health based on the request exception rate against a predetermined threshold. If a bucket is unhealthy, the router will stop dispatching any new requests to the bucket, preventing requests from accumulating in the router’s heap, leading to client timeouts and downing a router.
-
Couchbase leader’s replica failure: We choose to have 3 replicas for Profile data (a leader’s replica and 2 followers’ replicas) and implement the leader-then-follower interface (API). Every request is fulfilled by the leader’s replica. If the leader’s replica fails, the router would fetch the data from a follower’s replica.
-
Retrying eligible failed Couchbase requests: Despite the fact that a Couchbase failure can be caused by any possible reasons, a Couchbase exception is categorized by 1 of the 3 following categories: an Espresso router issue, a networking issue, or a Couchbase server side issue. For the first two categories, we retry the failed Couchbase request on a different router with the assumption that the issue may be localized and transient, and a retry may succeed.
All-time cached data availability
To make cached data available everywhere all the time including during a datacenter traffic failover, we elect to cache the entire Profile dataset in every data center. It is a good option because profile payload is small: the 95th percentile is less than 24 KB.
It would be trivial to make data available in every data center with an infinite Time-To-Live (TTL) for every cache record. That however would lead to a permanent data divergence, caused by missing deletes for example (i.e. a deletion occurred in the Espresso database but got lost before reaching Couchbase). Despite missing deletions are rare events, they may still happen when, for example, a Kafka infrastructure outage can cause certain events to fall off the retention window before they can be consumed by the cache updater. As a result, we choose to set a finite TTL for the cached data to ensure that any expired records will be purged from Couchbase.
To prevent Couchbase cache from becoming cold or divergent from the SOT, we periodically bootstrap Couchbase. The bootstrapping period is within the cached data TTL to guarantee no records that exist in Espresso would expire before the next bootstrapping.
Data divergence prevention
Race conditions among components that have write access to Couchabse cache (an Espresso router, the cache bootstrapper and the cache updater) can lead to data divergence. To prevent it from happening, we order the cache updates of the same key in order to prevent a stale record from being inserted into Couchbase. To order the updates, we compare the System Change Number (SCN) values associated with cache records. Conceptually, SCN is a logical timestamp: For each database row committed, Espresso storage engine produces a SCN value and persists it in the binlog. The order of SCN values reflect the commit order within a data center, cluster, database and database partition.
To be consistent with the SOT Espresso Database, our system coordinates update operations to Couchbase via SCN comparisons and all updates to Couchbase follow the Last-Writer-Win (LWW) reconciliation – given the same key, the record with the largest SCN always replaces the existing one in Couchbase. To memorize deletes, we have the cache updater upsert tombstone records into Couchbase rather than issuing hard deletions. A tombstone record differs from a regular cache record in that it contains only an SCN as its data payload. It is however subjected to the same purge policy as a regular record.
Our system also compares the SCN in the storage node’s response and that of the cache record. It updates the Couchbase with the storage node’s response if its SCN value is the same or greater. The former is intended to handle a cache miss due to an expired TTL, or when the read request for the key has a higher document schema version.
To handle concurrent modifications that may occur while routers and/or cache updaters attempt to update a cache entry, we use Couchbase Compare-And-Swap (CAS) to detect concurrent updates and retry the update if necessary. The CAS value represents the current state of a record stored in Couchbase: Each time a record is modified, its CAS value is updated.
Hybrid cache strategy
Architecturally, the Espresso integrated Couchbase cache consists of the Espresso router, the cache updater, the cache bootstrapper and the Couchbase cluster (see Figure 2). We converted from the existing cache strategy that employed OHC in the routers to the hybrid cache strategy implementing both OHC for hot keys and Couchbase for all reads that cannot be satisfied by OHC.
Espresso router has the sole responsibility of determining whether a read request can be served by the cache tier or a storage node, and which cache tier – OHC or Couchbase. The cache contract between an application and Espresso is set with the HTTP header called the staleness bound header. Applications can enable or disable cache, and set cache staleness tolerance per request using the staleness bound header. Based on the staleness bound value, Espresso determines where the request can be served.
One major advantage of the Espresso integrated cache is that Espresso abstracts away all of the caching internals, freeing the application owners to focus on just the business logic.
Figure 2. Espresso integrated Couchbase cache. Application requests are in blue, responses are in green. A cache record is a full HTTP response that holds an Espresso response containing a profile document in Avro/binary format
Read path
When the Profile backend application sends a read request to an Espresso router, the router evaluates if it is a cacheable request based on the table schema configuration and the value set in the staleness bound request header. The router then determines if the key is in its OHC. A request is sent to the Couchbase when the key does not exist in the local OHC, or when the key exists in the OHC but its content is too stale. With the latter, the read request is sent to Couchbase, and which can generate a cache hit, or a cache miss. A cache miss requires the request to be served by a storage node. In that case, the router sends the document back to the Profile backend application while asynchronously upserting it into the hybrid Cache tier.
Write path
Cached profiles in Couchbase must be kept in sync with the data in Espresso. We implemented a cache updater and a cache bootstrapper via Samza jobs. Each consumes Espresso change events from a Brooklin change capture stream, or a Brooklin bootstrap stream, respectively, and upserts into the Couchbase cache (See Figure 2). The Brooklin change capture stream is populated with database rows that had been committed to the SOT whereas the Brooklin bootstrap stream is populated with a periodically generated database snapshot.
Since the Brooklin streams are nearline, the database changes synchronized from the Espresso datastore to Couchbase follow the eventual consistency principle. Espresso-integrated Couchbase cache needs to cope with race conditions raised by multiple writers to the same Couchbase cache. To prevent data divergence, we introduced a logical timestamp to order the writes (see “Data Divergence Prevention” for more details.)
Full document
While it is optimal for applications to apply projection to reduce the payload of the responses for performance reasons, transitioning to adopt the Espresso integrated hybrid cache required us to evaluate projection and where it should be applied. The number of projections used by applications is large and varied. If we were to cache projected requests, the cache solution would not be very efficient because the possible projection permutation is immense. Moreover, since the majority of profiles are small, we decided to cache the full profile dataset. Projection is continued to be supported but the responsibility to apply projection on full profiles has moved from the Espresso storage nodes to the Profile backend application.
Profile backend changes
With Espresso now returning a full document to the Profile backend as it was written (i.e., based on the profile schema version used in the last update), the responsibility to perform the schema upconversion lies with the Profile backend.
Schema upconversion
Prior to Couchbase, when the Profile backend sent Espresso a read request, it set the accept-type in the HTTP request header to the latest Avro schema version registered, for example, version 55. Data returned by Espresso conformed to the schema version 55, and the datum reader used schema version 55 when it deserialized the Avro binary.
With Couchbase and the profile schema had evolved for example to version 70, when the Profile backend sends Espresso a read request, it sets the accept-type in the HTTP request header to version 0 to tell Espresso to return the data as it was written. The Profile backend now performs the schema upconversion transforming the record written with schema version 55 to the version 70.
Figure 3. Deserialization flow after Couchbase adoption in the Profile backend
Projection
Because profile documents are returned as full binaries from the Espresso routers, the Profile backend applies projection on the documents as dictated by the query projection parameter before deserializing it and returning it to the client.
Registered schemas
Prior to Couchbase, the Profile backend cached all Avro schemas statically during startup. The schema cache was not expected to change during the runtime of the application. If a new profile schema version 70 becomes available, the Profile backend would not have the version 70 until the next restart. This approach had worked because Espresso performed the schema upconversion to the latest version configured in the Profile backend.
When the schema upconversion moved to the Profile backend, the application must be able to fetch the latest registered schema from the registry when it requires the specific version for deserialization. This requirement comes from the deployment pipeline allowing a new software version to be limitedly deployed to a few instances, the canaries, where the new software version must pass predefined tests before it is allowed to deploy to the full cluster. For example, during a typical deployment, when a Profile backend canary updates a profile using the latest schema version 70, any other Profile backend instance which needs that schema version but still runs with the latest schema configured to version 69, must be able to fetch the schema version 70 from the schema registry.
Figure 4. A Profile backend instance in the deployment queue
Pegasus datum reader
One optimization we made involves the use of a new datum reader. The previous datum reader converts an Avro/binary object to a GenericRecord, which then gets translated to a DataMap used to create a Profile document. The new datum reader bypasses the intermediate step with GenericRecord and converts an Avro/binary to a DataMap directly. This change gained us a performance boost (see Table 2) and compensated for the performance hit incurred with the upconversion and projection relocating to the Profile backend.
Performance analysis
Tail latency reduction
The end-to-end latencies recording the round-trip time the Profile backend spends to fetch profiles from Espresso dropped significantly. We categorize the read requests received by Espresso as single get when the requests contain single keys, and multi get when the requests contain multiple keys. With the majority of the read requests being multi get requests, the 99th percentile latency dropped by 60.73%, and the 99.9th percentile latency by 63.66% (see Table 1).
Profile Latency | Without CouchBase (ms) | With CouchBase (ms) | Reduction (ms) | Reduction % |
Single Get P99 | 4.184 | 3.984 | -0.2 | -4.78% |
Single Get P99_9 | 25.64 | 15.15 | -10.49 | -40.91% |
Multi Get P99 | 31.6 | 12.41 | -19.19 | -60.73% |
Multi Get P99_9 | 66.87 | 24.3 | -42.57 | -63.66% |
Table 1. The round trip latencies between the Profile backend and Espresso
Cache effectiveness
Our Couchbase cache achieves a 99% cache hit rate (see Figure 5).
Figure 5. Couchbase cache hit rate
Pegasus datum reader
Using the Pegasus datum reader, the performance gains are seen across different percentiles (measured in microseconds) when conducting record deserialization for individual profile records. Comparing between the control datum reader (Espresso) and the treatment datum reader (Pegasus), the Pegasus datum reader performs much better across the board. The 95th percentile latency dropped by 34.1%, 99th percentile latency by 37.4% for, 99th percentile latency by 28%.
Percentile | P50 | P90 | P95 | P99 | P99_9 |
Espresso (µs) | 234.35 | 406.55 | 477.05 | 729.25 | 1080 |
Pegasus (µs) | 138.3 | 261.95 | 314.4 | 456.25 | 777.15 |
Reduction (µs) | -96.05 | -144.60 | -162.65 | -273 | -302.85 |
Reduction (%) | -40.1 | -35.6 | -34.1 | -37.4 | -28.0 |
Table 2. Deserialization of individual profile latencies measured in microseconds
Cost savings
With the Espresso hybrid cache tier, we are able to reduce the number of Espresso storage nodes by 90%. Since we also set up new infrastructure (e.g. the new Couchbase cluster, the Samza bootstrapping and updating jobs) and incur additional costs for the increase in Profile backend computing resources to handle the upconversion and projection (e.g., we set a maximum 30% buffer), we estimate that conservatively we save LinkedIn about 10% annually on the costs of servicing member profile requests.
The adoption of Espresso integrated Couchbase has enabled us to achieve our goals to support a growing LinkedIn member base, upscale profile datastore, lower the cost to serve, and to do all without impacting performance or member experiences.
This project could not have been done without significant contributions from groups of people across LinkedIn for over 1.5 years. The teams were the Espresso development team, the Espresso SRE team, the Couchbase team, the Identity Platform team, the Samza team, and the Identity SRE team.
Many thanks to Karthik Naidu DJ, Ben Weir, Michael Chletsos, Rick Ramirez, Bef Ayenew, Alok Dariwal, Madhur Badal, Gayatri Penumetsa for your leadership and support for this project.
Many thanks to Jason Less for implementing changes on the Profile backend, and conducting performance analyses and validation; Jean-Francois Dejeans-Gauthier for valuable RB and RFC review feedbacks; Gaurav Mishra for designing and implementing Couchbase health monitor, root-causing and fixing issues throughout the Espresso Couchbase cache development; Antony Curtis for implementing concurrent update and SCN comparison in router and sharing idiomatic pattern with CompletableFuture; Keshav Bachu for implementing and improving cache updater and bootstrapper; Zhantong Shang for designing and implementing periodic cache bootstrapping scheduler; Laxman Prabhu, Yun Sun for the valuable feedback on the espresso caching design; Ning Xu for the early exploration work; Kamlakar Singh for providing valuable feedback on observability and resiliency; Hongyi Ma and Himanshu Gupta for general support near the project completion; Olu Owolabi for setting up dark cluster and configuration; William Nguyen for Profile backend performance assessment; Robert Engel for site operating feedbacks; Ben Weir for sizing, setting up and configuring Couchbase. Zhengyu Cai for cost saving analyses.
Thanks to Shun Xuan Wang, Mahesh Vishwanath, Katherine Vaiente, Madhur Badal, Karthik Naidu DJ and Rick Ramirez for editorial reviews.
Topics
Career stories: Influencing engineering growth at LinkedIn

Since learning frontend and backend skills, Rishika’s passion for engineering has expanded beyond her team at LinkedIn to grow into her own digital community. As she develops as an engineer, giving back has become the most rewarding part of her role.
From intern to engineer—life at LinkedIn
My career with LinkedIn began with a college internship, where I got to dive into all things engineering. Even as a summer intern, I absorbed so much about frontend and backend engineering during my time here. When I considered joining LinkedIn full-time after graduation, I thought back to the work culture and how my manager treated me during my internship. Although I had a virtual experience during COVID-19, the LinkedIn team ensured I was involved in team meetings and discussions. That mentorship opportunity ultimately led me to accept an offer from LinkedIn over other offers.
Before joining LinkedIn full-time, I worked with Adobe as a Product Intern for six months, where my projects revolved around the core libraries in the C++ language. When I started my role here, I had to shift to using a different tech stack: Java for the backend and JavaScript framework for the frontend. This was a new challenge for me, but the learning curve was beneficial since I got hands-on exposure to pick up new things by myself. Also, I have had the chance to work with some of the finest engineers; learning from the people around me has been such a fulfilling experience. I would like to thank Sandeep and Yash for their constant support throughout my journey and for mentoring me since the very beginning of my journey with LinkedIn.
Currently, I’m working with the Trust team on building moderation tools for all our LinkedIn content while guaranteeing that we remove spam on our platform, which can negatively affect the LinkedIn member experience. Depending on the project, I work on both the backend and the frontend, since my team handles the full-stack development. At LinkedIn, I have had the opportunity to work on a diverse set of projects and handle them from end to end.
Mentoring the next generation of engineering graduates
I didn’t have a mentor during college, so I’m so passionate about helping college juniors find their way in engineering. When I first started out, I came from a biology background, so I was not aware of programming languages and how to translate them into building a technical resume. I wish there would have been someone to help me out with debugging and finding solutions, so it’s important to me to give back in that way.
I’m quite active in university communities, participating in student-led tech events like hackathons to help them get into tech and secure their first job in the industry. I also love virtual events like X (formally Twitter) and LinkedIn Live events. Additionally, I’m part of LinkedIn’s CoachIn Program, where we help with resume building and offer scholarships for women in tech.
Influencing online and off at LinkedIn
I love creating engineering content on LinkedIn, X, and other social media platforms, where people often contact me about opportunities at LinkedIn Engineering. It brings me so much satisfaction to tell others about our amazing company culture and connect with future grads.
When I embarked on my role during COVID-19, building an online presence helped me stay connected with what’s happening in the tech world. I began posting on X first, and once that community grew, I launched my YouTube channel to share beginner-level content on data structures and algorithms. My managers and peers at LinkedIn were so supportive, so I broadened my content to cover aspects like soft skills, student hackathons, resume building, and more. While this is in addition to my regular engineering duties, I truly enjoy sharing my insights with my audience of 60,000+ followers. And the enthusiasm from my team inspires me to keep going! I’m excited to see what the future holds for me at LinkedIn as an engineer and a resource for my community on the LinkedIn platform.
About Rishika
Rishika holds a Bachelor of Technology from Indira Gandhi Delhi Technical University for Women. Before joining LinkedIn, she interned at Google as part of the SPS program and as a Product Intern at Adobe. She currently works as a software engineer on LinkedIn’s Trust Team. Outside of work, Rishika loves to travel all over India and create digital art.
Editor’s note: Considering an engineering/tech career at LinkedIn? In this Career Stories series, you’ll hear first-hand from our engineers and technologists about real life at LinkedIn — including our meaningful work, collaborative culture, and transformational growth. For more on tech careers at LinkedIn, visit: lnkd.in/EngCareers.
Career Stories: Learning and growing through mentorship and community

Lekshmy has always been interested in a role in a company that would allow her to use her people skills and engineering background to help others. Working as a software engineer at various companies led her to hear about the company culture at LinkedIn. After some focused networking, Lekshmy landed her position at LinkedIn and has been continuing to excel ever since.
How did I get my job at LinkedIn? Through LinkedIn.
Before my current role, I had heard great things about the company and its culture. After hearing about InDays (Investment Days) and how LinkedIn supports its employees, I knew I wanted to work there.
While at the College of Engineering, Trivandrum (CET), I knew I wanted to pursue a career in software engineering. Engineering is something that I’m good at and absolutely love, and my passion for the field has only grown since joining LinkedIn. When I graduated from CET, I began working at Groupon as a software developer, starting on databases, REST APIs, application deployment, and data structures. From that role, I was able to advance into the position of software developer engineer 2, which enabled me to dive into other software languages, as well as the development of internal systems. That’s where I first began mentoring teammates and realized I loved teaching and helping others. It was around this time that I heard of LinkedIn through the grapevine.
Joining the LinkedIn community
Everything I heard about LinkedIn made me very interested in career opportunities there, but I didn’t have connections yet. I did some research and reached out to a talent acquisition manager on LinkedIn and created a connection which started a path to my first role at the company.
When I joined LinkedIn, I started on the LinkedIn Talent Solutions (LTS) team. It was a phenomenal way to start because not only did I enjoy the work, but the experience served as a proper introduction to the culture at LinkedIn. I started during the pandemic, which meant remote working, and eventually, as the world situation improved, we went hybrid. This is a great system for me; I have a wonderful blend of being in the office and working remotely. When I’m in the office, I like to catch up with my team by talking about movies or playing games, going beyond work topics, and getting to know each other. With LinkedIn’s culture, you really feel that sense of belonging and recognize that this is an environment where you can build lasting connections.
LinkedIn: a people-first company
If you haven’t been able to tell already, even though I mostly work with software, I truly am a people person. I just love being part of a community. At the height of the pandemic, I’ll admit I struggled with a bit of imposter syndrome and anxiety. But I wasn’t sure how to ask for help. I talked with my mentor at LinkedIn, and they recommended I use the Employee Assistance Program (EAP) that LinkedIn provides.
I was nervous about taking advantage of the program, but I am so happy that I did. The EAP helped me immensely when everything felt uncertain, and I truly felt that the company was on my side, giving me the space and resources to help relieve my stress. Now, when a colleague struggles with something similar, I recommend they consider the EAP, knowing firsthand how effective it is.
Building a path for others’ growth
With my mentor, I was also able to learn about and become a part of our Women in Technology (WIT) WIT Invest Program. WIT Invest is a program that provides opportunities like networking, mentorship check-ins, and executive coaching sessions. WIT Invest helped me adopt a daily growth mindset and find my own path as a mentor for college students. When mentoring, I aim to build trust and be open, allowing an authentic connection to form. The students I work with come to me for all kinds of guidance; it’s just one way I give back to the next generation and the wider LinkedIn community. Providing the kind of support my mentor gave me early on was a full-circle moment for me.
Working at LinkedIn is everything I thought it would be and more. I honestly wake up excited to work every day. In my three years here, I have learned so much, met new people, and engaged with new ideas, all of which have advanced my career and helped me support the professional development of my peers. I am so happy I took a leap of faith and messaged that talent acquisition manager on LinkedIn. To anyone thinking about applying to LinkedIn, go for it. Apply, send a message, and network—you never know what one connection can bring!
About Lekshmy
Based in Bengaluru, Karnataka, India, Lekshmy is a Senior Software Engineer on LinkedIn’s Hiring Platform Engineering team, focused on the Internal Mobility Project. Before joining LinkedIn, Lekshmy held various software engineering positions at Groupon and SDE 3. Lekshmy holds a degree in Computer Science from the College of Engineering, Trivandrum, and is a trained classical dancer. Outside of work, Lekshmy enjoys painting, gardening, and trying new hobbies that pique her interest.
Editor’s note: Considering an engineering/tech career at LinkedIn? In this Career Stories series, you’ll hear first-hand from our engineers and technologists about real life at LinkedIn — including our meaningful work, collaborative culture, and transformational growth. For more on tech careers at LinkedIn, visit: lnkd.in/EngCareers.
Topics
Solving Espresso’s scalability and performance challenges to support our member base

Espresso is the database that we designed to power our member profiles, feed, recommendations, and hundreds of other Linkedin applications that handle large amounts of data and need both high performance and reliability. As Espresso continued to expand in support of our 950M+ member base, the number of network connections that it needed began to drive scalability and resiliency challenges. To address these challenges, we migrated to HTTP/2. With the initial Netty based implementation, we observed a 45% degradation in throughput which we needed to analyze then correct.
In this post, we will explain how we solved these challenges and improved system performance. We will also delve into the various optimization efforts we employed on Espresso’s online operation section, implementing one approach that resulted in a 75% performance boost.
Espresso Architecture
Figure 1. Espresso System Overview
Figure 1 is a high-level overview of the Espresso ecosystem, which includes the online operation section of Espresso (the main focus of this blog post). This section comprises two major components – the router and the storage node. The router is responsible for directing the request to the relevant storage node and the storage layer’s primary responsibility is to get data from the MySQL database and present the response in the desired format to the member. Espresso utilizes the open-source framework Netty for the transport layer, which has been heavily customized for Espresso’s needs.
Need for new transport layer architecture
In the communication between the router and storage layer, our earlier approach involved utilizing HTTP/1.1, a protocol extensively employed for interactions between web servers and clients. However, HTTP/1.1 operates on a connection-per-request basis. In the context of large clusters, this approach led to millions of concurrent connections between the router and the storage nodes. This resulted in constraints on scalability, resiliency, and numerous performance-related hurdles.
Scalability: Scalability is a crucial aspect of any database system, and Espresso is no exception. In our recent cluster expansion, adding an additional 100 router nodes caused the memory usage to spike by around 2.5GB. The additional memory can be attributed to the new TCP network connections within the storage nodes. Consequently, we experienced a 15% latency increase due to an increase in garbage collection. The number of connections to storage nodes posed a significant challenge to scaling up the cluster, and we needed to address this to ensure seamless scalability.
Resiliency: In the event of network flaps and switch upgrades, the process of re-establishing thousands of connections from the router often breaches the connection limit on the storage node. This, in turn, causes errors and the router to fail to communicate with the storage nodes.
Performance: When using the HTTP/1.1 architecture, routers maintain a limited pool of connections to each storage node within the cluster. In some larger clusters, the wait time to acquire a connection can be as high as 15ms at the 95th percentile due to the limited pool. This delay can significantly affect the system’s response time.
We determined that all of the above limitations could be resolved by transitioning to HTTP/2, as it supports connection multiplexing and requires a significantly lower number of connections between the router and the storage node.
We explored various technologies for HTTP/2 implementation but due to the strong support from the open-source community and our familiarity with the framework, we went with Netty. When using Netty out of the box, the HTTP/2 implementation throughput was 45% less than the original (HTTP/1.1) implementation. Because the out of the box performance was very poor, we had to implement different optimizations to enhance performance.
The experiment was run on a production-like test cluster and the traffic is a combination of access patterns, which include read and write traffic. The results are as follows:
Protocol | QPS | Single Read Latency (P99) | Multi-Read Latency (P99) |
HTTP/1.1 | 9K | 7ms | 25ms |
HTTP/2 | 5K (-45%) | 11ms (+57%) | 42ms (+68%) |
On the routing layer, after further analysis using flame graphs, major differences between the two protocols are shown in the following table.
CPU overhead | HTTP/1.1 | HTTP/2 |
Acquiring a connection and processing the request | 20% | 32% (+60%) |
Encode/Decode HTTP request | 18% | 32% (+77%) |
Improvements to Request/Response Handling
Reusing the Stream Channel Pipeline
One of the core concepts of Netty is its ChannelPipeline. As seen in Figure 1, when the data is received from the socket, it is passed through the pipeline which processes the data. Channel Pipeline contains a list of Handlers, each working on a specific task.
Figure 2. Netty Pipeline
In the original HTTP/1.1 Netty pipeline, a set of 15-20 handlers was established when a connection was made, and this pipeline was reused for all subsequent requests served on the same connection.
However, in HTTP/2 Netty’s default implementation, a fresh pipeline is generated for each new stream or request. For instance, a multi-get request to a router with over 100 keys can often result in approximately 30 to 35 requests being sent to the storage node. Consequently, the router must initiate new pipelines for all 35 storage node requests. The process of creating and dismantling pipelines for each request involving a considerable number of handlers turned out to be notably resource-intensive in terms of memory utilization and garbage collection.
To address this concern, a forked version of Netty’s Http2MultiplexHandler has been developed to maintain a queue of local stream channels. As illustrated in Figure 2, on receiving a new request, the multiplex handler no longer generates a new pipeline. Instead, it retrieves a local channel from the queue and employs it to process the request. Subsequent to request completion, the channel is returned to the queue for future use. Through the reuse of existing channels, the creation and destruction of pipelines are minimized, leading to a reduction in memory strain and garbage collection.
Figure 3. Sequence diagram of stream channel reuse
Addressing uneven work distribution among Netty I/O threads
When a new connection is created, Netty assigns this connection to one of the 64 I/O threads. In Espresso, the number of I/O threads is equal to twice the number of cores present. The I/O thread associated with the connection is responsible for I/O and handling the request/response on the connection. Netty’s default implementation employs a rudimentary method for selecting an appropriate I/O thread out of the 64 available for a new channel. Our observation revealed that this approach leads to a significantly uneven distribution of workload among the I/O threads.
In a standard deployment, we observed that 20% of I/O threads were managing 50% of all the total connections/requests. To address this issue, we introduced a BalancedEventLoopGroup. This entity is designed to evenly distribute connections across all available worker threads. During channel registration, the BalancedEventLoopGroup iterates through the worker threads to ensure a more equitable allocation of workload
After this change, during registering of a channel, an event loop with the number of connections below the average is selected.
private EventLoop selectLoop() { int average = averageChannelsPerEventLoop(); EventLoop loop = next(); if (_eventLoopCount > 1 && isUnbalanced(loop, average)) { ArrayList list = new ArrayList<>(_eventLoopCount); _eventLoopGroup.forEach(eventExecutor -> list.add((EventLoop) eventExecutor)); Collections.shuffle(list, ThreadLocalRandom.current()); Iterator it = list.iterator(); do { loop = it.next(); } while (it.hasNext() && isUnbalanced(loop, average)); } return loop; }
Reducing context switches when acquiring a connection
In the HTTP/2 implementation, each router maintains 10 connections to every storage node. These connections serve as communication pathways for the router I/O threads interfacing with the storage node. Previously, we utilized Netty’s FixedChannelPool implementation to oversee connection pools, handling tasks like acquiring, releasing, and establishing new connections.
However, the underlying queue within Netty’s implementation is not inherently thread-safe. To obtain a connection from the pool, the requesting worker thread must engage the I/O worker overseeing the pool. This process led to two context switches. To resolve this, we developed a derivative of the Netty pool implementation that employs a high-performance, thread-safe queue. Now, the task is executed by the requesting thread instead of a distinct I/O thread, effectively eliminating the need for context switches.
Improvements to SSL Performance
The following section describes various optimizations to improve the SSL performance.
Offloading DNS lookup and handshake to separate thread pool
During an SSL handshake, the DNS lookup procedure for resolving a hostname to an IP address functions as a blocking operation. Consequently, the I/O thread responsible for executing the handshake might be held up for the entirety of the DNS lookup process. This delay can result in request timeouts and other issues, especially when managing a substantial influx of incoming connections concurrently.
To tackle this concern, we developed an SSL initializer that conducts the DNS lookup on a different thread prior to initiating the handshake. This method involves passing the InetAddress, that contains both the IP address and hostname, to the SSL handshake procedure, effectively circumventing the need for a DNS lookup during the handshake.
Enabling Native SSL encryption/decryption
Java’s default built-in SSL implementation carries a significant performance overhead. Netty offers a JNI-based SSL engine that demonstrates exceptional efficiency in both CPU and memory utilization. Upon enabling OpenSSL within the storage layer, we observed a notable 10% reduction in latency. (The router layer already utilizes OpenSSL.)
To employ Netty Native SSL, one must include the pertinent Netty Native dependencies, as it interfaces with OpenSSL through the JNI (Java Native Interface). For more detailed information, please refer to https://netty.io/wiki/forked-tomcat-native.html.
Improvements to Encode/Decode performance
This section focuses on the performance improvements we made when converting bytes to Http objects and vice versa. Approximately 20% of our CPU cycles are spent on encode/decode bytes. Unlike a typical service, Espresso has very rich headers. Our HTTP/2 implementation involves wrapping the existing HTTP/1.1 pipeline with HTTP/2 functionality. While the HTTP/2 layer handles network communication, the core business logic resides within the HTTP/1.1 layer. Due to this, each incoming request required the conversion of HTTP/2 requests to HTTP/1.1 and vice versa, which resulted in high CPU usage, memory consumption, and garbage creation.
To improve performance, we have implemented a custom codec designed for efficient handling of HTTP headers. We introduced a new type of request class named Http1Request. This class effectively encapsulates an HTTP/2 request as an HTTP/1.1 by utilizing wrapped Http2 headers. The primary objective behind this approach is to avoid the expensive task of converting HTTP/1.1 headers to HTTP/2 and vice versa.
For example:
public class Http1Headers extends HttpHeaders { private final Http2Headers _headers; …. }
And Operations such as get, set, and contains operate on the Http2Headers:
@Override public String get(String name) { return str(_headers.get(AsciiString.cached(name).toLowerCase()); }
To make this possible, we developed a new codec that is essentially a clone of Netty’s Http2StreamFrameToHttpObjectCodec. This codec is designed to translate HTTP/2 StreamFrames to HTTP/1.1 requests/responses with minimal overhead. By using this new codec, we were able to significantly improve the performance of encode/decode operations and reduce the amount of garbage generated during the conversions.
Disabling HPACK Header Compression
HTTP/2 introduced a new header compression algorithm known as HPACK. It works by maintaining an index list or dictionaries on both the client and server. Instead of transmitting the complete string value, HPACK sends the associated index (integer) when transmitting a header. HPACK encompasses two key components:
-
Static Table – A dictionary comprising 61 commonly used headers.
-
Dynamic Table – This table retains the user-generated header information.
The Hpack header compression is tailored to scenarios where header contents remain relatively constant. But Espresso has very rich headers with stateful information such as timestamps, SCN, and so on. Unfortunately, HPACK didn’t align well with Espresso’s requirements.
Upon examining flame graphs, we observed a substantial stack dedicated to encoding/decoding dynamic tables. Consequently, we opted to disable dynamic header compression, leading to an approximate 3% enhancement in performance.
In Netty, this can be disabled using the following:
Http2FrameCodecBuilder.forClient() .initialSettings(Http2Settings.defaultSettings().headerTableSize(0));
Results
Latency Improvements
P99.9 Latency | HTTP/1.1 | HTTP/2 |
Single Key Get | 20ms | 7ms (-66%) |
Multi Key Get | 80ms | 20ms (-75%) |
We observed a 75% reduction in 99th and 99.9th percentile multi-read and read latencies, decreasing from 80ms to 20ms.
Figure 4. Latency reduction after HTTP/2
We observed similar latency reductions across the 90th percentile and higher.
Reduction in TCP connections
HTTP/1.1 | HTTP/2 | |
No of TCP Connections | 32 million | 3.9 million (-88%) |
We observed an 88% reduction in the number of connections required between routers and storage nodes in some of our largest clusters.
Figure 5. Total number of connections after HTTP/2
Reduction in Garbage Collection time
We observed a 75% reduction in garbage collection times for both young and old gen.
GC | HTTP/1.1 | HTTP/2 |
Young Gen | 2000 ms | 500ms (+75%) |
Old Gen | 80 ms | 15 ms (+81%) |
Figure 6. Reduction in time for GC after HTTP/2
Waiting time to acquire a Storage Node connection
HTTP/2 eliminates the need to wait for a storage node connection by enabling multiplexing on a single TCP connection, which is a significant factor in reducing latency compared to HTTP/1.1.
HTTP/1.1 | HTTP/2 | |
Wait time in router to get a storage node connection | 11ms | 0.02ms (+99%) |
Figure 7. Reduction is wait time to get a connection after HTTP/2
Conclusion
Espresso has a large server fleet and is mission-critical to a number of LinkedIn applications. With HTTP/2 migration, we successfully solved Espresso’s scalability problems due to the huge number of TCP connections required between the router and the storage nodes. The new architecture also reduced the latencies by 75% and made Espresso more resilient.
Acknowledgments
I would like to thank my colleagues Antony Curtis, Yaoming Zhan, BinBing Hou, Wenqing Ding, Andy Mao, and Rahul Mehrotra who worked on this project. The project demanded a great deal of time and effort due to the complexity involved in optimizing the performance. I would like to thank Kamlakar Singh and Yun Sun for reviewing the blog and providing valuable feedback.
We would also like to thank our management Madhur Badal, Alok Dhariwal and Gayatri Penumetsa for their support and resources, which played a crucial role in the success of this project. Their encouragement and guidance helped the team overcome challenges and deliver the project on time.
Topics
-
Uncategorized1 week ago
3 Ways To Find Your Instagram Reels History
-
OTHER2 weeks ago
WhatsApp Chat Interoperability Feature Spotted in Development on Latest Beta Update: Report
-
FACEBOOK1 week ago
Introducing Facebook Graph API v18.0 and Marketing API v18.0
-
OTHER2 weeks ago
YouTube ‘Subscribe’ Button Spotted to Be Glowing When Creators Request Subscription
-
Uncategorized1 week ago
Community Manager: Job Description & Key Responsibilities
-
LINKEDIN1 week ago
Career Stories: Learning and growing through mentorship and community
-
Uncategorized1 week ago
The Complete Guide to Social Media Video Specs in 2023
-
Uncategorized1 week ago
Social Media Intelligence: What It Is & Why You Need It