Connect with us

LINKEDIN

Overcoming challenges with Linux cgroups memory accounting

Published

on

overcoming-challenges-with-linux-cgroups-memory-accounting

Introduction

LinkedIn’s de facto search solution, Galene, is a Search-as-a-Service infrastructure that powers a multitude of search products at LinkedIn, from member-facing searches (such as searching for jobs or other members) to internal index searches. Galene’s responsiveness and reliability are paramount as it caters to many critical features.

This post discusses debugging an issue where the hosts ran out of memory and became inaccessible, even though the applications are limited by cgroups. We’ll cover memory accounting in cgroups and how it is not always straightforward when there are multiple variables at play. We will also discuss a case where cgroups, in certain cases may not account for the memory according to our expectations, which can be disastrous for co-hosted applications or the host itself.

This issue arose from one of the services in the search stack, the searcher-app, which is responsible for querying search indexes. The indexes are stored as flat files in a binary format specific to Galene and loaded into the searcher-app’s memory using mmap() calls. The application also uses the mlockall() call to keep the file in memory and disable paging, as fpaging can cause extremely high tail latencies. When mlockall() is not used, the Linux kernel can swap out pages that are part of the index and not frequently accessed. A query requiring one of those sections will require disk access, which will increase the latency. Searcher applications, like a number of other apps, are hosted on containers and use memory and CPU cgroups to limit resources used by an application or system process on the host. 

Issue 1: Low memory leads to excessive page swapping and high latency

We received an alert notification that one of our search clusters was having issues and noticed that many of our searcher-apps were down. When we tried to restart the apps, we saw that the physical host itself was not responding and needed a power cycle via the console to get any response. A few observations to note from the debugging are that before going into the “unresponsive” state, the system had a memory crunch, and once it had entered into the “unresponsive” state, no logs of any kind were generated on the host.

Advertisement
free widgets for website
  • graph-of-host-disk-read-time-graph

Fig 1: Host disk read time graph (y-axis in milliseconds)

  • graph-of-host-available-memory-graph

Fig 2: Host available memory graph

We noticed that the host was running low on memory and that there was also an increase in disk read times. This observation, along with an increase in page faults, led us to realize that the pages were being swapped too often because the host was low on memory, which led to high disk writes and slowed down read times. The search application was a major contributor to the lack of memory on the host. So, we optimized the searcher-app’s memory utilization and reduced the cgroup memory limit for the app, which in turn reserved more memory for system processes and resolved the issue.

Advertisement
free widgets for website

Issue 2: An unknown cause for reserving large amounts of memory, leading to unresponsive hosts

In six months, we had the same problem on another cluster and during our debugging this time around, we uncovered something specific: the application tried to reserve a huge chunk of memory right before the system hung and pushed the host into an unreachable state. This led us to suspect Linux’s cgroup memory enforcement as the culprit. We wrote a small C program to try and reproduce the issue by running this reproducer inside of a cgroup under a few different memory overallocation patterns, but in all cases, the Linux OOMkiller was correctly invoked and killed off the application process. We could not simulate the host-hang situation so we had to look back at our OS metrics.

See also  Career Stories: Breaking barriers with LinkedIn

Debugging

Once we established that the issue was a memory crunch, we began investigating the memory usage pattern on the host. Interestingly, we found that the application cgroup showed much less memory usage than expected.

Advertisement
free widgets for website
  • graph-of-application-cgroup-total-memory-usage-graph

Application cgroup total memory usage graph

The above graph shows memory usage of about 51GB before the node went unreachable. The red circle that marks the point it went unreachable is the point we will use for all of our further graphs. The ideal way to calculate the entire memory usage for the cgroup is Resident Set Size (RSS) Anonymous + page-cache + swap used by the cgroup. Because we use mlockall()we don’t use swap, so we don’t need to worry about that here. RSS is how much memory a process currently has in main memory (RAM). The cgroup stat file used for the following cgroup graphs only shows the anonymous part of RSS—the total RSS of a process is the sum of RSS Anonymous, RSS File, and Shared RSS. RSS File (which contains the mmapped files) will be accounted for in page cache and Shared RSS size is too low to be of any significance in the calculations.

  • application-cgroup-RSS-usage-graph

Application cgroup RSS usage graph

Advertisement
free widgets for website
  • application-cgroup-page-cache-usage-graph

Application cgroup page cache usage graph

From the previous graphs, if we add up the memory usage (19 and 31GB), it says that we use 50GB. That’s in line with the “Application cgroup total memory usage graph” shown at the beginning of this section.

Advertisement
free widgets for website
  • searcher-application-base-index-size-graph

Searcher application base index size graph

  • searcher-application-middle-index-size-graph

Searcher application middle index size graph

From these two graphs, we can see that the base index size is 32GB and the middle index size is 12GB, which brings us to a total size of 44GB—the size of flat index files mmapped into memory. When we add the RSS value of 19GB, we get a total usage of 63GB.

So, the application is using 63GB of memory, based on the above calculation from the actual file size of the indexes and the RSS, which were verified by looking at the process on the host. This means that our cgroup is not reporting the correct amount of memory used for cache: we need 44GB of cache, but cgroup only shows 31GB.

The current hierarchy of our cgroups is

Advertisement
free widgets for website
  • Root cgroup

    • Application parent cgroup

      • Application 1 cgroup

      • Application 2 cgroup

Now, let’s compare the application cgroup page cache usage with the parent cgroup metrics. We wanted to compare the different cgroups to identify at which level the memory was not being reported as we expected.

  • parent-cgroup-page-cache-usage-graph

Parent cgroup page cache usage graph

Advertisement
free widgets for website
  • application-cgroup-page-cache-usage-graph

Application cgroup page cache usage graph

The dip in cache usage by the application cgroup is due to a restart. After the restart, we see that the application cgroup is reporting the wrong numbers for the cache. We expect around 44GB of cache, but the application cgroup only shows around 10GB just after restart, while the parent cgroup still reports the right amount of cache usage.

OOMkiller will not kick in, even when the application is using more memory than allocated, because the application cgroup is not reporting the correct memory usage. This can cause the search application to hog memory on the box and other services to become starved for memory, which leads to swapping, and eventually the system becomes unreachable.

Understanding page cache accounting in cgroups

Let us first understand how memory is being accounted for in cgroups

  • RSS: This one is simple. Just add up the RSS of all the processes under that cgroup.

  • Cache: Shared pages are accounted for on a first touch basis. This means that any page created by a process inside a cgroup is accounted for by that cgroup. If the page already existed in memory, then the accounting gets complicated. In this case, the page will eventually get accounted to the cgroup after it keeps accessing that page aggressively. 

In our stack, restarts or redeploys follow these steps:

Advertisement
free widgets for website
  1. Stop application

  2. Delete application cgroup

  3. Create application cgroup

  4. Start application

In our case, we deploy new indexes and then the application’s cgroup reports the correct memory usage. Once the index grows and reaches the application cgroup memory limit, the OOMkiller is invoked and the application is killed. From there, our automation kicks in and starts the application. This leads to the existing application cgroup being deleted and a new one being created. But this time, the application cgroup memory is wrong. This is because the pages for the index are already in memory, but the new application cgroup is not accounting for this. As a result, the index keeps growing and the host faces a memory crunch, which leads to thrashing (Figures 1, 2). The OOMkiller is not invoked by the application cgroup because it reports less memory than is actually being used. Our application uses mlockall() so memory cannot be swapped; this leads to other critical system applications being swapped instead, and causes the host to go into an “unresponsive” state.

Validating the findings

We did a small experiment to validate our findings. We picked one host showing lower application cgroup memory usage and stopped the application and destroyed the cgroup, then got the machine to drop all its page cache. After that, we created a new cgroup and started the application inside it.

Advertisement
free widgets for website
  • application-cgroup-page-cache-usage-graph

Application cgroup page cache usage graph

The application cgroup showed the right amount of memory after the above steps. This verified that the issue was caused by a new application cgroup not charging pages to itself, even if the application inside it is the only one using those pages.

Solution

First, we wanted to set up proper monitoring to catch the growth of indexes to avoid running out of memory. We used metrics emitted by the application to monitor the index size and tracked the RSS memory used by the cgroup to set up an alert that would let us know when a certified threshold had been exceeded. This gave us enough time to mitigate the issue before we ran out of memory, but there were some cases where a sudden increase in memory could happen, so we needed a failsafe to ensure that the host doesn’t go into an unresponsive state.

The total memory used shown in the parent cgroup is still correct, as previously discussed. When the old cgroup is destroyed, the parent still retains the total memory usage numbers, which include the page cache. To ensure that the OOMkiller is invoked when the parent is breaching its limits, we are planning to put a memory limit on the parent cgroup. Doing so can cause a noisy neighbor situation, where a different co-hosted application is killed rather than the one abusing the memory, but considering that the host will go unreachable and both applications will suffer if the memory situation becomes too overloaded, this is the best current solution to the issue.

While we did considere a few other solutions (listed below), we determined that they didn’t fit our needs.

  • Adding a cache flush each time a cgroup is created: this would unnecessarily affect other applications running on the host because of disk I/O using up CPU cycles.

  • Leverage /tmpfs to host indexes: this would require changes on the application side and a different configuration for searcher hosts. 

  • Create a parent cgroup with limits per application cgroup: this would require extensive changes from the current provisioning and deployment tooling.

After evaluating all these approaches, we decided to go with setting a cgroup limit on the parent cgroup.

Advertisement
free widgets for website

Conclusion

Debugging an issue is always filled with surprises and learnings. From this issue, we realized that memory accounting in cgroups can be complicated when page cache is involved. Using mlockall() can lead to critical services being swapped out when the application starts hogging memory. But most importantly, this process was a good reminder of the importance of challenging the assumptions we make during debugging—for instance, if we had questioned cgroup’s memory reporting during the initial issue, we would have had one less issue in production. After adding monitoring to detect the issue, we figured out that there were other clusters affected by this and we could fix it before it caused any production impact.

Acknowledgments

I would like to thank Kalyan Somasundaram and Mike Svoboda for helping me during the triaging. Also, a big thanks again to Kalyan for reviewing this blog post. Finally, I would like to acknowledge the constant encouragement and support from my manager, Venu Ryali.

Advertisement
free widgets for website

Topics

    Continue Reading
    Advertisement free widgets for website
    Click to comment

    Leave a Reply

    Your email address will not be published.

    LINKEDIN

    Career Stories: Breaking barriers with LinkedIn

    Published

    on

    By

    career-stories:-breaking-barriers-with-linkedin

    After interning with us, Beatrix resonated with the culture and community she found at LinkedIn, and rejoined us post-undergrad. As she continues exploring her passion for frontend (UI) and accessibility engineering, she shares why launching her career with LinkedIn is one of the best decisions she has made.

    • beatrix-in-a-rainbow-light-tunnel

    From intern to engineer

    In 2019, my career with LinkedIn started with a UI (frontend) engineering internship in the San Francisco office. I had done a bit of user experience (UX) work before, so I was excited that there was a specific frontend role for interns at LinkedIn. I learned a lot about frontend development and immersed myself in LinkedIn’s culture. After my internship, I felt like I had barely scratched the surface of all there was for me to learn at LinkedIn, and I was so happy to get a return offer as a UI engineer on the LinkedIn Marketing Solutions team after I graduated in 2020. 

    I had a bit of an untraditional path to engineering — I’ve always been creative and loved taking Latin, but I started taking an interest in computer science during high school, which continued into my undergraduate studies at Vassar. Computer science combined many things I liked about science with my love for humanities and logic; it’s truly a multidisciplinary field. 

    I honed these skills during my extracurriculars in college as a teaching assistant with Kode with Klossy, a summer camp that helps teach teen girls how to code, and through attending the Grace Hopper Celebration, a women in tech conference. It was a full-circle moment when I had the privilege to attend the virtual Grace Hopper Conference in 2020 with LinkedIn. 

    Advertisement
    free widgets for website
    • beatrix-in-front-of-the-painted-ladies-in-san-francisco

    A culture of Next Plays

    One of the attributes that kept me here is LinkedIn’s culture of transformation. With every team I’ve been on, there’s a lot of celebration of people’s “Next Plays” as we call it here, whether that’s a new job opportunity or promotion at LinkedIn itself or elsewhere. 

    While I enjoyed my time working on LinkedIn Marketing Solutions’ Campaign Manager, after 1.5 years on the team, I was eager to dive deeper into a new challenge and more accessibility work, to better support diverse learners and LinkedIn users (or members as we call them) with disabilities. 

    See also  Towards data quality management at LinkedIn

    Thanks to LinkedIn’s wonderful culture that has an emphasis on collaboration and mentorship, I was able to connect with engineers from across the engineering organization to find a role that combined my interests in accessibility engineering, and development for the main LinkedIn site. With that mentorship, I found a new role earlier this year on our LinkedIn Talent Solutions team, centered on the job search and evaluation engineering work. 

    Advertisement
    free widgets for website

    I’ve always found LinkedIn to be very human in its approach to work, because everything we do stems from our mission to build economic opportunities and connections for people. The job search team is focused on trying to help people get jobs and with our accessibility impact, we make more jobs accessible to every member of the global workforce. This focus was also seen in other roles that I’ve had at LinkedIn. While on the LMS team, I got to work on our reflow efforts, which ensures that the pages in Campaign Manager are usable on many screen sizes and at different zoomed-in levels. 

    50M job seekers visit LinkedIn every week — resulting in 95 job applications every second, and six hires every minute. To help job seekers find their dream jobs at that scale is incredibly rewarding, and I feel very fortunate to contribute directly to those outcomes as an engineer.  

    Advertisement
    free widgets for website
    • beatrixs-work-from-home-setup-with-her-cats

    Being there through the tough times

    And LinkedIn’s human approach transcends the work itself. I grew up in Los Angeles, and I’m incredibly close to my parents, and two sisters: one who’s in high school, and one in college.

    I remember getting pulled into a family emergency where I had to unexpectedly fly back home from San Francisco to help, and told my manager, “I’m not sure when I’ll be back [in San Francisco].” My team and managers were incredibly compassionate, and I was able to spend time with my family and fly back to visit them as needed to help support during this difficult time. 

    See also  INWED2022: How Skills Can Help Shape Your Future in Engineering

    I was also able to be in Los Angeles from Thanksgiving to New Years with my family, working remotely from LA. The flexibility and earnest support I’ve received from my team during both the good and the tough times have meant the world.

    Advertisement
    free widgets for website
    • beatrix-at-a-restaurant

    Craftsmanship in engineering

    On the technical side of the house, one of the things that impresses me about LinkedIn is engineering’s emphasis on craftsmanship here, especially on our Talent Solutions team. We invest a lot into the foundations of our code base and code quality, ensuring that we are writing code that we can build on in the future. 

    While working on new features for the site is always exciting, I am also grateful to have the opportunity to work on the efforts that improve the site behind the scenes, like documentation and other quality-of-life changes. Many teams at LinkedIn are trying to push foundational work initiatives like this forward. Documentation is one of those things that always comes up in developer productivity and happiness here at LinkedIn, so I’m glad to be able to contribute in ways that help make my colleagues’ work lives easier. 

    Recently, my team discussed a situation with a contractor who came into the code base and thought our code tests did not make sense. This confusion sparked us to begin renaming the tests, changing the wording, and agreeing on the clearest way of labeling our code tests. I have so appreciated having space for these discussions; although product users will never see this, it is something that makes our code so much more reliable. 

    Advertisement
    free widgets for website
    • beatrixs-cat-next-on-her-desk

    Breaking barriers through Women in Tech

    Since I joined LinkedIn full-time in the midst of remote work during the pandemic, I wanted to find ways to connect with other engineers. I’m so thankful that LinkedIn has given me those opportunities; I was one of the founding members of the LinkedIn Marketing Solutions branch of Women in Tech (LMS WiT), and joined our Out@In (i.e., LGBTQIA+) Employee Resource Group (ERG). It is incredible how leadership opportunities aren’t gated to you based on age or company tenure at LinkedIn. I was able to grow and learn so much about what it means to be organized, to be a leader, and what it means to think about how I am in a position to help the WIT community, to facilitate these learnings.

    See also  Social Networking Sites Market 2020 (COVID-19 Worldwide Spread Analysis) by Key Players ...

    Within LMS WiT, I helped to co-found the Amplify Voices track. Shortly after joining, I raised that we should rename our Male Allies track. I had heard from several nonbinary employees on our LinkedIn Marketing Solutions team that were wondering if there was room for them within WiT. It was powerful to me that my group was receptive to my idea and changed the name to WiT Allies the very next day, so that more LinkedIn employees felt included. If you’re interested in equality, empowerment, and these events that are focused on how to speak up for yourself in a professional setting, it’s essential to have these discussions about inclusiveness. 

    Anytime I had a suggestion in ERGs, it was always considered thoughtfully and there was a lot of trust placed in me even as a young professional. In LinkedIn’s ERGs, there’s this openness that breaks down artificial limits and helps us grow as leaders. This spirit of inclusiveness is what makes LinkedIn such a welcoming place. 

    Advertisement
    free widgets for website

    About Beatrix

    Beatrix is a frontend (UI) engineer on our LinkedIn Talent Solutions team. Prior to her current role, Beatrix was a UI engineering intern and a UI engineer on our LinkedIn Marketing Solutions team. She graduated from Vassar College with a degree in computer science. In her free time, Beatrix enjoys spending time with her two cats, Mr. Darcy and Georgiana, cross-stitching and crocheting, and gaming.

    Editor’s note: Considering an engineering/tech career at LinkedIn? In this Career Stories series, you’ll hear first-hand from our engineers and technologists about real life at LinkedIn — including our meaningful work, collaborative culture, and transformational growth. For more on tech careers at LinkedIn, visit: lnkd.in/EngCareers.

    Advertisement
    free widgets for website

    Topics

    Continue Reading

    LINKEDIN

    Measuring marketing incremental impacts beyond last click attribution

    Published

    on

    By

    measuring-marketing-incremental-impacts-beyond-last-click-attribution

    Co-authors: Maggie Zhang, Joyce Chen, and Ming Wu

    What’s my ROI?

    In every company, there’s a fundamental need to understand the impacts of marketing campaigns. You want to be able to measure how many incremental conversions different channels and touchpoints are successfully driving. The best practice of A/B tests at individual level is not applicable in traditional channels such as TV ads, radios, or billboards. Even in digital marketing channels, new regulations and public awareness for data privacy have made A/B testing on third party platforms, which require transferring user level data, harder than ever. As a compromise, companies often rely on the last-click attribution model, which gives 100% credit for a conversion to the last marketing touchpoint/campaign in a user’s journey. This means that not only does it ignore everything (i.e. engagement, other media exposure) that happened before the final touchpoint throughout the user journey, it also tends to over-credit the last touchpoint (usually a paid media exposure) for conversions that would have been achieved organically without the media exposure. 

    To accurately quantify the true incremental impact of marketing campaigns, we adopted a powerful approach — a Bayesian Structural Time Series (BSTS) model approach to measure the causal effect of an intervention. 

    The basic idea is simple and intuitive: we design an experiment where the experimental units are defined by targetable geographical areas. Planned marketing intervention is applied in the selected areas (the test areas). The remaining areas are used as Control. The BSTS model is created to predict the Test areas’ would-be performance in an alternative scenario with no marketing intervention. The delta between the observed and the predicted performance of the Test areas enables us to measure the true impact of the marketing intervention.

    What is BSTS?

    BSTS model is a statistical technique, designed to work with time series data and used for time series forecasting and inferring causal impact. You can refer to this paper, and Google’s open source R Causal Impact package for more details.

    Advertisement
    free widgets for website

    Let’s use geo based marketing campaign measurement as an example. At a high level, in order to construct an adequate counterfactual for the test marketers’ performance, three sources of information are needed. The first is the time series performance of the test markets, prior to the marketing campaign. Second is the time series performance of the Control markets that are predictive of the test markets performance before the campaign (there are a lot of considerations that go into picking the most relevant subset to use as contemporaneous controls). The third source of information is the prior knowledge about the model parameters from previous studies as an example.

    BSTS causal impact analysis steps

    To infer causal impact of a marketing campaign with BSTS model approach, the following steps need to take place. 

    Metric selection

    A true north metric will be used to select comparable markets. Whether it’s the traffic, job views, or job applications, we have to be very clear about what we want to drive and what we want to measure. 

    See also  Towards data quality management at LinkedIn

    Geo-split

    One key assumption of a geo test is that control markets’ time series data are predictive of test markets’ time series data. We can form test and control groups by leveraging a sampling/matching algorithm to select comparable groups of markets based on historical time series data. There are two algorithms to form the comparable groups depending on the actual business needs: 

    • MarketMatching is used to find matching markets when marketers already have a list of markets they want to run campaigns with. For example, a billboard campaign is set to launch in New York and the matching algorithm might find that San Francisco and Chicago are good markets to use as control. 

    • Stratified sampling approach pre-divides the list of markets into homogenous groups called strata based on characteristics that they share (e.g., location, revenue share), then it draws randomly from each strata to form the test sample. It can guard against an “unrepresentative” sample (e.g., all-coastal states from a nationwide Google search campaign) by ensuring each subgroup of a given population is adequately represented within the whole population. This allows the marketers to properly infer the performance of a large scale non-local campaign. 

    Theoretically, geo-split can be implemented at various levels (nations, state, county). In reality, a good selection of geo-split level should fulfill these requirements: 

    Advertisement
    free widgets for website
    • Targetable: it is possible to fully control the marketing activities at this level on the desired ad platforms. Geo-targeting capability and restrictions vary across platforms. It is important to understand them before planning your test.

    • Measureable: it is possible to observe the ad spend amount and accurately measure the response metric at this level. 

    • Economical: For example, it is not a good idea to run a job promotion campaign with a state level split. Some people may reside in New Jersey while working in New York City. Instead, the campaign should be run in the entire New York metropolitan area, which covers both key areas in New Jersey and New York and therefore reduce risks of cross-group contamination. 

    Modeling

    After decisions have been made on the geo group assignments and true north metrics, we can construct two time series (test/control) using historical data aggregated at the assigned geo group level. We recommend finding a period without major regional marketing activities. The period required for training the model depends on the availability of the data and variance of the time series. If the training period is too short, there will not be enough data to learn the relationship between test and control time series, thus high bias. If the training period is too long, the relationship may change over time and won’t apply anymore. In practice, we find one to three months to be a good duration. 

    The next step is to build a model that can accurately predict test time series based on the control time series. 

    A good time series model needs to be flexible and transparent and should take in account the seasonality, the macroeconomic trend, and the business drivers tobe able to quantify the impact from each. BSTS allows you to explicitly specify the posterior uncertainty of each individual component (regression, seasonality, trend). You also can control the variance of each component and impose prior belief in its Bayesian framework. Mean Absolute Prediction Error (MAPE) () is used to evaluate the goodness of the fit of the model during the training period. A good MAPE score (usually <5%) is a strong signal that the selected control group can be used to accurately predict the counterfactual of the test market.

    Validation 

    Prior to the campaign launch, we’d like to establish an AA-testing process to validate the model performance and rule out the possibility of pre-existence bias that could potentially undermine the causal inference. During the AA test period, no marketing intervention is applied to either treatment or control. We expect the model to report no statistically significant difference between the predicted time series and the observed time series. Further deep-dives and re-design of the test is required if AA test fails. 

    Advertisement
    free widgets for website
    • image-of-aa-testing-process

    Power analysis and budget scenarios 

    Similar to an A/B test, we’d like a power analysis at the design stage for a geo experiment. If those markets are used as control and treatment in the experiment and the true north metric is session, what is the probability of detecting an effect if a session lift of a particular magnitude is truly present? Unlike A/B tests, there is no theoretical approach to conduct a power analysis. The current approach to estimate minimum detectable effect (MDE) and the required test duration is through simulation where a synthetic lift is added to the treatment group to approximate the effect of a marketing campaign. We can then work with marketing partners to create budget scenarios at different MDE levels to ensure incrementality can be detected with a reasonable chance and with a reasonable budget and pacing plan. A budget scenario usually takes account of several factors including media cost, MDE, targeting plan (audience size/launch areas), and campaign duration. 

    Measurement 

    At the end of the campaign, we’ll apply the previously trained BSTS model and forge a synthetic control based on the control time series data from the post intervention period. Comparison of the synthetic control (the predicted) and the observed time series of the test markets will be performed to measure the true impact of the marketing intervention. Similar to A/B tests, impacts are only considered statistically significant if the p-value of (delta > 0) is below 0.05.

    Advertisement
    free widgets for website
    • graph-measurement-of-BSTS

    Successful use case of BSTS at LinkedIn

    At LinkedIn, our Data Science team has successfully applied the BSTS approach to many unique business cases and answered questions that would otherwise have remained myths to our business.

    In one of our full funnel brand marketing national campaigns, which lasted for two months with multi-channels deployment, including TV, billboard, audio, digital, and social, we applied our BSTS approach and concluded that the national full funnel campaign drove almost double digit lift in targeted metrics. 

    In one of our paid job distribution programs, we successfully designed go-dark city selection using the aforementioned stratified approach. By applying BSTS, we successfully proved that the Return on Advertising Spend (ROAS) of the program has a healthy reading that’s well above 1.0.

    In our paid app activation program, we were able to leverage BSTS to infer member’s incremental lifetime value (LTV) by country and by operating system (iOS, Android). The results guided our app activation program’s future investment.

    In a recent Google Universal App campaign where we promoted LinkedIn Apps on Android, we again applied BSTS and concluded that in the tested geography, about half of the app installs reported through last click are incremental.

    Advertisement
    free widgets for website

    Conclusion

    Understanding marketing campaign ROI is a crucial business challenge. When the golden standard of A/B testing or measurement at the individual unit level are not available, BSTS is a powerful alternative to measure a marketing campaign’s causal impact at a geo aggregated level. The LinkedIn Data Science team, by establishing a BSTS measurement framework and best practices, has successfully applied the approach to deliver insightful measurement results that led to improvements on our marketing channel efficiency and budget allocation.  

    We’d like to end this blog post by highlighting that in addition to measuring past marketing campaign performance using BSTS, in our subsequent work, we also feed the BSTS results into a marketing mixed model (MMM) to optimally allocate spend on future investments. Media mix models can provide a high-level cross channel view on how marketing channels are performing. By triangulating modeling results with rigorous BSTS causal experimentation, one can improve the model’s robustness and its capability to recover some of the lost signals.

    Acknowledgements

    We would like to acknowledge Rahul Todkar and Ya Xu for their leadership in this cross-team work and Minjae Ormes, Ryan McDougall, Tim Clancy, Kim Chitra, Ginger Cherny, for their business partnership and support. We would like to thank all of the collaborators, reviewers and users who assisted with BSTS Geo Causal Inference Studies from the Data Science Applied Research team (Rina Friedberg, Albert Chen, Shan Ba), the Go-to-Market Data team (Fangfang Tan, Jina Lin, Catherine Wang, Kelly Chang, Sylvana Yelda), the Consumer Product Marketing Team (Rajarshi Chatterjee, Shauna-kay Campbell, Emma Yu), the Paid Media team (Nicolette Song, Krizia Manzano, Sandy Shen), the Careers Engineering Team (Wenxuan Gao, Dheemanth Bykere Mallikarjun), and our wonderful partners, the DSPx team (Xiaofeng Wang, Daniel Antzelevitch, Kate Xiaonan Ding) who helped build the automated solution. 

    Advertisement
    free widgets for website

    Topics

    Advertisement
    free widgets for website
    Continue Reading

    LINKEDIN

    Career stories: Next plays, jungle gyms, and Python

    Published

    on

    By

    career-stories:-next-plays,-jungle-gyms,-and-python

    Since she was a child, Deepti has been motivated to help people. This drive led her on a career journey with many pivots and moves — akin to navigating a children’s jungle gym — between industries and around the world. Based in Bangalore, this biomedical engineer turned data scientist shares how LinkedIn helped her gain new technical skills, dive into meaningful work, and grow. 

    • picture-of-deepti-walking-in-a-jungle

    Growing up in Mumbai, India, I always imagined myself in a career where I could give back. I once dreamed of becoming a neurosurgeon, but early in my career, I took a different path and earned a bachelor’s in electronics engineering. While studying engineering gave me the foundation for my future career, I quickly realized that my job options wouldn’t help me make the difference that I wanted to. So, I decided to complete a master’s program in biomedical engineering at Drexel University in Philadelphia. 

    After graduating, I found an opportunity at the Toyota Technical Center in Boston, where I helped build driver safety systems that incorporated human physiological considerations into injury prevention. Toyota is where I first began to reconsider my perspective on what it means to help others, realizing that I could draw on my STEM background to build safer systems that would benefit everyone.

    Advertisement
    free widgets for website
    • picture-of-deepti-and-her-daughter

    Embracing a data-driven career change

    Soon, however, home and family called me back to India where at the time, biomedical research in India was not as exciting as the work I was doing in the U.S. While CT scans and MRIs are, of course, critical instruments, I increasingly felt that I wasn’t giving back in the way I’d hoped. After two years, I knew it was time to push myself out of my comfort zone once again, which led me to data science. 

    When I first broke into the field, data science was more like informal analytics. Yet I was intrigued by this new discipline, where I could use the skills I gained as an engineer, like problem-solving and logical thinking, while also gaining unique expertise. When I started, my mantra was to keep focused on learning and not worry about my experience (or lack of) when surrounded by data scientists, who were just out of school, with more experience than me.

    See also  Towards data quality management at LinkedIn

    My instincts served me well, and I quickly grew from an analyst to a senior manager — this pace of career progression is the norm in startups, where fast growth is expected. In a short period, my time became less focused on getting my hands dirty with data, and more centered on managing clients and stakeholders and putting out fires. After seven years, I missed building things and solving problems, which is when the perfect opportunity opened up at LinkedIn. 

    Advertisement
    free widgets for website
    • picture-of-deepti-sitting-on-couch-with-plants-behind-her

    Giving back to the global community at LinkedIn

    With a desire to do more, I was recruited at the right time for a data scientist position on LinkedIn’s Economic Graph, our digital representation of the global economy. The Economic Graph research team I was on was a global team with people based in the U.S., Europe, and Singapore. What appealed to me most about the Economic Graph was that we work and collaborate alongside the government and other non-governmental organizations (NGOs) to deliver insights that enable our members to succeed and connect with the right opportunities for them. 

    The Economic Graph partners with public sector organizations to provide data insights that improve policy decisions. For example, if a government ministry is considering where to invest in education, they need data on issues like labor market demand and skills gaps. Using our member (i.e., LinkedIn user) data, our team would deliver such power-packed insights using LinkedIn platform data. At LinkedIn, we ensure that member data is used safely, and we’re proud that the trust we’ve built with our members enables us to deliver these insights. 

    See also  Social Networking Sites Market 2020 (COVID-19 Worldwide Spread Analysis) by Key Players ...

    Python, Scala, and people management

    When the Economic Graph team consolidated, I knew it was time for the next stage of my career, or my Next Play as we call it here. My manager pushed me to consider taking on a tech lead role in data science within the Business Operations team at LinkedIn in India. I admit I was reluctant to go back to a position focused on business revenues, as I had grown attached to the research mission in my previous role. Soon, though, I realized that everything we do at LinkedIn helps advance our mission and vision for the community. 

    Now, I’m managing a newsletter and leading a team of data scientists solving business-critical problems across the company. It’s precisely the kind of exposure I’m looking for at this point in my career, gaining horizontal expertise by engaging all these different domains. LinkedIn is all about learning. Here, managers encourage people to take charge of their careers, experiment, and move into other roles according to their interests and goals. 

    Advertisement
    free widgets for website

    We don’t shy away from challenges and learning curves. For example, I’ve had to upskill myself in coding. For nearly 14 years, I primarily used the R programming language. Now, we’ve moved on to Python and Scala, building on our everyday work with statistics and math. 

    It’s not all about tech, though. We deal with unique questions, so problem-solving skills are critical. It’s also essential to think about the business contexts and ask the right questions. Then, we bring it all together with technology to solve a problem in a structured manner.

    Advertisement
    free widgets for website
    • picture-of-deepti-and-her-family-on-the-beach

    Moving forward on the LinkedIn learning path

    When thinking about career trajectories, I always return to this metaphor of a children’s jungle gym. I tell my team that it’s about moving through a matrix rather than climbing a ladder one step at a time. You’re still moving from one point to the next, but the next step isn’t necessarily upward. The reality is that, sometimes, you have to move down a level to reach a specific endpoint. 

    See also  Career stories: Next plays, jungle gyms, and Python

    I’ve moved from senior management roles into positions with no one reporting to me. At first, I thought, “Am I down-leveling?” Then I would remind myself that my final goal is always to do something meaningful. At that point, taking on a more technical role was a step in that direction. 

    Then, with the knowledge and expertise I gained, I could return to leading teams and tackling more considerable challenges. Growth looks different to different people but as long as you have that fire inside you to keep learning and growing, changing domains, jobs, or even countries, it will only help you in your journey. 

    Advertisement
    free widgets for website

    About Deepti 

    Based in Bengaluru, India, Deepti is a senior data scientist at LinkedIn. Before LinkedIn, she spent nearly seven years as a senior analyst and senior manager for [24]7.ai working on customer engagement solutions. Born and raised in India, she holds a bachelor’s degree in electronics engineering from the University of Mumbai, and a master’s in biomedical engineering from Drexel University in the U.S. Outside of work, Deepti spends time with her two daughters, and shares her passions for interior design and gender equality issues on social media. 

    Editor’s note: Considering an engineering/tech career at LinkedIn? In this Career Stories series, you’ll hear first-hand from our engineers and technologists about real life at LinkedIn — including our meaningful work, collaborative culture, and transformational growth. For more on tech careers at LinkedIn, visit: lnkd.in/EngCareers.

    Advertisement
    free widgets for website

    Topics

    Continue Reading

    Trending