Connect with us

FACEBOOK

Async stack traces in folly: Introduction

Published

on

This article was written by Lee Howes and Lewis Baker from Facebook.

Facebook’s infrastructure is highly distributed and relies heavily on asynchronous programming. Most of our services that are written in C++ are implemented using async programming frameworks like folly::Futures and folly::coro. Async programming is an important tool for scaling a process with a relatively small number of threads to handle a much larger number of concurrent requests. However, these async programming models have typically come with some downsides.

One of these downsides is that existing debugging and profiling tools that make use of stack-traces to provide information about what your program was doing generally give poor quality data for async code.

This is the first in a short series of blog posts about how we are leveraging C++ coroutines to improve this situation, and describing the technical details of our solution. This first post gives a high level background, and later posts in the series go into fairly deep technical detail about the implementation in folly and surrounding tooling to assist debugging and profiling of coroutine code.

Why async stack traces?

For some years, the library we have relied on for async programming at Facebook is folly’s Futures library, which allows code like the following:

Advertisement
free widgets for website
Executor& e = ...;
Future<void> s = makeSemiFuture() .via(&e) .thenValue([](auto&&) { doWork(); });
doComplexOperation();
Future<void> s2 = std::move(s) .thenValue([](auto&&) { doMoreWork(); }); 

where the lambda containing doWork will run at some point on the thread pool represented by e.

Async code like the example above generally involves launching an operation and then attaching a callback or continuation to that operation that will be executed when that operation completes. doComplexOperation can run on the main thread concurrently with doWork running on some other thread owned by the Executor. In general, this avoids much of the overhead of thread-context switches that you get when using thread-per-task concurrency. However, this also means that the callback is often executed in a different context from the context that launched it, usually from an executor’s event-loop.

Under normal circumstances, when code is launched on an executor, for example a thread pool, a stack trace will represent the path from the thread’s execution loop to the function being run. Conceptually, the threads in the executor do something like:

Executor::run() { while(!cancelled) { auto t = queue_.getTask(); t(); }
} 

If we run this in a debugger and break inside doWork the stack trace will look like:

- doWork
- <some function type-erasure internals...>
- Executor::run
- <some executor / thread startup internals...>
- __thread_start 

The trace will only cover stack frames from doWork’s body down to the run method of the executor. The connection to the calling code, and to doMoreWork is lost. This loss of context can make debugging or profiling such code very challenging.

Advertisement
free widgets for website

It is hard to fix this loss of context in futures code like the example above. With a normal stack-trace, when one function calls another, the full call stack is preserved for the duration of the call, allowing any stack walks performed while the function is executing to be able to easily trace back through the calling code without having to do any more work. However, with futures, the code that launches an operation continues executing and may unwind its call-stack before the continuation attached to the future runs. The calling context is not preserved and we’d need to effectively take a snapshot of the call stack at the time when the operation is launched to be able to later reconstruct the full call stack, incurring a large runtime overhead. Coroutines offer us a nesting that makes this cleaner:

Executor& e = ...;
auto l = []() -> Task<void> { co_await doWork(); co_await doMoreWork();
};
l().scheduleOn(e).start(); 

The compiler transforms this to something like the futures code above, but structurally there is a big difference: the next continuation is in the same scope as the parent. This makes the idea of a “stack” of coroutines make much more sense, at least syntactically. Taking advantage of this and helping us with debugging is still a technical challenge.

See also  Senator Green Definds Facebook Post

Coroutines as callbacks

C++ uses a style of coroutines that suspend by return from a function: that is to say that we do not suspend the entire stack, we return from the coroutine and have the suspended state stored separately. This is distinct from the “fiber” style of coroutine where we suspend the entire stack. The implementation of this looks something like a series of callbacks, and a hidden linked list of chained coroutine frames exposed through the sequence of function calls.

There are big advantages to the style of coroutines chosen for C++, which we will not go into here, but one downside is that while the nested structure is there in the code, it is not directly apparent in the stack frames visible to a debugger or profiling tool.

To illustrate this problem, consider the following more complicated code snippet of folly::coro code, during the execution of which we want to sample a stack trace:

Advertisement
free widgets for website
void normal_function_1() { // ... expensive code - sample taken here.
} void normal_function_2() { // ... normal_function_1(); // ...
} folly::coro::Task<void> coro_function_1() { // ... normal_function_2(); // ... co_return;
} folly::coro::Task<void> coro_function_2() { // ... co_await coro_function_1(); // ...
} void run_application() { // ... folly::coro::blockingWait( coro_function_2().scheduleOn(folly::getGlobalCPUExecutor())); // ...
} int main() { run_application();
} 

Currently, if the profiler captures a sample when code is executing within normal_function_1() then it might show a stack trace that looks something like:

- normal_function_1
- normal_function_2
- coro_function_1
- std::coroutine_handle::resume
- folly::coro::TaskWithExecutor::Awaiter::await_suspend::$lambda0::operator()
- folly::Function::operator()
- folly::CPUThreadPoolExecutor::run
- std::thread::_invoke
- __thread_start 

Notice how only coro_function_1 appears with coro_function_2 missing. From coro_function_1 the trace moves into internal details of the framework and executor. We can’t see from this stack-trace that coro_function_1() is called from coro_function_2() and that coro_function_2() is in turn called from run_application() and main().

See also  WhatsApp Snooping Row: Facebook Service Says Regrets Not Meeting 'Government Expectations'

Further, if there were multiple call-sites for coro_function_1(), in a sampled profiling system, all of their samples from different call sites will probably get merged together, making it difficult to determine which call sites are expensive. This can make it difficult to determine where you should focus your efforts when looking for performance optimisations.

Ideally, both profiling tools and debuggers would be able to capture the logical stack-trace instead The result would show the relationship between coro_function_1 and its caller coro_function_2, and we’d end up with a stack trace for this sample that looks more like this:

- normal_function_1
- normal_function_2
- coro_function_1
- coro_function_2
- blockingWait
- run_application
- main 

folly support for async stack traces

folly now implements a set of tools to support async stack traces for coroutines. The library provides fundamental hooks that are used by internal code profiling libraries. Those same hooks provide access to stack traces for debugging purposes.

Advertisement
free widgets for website

These are briefly summarised here and we will go into detail in a later post.

Printing async stack traces when the program crashes

Probably the most frequent place where developers see stack traces is when programs crash. The folly library already provides a signal handler that prints the stack trace of the thread that is causing the program to crash. The signal handler now prints the async stack trace if there is a coroutine active on the current thread when the crash happens.

Printing async stack traces on demand

During development, a frequent question developers have is: What is the series of function calls that led to this function being called? folly provides a convenience function to easily print the async stack trace on demand, helping developers quickly see how a function or coroutine is called:

#include <iostream>
#include <folly/experimental/symbolizer/Symbolizer.h> folly::coro::Task<void> co_foo() { std::cerr << folly::symbolizer::getAsyncStackTraceStr() << std::endl;
} 

GDB extension to print async stack traces

C++ developers often need to use debuggers like GDB to debug crashes after-the-fact or to investigate buggy behavior in running programs. We recently implemented a GDB extension to easily print the async stack trace for the current thread from within the debugger:

# Print the async stack trace for the current thread
(gdb) co_bt
0x... in crash() ()
0x... in co_funcC() [clone .resume] ()
0x... in co_funcB() [clone .resume] ()
0x... in co_funcA() [clone .resume] ()
0x... in main ()
0x... in folly::detached_task() () 

Tracking where exceptions are thrown

We want to be able to tell where an exception was constructed/thrown from when it is later caught. To meet this need, folly provides an exception tracer library to hook into the Itanium C++ ABI for exception handling to track exception information. We have recently expanded the set of helper functions this library provides to easily track where exceptions are thrown, including both normal and async stack traces. Below is an example program that uses these helpers:

Advertisement
free widgets for website
folly::coro::Task<void> co_funcA() { try { co_await co_funcB(); } catch (const std::exception& ex) { // Prints where the exception was thrown std::cerr << "what(): " << ex.what() << ", " << folly::exception_tracer::getAsyncTrace(ex) << std::endl; }
} 

Concluding

This new functionality brings coroutines much closer to the level of debugging and tracing support we see with normal stacks. Facebook developers already benefit from these changes, and they are open source in the folly library for anybody to use.

See also  India imposes new rules on Facebook, Twitter and YouTube

In subsequent posts in this series, we will go into more detail about how this is implemented. Next week, we will discuss the differences between synchronous and asynchronous stack traces and the technical challenges of implementing traces on top of C++20 coroutines. Stay tuned!

Facebook Developers

Continue Reading
Advertisement free widgets for website

FACEBOOK

Resources for Completing App Store Data Practice Questionnaires for Apps That Include the Facebook or Audience Network SDK

Published

on

By

resources-for-completing-app-store-data-practice-questionnaires-for-apps-that-include-the-facebook-or-audience-network-sdk

Resources for Completing App Store Data Practice Questionnaires for Apps That Include the Facebook or Audience Network SDK

First seen at developers.facebook.com

See also  India imposes new rules on Facebook, Twitter and YouTube
Continue Reading

FACEBOOK

Resources for Completing App Store Data Practice Questionnaires for Apps That Include the Facebook or Audience Network SDK

Published

on

By

resources-for-completing-app-store-data-practice-questionnaires-for-apps-that-include-the-facebook-or-audience-network-sdk

Updated July 18: Developers and advertising partners may be required to share information on their app’s privacy practices in third party app stores, such as Google Play and the Apple App Store, including the functionality of SDKs provided by Meta. To help make it easier for you to complete these requirements, we have consolidated information that explains our data collection practices for the Facebook and Audience Network SDKs.

Facebook SDK

To provide functionality within the Facebook SDK, we may receive and process certain contact, location, identifier, and device information associated with Facebook users and their use of your application. The information we receive depends on what SDK features 3rd party applications use and we have structured the document below according to these features.

App Ads, Facebook Analytics, & App Events

Facebook App Events allow you to measure the performance of your app using Facebook Analytics, measure conversions associated with Facebook ads, and build audiences to acquire new users as well as re-engage existing users. There are a number of different ways your app can use app events to keep track of when people take specific actions such as installing your app or completing a purchase.

With Facebook SDK, there are app events that are automatically logged (app installs, app launches, and in-app purchases) and collected for Facebook Analytics unless you disable automatic event logging. Developers determine what events to send to Facebook from a list of standard events, or via a custom event.

When developers send Facebook custom events, these events could include data types outside of standard events. Developers control sending these events to Facebook either directly via application code or in Events Manager for codeless app events. Developers can review their code and Events Manager to determine which data types they are sending to Facebook. It’s the developer’s responsibility to ensure this is reflected in their application’s privacy policy.

Advertisement
free widgets for website

Advanced Matching

Developers may also send us additional user contact information in code, or via the Events Manager. Advanced matching functionality may use the following data, if sent:

  • email address, name, phone number, physical address (city, state or province, zip or postal code and country), gender, and date of birth.
See also  Facebook research shows company knew of Instagram harm to teens, senators say | Reuters

Facebook Login

There are two scenarios for applications that use Facebook Login via the Facebook SDK: Authenticated Sign Up or Sign In, and User Data Access via Permissions. For authentication, a unique, app-specific identifier tied to a user’s Facebook Account enables the user to sign in to your app. For Data Access, a user must explicitly grant your app permission to access data.

Note: Since Facebook Login is part of the Facebook SDK, we may collect other information referenced here when you use Facebook Login, depending on your settings.

Device Information

We may also receive and process the following information if your app is integrated with the Facebook SDK:

  • Device identifiers;
  • Device attributes, such as device model and screen dimensions, CPU core, storage size, SDK version, OS and app versions, and app package name; and
  • Networking information, such as the name of the mobile operator or ISP, language, time zone, and IP address.

Audience Network SDK

We may receive and process the following information when you use the Audience Network SDK to integrate Audience Network ads in your app:

  • Device identifiers;
  • Device attributes, such as device model and screen dimensions, operating system, mediation platform and SDK versions; and
  • Ad performance information, such as impressions, clicks, placement, and viewability.

First seen at developers.facebook.com

Continue Reading

FACEBOOK

Enabling Faster Python Authoring With Wasabi

Published

on

By

enabling-faster-python-authoring-with-wasabi

This article was written by Omer Dunay, Kun Jiang, Nachi Nagappan, Matt Bridges and Karim Nakad.


Motivation

At Meta, Python is one of the most used programming languages in terms of both lines of code and number of users. Everyday, we have thousands of developers working with Python to launch new features, fix bugs and develop the most sophisticated machine learning models. As such, it is important to ensure that our Python developers are productive and efficient by giving them state-of-the-art tools.

Introducing Wasabi

Today we introduce Wasabi, a Python language service that implements the language server protocol (LSP) and is designed to help our developers use Python easier and faster. Wasabi assists our developers to write Python code with a series of advanced features, including:

  • Lints and diagnostics: These are available as the user types.
  • Auto import quick fix: This is available for undefined-variable lint.
  • Global symbols autocomplete: When a user types a prefix, all symbols (e.g. function names, class names) that are defined in other files and start with that prefix will appear in the autocomplete suggestion automatically.
  • Organize Imports + Remove unused: A quick fix that removes all unused imports and reformats the import section according to pep8 rules. This feature is powered by other tools that are built inside Meta such as libCST that helps with safe code refactoring.
  • Python snippets: Snippet suggestions are available as the user types for common code patterns.

Additionally, Wasabi is a surface-agnostic service that can be deployed into multiple code repositories and various development environments (e.g., VSCode, Bento Notebook). Since its debut, Wasabi has been adopted by tens of thousands of Python users at Meta across Facebook, Instagram, Infrastructure teams and many more.

Figure 1: Example for global symbols autocomplete, one of Wasabi’s features

Language Services at Meta Scale

A major design requirement for language services is low latency / user responsiveness. Autocomplete suggestions, lints and quickFixes should appear to the developer immediately as they type.

Advertisement
free widgets for website

At Meta, code is organized in a monorepo, meaning that developers have access to all python files as they develop. This approach has major advantages for the developer workflow including better discoverability, transparency, easier to share libraries and increased collaboration between teams. It also introduces unique challenges for building developer tools such as language services that need to handle hundreds of thousands of files.

See also  WhatsApp Snooping Row: Facebook Service Says Regrets Not Meeting 'Government Expectations'

The scaling problem is one of the reasons that we tried to avoid using off-the-shelf language services available in the industry (e.g., pyright, jedi) to perform those operations. Most of those tools were built in the mindset of a relatively small to medium workspace of projects, maybe with the assumptions of thousands of files for large projects for operations that require o(repo) information.

For example, consider the “auto import” quick fix for undefined variables. In order to suggest all available symbols the language server needs to read all source files, the quick fix parses them and keeps an in-memory cache of all parsed symbols in order to respond to requests.

While this may scale to be performed in a single process on the development machine for small-medium repositories, this approach doesn’t scale in the monorepo use case. Reading and parsing hundreds of thousands of files can take many minutes, which means slow startup times and frustrated developers. Moving to an in-memory cache might help latency, but also may not fit in a single machine’s memory.

For example, assume an average python file takes roughly 10ms to be parsed and to extract symbols in a standard error recoverable parser. This means that on 1000 files it can take 10 seconds to initialize which is a fairly reasonable startup time. Running it on 1M files would take 166 minutes which is obviously a too lengthy startup time.

Advertisement
free widgets for website

How Wasabi Works

Offline + Online Processing:

In order to support low latency in Meta scale repositories, Wasabi is powered by two phases of parsing, background processing (offline) done by an external indexers, and local processing of locally changed “dirty files” (online):

  1. A background process indexes all committed source files and maintains the parsed symbols in a special database (glean) that is designed for storing code symbol information.
  2. Wasabi, which is a local process running on the user machine, calculates the delta between the base revision, stack of diffs and uncommitted changes that the user currently has, and extracts symbols only out of those “dirty” files. Since this set of “dirty” files is relatively small, the operation is performed very fast.
  3. Upon an LSP request such as auto import, Wasabi parses the abstract syntax tree (AST) of the file, then based on the context of the cursor, creates a query for both glean and local changes symbols, merges the results and returns it to the user.
See also  Facebook to Announce First Quarter 2021 Results

As a result, all Wasabi features are low latency and available to the user seamlessly as they type.

Note: Wasabi currently doesn’t handle the potential delta between the revision that glean indexed (happens once every few hours) and the locally base revision that the user currently has. We plan on adding that in the future.

Figure 2: Wasabi’s high level architecture

Ranking the Results

In some cases, due to the scale of the repository, there may be many valid suggestions in the set of results. For example, consider “auto import” suggestions for the “utils” symbol. There may be many modules that define a class named “utils” across the repository, therefore we invest in ranking the results to ensure that users see the most relevant suggestions on the top.

Advertisement
free widgets for website

For example, auto import ranking is done by taking into account:

  • Locality:
    • The distance of the suggested module directory path from the directory paths of modules that are already imported in this file.
    • The distance of the suggested module directory path from the current directory path of the local file.
    • Whether the file has been locally changed (“dirty” files are ranked higher).
  • Usage: The number of occurrences the import statement was used by other files in the repository.

To measure our success, we measured the index in the suggestion list of an accepted suggestion and noted that in almost all cases the accepted suggestion was ranked in one of top 3 suggestions.

Positive feedbacks from developers

After launching Wasabi to several pilot runs inside Meta, we have received numerous positive feedbacks from our developers. Here is one example of the quote from a software engineer at Instagram:

“I’ve been using Wasabi for a couple months now, it’s been a boon to my productivity! Working in Instagram Server, especially on larger files, warnings from pyre are fairly slow. With Wasabi, they’re lightning fast 😃!”

“I use features like spelling errors and auto import several times an hour. This probably makes my development workflow 10% faster on average (rough guess, might be more, definitely not less), a pretty huge improvement!”

As noted above, Wasabi has made a meaningful change to keep our developers productive and make them feel delightful.

Advertisement
free widgets for website

The metric to measure authoring velocity

In order to quantitatively understand how much value Wasabi has delivered to our Python developers, we have considered a number of metrics to measure its impact. Ultimately, we landed on a metric that we call ‘Authoring Velocity’ to measure how fast developers write code. In essence, Authoring Velocity is the inverse function of the time taken on a specific diff (a collection of code changes) during the authoring stage. The authoring stage starts from the timestamp when a developer checks out from the source control repo to the timestamp when the diff is created. We have also normalized it against the number of lines of code changed in the diff, as a proxy for diff size, to offset any possible variance. The greater the value for ‘Authoring Velocity,’ the faster we think developers write their code.

See also  Senator Green Definds Facebook Post

Figure 3: Authoring Velocity Metric Formula

The result

With the metric defined, we ran an experiment to measure the difference that Wasabi brings to our developers. Specifically, we selected ~700 developers who had never used Wasabi before, and then randomly put them into two independent groups at a 50:50 split ratio. For these developers in the test group, they were enabled with Wasabi when they wrote in Python, whereas there was no change for those in the control group. For both groups, we compare the changes in relative metric values before and after the Wasabi enablement. From our results, we find that for developers in the test group, the median value of authoring velocity has increased by 20% after they started using Wasabi. Meanwhile, we don’t see any significant change in the control group before and after, which is expected.

Figure 4: Authoring Velocity measurements for control and test groups, before and after Wasabi was rolled out to the test group.

Summary

With Python’s unprecedented growth, it is an exciting time to be working in the area to make it better and handy to use. Together with its advanced features, Wasabi has successfully improved developers’ productivity at Meta, allowing them to write Python faster and easier with a positive developer experience. We hope that our prototype and findings can benefit more people in the broader Python community.

Advertisement
free widgets for website

To learn more about Meta Open Source, visit our open source site, subscribe to our YouTube channel, or follow us on Twitter, Facebook and LinkedIn.

First seen at developers.facebook.com

Continue Reading

Trending