“VICReg could be used to model the dependencies between a video clip and the frame that comes after, therefore learning to predict the future in a video.”
Adrien Bardes, Facebook AI Research
Humans have an innate capability to identify objects in the wild, even from a blurred glimpse of the thing. We do this efficiently by remembering only high-level features that get the job done (identification) and ignoring the details unless required. In the context of deep learning algorithms that do object detection, contrastive learning explored the premise of representation learning to obtain a large picture instead of doing the heavy lifting by devouring pixel-level details. But, contrastive learning has its own limitations.
According to Andrew Ng, pre-training methods can suffer from three common failings: generating an identical representation for different input examples (which leads to predicting the mean consistently in linear regression), generating dissimilar representations for examples that humans find similar (for instance, the same object viewed from two angles), and generating redundant parts of a representation (say, multiple vectors that represent two eyes in a photo of a face). The problems of representation learning, wrote Andrew Ng, boil down to variance, invariance, and covariance issues.
Andrew Ng’s observations are a reference to a new self-supervised algorithm released by the researchers at Facebook AI, PSL Research University, and New York University, along with Turing award recipient Yann Lecun introduced called Variance-Invariance-Covariance Regularization (VICReg), which builds on Lecun’s own Barlow Twins method.
The researchers designed VICReg (Variance-Invariance-Covariance Regularization) to avoid the collapse problem, which is handled more inefficiently in the case of contrastive methods. They do this by introducing a simple regularisation term on the variance of the embeddings along each dimension individually and combining the variance term with a decorrelation mechanism based on redundancy reduction and covariance regularisation. The authors state that VICReg is performed on par with several state-of-the-art methods.
VICReg is a simple approach to self-supervised image representation learning, and its objectives are as follows:
Learn invariance to different views with an invariance term.
Avoid collapse of the representations with a variance regularisation term.
Spread the information throughout the different dimensions of the representations with a covariance regularisation term.
The results show that VICReg performs on par with state-of-the-art methods and ushers a new paradigm of non-contrastive self-supervised learning.
What Authors Had To Say
Talking to Analytics India Magazine about VICReg’s significance, the lead author, Adrien Bardes, who is also a resident PhD student at Facebook AI Research, Paris, said that self-supervised representation learning is a learning paradigm that aims to learn meaningful representations of some unlabelled data. Recent approaches rely on Siamese networks and maximise the similarity between two augmented views of the same input. A trivial solution is for the network to output constant vectors, known as the collapse problem. VICReg is a new algorithm based on siamese networks but aims to prevent a collapse by regularising the variance and covariance of the network outputs. It achieves state-of-the-art results in several computer vision benchmarks while being a straightforward and interpretable approach.
When asked about how VICReg addresses shortcomings of contrastive learning methods, Bardes explained that contrastive learning methods are based on a simple principle. They make the inputs that should encode similar information close to each other in the embedding space and prevent a collapse by pushing apart the inputs that should encode dissimilar information. This process requires the mining of a massive amount of negative pairs, pairs of distinct inputs. Recent contrastive approaches for self-supervised learning have different strategies for mining these negative pairs; they can sample them from a memory bank, as in MoCo, or sample them from the current batch, as in SimCLR, which in both cases is costly in time or memory. VICReg, on the other hand, does not require these negative pairs; it implicitly prevents a collapse by enforcing the representations to be different from each other without making any direct comparison between different examples. It, therefore, does not require the memory bank of MoCo and works with much smaller batch sizes than SimCLR.
For Bardes, self-supervised learning is probably the most exciting topic in machine learning research. Annotating data is a very expansive process performed by humans who have biases and can make mistakes. It is, therefore, impossible to annotate the vast amount of data available today, for example, medical or astronomical data and images and videos on the Internet. Training models that leverage all these data can only be done using self-supervised learning. This is one of the motivations behind the development of VICReg.
Bardes believes that VICReg is applicable in any scenario where one wants to model the relationships within a data set. It can be used with any kind of data, images, videos, text, audio, or proteins. For example, you could use it to model the dependencies between a video clip and the frame after, therefore learning to predict the future in a video. Another example would be to understand the relationship between the graph of a molecule and its image seen from a microscope.
“We are at the early stages of the development of self-supervised learning. Shifting from contrastive methods to non-contrastive methods is the first step towards more practical algorithms. Current approaches rely on hand-craft data augmentations that can be viewed as a kind of supervision. The next step will probably be to get rid of these augmentations. Another promising direction consists in handling the uncertainties in modelling the data. Current methods are mostly deterministic and always model the same relation between two inputs. For example, if we go back to the frame prediction example, current methods would only model the possible future for a video clip. Future approaches will probably use latent variables that model the space of possible predictions,” concluded Bardes.
Join Our Discord Server. Be part of an engaging online community. Join Here.
Subscribe to our Newsletter
Get the latest updates and relevant offers by sharing your email.
The ways customers want to connect are changing. The WhatsApp Business Platform gives businesses an integrated way to communicate with customers right where they are. In order to integrate properly when using the Cloud API, hosted by Meta, you’ll need to leverage webhooks so applications have a way to respond to events. Webhooks allow your application to monitor three primary events on WhatsApp so you can react with different functionality depending on your goals.
This article looks at these three components, goes through the information they carry, and provides some use-case scenarios to give you an idea of the possibilities.
Interpreting Different Webhook Components
To send and receive messages on WhatsApp, it’s critical to keep track of statuses and errors to help ensure you’re communicating effectively with your customers, which you can do with webhooks.
With webhooks, the WhatsApp Business Platform monitors events and sends notifications when one occurs. These events are one of three components: messages, statuses, and errors.
Let’s explore each of these and examine examples of how you can use them.
The messages component is the largest of the three event types and contains two core objects:
Contacts — which contain information about the message’s sender.
Messages — which provide information about a message’s type and contents.
These two event types allow your application to manage and respond to people that interact with your application. The contacts object contains two pieces of information: name and WhatsApp Id. The contact’s name allows your application to use their name without further lookups. In contrast, the contact’s WhatsApp ID lets you keep track of these contacts or use the contacts/ endpoint to add additional functionality.
For instance, you can verify the customer and start the opt-in process within the customer-initiated conversation, which allows you to message them outside the initial 24-hour response window. It’s important to note that only the text, contacts, and location message types provide contact information.
The message object is where the bulk of the information is stored, including the message contents, type of message, and other relevant information. Depending on the message type, the actual payload of the message component can vary widely. It’s crucial to determine the message type to understand the potential payload. Message types include:
Text: a standard text-only message
Contact: contains a user’s full contact details
Location: address, latitude, and longitude
Unknown: unsupported messages from users, which usually contain errors.
Ephemeral: disappearing messages
Media message types: contain information for the specified media file. These types include:
These different data types can have very different uses, from reviewing images and screenshots from concerned customers to collecting information about where to ship goods and send services. To use these different data types most effectively, you can create applications to handle different forms of communication, with functionalities such as:
Ask your customers to provide a shipping or mailing address. You can use the location-based message feature to capture your users’ location to determine where to send their goods and services.
Show customers products and communicate product details through a message. You can use the referred_product field within messages to offer your users specific product details. Using this field develops a more personal, conversational shopping experience and customer interactions.
Build support functionality that allows customers to take and send images and videos of product concerns, and submit those for a support case. Once the user has submitted a support ticket, the app can track the case — including steps taken towards resolution and conversations between support teams and the customer through WhatsApp — using a unique case identifier.
These are just some potential features you can build using the interactivity provided by webhooks and the message object. These features extend your current communication channels and provide additional options for customers.
Where the messages component provides your application with insight into events that originate directly from your customers, the statuses component keeps track of the results of messages you send and the conversation history. There are six status components:
Sent: the application sent your message and is in transit.
Delivered: the user’s device successfully received the message.
Read: the user has read your message.
Deleted: a user deleted a message that you sent.
Warning: a message sent by your application contains an item that isn’t available or doesn’t exist.
Failed: a message sent by your application failed to arrive.
Status components also contain information on the recipient ID, the conversation, and the pricing related to the current conversation. Conversations on WhatsApp are a grouping of messages within a 24-hour window that are either user-initiated or business-initiated. Keeping track of these conversations is vital, as a new conversation occurs when you send additional responses after the 24-hour period ends.
Some functionality you may want to add to your application based on status events includes:
Ensuring your application has sent generated messages, they arrived, and the recipient potentially read them by using a combination of these status types and timestamps within the status object. This information allows your application to follow up with customers if they didn’t engage.
Keep analytical information about your application’s messages, especially regarding business-initiated conversations. For example, if your application uses a WhatsApp customer contact list to send offer messages, the status component helps you understand how many were sent, delivered, read, responded to, or failed to measure your campaign’s success.
Finally, the errors component allows your application to receive any out-of-band errors within WhatsApp that affect your platform. These errors don’t stop your application from compiling or working but are typically caused when your application is misusing specific functionality. The following are some typical errors.
Error Code 368, Temporarily Blocked for Policy Violations
If your workflows unintentionally generate duplicate messages, you can monitor this to find the source.
Error 131043, Message Expired
Sometimes, messages are not sent during their time to live (TTL) duration. Use this code to know which messages to schedule for resending if needed.
Error handling is a broad, complex subject, and there are many other use cases for which you should be implementing error handling. The errors component helps extend your error handling on the WhatsApp Business Platform for greater consistency.
This article took a high-level look at messages, statuses, and errors returned by webhooks and explored ways you can use these three components to expand your application’s functionality.
Messages provide information on customer interactions, statuses give insight into messages your app sends, and error notices enable you to increase your application’s resilience. Webhooks are critical to ensuring your app interacts with customers seamlessly.
The WhatsApp Business Platform’s webhooks provide your applications with real-time data, enabling you to build better experiences as you interact with customers. Ready to know more? Dive deeper into everything the WhatsApp Business Platform has to offer.
If you’ve used emulators for older platforms, you probably experienced a level of precise control over software execution that we lack on contemporary platforms. For example, if you play 8-bit video games emulated on your modern console, you are able to suspend and rewind gameplay, and when you resume, that incoming creature or projectile will predictably appear in the same spot because any randomness plays out deterministically within the emulator.
Yet, as a software engineer, when your multithreaded service crashes under transient conditions, or when your test is flaky, you don’t have these luxuries. Everything the processor and operating system contributes to your program’s execution—thread scheduling, random numbers, virtual memory addresses, unique identifiers—constitutes an unrepeatable, unique set of implicit inputs. Standard test-driven methodologies control for explicit program inputs, but they don’t attempt to control these implicit ones.
Since 2020, our team within DevInfra has worked to tackle this hard problem at its root: the pervasive nondeterminism in the interface between applications and the operating system. We’ve built the first practical deterministic operating system called Hermit (see endnote on prior art). Hermit is not a new kernel—instead it’s an emulation layer on top of the Linux kernel. In the same way that Wine translates Windows system calls to POSIX ones, Hermit intercepts system calls and translates them from the Deterministic Linux abstraction to the underlying vanilla Linux OS.
Now we explore some of the applications Hermit can be used for, and the role [non]determinism plays. In the next section, we go deeper into how Hermit works.
First, flaky tests. They’re a problem for every company. Google, Apple, Microsoft and Meta have all published their experiences with flaky tests at scale. Fundamentally, the cause of flakiness is that test functions don’t really have the signatures that appear in the source code. An engineer might think they’re testing a function from a certain input type to output type, for example:
test : Fn(Input) -> Output;
Indeed, unless we’re doing property-based testing, then for unit tests it’s even simpler. (The input is empty, and the output is boolean.) Unfortunately, in reality, most tests may be affected by system conditions and even external network interactions, so test functions have a true signature more like the following:
test : Fn(Input, ThreadSchedule, RNG, NetworkResponses) -> Output;
The problem is that most of these parameters are outside of engineers’ control. The test harness and test code, running on a host machine, is at the mercy of the operating system and any external services.
Caption: Irreproducible, implicit inputs from the operating system can affect test outcomes.
That’s where Hermit comes in. Hermit’s job is to create a containerized software environment where every one of the implicit inputs (pictured above) is a repeatable function of the container state or the container configuration, including command line flags. For example, when the application requests the time, we provide a deterministic time that is a function of program progress only. When an application thread blocks on I/O, it resumes at a deterministic point relative to other threads.
Hermit’s guarantee is that any program run by Hermit (without external networking) runs an identical execution—irrespective of the time and place it is run—yielding an identical stream of instructions and complete memory states at the time of each instruction. This means if you run your network-free regression test under Hermit, it is guaranteed not to be flaky:
hermit run ./testprog
Further, Hermit allows us to not merely explore a single repeatable execution of a program, but to systematically navigate the landscape of possible executions. Let’s look at how to control one specific feature: pseudo-random number generation (PRNG). Of course, for determinism, when the application requests random bytes from the operating system, we provide repeatable pseudo-random ones. To run a program with different PRNG seeds, we simply use a different --rng-seed parameter:
hermit run --rng-seed=1 prog hermit run --rng-seed=2 prog
In this case, it doesn’t matter what language prog is written in, or what random number generator library it uses, it must ultimately ask the operating system for entropy, at which point it will receive repeatable pseudo-random inputs.
It is the same for thread scheduling: Hermit takes a command line seed to control thread interleaving. Hermit is unique in being able to reproducibly generate schedules for full Linux programs, not merely record ones that happen in nature. It generates schedules via established randomized strategies, designed to exercise concurrency bugs. Alternatively, full thread schedules can be passed explicitly in as input files, which can be derived by capturing and modifying a schedule from a previous run. We’ll return to this in the next section.
There is an upshot to making all these implicit influences explicit. Engineers dealing with flaky programs can steer execution as they desire, to achieve the following goals:
Regression testing: Use settings that keep the test passing.
Stress testing: Randomize settings to find bugs more effectively.
Diagnosis: Vary inputs systematically to find which implicit inputs cause flakiness.
Pinpointing a general class of concurrency bugs
As mentioned above, we can vary inputs to find what triggers a failure, and we can control schedules explicitly. Hermit builds on these basic capabilities to analyze a concurrency bug and pinpoint the root cause. First, Hermit searches through random schedules, similar to rr chaos mode. Once Hermit finds failing and passing schedules, it can analyze the schedules further to identify pairs of critical events that run in parallel and where flipping the order of those events causes the program to fail. These bugs are sometimes called ordering violations or race conditions (including but not limited to data races).
Many engineers use race detectors such as ThreadSanitizer or go run -race. Normally, however, a race detector requires compile-time support, is language-specific and works only to detect data races, specifically data races on memory (not files, pipes, etc.). What if, instead, we have a race between a client program written in Python, connecting to a server written in C++, where the client connects before the server has bound the socket? This nondeterministic failure is an instance of the “Async Await” flakiness category, as defined by an empirical analysis and classification of flaky tests.
By building on a deterministic operating system abstraction, we can explicitly vary the schedule to empirically find those critical events and print their stack traces. We start by using randomized scheduling approaches to generate a cloud of samples within the space of possible schedules:
Caption: A visualization of the (exponential) space of possible thread schedules, generated by different Hermit seeds. With the space organized by edit distance, the closest red and green dots correspond to the minimum edit distance observed between a passing and failing schedule.
We can organize this space by treating the thread schedules literally as strings, representing sequential scheduling histories. For example, with two threads A & B, “AABBA” could be an event history. The distance between points is the edit distance between strings (actually, a weighted edit distance preferring swaps over insertion or deletion). We can then take the closest pair of passing and failing schedules and then study it further. In particular, we can bisect between those schedules, following the minimum edit distance path between them, as visualized below.
Caption: A binary search between passing and failing schedules, probing points in between until it finds adjacent schedules that differ by a single transposition, a Damerau-Levenshtein distance of one.
At this point, we’ve reduced the bug to adjacent events in the thread schedule, where flipping their order makes the difference between passing and failing. We report the stack traces of these events as a cause of flakiness. (Indeed, it is only a single cause because there may be others if flakiness is overdetermined.)
Challenges and how it works
Here, we’ll cover a bit about how Hermit works, emphasizing pieces that are different from our prototype from ASPLOS ’20. The basic scenario is the same, which is that we set out to build a user space determinization layer, not allowing ourselves the liberty of modifying the Linux kernel or using any privileged instructions.
Challenge 1: Interposing between operating system and application
Unfortunately, on Linux there is not a standard, efficient, complete method to interpose between user-space applications and the operating system. So we’ve built a completely new program sandboxing framework in Rust, called Reverie, that abstracts away the backend—how program sandboxing is implemented. Reverie provides a high-level Rust API to register handlers: callbacks on the desired events. The user writes a Reverie tool that observes guest events and maintains its own state.
Reverie is not just for observing events. When you write a Reverie handler, you are writing a snippet of operating system code. You intercept a syscall, and you become the operating system, updating the guest and tool state as you like, injecting zero or more system calls to the underlying Linux operating system and finally returning control to the guest. These handlers are async, and they run on the popular Rust tokio framework, interleaving with each other as multiple guest threads block and continue.
The reference Reverie backend uses the ptrace system call for guest event interception, and a more advanced backend uses in-memory program instrumentation. In fact, Reverie is the only program instrumentation library that abstracts away whether instrumentation code is running in a central place (its own process), or inside the guest processes themselves via injected code.
Challenge 2: Inter-thread synchronization
Consider communication through sockets and pipes within the reproducible container. This is an area where our earlier prototype mainly used the strategy of converting blocking operations to non-blocking ones, and then polling them at deterministic points in time within a sequential execution. Because we run in user space, we don’t have a direct way to ask the kernel whether a blocking operation has completed, so attempting a non-blocking version of the syscall serves as our polling mechanism.
Hermit builds on this strategy and includes a sophisticated scheduler with thread priorities and multiple randomization points for exploring “chaos mode” paths through the code. This same scheduler implements a back-off strategy for polling operations to make it more efficient.
Hermit also goes beyond polling, implementing some inter-thread communication entirely inside Hermit. By including features like a built-in futex implementation, Hermit takes a small step closer to behaving like an operating system kernel. But Hermit is still vastly simpler than Linux and passes most of the heavy lifting on to Linux itself.
For specific features that Hermit implements directly, it never passes those system calls through to Linux. In the case of futexes, for example, it is hard or impossible to come up with a series of raw futex syscalls to issue to the kernel and achieve a deterministic outcome. Subtleties include spurious wake-ups (without the futex value changing), the nondeterministic selection of threads to wake, and the ineradicable moment in time after Hermit issues a blocking syscall to Linux, but before we know for sure if Linux has acted on it.
These issues are avoided entirely by intercepting each futex call and updating the Hermit scheduler’s own state precisely and deterministically. The underlying Linux scheduler still runs everything, physically, but Hermit’s own scheduler takes precedence, deciding which thread to unblock next.
Challenge 3: Large and complex binaries
Meta has no shortage of large and challenging binaries that use recent features of the Linux kernel. Nevertheless, after a couple of years of development, Hermit runs thousands of different programs deterministically today. This includes more than 90 percent of test binaries we run it on.
While flaky tests and concurrency bugs are where we’ve invested most heavily, there are many other applications which we will briefly outline below:
Smart contracts: which are normally written in specialized languages that are constrained to be deterministic. Reproducible containers could make it possible to run traditional software “on chain” with the same assurance of distributed replicability.
There’s more than we can explore on our own! That is why we’re excited to open up these possibilities for the community at large.
Repeatable execution of software is an important capability for debugging, auditability and the other applications described above. Nevertheless, it’s been treated as an afterthought in the technology stacks we all depend on—left as the developer’s responsibility to create “good enough” repeatability for stable tests or a reproducible bug report.
Hermit, as a reproducible container, provides a glimpse of what it would be like if the system stack provided repeatability as an abstraction: a guarantee the developer could rely upon, just like memory isolation or type safety. We’ve only scratched the surface of what is possible with this foundational technology. We hope you’ll check out the open source GitHub repo and help us apply and evolve Hermit.
Note on prior Art: Earlier academic research on Determinator and dOS represents exploratory work in this area over a decade ago. But these systems were, respectively, a small educational OS and an experimental fork of Linux. Neither was designed as a maintainable mechanism for people to run real software deterministically in practice.
Interactions are a crucial aspect of making immersive VR experiences. These interactions must be natural and intuitive to improve retention and enhance users’ overall experience. To make it easy for developers to build immersive experiences that create a realistic sense of presence, we released Presence Platform, which provides developers with a variety of capabilities, including Passthrough, Spatial Anchors, Scene understanding, Interaction, Voice and more.
Interaction SDK lets developers create high-quality hand and controller interactions by providing modular and flexible components to implement a large range of interactions. In this blog, we’ll discuss the capabilities, samples and other resources available to you to help you get started with Presence Platform’s Interaction SDK. It’s important to us at Meta that the Metaverse continues to be built in the open, so we created an open source sample featuring the Interaction SDK to inspire you to innovate in VR along with us.
Interaction SDK is a library of components for adding controllers and hands interactions to your experiences. Interaction SDK enables developers to incorporate best practices for user interactions and includes interaction models for Ray, Poke and Grab as well as hand-centric interaction models for HandGrab, HandGrabUse and Pose Detection. If you’re looking to learn more about what these models and interactions entail, see the “available interactions” section below.
To get started with Interaction SDK, you’ll need to have a supported Unity version and the Oculus Integration package installed. To learn more about the prerequisites, supported devices, Unity versions, package layout and dependencies, check out our documentation on the Meta Quest for Developers website.
We strive to make your experience of using Interaction SDK as smooth as possible. New features are regularly added, along with bug fixes and improvements. To learn more about latest versions and keep up to date with the latest features and fixes, check out our documentation on upgrade notes where you can find the latest information on new features, deprecations and improvements.
How does Interaction SDK work?
All interaction models in the Interaction SDK are defined by an Interactor–Interactable pair of components. For example, a ray interaction is defined by the RayInteractor-RayInteractable pair of components. An Interactor is the component that acts on (hovers over, selects) Interactables. An Interactable is the component that gets acted on (can be hovered or selected) by Interactors. Together, the Interactor and Interactable components make up interactions.
To learn more about the Interactor-Interaction lifecycle, check out the documentation on Interactables. You can also choose to coordinate multiple Interactors based on state by using InteractorGroups. Our documentation on InteractorGroups goes over what an InteractorGroup is, how to use them more broadly and how to coordinate multiple Interactors to create InteractorGroupMulti.
If you’re looking to make one interaction dependent on another ongoing interaction, you can link interactions with each other in a primary-secondary relationship. For example, if you’d first like to grab an object and then use it, the grab interaction would be the primary one, and the use interaction would be secondary to the grab. Check out our documentation to learn more about Secondary Interactions and how to use them.
What are the various Interactions available?
Interaction SDK allows developers to implement a range of robust and standardized interactions such as grab, poke, raycast and many more. Let’s dive into some of these interactions and how they work:
Hand Grab Interactions provide a means of grabbing objects and snapping them to pre-authored hand grab poses. It uses the HandGrabInteractor and the HandGrabInteractable for this interaction.
Hand Grab interactions use per-finger information to inform when a selection should begin or end by using the HandGrabAPI, which indicates when the grabbing starts and stops, as well as the strength of the grabbing pose. While the HandGrabInteractor searches for the best interactable candidate and provides the necessary snapping information needed, the HandGrabInteractable indicates whether an object can be grabbed, how it moves, which fingers perform the grab, handedness information, finger constraints, how the hand should align and more. Check out the documentation on Hand Grab Interaction, where we discuss how this interaction works and how to customize the interactable movements and alignments.
Poke Interactions allow users to interact with surfaces via direct touch, such as pressing or hovering over buttons and interacting with curved and flat UI.
PokeInteractables can also be combined with a PointableCanvas to enable direct touch Unity UI. Read about how to integrate Interaction SDK with Unity Canvas on our documentation about Unity Canvas Integration.
Hand Pose Detection provides a way to be able to recognize when tracked hands match shapes and wrist orientations.
The SDK provides 6 example poses as prefabs that show how pose detection works:
Based on the patterns defined in these poses, you can define your own custom poses. To learn more about what the Hand pose prefabs contain, check out our documentation on Pose Prefabs.
Interaction SDK combines the use of Sequences and active state to allow you to create gestures. Active states can be chained together using Sequences to create gestures–the swipe gesture, for example.
When a specified criteria such as shape, transform feature, velocity and others are met, a state can become active. For example, one or more ShapeRecognizerActiveState components can become active when all of the states of the finger features match those in the listed shapes, that is, the criteria of a specified shape is met.
Since Sequences can recognize a series of active states over time, they can be used to create complex gestures. These gestures can be used to interact with objects by hovering over them or even different interactions based on which direction the hand swipes in. To learn more about how Sequences work, check out our documentation.
The Distance Grab interaction allows a user to select and move objects that are usually beyond their hand reach. For example, this can mean attracting an object towards the hand and then grabbing it. Distance Grab supports other movements as well.
Touch Hand Grab enables using hands to grab objects based on their collider configuration and dynamically conforming fingers to their surface, so you can grab the interactables by simply touching the surface between your fingers, without fully needing to grab the object.
It uses the TouchHandGrabInteractor and the TouchHandGrabInteractable for this interaction. The TouchHandGrabInteractor defines the touch interaction properties, while the TouchHandGrabInteractable defines the colliders that are used by the interactor to test overlap against during selection. To learn more about how these work, check out the documentation about Touch Hand Grab interaction.
Transformers let you define different ways to manipulate Grabbable objects in 3D space, including how to translate, rotate or scale things with optional constraints.
For example, this interaction could use the Grab, Transformers and Constraints on objects to achieve interactions such as moving an object on a plane, using two hands to change its scale, rotating an object and much more.
Ray interactions allow users to select objects via raycast and pinch. The user can then choose to interact using pose and can pinch for selection.
It uses RayInteractor and RayInteractable for this interaction. The RayInteractor defines the origin of raycasts, their direction and a max distance for the interaction, whereas the RayInteractable defines the surface of the object being raycasted against. When using hands, theHandPointerPose component can be used for specifying the origin and direction of the RayInteractor. Check out our documentation on Ray Interaction, where we discuss how RayInteractor and RayInteractable work as well as how Ray Interaction works with hands.
If you’d like to read more about the features mentioned above, check out our documentation on Interactions. There you’ll find information on Interactors, Interactables, Debugging interactions, Grabbables, the different types of interactions available and more.
Trying out the Interaction SDK Samples
To make it easy for developers to try out these interactions, our team has created the Interaction SDK Samples app that showcases each of the features that we discussed earlier. The Samples App can be found on AppLab. It goes over each feature as a separate example to help understand the interactions better.
Before you try the samples on your headset, make sure that you have hand tracking turned on for your headset. To do that, go to Settings on the headset, click on Movement Tracking and turn on Hand Tracking. You can leave the Auto Switch Between Hands And Controllers selected so that you can use hands if you put controllers down.
You will begin the sample in the welcome scene, which shows you all the sample interactions available for you to try. This sample contains examples for interactions, such as Hand Grab, Touch Grab, Distance Grab, Transformers, Hand Grab Use, Poke, Ray, Poses and Gesture interactions.
Let’s take a look at each of these samples:
The Hand Grab example: This example showcases the HandGrabInteractor. The Virtual Object demonstrates a simple pinch selection with no hand grab pose. The key demonstrates pinch-based grab with a single hand grab pose. The torch demonstrates curl-based grab with a single hand grab pose. The cup demonstrates pinch- and palm-capable grab interactables with associated hand grab poses.
The Touch Grab example: This example shows how you can grab an object by just touching it and not actually grabbing it. Touch Grab uses the physics shape of the object to create poses for the hand dynamically. In this example, you will find an example of kinematic as well as non-kinematic pieces to try this interaction with. Kinematic demonstrates the interaction working with non-physics objects, whereas Physical demonstrates the interaction of a non-kinematic object with gravity as well as the interplay between grabbing and non-grabbing with physical collisions.
The Distance Grab example: This example showcases multiple ways for signaling, attracting and grabbing distance objects. For example, Anchor At Hand anchors the item at the hand without attracting it, so that you can move it without actually grabbing it. Interactable To Hand shows an interaction where the item moves toward the hand in a rapid motion and then stays attached to it. And finally, Hand To Interactable shows an interaction where you can move the object as if the hand was there.
The Transformers example: This example showcases the GrabInteractor and HandGrabInteractable with the addition of Physics, Transformers and Constraints on objects. The map showcases translate-only grabbable objects with planar constraints. The stone gems demonstrate physics objects that can be picked up, thrown, transformed and scaled with two hands. The box demonstrates a one-handed rotational transformation with constraints. The figurine demonstrates hide-on-grab functionality for hands.
The Hand Grab Use example: This example demonstrates how a Use interaction can be performed on top of a HandGrab interaction. For this example, you can grab the water spray bottle and hold it with a finger on the trigger. You can then aim at the plant and press the bottle trigger to moisturize the leaves–it feels intuitive and natural.
The Poke example: This example showcases the PokeInteractor on various surfaces with touch limiting. It demonstrates poke interactions and pressable buttons with standalone buttons or with Unity canvas. Multiple visual affordances are demonstrated, including various button depths as well as Hover On Touch and Hover Above variants for button hover states. These affordances also include Big Button with multiple poke interactors like the side of the hand or palm, and Unity canvases that showcase Scrolling and pressing buttons.
The Ray example: This example showcases the RayInteractor interacting with a curved Unity canvas using the CanvasCylinder component. It demonstrates ray interactions with a Unity canvas and hand interactions that use the system pointer pose and pinch for selection. Ray interactions use the controller pose for ray direction and trigger for selection. It also showcases curved surfaces that allow for various material types: Alpha Cutout, Alpha Blended and Underlay.
The Poses example: This example showcases six different hand poses, with visual signaling of pose recognition. It has detection for Thumbs Up, Thumbs Down, Rock, Paper, Scissors and Stop. Poses can be triggered by either the left or right hand. It also triggers a particle effect when a pose starts and then hides the particle effect when a pose ends. In the past, hand poses needed to be manually authored for every item in a game, and it could take several iterations before the pose felt natural. Interaction SDK provides a Hand Pose Authoring Tool, which lets you launch the editor, reach out and hold an item in a way that feels natural, record the pose and use the pose immediately.
The Gestures example: This example demonstrates the use of the Sequence component combined with ActiveState logic to create simple swipe gestures. For example, the stone shows a random color change triggered by either hand. The picture frame cycles through pictures in a carousel view, with directional change depending on which hand is swiping. Also note that the gestures are only active when hovering over objects.
To learn more about how these samples work, and the various reference materials available to you to help you get started with these, check out our documentation on Example Scenes and Feature Scenes.
These examples showcase just a small subset of all the interactions that could be possible by using Interaction SDK. You can play around and build your own versions of these interactions by downloading the SDK in Unity. We’ll discuss how to do that later in the blog.
Trying out First Hand
Our team has also worked on a showcase experience called First Hand, which aims to demonstrate the capabilities and variety of interaction models possible with Hands. This is an official demo for Hand Tracking built with Interaction SDK. First Hand can be found on AppLab.
The experience starts with you in a clocktower. There’s a table in front of you and various objects that you can interact with. You can see your hands being tracked, and you will notice a light blinking on your left giving you an indication that it can be interacted with. It tells you that a delivery package is waiting for you to accept. You can grab the lift control and push the accept button. This demonstrates the HandGrab and Poke interaction.
The package is delivered, and it prompts you to pull the handle to provide power. Pulling the handle provides power, and it opens up. This demonstrates the HandGrab interaction with a one-handed rotate transform for rotating the lever. Next, you will need to type in the code written on the number pad to allow the box to unlock. This uses the Poke interaction.
A wheel is presented in front of you. You can use your hands to HandGrab and Turn. This will open up the box, and you will be presented with an interactable UI, which you can interact with to create your own robotic gloves.
You can use your hands to scroll through the UI and poke to select which section of your hand you want to create first. This demonstrates the Poke interaction. Select the first section, and it will present the first section of the glove in front of you. You are presented with three color options. Swipe to choose the color you’d like your glove section to be. This demonstrates Swipe gesture detection feature. You can pick the piece up, scale it, rotate it or move it around. This uses the Touch Grab interaction and the Two Grab Free Transformer. Once you’ve selected a color, you can push the button to build it for you. You can move on to the next section and create the remaining parts of your glove in a similar manner.
Once your glove is ready, it presents you with three rocks that you can choose from to add to your glove. You can choose one of them by grabbing a rock, and then you’re able to crush the rock in your hand by squeezing it hard to reveal the crystal–a Use Grab interaction. Once the crystal is ready, you can place it on your glove to activate the crystal. You now have super power gloves!
An object starts flying around you and tries to attack you with lasers! You can use your super powers to shoot into the targets and to save yourself from getting hit. To shoot, open your palms and aim. To create a shield, fold your fingers and bring your hands together. This shows how Pose Detection can be used for even two-handed poses! Based on the right pose, it performs the appropriate actions.
Finally, you will be presented with a blue button on your glove, which you can Poke to select. It then teleports you to a world where you can see the clock tower that you were in, along with several potential objects you could interact with, leaving the players with the imagination of what we could unlock with these interactions at our disposal.
Building great Hands-driven experiences requires optimizing across multiple constraints—first, the technical constraint like tracking capabilities, second, the physiological constraint, like comfort and fatigue, and finally, the perceptive constraint, like hand-virtual object interaction representation. This demo shows how the Interaction SDK unlocks several ways that can make your VR experiences more intuitive and immersive, without you needing to set up everything from scratch to get it all to work. It showcases some of the Hands interactions that we’ve found to be the most magical, robust and easy to learn but that are also applicable to many categories of content.
To make it really easy for you to follow along and use these interactions in your own VR experiences, we’ve created an open source project that demonstrates the use of Interaction SDK to create the various interactions that you saw in the First Hand showcase app. In the next section, you will learn about the Interaction SDK package in Unity and how to set up a local copy of the First Hand sample.
Setting up a local copy of the First Hand sample
To have a local copy of the sample, you will need to have Unity installed, a PC or Mac and the Oculus Integration package installed. We will be using Unity 2021.3.6f1 for setup in this blog, but you can use any of the supported Unity versions. You can find more information on supported devices, Unity versions and other prerequisites on our documentation for Interaction SDK. Interaction SDK can be downloaded from the Unity Asset Store or from our Developers Page.
You should also make sure that you have turned on Development Mode on your headset. You can do that from your headset as well as from the Meta Quest Mobile App. To do it from the headset, go to Settings → System → Developer, and then turn ON the USB Connection Dialog option. Alternatively, if using the Meta Quest Mobile App, you can do it by going to Menu → Devices. Select the headset from the list, and then turn ON the Developer Mode option.
Once you have Unity installed, open a new 3D scene. To install Interaction SDK, go to Window → Asset Store and look for the Oculus Integration package, and then click on “Add to my Assets.” To install the package, open Package Manager, click on Import and then Import the files into your project.
If it detects that a newer version of OVRPlugin is available, it is recommended to use the newest version. Go ahead and enable it.
If asked to use OpenXR for OVRPlugin, click on Use OpenXR. New features, such as the Passthrough API, are only supported through the OpenXR backend. You can switch between legacy and OpenXR backends at any time from Oculus → Tools → OVR Utilities Plugin.
It will confirm the selection, and it may provide an option to upgrade the Interaction SDK and perform a cleanup to remove the obsolete and deprecated assets. Allow it to do so. If you choose to do this at a later stage, you can do it anytime by going to Oculus → Interaction → Remove Deprecated Assets.
Once installed, Interaction SDK components can be found under the Interaction folder under Oculus.
Since you will be building for Quest, make sure to update your build platform to Android. To do that, go to File → Build Settings → Select Android and choose “Switch Platform.”
What does the Interaction SDK package contain?
Interaction SDK components can be found under the Interaction folder under Oculus. The SDK contains three folders–Editor, Runtime and Samples. The Editor contains all the editor code for the SDK. The Runtime contains the core runtime components of Interaction SDK, and the Samples contain the scenes and assets used in the Interaction SDK samples demo that we discussed earlier in the blog.
Under Samples, click on Scenes. Here you will find the Example Scenes, Feature Scenes and Tools. The Example Scenes directory includes all the scenes that you saw earlier in the Interaction SDK samples app from AppLab. The Feature directory includes scenes that are dedicated to the basics of any single feature, and the Tools directory includes helpers like a Hand Pose Authoring Tool. Let’s open one of these samples, the HandGrabExamples, and check out the scene setup, scripts and other components–as well as how these are used to create the interactions you saw in the samples. When opening any of the sample scenes, it might ask you to import TMP essentials. Go ahead and do that, and you will have your scene open.
Now, let’s look at some game objects and scripts to understand how they work. The OVRCameraRig prefab provides the transform object to represent the Oculus tracking space. It contains a TrackingSpace game object and the OVRInteraction game object under it. The TrackingSpace prefab allows users to fine-tune the relationship between the head tracking reference frame and your world. Under TrackingSpace, you will find a center-eye anchor (which is the main Unity camera), two anchor game objects for each eye, and left- and right-hand anchors for controllers.
The OVRInteraction prefab provides the base for attaching sources for Hands, Controllers or Hmd components that source data from OVRPlugin via OVRCameraRig. It contains the OVRControllerHands with left- and right-controller hands under it, along with the OVRHands with left and right hands under it.
To learn more about the OVRIntegration prefab and how it works with OVRHands, check out our documentation on Inputs.
There are two main scripts attached to the OVRCameraRig prefab: OVRManager and the OVRCameraRig. OVRManager is the main interface to the VR hardware and exposes the Oculus SDK to Unity. It should only be declared once. It contains settings for your target devices, performance, tracking and color gamut. Apart from these, it also has some Quest-specific settings:
Hand Tracking Support: This allows you to choose the type of input affordance you like for your app. You can choose to have Controllers Only, Controllers and Hands to switch between the two, or only Hands.
Hand Tracking Frequency: You can select the hand tracking frequency from the list. A higher frequency allows for better gesture detection and lower latencies but reserves some performance headroom from your application’s budget.
Hand Tracking Version list: This setting lets you choose the version of hand tracking you’d like to use. Select V2 to use the latest version of hand tracking. The latest update brings you closer to building immersive and natural interactions in VR without the use of controllers, and it delivers key improvements on Quest 2.
As discussed earlier, each interaction consists of a pair of Interactor and Interactable. Interactables are rigid body objects that the hands or controllers will interact with, and they should always have a grabbable component attached to them. To learn more about Grabbables, check out our documentation.
This sample demonstrates the HandGrabInteractor and HandGrabInteractable pair. The HandGrabInteractor script can be found under OVRInteraction → OVRHands → Choose LeftHand or RightHand → HandInteractorsLeft/Right → HandGrabInteractor. The corresponding interactable can be found for the rigid body objects under Interactables → choose one of the SimpleGrabs → HandGrabInteractable, which contains the HandGrabInteractable script. HandGrabInteractables are used to indicate that these objects can be Hand-Grab interacted with either hands or controllers. They require both a Rigidbody component for the interactable and a Grabbable component for 3D manipulation, attached as shown.
Now that you have a basic understanding of how a scene is set up using the OVRCameraRig, OVRManager, OVRInteraction and how to add Interactor-Interactable pairs, let’s learn how you can set up a local copy of the First Hand sample.
Cloning from GitHub and setting up the First Hand sample in Unity
The First Hand sample GitHub repo contains the Unity project that demonstrates the interactions presented in the First Hand showcase. It is designed to be used with hand tracking.
The first step to setting up the sample locally is to clone the repo.
First, you should ensure that you have Git LFS installed by running:
Once the project has successfully cloned, open it in Unity. It might give a warning if you have a different Unity version. Accept the warning, and it will resolve the packages and open the project with your Unity version. It might also give a warning that Unity needs to update the URP materials for the project to open. Accept it, and the project will load.
The main scene is called the Clocktower scene. All of the actual project files are in Assets/Project. This folder includes all scripts and assets to run the sample, excluding those that are part of the Interaction SDK. The project already includes v41 of the Oculus SDK, including the Interaction SDK. You can find the main scene under Assets/Project/Scenes/Clocktower. Just like before, import TMP essentials if you don’t already have it, and the scene should load up successfully.
Before you build our scene, make sure that Hand Tracking Support is set to Controllers and Hands. To do this, go to Player → OVRCameraRig → OVRManager. Under Hand Tracking support, choose “Hands and Controllers or Hands only.” Set the Hand Tracking Version to “V2” to use the latest and the Tracking Origin Type to “Floor Level.” Make sure you’re building for the Android platform. To build, go to File → Build Settings and click on “Build.” You can choose to click on “Build and Run” to directly load and run it on your headset, or you can install and run it from Meta Quest Developer Hub, which is a standalone companion development tool that positions Meta Quest and Meta Quest 2 headsets in the development workflow. To learn more about Meta Quest Developer Hub, visit our documentation.
If you choose to use Developer Hub, you can drag the APK into the application and click “Install” to install and run it on the headset.
When the sample loads, you will see the objects that you interacted with in the First Hand showcase app. You will also see an option to enable Blast and Shield Gestures and the option to enable Distance Grab. You will be able to interact with the UI in front of you using the Poke interaction. You can also interact with the pieces from the glove like you saw in the First Hand showcase app. Note that this sample only showcases the interactions and not the complete gameplay of the FirstHand demo. Below you can find a list of the objects you can interact with in this sample and the kind of interaction they demonstrate:
This was a quick walkthrough of the First Hand sample GitHub project that demonstrates the various interactions created using the Interaction SDK, which you saw in the First Hand showcase app. The SDK provides you with these interactions to make it really easy for you to integrate these into your own applications. Apart from this, our team has created documentation, tutorials and many other resources to help you get started with using Interaction SDK. Let’s discuss some resources and best practices that you can keep in mind when using the SDK to add interactions in your VR experiences.
Best practices and resources
For hand tracking to be an effective means of interaction in VR, it needs to be intuitive, natural and effortless. However, hands don’t come with buttons or switches the way other input modalities do. This means there’s nothing hinting at how to interact with the system and no tactile feedback to confirm user actions. To solve this, we recommend communicating affordances through clear signifiers and continuous feedback on all interactions.
You can think of this in two ways: signifiers, which communicate what a user can do with a given object, and feedback, which confirms the user’s state throughout the interaction. For example, in the First Hand sample, visual and auditory feedback plays an important role in prompting the user to interact with certain objects in the environment–the lift control beeps and glows to let the player know they should press the button, the glowing object helps the player identify when objects can be grabbed from a distance and color changes for UI buttons when selected confirm the user’s selection. Apart from this, creating prompts to guide the player through First Hand also worked especially well.
Hands are practically unlimited in terms of how they move and the poses they can form. This opens up a world of opportunities, but it can also cause confusion. By limiting which hand motions can be interpreted by the system, we can achieve more accurate interactions. For example, in the First Hand sample, the player is allowed to grab certain objects, but not all. They can also grab, move and scale some of the objects, but this is limited to some objects—they can grab and scale the parts of the glove, but it doesn’t allow them to scale it only in one direction to prevent deforming the object.
Snap Interactors are a great option for drop zone interactions and fixing items in place. You will see this used extensively in First Hand, especially during the Glove building sequence where “Snap Interactors” trigger its progression. Interaction SDK’s Touch Hand Grab interaction can come in handy for small objects, especially objects that don’t have a natural orientation or grab point, without restricting the player with pre-authored poses. You will see this being used in First Hand for the glove parts, which the player can pick up in whatever way feels natural to them.
For objects out of a user’s reach, raycasting comes in very handy when selecting objects. Distance Grab is also another great way to make it easy for users to interact with the object, without needing to walk up to it or reach out to it. This can not only make your experience more intuitive, it also makes it more accessible. Distance Grab is easy to configure and is implemented by having a cone extend out from the hand that selects the best candidate item. When setting up an item for Distance Grab, it can save time to reuse the same hand poses that were set up for regular grab. In the First Hand sample, you can see how Distance Grab was used to grab objects from a distance and how the visual ray from the hand confirms which object will be grabbed.
Tracking and ergonomics also play a huge role when designing your experiences with hand tracking. The more of your hands the headset’s sensors can see, the more stable tracking will be. Only objects that are within the field of view of the headset can be detected and hence you should avoid forcing users to reach out to objects outside of the tracking volume.
It’s also important to make sure the user can remain in a neutral body position as much as possible. This allows for a more comfortable ergonomic experience, while keeping the hand in an ideal position for the tracking sensors. For example, in the First Hand sample, the entire game can be experienced sitting or standing, with hands almost always in front of the headset. This makes the hand tracking more reliable while making the experience more accessible.
These were some of the best practices that we recommend keeping in mind when designing interactions using your hands in VR. Interaction SDK provides you with modular components that can make implementing these interactions easy so that you don’t have to start from scratch when building your experiences.
To get started with Interaction SDK, check out our documentation on the Interaction SDK overview, where we go over the setup, prerequisites and other settings to get started with using Interaction SDK in Unity. We’ve added several sample scenes in the SDK to help you get started with developing interactions in your apps. To learn more about the examples, check out our documentation that goes over the example scenes the feature scenes and the interactions they showcase. Check out our blog to learn from the team who built First Hand on their tips to get started with Interaction SDK.
To learn more about the OVRInteraction prefab, Hands and Controllers and how to set them up, check out our documentation for Inputs. To learn more about how Interactors and Interactables work, and the various kinds of interactor-interaction pairs available, check out our documentation on Interactions, where we go over Interactors and its methods, Interactables and the lifecycle between both and Grabbable components. You can also learn more about the various kinds of interactions and how they work.
This blog supports the video “Building Intuitive Interactions in VR: Interaction SDK, the First Hand Sample and Other Resources.” In this blog, we discuss Presence Platform’s Interaction SDK and how it can drastically reduce development time if you’re looking to add interactions in your VR experiences. To help you create virtual environments that feel authentic and natural, we’ve created Presence Platform which consists of a broad range of machine perception and AI capabilities, including Passthrough, Spatial Anchors, Scene understanding, Interaction SDK, Voice SDK, Tracked keyboard and more.