Connect with us

FACEBOOK

Facebook Architects Around Silent Data Corruption

Published

on

Silent but deadly: there is nothing more destructive than data corruptions that cannot be caught by the various error capture tools in hardware and even in software, can be hard to spot before they have infected an entire application.

This is especially devastating at Facebook scale but engineering teams at the social giant have discovered strategies to keep a local problem from going global. A single hardware-rooted error can cascade into a massive problem when multiplied at hyperscale and for Facebook, keeping this at bay takes a combination of hardware resiliency, production detection mechanisms, and a broader fault-tolerant software architecture.

Facebook’s infrastructure team started an effort to understand the roots and fixes for silent data corruption in 2018 to understand how fleet-wide fixes might look—and what those might detection strategies could cost in terms of overhead.

Engineers found that many of the cascading errors are the result of CPUs in production but not always due to the “soft errors” of radiation or synthetic fault injection. Rather, they find these can happen randomly on CPUs in repeatable ways. Although ECC is useful, this is focused on problems in SRAM but other elements are susceptible. The Facebook engineering team that reported on these problems finds that CPU silent data corruptions are actually orders of magnitude higher than soft-errors due to a lack of error correction in other blocks.

Increased CPU complexity opens the doors to more errors and when compounded at hyperscale datacenter levels with ever-denser nodes, these at-scale problems will only become more problematic and widespread. At the hardware level, the problems can range from general device errors (placement and routing problems can lead to different arrival times for signals, causing bit-flips, for instance) and more manufacturing-centric problems like etching errors still happen. Further, early life failures of devices and degradation of existing CPUs can also have hard-to-detect impacts.

For example, when you perform 2×3, the CPU may give a result of 5 instead of 6 silently under certain microarchitectural conditions without any indication of the miscomputation in the system event or error logs. As a result, a service utilizing the CPU is potentially unaware of the computational accuracy and keeps consuming the incorrect values in the application.

“Silent data corruptions are real phenomena in datacenter applications running at scale,” members from the Facebook infrastructure team explain. “Understanding these corruptions helps us gain insights into the silicon device characteristics; through intricate instruction flows and their interactions with compilers and software architectures. Multiple strategies of detection and mitigation exist, with each contributing additional cost and complexity into a large-scale datacenter infrastructure.”

Advertisement
free widgets for website

Facebook used a few reference application examples to highlight the impact of silent data corruption at scale, including an example with a Spark workflow that runs millions of computations of wordcount computations per day along with FB’s compression application, which similar millions of compression/decompression computations daily. In the compression example, Facebook observed a case where the algorithm returned a “0” size value for a single file (was supposed to be a non-zero number), therefore the file was not written into the decompressed output database. “as a result, the database had missing files. The missing files subsequently propagated to the application.  An application keeping a list of key value store mappings for compressed files immediately observes that files that were compressed are no longer recoverable. The chain of dependencies causes the application to fail.” And pretty soon, the querying infrastructure reports back with critical data loss. The problem is clear from this one example, imagine if it was larger than just compression or wordcount—Facebook can.

Data corruptions propagate across the stack and manifest as application level problems. These types of errors can result in data loss and can require months of debug engineering time… With increased silicon density and technology scaling, we believe that academic researchers and industry should invest in methods to counter these issues.

Debugging is arduous but it is still at the heart of how Facebook handles these silent data corruptions, although not until they’re loud enough to be heard. “To debug a silent error, we cannot proceed forward without understanding which machine level instructions are executed. We either need an ahead-of-time compiler for Java and Scala or we need a probe, which upon execution of the JIT code, provides the list of instructions executed.” Their best practices for silent error debugging include are detailed in 5.2.

An overall suite of fault tolerance mechanisms is also key to Facebook’s strategy. These include redundancy at the software level but of course, this comes with costs. “The cost of redundancy has a direct effect on resources; the more redundant the architecture, the larger the duplicate resource pool requirements” even though this is the most certain path to probabilistic fault tolerance. Less overhead-laden ways of dealing with fault tolerance also include relying on fault tolerant libraries (PyTorch is specifically cited) although this is not “free” either, the impact on application performance is palpable.

“This effort would need a close handshake between the hardware silent error research community and the software library community.”

In terms of that handshake, Facebook is openly calling on datacenter device makers to understand that their largest customers are expecting more, especially given the cascading wide-net impacts of hardware-derived errors.

Advertisement
free widgets for website

“Silent data corruptions are not limited to rare one in a million occurrences within a large-scale infrastructure. These errors are systemic and are not as well understood as the other failure modes like Machine Check Exceptions.” The infrastructure team adds that there are several studies evaluating the techniques to reduce soft error rate within processors those lessons can be carried into similar, repeatable SDCs which can occur at a higher rate.

A large part of the responsibility should be shared by device makers, Facebook says. These approaches are on the manufacturer’s side and can include beefing up the blocks on a device for better datapath protection using custom ECCs, providing better randomized testing, understanding increased density means higher propagation of errors and most important, understanding “at scale behavior” via “close partnership with customers using devices at scale to understand the impact of silent errors.” This would include occurrence rates, time to failure in production, dependency on frequency, and environmental issues that impact these errors.

“Facebook infrastructure has implemented multiple variants of the above hardware detection and software fault tolerant techniques in the past 18 months. Quantification of benefits and costs for each of the methods described above has helped the infrastructure to be reliable for the Facebook family of apps.” The infrastructure team plans to release a follow-on with more detail about the various trade-offs and costs for their current approaches.

More detail, including Facebook’s best practices for fault tolerance in software and architecting around potential hardware failures can be found here.

Advertisement
free widgets for website

Sign up to our Newsletter

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.

Subscribe now

Read More

See also  Manteca PD credits Facebook responses to arrest in March shooting

FACEBOOK

Facebook-Meta Earns the ‘Worst Company of 2021’ Title in This Survey

Published

on

By

facebook-meta-earns-the-‘worst-company-of-2021’-title-in-this-survey-–-news18
Facebook has had its share of controversies this year. The company was under more scrutiny after whistleblower Frances Haugen leaked a series of internal documents.

Facebook parent Meta has been named the Worst Company of the Year (2021) by Yahoo Finance respondents. According to the publication, an “open-ended” survey was published on Yahoo Finance on December 4 and 5, where 1,541 respondents participated. Facebook received 8 percent of the write-in vote, but respondents were seemingly mad about the Robinhood trading app as well. Electric truck startup Nikola, which was named last year’s worst company by the same publication also faced respondents ire.

Yahoo Finance notes, “Facebook has had its share of controversies this year.” Starting in January, Meta-owned WhatsApp got caught up in a huge controversy after the messaging app announced a new privacy policy (Terms of Service). WhatsApp said it would collect user information and share it with third-party apps for a better user experience. However, the app gave users no choice but later made modifications to the policy under pressure. Similarly, the company was under more scrutiny after whistleblower and former Facebook employee Frances Haugen leaked a series of internal documents showing the company’s problematic practices. It was revealed that Meta-owned Instagram had a negative impact on teenage girls, but the company did almost nothing to rectify the problem.

Yahoo Finance even highlights, “At the same time, some critics, including conservatives, say Facebook over-policed the platform’s speech and stifled their voices.” Critics also blame Facebook and other social media platforms for not curbing hate speech that led to Capitol Building riots.

See also  Facebook To Limit Politics, Boost Friends, Says Spokesman On 'Meet The Press' - Deadline

However, around 30 percent of Yahoo Finance readers said that Facebook or Meta could redeem itself. One respondent suggested that the company could issue a formal apology for negligence and donate a sizable amount of its profits to a foundation to help reverse its harm.

On the other hand, respondents chose Microsoft as the Company of the Year (2021). The Satya Nadella-led company touched the trillion-mark this year and introduced notable upgrades. The most notable is the Windows 11 OS update that succeeds Windows 10.

Advertisement
free widgets for website
Continue Reading

FACEBOOK

Facebook pays 1.7 Cr fine to Russia after failing to delete content Moscow deems illegal

Published

on

By

facebook-pays-1.7-cr-fine-to-russia-after-failing-to-delete-content-moscow-deems-illegal

In the latest legal tussle with Russia over controversial social media regulation laws, Facebook paid 17 million roubles (Rs 1.7 Crore) for failing to remove content deemed illegal by Moscow. With a threat of potential larger fines looming, Facebook parent company Meta, owned by Mark Zuckerberg, is scheduled to face court next week over repeated violations of Russian legislation on content, Interfax News Agency reported. As per the latest updates, the social media giant could be fined a percentage of its annual revenue.

In October, Moscow sent state bailiffs to enforce the collection of 17 million roubles. Meanwhile, as per Interfax report citing a federal bailiffs’ database, on Sunday, there were more enforcement proceedings against the company. Apart from the popular social media app, Telegram has also paid 15 million roubles in fines for failing to comply with the Russian social media legislations that came into force in 2016.

Facebook pays $53k to Russia for refusing controversial social media laws

It is pertinent to mention that Facebook has locked horns with Moscow earlier in November, resulting in it paying 4 million roubles ($53,000) over its refusal to adhere to Russian data localisation laws, the Moscow Times reported. The Moscow court on November 25 had said that Facebook paid the fine levied in February, following which all proceedings against the US-based social media giant. The payment comes against the litigation filed against the company in 2018, alongside Twitter. The tech companies were also forced to pay an additional 3000 rubles ($40) for failing to comply with user data sharing rules as per the law. The Russian authorities have also previously blocked LinkedIn, owned by Microsoft, for failing to abide by the laws.

See also  Conservatives unfriend Facebook

Russian social media laws

As per Moscow Times, under the Russian social media regulation laws, all foreign technology companies are required to store data related to Russian customers and users on servers located in Russia. Additionally, the Russian tech companies will also have to share encryption data with the federal authorities as well as record user calls, messages and civil society group conversation records. The apparatus is said to be a severe breach of privacy rights and unfettered back-door access to personal data that could be used to harass Kremlin critics.

Continue Reading

FACEBOOK

Facebook Messenger Is Launching a Split Payments Feature for Users to Quickly Share Expenses

Published

on

By

Facebook Messenger Is Launching a Split Payments Feature for Users to Quickly Share Expenses

Meta has announced the arrival of a new Split Payments feature in Facebook Messenger. This feature, as the name suggests, will let you calculate and split expenses with others right from Facebook Messenger. This feature essentially looks to bring an easier method to share the cost of bills and expenses — for example, splitting a dinner bill with friends. Using this new Split Payment feature, Facebook Messenger users will be able to split bills evenly or modify the contribution for each individual, including their own.

The company took to its blog post to announce the new Split Payment feature in Facebook Messenger. 9to5Mac reports that this new bill splitting feature is still in beta and will be exclusive to US users at first. The rollout will begin early next week. As mentioned, it will help users share the cost of bills, expenses, and payments. This feature is especially useful for those who share an apartment and need to split the monthly rent and other expenses with their mates. It could also come handy at a group dinner with many people.

With Split Payments, users can add the number of people the expense needs to be divided with and, by default, the amount entered will be divided in equal parts. A user can also modify each person’s contribution including their own. To use Split Payments, click the Get Started button in a group chat or the Payments Hub in Messenger. Users can modify the contribution in the Split Payments option and send a notification to all the users who need to make payments. After entering a personalised message and confirming your Facebook Pay details, the request will be sent and viewable in the group chat thread.

See also  Conservatives unfriend Facebook

Once someone has made the payment, you can mark their transaction as ‘completed’. The Split Payment feature will automatically take into account your share as well and calculate the amount owed accordingly.


For the latest tech news and reviews, follow Gadgets 360 on Twitter, Facebook, and Google News. For the latest videos on gadgets and tech, subscribe to our YouTube channel.

Advertisement
free widgets for website

Tasneem Akolawala is a Senior Reporter for Gadgets 360. Her reporting expertise encompasses smartphones, wearables, apps, social media, and the overall tech industry. She reports out of Mumbai, and also writes about the ups and downs in the Indian telecom sector. Tasneem can be reached on Twitter at @MuteRiot, and leads, tips, and releases can be sent to tasneema@ndtv.com.

Continue Reading

Trending