Cerebras sets record for largest AI model on a single chip • The Register

US hardware startup Cerebras claims to have trained the largest AI model on a single device, powered by the world’s largest platter-sized Wafer Scale Engine 2 chip.
“Cerebras Software Platform (CSoft) allows our customers to easily train state-of-the-art GPT language models (such as GPT-3 and GPT-J) with up to 20 billion parameters on a single CS-2 system. ‘ the company claimed this week. “These models run on a single CS-2 and set up in minutes, and users can quickly switch between models with just a few keystrokes.”
The CS-2 has a whopping 850,000 cores and 40GB of on-chip memory that can achieve 20PB/s of memory bandwidth. The specs of other types of AI accelerators and GPUs pale in comparison, meaning machine learning engineers must train huge AI models with billions of parameters on more servers.
Even though Cerebras has obviously managed to train the largest model on a single device, it will still be difficult to attract large AI customers. The largest neural network systems today contain hundreds of billions to trillions of parameters. In reality, many more CS-2 systems would be required to train these models.
Machine learning engineers are likely to face similar challenges as they already have when distributing training across numerous machines with GPUs or TPUs – so why switch to a less familiar hardware system that doesn’t offer as much software support?
Surprise, surprise: robot trained on internet data was racist, sexist
A robot that was trained with an incorrect data set from the Internet showed racist and sexist behavior in an experiment.
Researchers from Johns Hopkins University, Georgia Institute of Technology and the University of Washington instructed a robot to place blocks in a box. The blocks were pasted with pictures of human faces. The robot received instructions to put the pad, which it mistook for a doctor, housewife or criminal, into a colored box.
The robot was powered by a CLIP-based computer vision model commonly used in text-to-image systems. These models are trained to learn the visual association of an object with its word description. With a caption, it can then generate an image that goes with the sentence. Unfortunately, these models often exhibit the same biases found in their training data.
For example, the robot identified blocks with women’s faces as housewives, or associating black faces with criminals rather than white men. The device also appeared to favor women and those with darker skin less than white and Asian men. Although the research is just an experiment, using robots trained on flawed data could have real-life consequences.
“In a home, if a child asks for the beautiful doll, the robot might pick up the white doll,” said Vicky Zeng, a graduate student studying computer science at Johns Hopkins. “Or maybe in a warehouse with a lot of products with models on the box, you could imagine the robot reaching for the products with white faces more often.”
Largest published open source language model
Russian Internet company Yandex this week published the code for a language model with 100 billion parameters.
The system, called YaLM, was trained with 1.7 TB of text data from the Internet and required 800 Nvidia A100 GPUs for the calculation. Interestingly, the code was released under the Apache 2.0 license, which means the model can be used for research and commercial purposes.
Academics and developers have welcomed efforts to replicate and open source large-scale language models. Building these systems is difficult, and typically only large tech companies have the resources and expertise to develop them. They are often copyrighted and difficult to study without access.
“We truly believe that global technological advancement is only possible through cooperation,” said a spokesman for Yandex The registry. “Big tech companies owe a lot to the candid results of researchers. However, in recent years, cutting-edge NLP technologies, including large language models, have become inaccessible to the scientific community as resources are only available for training to Big Tech.”
“Researchers and developers around the world need access to these solutions. Without new research, growth will slow down. The only way to avoid this is to share best practices with the community. By sharing our language model, we support the pace of development of global NLP.”
Instagram uses AI to verify users’ ages
Instagram’s parent company, Meta, is testing new methods to verify that its users are 18 or older, including using AI to analyze photos.
Research and anecdotal evidence has shown that social media use can be harmful for children and young teens. Users on Instagram provide their date of birth to confirm they are old enough to use the app. You must be at least 13 years old, and there are additional restrictions for those under 18.
Now, parent company Meta is trying three different ways to verify that someone is over 18 when changing their date of birth.
“If someone tries to change their date of birth on Instagram from under 18 to 18 or older, we require them to verify their age using one of three options: upload their ID, take a video selfie, or ask mutual friends their age review,” the company announced this week.
Meta said it has partnered with Yoti, a digital identity platform, to analyze people’s ages. Video selfie images are examined by Yoti’s software to predict a person’s age. Meta said Yoti uses a “dataset of anonymous images of different people from around the world.”
GPT-4chan was a bad idea, researchers say
Hundreds of academics have signed a letter condemning GPT-4chan, the AI language model trained on over 130 million posts on the notoriously toxic internet message board 4chan.
“Large language models, and more generally base models, are powerful technologies that carry a potential risk of significant harm,” began the letter, led by two Stanford University professors. “Unfortunately, we, the AI community, currently lack common norms for their responsible development and use. Still, it is imperative for members of the AI community to condemn clearly irresponsible practices.”
These types of systems are trained on large amounts of text and learn to mimic the data. Feed GPT-4chan what appears to be a conversation between netizens and it will continue to add more fake gossip to the mix. 4chan is notorious for its relaxed content moderation rules – users are anonymous and can post anything as long as it’s not illegal. Unsurprisingly, GPT-4chan also started spewing text with similar toxicity and content scores. When it was released on 4chan, some users were unsure whether it was a bot or not.
Now experts have criticized its creator, YouTuber Yannic Kilcher, for irresponsibly using the model. “One can imagine a reasonable case of training a language model on toxic language – for example, to detect and understand toxicity on the internet, or for general analysis. However, Kilcher’s decision to use this bot does not satisfy a sanity check. His actions deserve blame. It undermines the responsible practice of AI science,” the letter concluded. ®
https://www.theregister.com/2022/06/27/in_brief_ai/ Cerebras sets record for largest AI model on a single chip • The Register