Discover our selection of articles on general AI updates and advancement..
GPT-Neo, GPT-3’s open-source sibling
12th April 2021
In 2020, OpenAI released the famous, massive, NLP algorithm called GPT-3. It is an autoregressive language model that uses deep learning to produce jarringly human-like text, datasets, and is able to perform translation, question-answering, as well as several tasks that require on-the-fly reasoning, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. From there, many spin-offs have been created, such as DALL E which creates images from text captions. However, gaining commercial access to this product requires a paid license and massive computing resources and power.
Eleuther is an open-source alternative to the GPT-3 project. Its model is still some way from matching the full capabilities of the latter, but last week the researchers released a new version of their model, called GPT-Neo, which is about as powerful as the least sophisticated version of GPT-3. In today’s race to bigger, better NLP models, the democratization of AI models could further accelerate the path to progress and reduce the chance of AI being solely reserved for high-tech companies. Additionally, the engineers at Eleuther have devised a way to make use of spare cloud computing resources to keep the computational power accessible for all. Another interesting note is that the dataset that Eleuther is using is more diverse than GPT-3, and it avoids some sources such as Reddit that are more likely to include dubious material.
A thought-provoking matter: AI advances in emotion detection
9th April 2021
Queen Mary University
We used to think that our thoughts are the one place where we could truly be private: after all, mind-reading is fiction - or is it? Researchers at Queen Mary University in London have successfully leveraged a deep neural network that can determine a person’s emotional state by analyzing wireless signals that are used like radar. In their article, they describe how radio waves can be used to analyse the breathing and heart rates of a person, even in the absence of any other visual cues, such as facial expressions. This could then be scaled to infer emotion in large gatherings, for example at work, and collect information on how people react to different activities.
Although this breakthrough is fascinating, it raises some important questions around ethics, privacy and morals: is it right to infer the mood of an individual, let alone large groups, and act solely based on those subjective concepts? Can one truly categorize a state of mind, or is it a continuous, dynamic spectrum?
Self supervised learning, the dark matter of intelligence
24th March 2021
Supervised learning, or training AI on already labelled data, has been one of the most popular paradigms in machine learning for many years. It has proven to be extremely useful, however, there is a limit to this: in real life, not everything can be labelled, and the latter is a resource-intensive, tedious task prone to error and bias.
We, humans, learn differently: through observation, inference and common sense. This final element has stumped machine learning engineers who are still trying to find ways to translate these concepts, and thus consider it as the “dark matter of AI”. For instance, children can recognize images of an animal after being shown very few examples and then recognize them in images, yet AI models require hundreds of images and still be likely to misclassify images of cows with horses. This is because, in short, humans rely on their previously acquired background knowledge of how the world works.
One way of leveraging this in AI would be through self-supervised learning. FacebookAI describes it as follows: “Self-supervised learning obtains supervisory signals from the data itself, often leveraging the underlying structure in the data. The general technique of self-supervised learning is to predict any unobserved or hidden part (or property) of the input from any observed or unhidden part of the input. For example, as is common in NLP, we can hide part of a sentence and predict the hidden words from the remaining words. We can also predict past or future frames in a video (hidden data) from current ones (observed data). Since self-supervised learning uses the structure of the data itself, it can make use of a variety of supervisory signals across co-occurring modalities (e.g., video and audio) and across large data sets — all without relying on labels.”
You’ve probably heard this too many times to count: the past year has been like no other. Yet, technology and progress have still continued to improve, and are still set on redefining our future in all fields, from healthcare to finance. With selections such as data trusts or messenger RNA vaccines, the editors of the MIT Technology Review have compiled a list of 10 technologies, each with corresponding featured articles, which they believe have the potential to change our lives this year.
A couple of notable ones related to AI:
GPT-3: some of the world’s largest family of Natural Language-based computer models that can generate images, autocomplete sentences and even invent entire short stories that rival those of a human author.
Multi-skilled AI: even today, most models and robots are only able to complete tasks that they have been trained to do explicitly, or solve problems they have encountered before. Transfer learning is still in its infancy and required specific calibrations. This is in clear contrast to us humans, who have been able to adapt and transfer skills from one domain to another with ease. A key breakthrough as of late has been AI models that are able to combine different senses, for example, computer vision with audio recognition, making them better suited to understand their environment and interact with us.
When using AI models for image processing, often they will be pre-trained on general image datasets sourced from the internet. This can be useful to help improve performance and speed up training time by essentially grabbing an “off-the-shelf” model to then tailor to a specific type of image or theme. However, there are a few issues with this: for example, labels can contain biased, stereotypical or even racist words, and gathering data firsthand is both costly and time-intensive.
Thus, being able to use computer-generated datasets for pre-training is an emerging trend likely to gain momentum in the future. Recently, Japanese researchers have come up with FractalDB, a database containing “an endless number of computer-generated fractals”. Since fractals are pervasive throughout nature, in places ranging from snowflakes to trees, these abstract fractals have been grouped and used as a pre-training dataset for convolutional neural networks (CNNs). The result? The performance was nearly as good as models pre-trained on actual images from state-of-the-art datasets. However, caution is still needed: abstract images have been shown to possibly confuse image recognition systems, so at this stage, computer-generated images are not yet completely able to replace manual ones.
As you might have heard, on February 18th, NASA’s Perseverance robot made its triumphant landing on the red planet to begin searching for traces of life. With cutting-edge equipment and hardware, this robot is undoubtedly state-of-the-art - but its software is not to be underestimated either. Perseverance has the most AI capabilities of any rover, which has been essential in ensuring everything runs as smoothly as possible during its mission. For example, in helping Perseverance land safely with little information about local terrain millions of kilometres away. In contrast to its predecessor Curiosity, Perseverance was able to land in a more dangerous location, Jezero Crater, largely thanks to the fact that “if it recognizes it's coming down on a place that's not safe, it will autonomously steer during its supersonic descending-to-zero speed descent to Mars”, as explained by Raymond Francis, a science operations engineer at NASA’s Jet Propulsion Laboratory.
Furthermore, other applications of AI include use in targeting instruments and improved autonomous navigation. Due to the distance, information takes up to 40 minutes to travel between the rover and ground control, and time is of the essence. Therefore, when Perseverance travels to unknown locations, it will use AI to decide autonomously which are the best areas and rocks to investigate rather than waiting until humans are able to see these locations and give instructions, which emphasizes the crucial role AI plays in the mission. Francis adds: “Space missions to outer planets or harsh environments are going to depend on AI-based autonomy more and more”.
To create functional robots, the algorithms behind them should be able to navigate through complex environments with various physical obstacles and structures. However, to do so, large image datasets are required but must be tailored to the physical properties of the robot. Take for example the popular Roomba (an automatic vacuum cleaner): traditionally, images would be taken from the vantage point of the robot in various different environments and “stitched back” manually to create a virtual layout of an interior, as images taken at human height failed to produce satisfying results. However, this sort of manual capture is inefficient and incredibly slow.
Researchers at the University of Texas at Austin are looking to take advantage of a form of deep learning known as “generative adversarial networks, or GANs, where two neural networks contest with each other in a game until the 'generator' of new data can fool a 'discriminator.'”.
This would enable the generation of any sort of possible environments, able to be tweaked to the user’s preferences, and that the robot could use to recognize and detect objects and obstacles. Mohammad Samiul Arshad, a graduate involved in the research, adds: "Manually designing these objects would take a huge amount of resources and hours of human labour while, if trained properly, the generative networks can make them in seconds."
In recent years, AI models have increasingly been replacing quantifiable, formulaic and repetitive tasks “susceptible to analysis and reproduction by machine learning systems”. However, does this also apply to disciplines combining both design analysis and artistic expression, such as architecture?
Some uses include semi-automated design generation, composing both internal and external spaces, and tools which are based on neural networks that gradually learn an architect’s habits and preferences when generating designs over time and gradually adapts its processes and methods to better suit those (an example of which is Finch).
Furthermore, with the popularization of GPUs and image recognition, research is underway in a new field called “architectural biometrics”, initially stemming from a glitch where facial recognition software tended to confuse patterns in buildings with faces. This may lead us to better understand “the anthropomorphic aspect of the way that humans create and relate to buildings”.
The range of applications is vast and diverse, yet architects are, according to The Economist, some of the people least likely to be replaced by AI in the future. This is because it also incorporates social, cultural and even political elements, all of which are difficult to capture with only calculations and represent more concrete concepts. Yet the above also demonstrates that AI can be a powerful tool to help simplify and enhance the work of architects, showing how there is a potential for collaboration between AI and the human mind rather than a battle for supremacy.
“Give me an illustration of a baby daikon radish in a tutu walking a dog.” This might not be the first thing you’d like to ask an AI model, but OpenAI’s latest innovation, DALL·E, sure is capable of doing it. Using a “12-billion parameter version of GPT-3”, and trained on image-text pairs, this model takes in any regular sentence, it is able to generate a plausible image to fit the description. On a more technical level, it is a decoder-only transformer that receives both the text and the image as a single stream of 1280 tokens.
A fascinating feature that the DALL·E also displayed some capacity for “zero-shot reasoning”: essentially, it was able to apply image transformations based on textual instructions, for instance, “the exact same cat on top as a sketch on the bottom” (possibly useful in illustration and product design), with an image prompt. More simple transformations such as a photo coloured pink were performed with higher accuracy.
Furthermore, DALL·E also displayed, to some extent, geographic and temporal knowledge: you could ask it for a picture of a phone from the ’20s, or an image of food from China. However, it tended to show superficial stereotypes for choices like “food” and “wildlife,” rather than fully representing the real-life diversity of these themes.
There may be several reasons why a top-quality machine learning model, with high accuracy and precision in testing conditions, performs less than ideally in real-life situations. For example, the training and testing data may not match reality - this is known as data mismatch. Now, researchers at Google have brought to light another phenomenon that causes this gap in performance: underspecification.
From Natural Language Processing to disease prediction, underspecification is a common, and arguably one of the most significant issues of modern machine learning training. Essentially, the same models trained on the same dataset can have infinitesimal variations although they all pass the testing phase. This can be caused by random assignment of variables, for example. However, these small differences can make or break the model’s performance in the real world - in other words, we may be too lax with the testing criteria of our models, but even we, as human beings, cannot know which version would perform better.
The key here is to find a way to better specify our requirements to models and is essential in order for us to regain trust in the beneficial impact AI could have outside the labs.
Neuromorphic computing - computing on architecture inspired by the brain rather than traditional von Neumann architecture - promises to usher in a new age of computing, and with it the means to develop improved forms of artificial intelligence.
Many experimental platforms for neuromorphic computing have been developed, but thus far have been limited by their lack of interoperability - no one algorithm can run on all neuromorphic architectures.
Zhang et al provide a solution to this problem in the form of a generalised system hierarchy comparable to the current Turing completeness-based hierarchy, but instead proposing 'neuromorphic completeness'.
The authors show that this hierarchy can support programming-language portability, hardware completeness and compilation feasibility. They hope that it will increase the efficiency and compatibility of brain-inspired systems, even quickening the pace toward development of artificial general intelligence.
From uncertain to unequivocal, deep learning’s future according to AI pioneer Geoff Hinton.
21st November 2020
MIT Technology Review
A decade ago, artificial intelligence was an obscure idea, and the former truly began its revolution only recently. Geoffrey Hinton, one of the winners of the Turing award last year for his foundational work in this field, talks about his thoughts on how AI and deep learning will develop in the next few years:
On the AI field’s gaps: "There’s going to have to be quite a few conceptual breakthroughs...we also need a massive increase in scale."
On neural networks’ weaknesses: "Neural nets are surprisingly good at dealing with a rather small amount of data, with a huge numbers of parameters, but people are even better."
On how our brains work: "What’s inside the brain is these big vectors of neural activity."
A common view in AI and data science is “the larger the dataset, the better the model”. In contrast, once a human recognises something, that’s it - no need to make them look at thousands of other pictures of the same object. Similarly, children learn in the same way, and more usefully, they don’t need to see an image to recognize it: if we tell them a persimmon is like an orange tomato, they’d instantly be able to locate one if they see it in the future.
What if AI models could do the same? Researchers at the University of Waterloo (Ontario) have come up with a technique. Dubbed as the “less-than-one-shot” learning (LO-shot), this would allow models to recognize more objects than the number it was trained on. This is achieved by condensing large datasets into much smaller ones, optimised to contain the same amount of information as the original. With time, this could prove to be a groundbreaking technique which could save millions in data acquisition.
Defining key terms in AI in 1 page: informative and suitable for non-specialists
27th October 2020
Stanford Institute for Human-
Centered Artificial Intelligence
Do you really understand the core concepts in the field of artificial intelligence in the current era when 'AI' is everywhere?
Just now, Christopher Manning - Prof. CS & Linguistics at Stanford University, director of the Stanford Artificial Intelligence Laboratory (SAIL), and Associate Director of HAI - used one page to define the key terms in AI. He expressed the hope that these definitions can help non-specialists understand AI.