An interview with Rumen Dangovksi: On AI applications and advancements
Updated: Jul 3
Rumen Dangovski (http://super-ms.mit.edu/rumen.html) is a PhD student at the Physics for AI Research at MIT. Some of his interests include self-supervised and meta learning, and improving machine learning through principles from fundamental science - especially physics.
He has previously worked at a start-up company centred on developing light-based computer chips for AI applications (click here to learn more about that), and has kindly accepted to have a short talk with me about his thoughts and experience in the field of AI - here is our exclusive interview with him as he sheds light on some of the most exciting advancements in this rapidly growing field.
A big thank you to Rumen for his time and expertise, and also to Kamen, our Head of Project Teams and ML Officer here at UCL AI Society, for putting me in touch with him!
[Editor’s note: the transcript below has been edited for length and clarity]
First of all, regarding this emerging technology of light-based computer chips: how would you explain this concept to someone with little background knowledge in the field?
As a disclaimer, I worked on this very briefly - I was at Lightelligence (https://www.lightelligence.ai/technology), which is a startup company that works in the field. So to explain, with light, you can encode information in the same way you encode information with digital electronics. Basically, you can think of light as a wave: you can modulate its amplitude and its phase. You can control this in a programmable way using a chip, and choose how to transform that information so that you can make computations.
My understanding is that people tried to do this back in the ’80s. They wanted to create a general-purpose computer that uses light for computations but found that it was very difficult to beat digital electronics. In the last few years, what was truly revolutionary was that people realized you don’t need to do general computations - instead you can focus on linear transformations, using light very efficiently. That came at the right time, because deep learning is a now growing field, and most of the computations required for deep learning and neural networks are linear transformations.
So from my understanding, these chips are specifically designed for AI and deep learning applications?
Yes. You have many different kinds of chips: for example, you have CPUs (Central Processing Unit), which have an architecture designed for all-purpose computer applications. But if you know your algorithms, you can design specific chips, for example, GPUs (Graphical Processing Unit), which were initially designed for games but were found to perform linear transformations very rapidly. You can repurpose these for AI applications. Recently, Google created their own chip, the TPU (Tensor Processing Unit), specifically for tensor transformations. These are very efficient - they call them “AI accelerators”. You can think of these chips that use light for computations as competitors of GPUs and TPUs: essentially, people are trying to create the next generation of chips to use for AI applications. And there are certain benefits to them: for example, you get large orders of magnitude of improvement in speed and lower energy consumption.
Super interesting! And how close are we to being able to implement these in real life?
*laughs* I’m not an expert so I don’t know what the current state is, but I wouldn’t be surprised if in a couple of years we would be able to start using such chips and commercialize them.
So do you think big data centres are a potential field of application for these chips, given the lower energy consumption?
Yes. Also, people are creating bigger and bigger models, like GPT-3, which have billions of parameters. If you want to run a single task with these neural networks, you need very complicated and costly computations. So making them more efficient is what we are striving to do with these chips.
So these light-based chips are definitely relevant in today’s context. On a broader note, are there any other advancements or new technologies in AI that particularly interest you?
More generally, I’m very interested in self-supervised learning. I’m very new in this field but very excited about it. To me, what’s really interesting is that if you wanted to train a neural network, the original paradigm and the most successful applications were seen when the dataset is labelled and you can perform supervised practice, by using the labels to configure the parameters of the network. For example, you have the ImageNet dataset, a well-curated dataset with more than a million labels, which you can use to start to train a neural network and achieve superhuman performance. However, labelling data is costly - on the other hand, we have an abundance of data out there that is unlabelled, but that you can get from the Internet, for instance. The question is - can we use that data to build better applications, even though we don’t have the labels?
Some people call this unsupervised learning, but I’m specifically excited about self-supervised learning, where you come up with tasks that don't require human augmentation, and those tasks are useful to train a neural network to extract meaningful features from the data that we have. We've seen applications of this in language, with models like BERT or GPT-3 where the task there is to read. In the case of BERT, it is to mask portions of the text and try to predict those from the context.
Recently, these advances have been working very well for computer vision too. There is something called contrastive learning: for example, if I look at a pencil, I know it is going to be the same pencil no matter which point of view I am looking at it from. Even if I don’t know it is a pencil, I can still learn useful things: I can tell my neural network to assign the same features to the same representation, no matter what the view of this pencil is. I can also take two representations of an image, by taking an image and augmenting it - by rotating, changing the colour scale or cropping and resizing.
So now I have an image and its transformation, but I also have representations, from my neural network. What I can say is that although I might not know what’s on this image, I know these two versions come from the same image, so I want their representations to be close to each other. And if I have two different objects, in contrast, I would want their representations from the neural network to be different from each other.
If you’re interested, there is this model named CLIP, by OpenAI for example, that brings together images and captions [Editor’s Note: https://openai.com/blog/clip/].
We began by talking about hardware, for example with the light chips, and then we moved on to more of the software side. I was wondering what you thought about those two aspects: do people overlook one side thinking one is more important than the other? Do you believe they should be on equal footing?
There is this concept called co-design, where you are codesigning software and hardware. I think this is especially important for AI, because if you want to create a new algorithm, or just construct a new model, then you need to think about your hardware. And when you are creating new hardware, you have to think about what kinds of algorithms you can do with that. So instead of thinking about whether one is more important than the other, we should think of it in a co-designing fashion.
We see this type of co-design in TPUs, with light chips by Lightelligence and Lightmatter… Now, for AI, we need to think about how to create hardware that is useful for our algorithms but also what kind of algorithms we can run on this hardware - it’s a symbiotic relationship.
Finally, would you mind sharing a little about your educational background, your career path up till now and how you ended up in your position?
Yeah! When I began undergraduate education, I was mainly doing pure mathematics. Then I discovered that physics is super cool. *laughs* And I really enjoyed the people in the department and decided to continue. I think machine learning is very exciting because you can apply concepts from fundamental science to make applications. So the general direction of my work is to take some fundamental concepts from math and physics and try to make cool applications of those things, and improved AI applications. I like building from first principles, but then I also care about solving problems that are useful to the community, not necessarily only to the scientific world. I want a lot of people to benefit from the contributions that we make, this is what kind of led me to research in this direction. [Editor’s note: more on his work here https://www.researchgate.net/profile/Rumen-Dangovski ]
I’ve had several other projects. For example, I was designing new types of recurrent neural networks that can process information with long term memory and associated memory, that were useful for applications in text memory summarization. With NLP (Natural Language Processing), I’ve had a project recently where we cared about more of the application side of summarization, with reconstructed data from ScienceDaily (a media outlet that talks about press releases in research). We created a dataset of scientific papers and their corresponding press releases in ScienceDaily. And we wanted to see if we could create a neural network that can generate press releases from scientific work, framing that problem as summarization - abstractive summarization.
The field is advancing so fast, I think it’s such an exciting place to be in right now. Thank you so much for taking the time to answer all of my questions!
Rumen had also provided some great resources if you are interested in further expanding your knowledge in the field!
Slides on contrastive learning: https://github.com/AI-Club-at-MIT/Reading-Groups (dated 4/7/2020).
Self-supervised learning: https://ai.facebook.com/blog/self-supervised-learning-the-dark-matter-of-intelligence/