Explore the past, present, and future of artificial intelligence in episode 16 of Hidden Layers, where Ron chats with Dr. Raymond Mooney, a luminary in the AI field.
Delve into Dr. Mooney's vast experience and witness his firsthand account of AI’s transformative journey. From the rule-based systems of the '80s to the groundbreaking developments in machine learning and natural language processing, Dr. Mooney offers invaluable insights into the evolution of AI technologies.
This episode not only traces significant milestones but also contemplates the ethical implications and future potential of AI. Tune in for an enlightening conversation that will deepen your understanding of AI's impact on society and inspire curiosity about what lies ahead.
Ron Green: Welcome to Hidden layers, where we explore the people and the tech behind artificial intelligence. I'm your host, Ron Green, and I'm delighted and honored to have Dr. Raymond Mooney joining me today. Dr. Mooney is a luminary within the field of artificial intelligence. He's been at the forefront of its evolution where he's witnessed and contributed some of its most significant transformations. Today, we're gonna discuss some of the enormous changes he's witnessed within the field over his career, reflect on the shifting challenges of the field and discuss the future trajectory of AI in research, education, and its impact on society. Dr. Raymond J. Mooney is a professor in the Department of Computer Science at the University of Texas at Austin. He received his PhD in 1988 from the University of Illinois at Urbana -Champaign. He's an author of over 200 published research papers, primarily in the areas of machine learning and natural language processing. He was the president of the International Machine Learning Society from 2008 to 2011, program co -chair of the Association for the Advancement of Artificial Intelligence, general chair for the Human Language Technology Conference on empirical methods and natural language processing, and co -chair of the International Conference on Machine Learning. He's a fellow of the AAAI, ACM, and ACL, and his work has been recognized with multiple prestigious awards, including a classic paper award and several best paper awards at leading conferences across artificial intelligence, machine learning, and computational linguistics. Ray, thank you so much for joining us today. I'm really excited about this conversation.
Dr. Raymond Mooney: Okay, well it's great to be here to talk about the amazing changes that have been happening in an AI lately.
Ron Green: Yeah, it's really remarkable. Let's start off. Let's go all the way back to the beginning. I know that you became interested in artificial intelligence back when you were a senior in high school. It was a very small field back then. What was it that attracted you to AI in the first place?
Dr. Raymond Mooney: Yeah, so I think like a lot of nerd kids, I was into watching a lot of science fiction, both reading and watching TV. And I guess sometimes I always mentioned lost in space, right, for those who remember that from the 60s. And there was Will Robinson, and he had his robot, you know, who was like, warning, warning, Will Robinson, aliens approaching, you know. So, you know, I was a sci -fi kid, and I always wanted a robot. And so it was a natural attraction to me. You know, I read a lot of science fiction, you know, Arthur Clarke, Asimov, Heinlein, you know, all the, you know, Herbert, of course, Dune is, I'm waiting to see Dune 2, right?
Ron Green: Oh, I just saw it. Fantastic.
Dr. Raymond Mooney: Oh, okay. And so, yeah, I was a sci -fi geek kid. And so it was just obviously a topic that attracted me. And then, so I was actually took a program in class in senior and high school, which I guess at that point was 1979, wasn't that common. It was on a teletype with paper tape. It was any idea what that means. And you had a dial -up modem, you know. Yeah. And that's where I started.
Ron Green: Okay. That's fantastic. The field has really evolved over your tenure within it. And part of the reason I'm so excited to chat with you is I don't get the honor of talking with people who've worked in AI very often who've been longer than me even. So what are some of the biggest changes when you look back, like when you think back about the 80s and 90s, what were the biggest changes over those couple of decades that you remember either being excited about or thinking we're maybe movements in the wrong direction?
Dr. Raymond Mooney: Yeah, so again, AI has gone through a lot of different eras in its history. And so the era I grew up in is sometimes called like the rule based or logic based or expert systems era of the 1980s, where the way you built AI systems was, you know, human engineers sat down and tried to write rules in a logical language that describe the knowledge of the domain, say you're doing medical diagnosis, you know, the AI expert would talk to a doctor and say, Okay, what symptoms indicate this disease, and then you type in some logical rule, you know, establishing that and then you would run this inference engine that would use those rules to make conclusions about, say, a particular patient or a particular case. But then, you know, people started thinking, well, it's really a pain to have to hand craft these rules, can we actually learn those rules from data? So that was sort of the original origin of machine learning, at least from my perspective is how can we build algorithms that look at data and learn the rules behind it automatically from the data rather than having to have the human, so quote, knowledge engineer, you know, type in that information. And so that was the original era I was brought up. And then while I was in grad school, the neural nets became a big thing. So I think we talked before that, you know, neural nets is just a very old history, it really started in the 1950s with with Rosenblatt's perceptron algorithm from the mid to late 50s. But then it went into sort of hibernation, I guess you would call it when the rule based paradigm came up because Minsky and Papert wrote this book in the late 60s say perceptron is very limited, it can't do very many things. And they were they promoted Minsky, of course, was one of the big ones and John McCarthy, promoting the logic symbolic based approach to AI that was dominant when I started in the field. But the neural nets made a resurgence in the late 80s, though, the origin of this approach called back propagation, where you can train three layer network, one neuron, you could actually train two layers of weights with this with this algorithm called back propagation. And it was demonstrating, you know, successful performance on a number of tasks. And so that was the I would call that actually the second era of of neural networks. I totally agree. And, and that was dominant when I was sort of transitioning from graduate school into being a faculty member here at the University of Texas. And then it's sort of the limitations of those simple three layer networks came about, and it was hard to train deeper network, everyone realized even from back in the in the 60s, you know, per the perceptron book talked about how, oh, it'd be great if we could train really, really deep neural networks, but we didn't have the algorithms or nor the data needed to do that. And so the three layer network sort of era sort of plateaued. And then other techniques took over like Bayesian reasoning and probabilistic and, and so called kernel methods and support vector machines. And there we're getting our way through the 90s at that point where, you know, that was dominant for a while until our recent era of deep learning, where people just started trying to train deeper networks, some new algorithms came along to help learn deeper networks. And also, the biggest changes, lots of data, right, we now had thanks to the internet, there were tons of data out there that we could train these models on. And now it became successful and possible to train much deeper networks. And then that led to our current quote, deep learning era. And do I think there are ever mistakes along this whole route? I wouldn't say so. I think, you know, the field has tried different ideas over time, and each idea has its own, you know, strengths and weaknesses. And I think, you know, that's true of even our current deep network model today. But I think the field has been making constant progress in a lot of ways over that entire period, even though it's been a lot of bumpy, bumpy, very multi paradigm road.
Ron Green: When did you kind of shift your focus away from role-based systems? You probably started exploring machine learning a little bit. When did you feel like you were going to move really heavily towards machine learning and move away from roles -based systems?
Dr. Raymond Mooney: Yeah, so I really started as a machine learning person. So my thesis advisor, Jared Young at the University of Illinois, had done hand -built language processing systems. He was building these things called scripts, were sort of knowledge of events. So the idea was from Roger Schenk, his thesis advisor, who got his PhD here at the University of Texas in the late 60s in linguistics, was a very early influential figure in the area of natural language processing. And his students in the 70s built these systems were his hand -built knowledge bases. And Jerry, particularly my thesis advisor, built this system using scripts, which are sort of sequences of events, the classic example that Schenk always used was restaurant. So how do I understand a text about a restaurant? So the classic example, a story, you get a story like, John went into the restaurant. He ordered a steak. He left a tip and left. And then you ask the question, what did John eat? It's like, I never mentioned that he ate anything. I said he ordered a steak, and then he left a tip, and he left. But if you know about what happens in a restaurant, you know that, well, why do people go to restaurants? They don't order stuff, and then leave a tip. Of course, they ate it, because we know that as humans. But the computer doesn't know that, right? So you had to hand code this so -called script, which says, here's how a restaurant works. You go into it. You get the menu. You talk to the waitress. You order something. They bring it. You eat it. You pay. And it has all this knowledge about what happens in particular events. So for his thesis at Yale University, my advisor had handwritten these scripts with this knowledge about all these events. And he actually connected it to the API Newswire and could understand news stories, interpret it with this sequence of events about it. So he'd have like an earthquake script. And he goes, well, here's what happens in an earthquake. So then it could read an earthquake and understand it better and fill in the gaps that weren't explicitly mentioned, which is what a person does when they read something. But he was tired of building these scripts by hand. So he came to, as a faculty member to the University of Illinois, saying, I really want to be able to machine to learn these scripts automatically from data. And he took me on as a new grad student with this vision. And I tried to execute on that. And we worked on this approach called explanation -based learning. We actually learned from just one example. So I built my thesis system, learned a cons script, or so -called schema, for kidnapping by reading one story about kidnapping that was very detailed and then understood how kidnapping worked and how the actions that the people were doing helped them achieve their goals. And it would then learn a script from one example by so -called explaining that example and generalizing it into a general knowledge structure that it could use to understand other stories. So I really came into the field, thanks to my thesis advisor and his experience, thinking the right way to build language systems is we have to learn that knowledge because hand -coding it is just too onerous and too impossible to scale.
Ron Green: Okay to stick on that topic and the bridging of machine learning and natural language processing because there was a time in the early days where the machine learning community and the natural language processing community, they were basically in different universes. Machine learning was mostly being applied to things like optimization. And as you mentioned, NLP, they were principally using sort of rule -based systems. And you were instrumental in sort of advocating for the use of machine learning on natural language processing problems, which in hindsight seems very obvious, but at the time was pretty radical. You've already mentioned some of your influences, but what was it that made you confident that machine learning approaches were going to be viable within NLP?
Dr. Raymond Mooney: Yeah, so like you said, I mean, I actually worked on my thesis in the 80s on this. And so I was sort of born into that idea. But it seemed it just did intuitively appeal to me that hand coding knowledge, you know, if you've ever set down and tried to do it yourself, that's what AI students did. And in the 80s, it's painful. It's laborious. You do what you put one wrong thing in and your system makes stupid mistakes, and then you have to go back and correct it. It's like writing programs, right? You know, there's a debugging process of debugging that knowledge, so that it works for the cases you want. It just it was obvious, I think that it wouldn't scale. So the idea that machine learning was the right way to build language systems, I think was always, you know, intuitive and obvious to me, but again, because I would grew up with that with that idea. Yeah, I could in the academic, you know, process. But it really wasn't the symbolic learning systems that I was working on that took over. So in the late 80s, the use of statistical learning in language actually originally came from speech. So in the 1970s, people tried to build something to understand speech, and it really wasn't working very well. And people more from signal processing came in and brought in these ideas of sort of stochastic processes and things like hidden Markov models, and statistical methods. Now we talk about language models. Now language models originally with these things called n -gram language models, where you just gather statistics on how likely does one word follow a previous two words. And there had this thing called trigram, which is like state of the art technology in the 80s or something, where you say, let's collect a bunch of statistics on how likely each word follows any pair of two previous words. Because just to do trigrams, three -letter combination, three -word combinations, you needed quite a bit of data just to get good statistics on that, much less generalizing to 4 grams or 5 grams. And the amount of data you need is so -called, you know, curse of dimensionality, right? The amount of data you need is you can grow that context grows exponentially. So you need so much data just to do trigrams to go to even bigger context, you know, was sort of infeasible. And so those statistical learning techniques that started in speech started to come into into NLP, particularly started with Ken Church did this work on doing part of speech tagging, which is a classic NLP problem where you just want to recognize for each word, what the part of speech is, is it a noun? Is it a verb? Is it a preposition? Is it an adverb? And that actually turns out to be sort of tricky, because language is ambiguous. So you know, the same word could be a noun, or it could be a verb depending on the context. So people started developing statistical techniques for that, like hidden Markov models. And then it went to probabilistic context free grammars, which is is taking that to the next level, making a more complicated model of language beyond a finite state machine to a context free grammar, for those of you who know, you know, those concepts from basic computer science. And so more statistical learning techniques became very popular in the 90s in natural language processing, a very famous case was IBM developed this system called Candide that did machine translation by training on just huge amounts of so called parallel texts. And they did this between French and English, because there happened to be a lot of publicly available electronic data between French and English, thanks to the Canadian Parliament proceedings, because Canada by law, of course, had to have all their legal documents in both English and French. And this was all publicly available, which having a publicly available IP available text electronic text in the 90s was not easy. So the Canadian government happened to nicely have this nice available resource and a lot of texts in both English and French. And that was used as called that with the Hansard's or whatever, was used to train the first statistical machine translation system at IBM in the early 90s. And it was remarkably successful, people were trying to build hand built rule based machine translation system for for decades, starting from the 50s never really worked that well. But now if we just train on a bunch of parallel texts and build a statistical model that recognizes the statistical patterns that connect English and French, it worked remarkably well. And that started this whole so called statistical revolution and NLP, where all those statistical learning techniques took over. And I was sort of a little bit along for the ride there, because I was coming from a slightly different angle of coming more from a symbolic AI background. But then I said, Well, you know, these statistical learning techniques are powerful too. And so I drifted in more into statistical learning in the 90s.
Ron Green: Right. It reminds me of that old joke, something effective. They were trying to build some NOP system. And the program manager said, every time I fire a linguist, our performance goes up.
Dr. Raymond Mooney: Yeah, that's Fred Javanek.
Ron Green: He's great.
Dr. Raymond Mooney: He's famous for this comedy.
Ron Green: Love that, I love that one.
Dr. Raymond Mooney: Okay, so- Someone later wrote a paper, every time I hire a linguist, my performance goes up.
Ron Green: I know. I don't mean to insult the linguists out there. Okay. That's a perfect segue. I want to talk about the state of NLP now because we've seen some radical changes. I would argue the most change -rich period of AI within our lifetimes. And we've got things like large language models now, which are, in my opinion, performing what would have been miracles back in the 90s in our perceptions. And they're really based upon what are fairly simple tasks, right? Token distribution prediction. But falling out of that, we get some pretty amazing emergent behavior. Are you surprised by the performance we're seeing in large language models right now? Yeah.
Dr. Raymond Mooney: I mean, I think most people in the field, particularly in came out of the more traditional NLP community are quite surprised at how well things like large language models based on transformers are working. And I think a lot of that is data, you know, when people talk to me about, well, why is deep learning successful, I always say the first reason is data, data, data. So even in the early 90s, I did experiments comparing symbolic learning techniques to neural net learning techniques in the early 90s, and found the algorithm really didn't matter that much. What mattered was how much data you train these models on. So I was always a believer that data matters more than anything else. And so now, you know, thanks to the internet, we just have just incredible amounts of data that that, you know, that these large tech companies can just download off the internet and train these absolutely by at least certainly by my standards, humongous models trained on just incredible amounts of data. And it's amazing if you train a very large neural network on an incredible amount of data, it is amazing what what it can do. And I am serious, I've been seriously, you know, surprised by the amount of progress in the field in the last couple years. Just to temper that a bit, though, is, you know, these these models still can make dumb mistakes. So one of the lines I use now when I talk about large language models, I came up with a couple years ago, I said, sometimes large language models surprise you how smart they are. But sometimes they surprise you how stupid they are, because every once in a while, they'll just make the stupidest mistake and say the stupidest thing. Because, you know, these statistical patterns, they work amazingly well in a lot of cases. But sometimes you realize they're not really capturing the deep knowledge of an understanding of the text or the area that humans have. And so they still have serious limitations. And those limitations are just becoming less and less and less prominent, but they're still there.
Ron Green: Do you think that we have pushed this approach basically as far as we can go, or do you think there's a lot of opportunity still to either increase the size of the models or add more data and get better performance?
Dr. Raymond Mooney: So data. I don't know where we get more data. My understanding is these companies are basically scraping up every bit of publicly available data on the internet. So I don't see where. So one line that people use is you know, it's all about scale and someone publishes paper scale is all you need. There's this meme because of the original transformer papers called attention all you need became this meme of something is all you need. So but I don't believe scale is all you need. Scale is very important. I've been very clear data is very important. One other line I always taught my students was there's no data like more data, you know, the more data you can get to train your model, that's always a good thing. But now it seems like we maxed out on data. But we were talking that, you know, transformers are an interesting model, this idea of self attention has been amazingly powerful, but they're fundamentally limited because they get a fixed length input window, which any language person, you know, is brought up with the idea from Noam Chomsky that language is generative, you can't constrain the length of a sentence, human sentences can go on forever, because you can constantly add new prepositional phrases, or you can constantly increase the link. So having a fixed window that you can process is just to a computational linguist is like, just crazy limit limiting. The amazing thing is, is we've been able to grow these windows, I think the latest, you know, Google Gemini models or some it's like a million, I think it's right, which is shocking to me that you can get a model to work on that link, but just fundamentally, a fixed length window is bad. The other problem with transformers is they grow n squared in that fixed length window, because they look at the connection between every possible token within that window, which gives you order n squared, which any computer scientist knows, well, it's not exponential growth, which is the thing any computer scientist really hates. But when you're doing large scaling to large amounts of data, it's too bad, because n squared grows too fast, you need linear, you know, you need something that can just grow as the computational demands the same size of the data, something n squared, it quickly becomes intractable. So I think it's hard to scale transformers that come up to more data, more context at this point, I've been amazed, it's been growing as much as we can. But we were talking about there are new models out there, like state space models in mamba that I think hold the promise to actually, you know, take us to the next level after this, I think we're sort of plateauing in the transformer area, I always hate to predict plateauing, because I've done it before when we were in the LSTM era, and we sort of plateaued, and then transformers came out took us to the next level. But I'm hopeful that the next generation of models, if we train on this much data, they'll be even better. So I think, you know, these mistakes that models are making, I think there's room to still improve them further. And I think we need new architect neural neural architectures to take us even to the next level.
Ron Green: Yeah, I totally agree with that. We talked about this a little bit, but I too share your enthusiasm around the state -space models. And we talked a little bit about even some really recent developments around there's a new architecture called Jamba that just came out that basically alternates sort of attention blocks with state -space blocks. And that could be a way to maybe get around this in -squared growth that we're seeing in these fixed context -linked windows. Do you have any thoughts on intelligence, like actual intelligence being exhibited within large language models? I know there are some people that think that if you're working with the latest and greatest, let's say, for example, ChatGPT-4, that it, by any metric that might have been defined 10 years ago, is exhibiting intelligence. And there are other people who just this stochastic parrot comment, where do you fall on that spectrum?
Dr. Raymond Mooney: Yeah. So, I mean, I think people sometimes tend to view intelligence as some sort of binary thing, where it's either you have it or you don't. And to me, it's a continuum. And, you know, I mean, I think John McCarthy started this idea that even a thermostat is intelligent to some extent. It senses information in the environment and reacts and changes and takes actions based on that information. That is intelligence, just extremely limited. All it can sense is temperature and turn on or off. But, you know, so I think a thermostat is epsilon intelligent. And like I said, I think AI, we've been making progress all along in the last couple of years. Have we made dramatic progress? Did I say to increase the intelligence of these models, you know, remarkably? Absolutely. But I think I've been clear, you know, we're not at the level of, you know, so -called, you know, Turing test level, where I would say that, you know, systems are as intelligent as people. But I think it's, how can you deny that these systems aren't intelligent at some reasonable level when they can do the amazing set of tasks that we've seen, but always be aware that, well, they might, the next thing they do might be incredibly stupid.
Ron Green: Right. not unlike a human sometimes.
Dr. Raymond Mooney: Yeah, I mean, people make mistakes also, but I think the types of mistakes that models make and the types of mistakes that people make currently are very different. I could go into a long story about work we did on why question answering, answering why questions about text, but we found that asks have stories and have models answer why questions, you know, like, you know, why did John go into the restaurant? And he's like, oh, he was hungry and he needed to eat, you know, and we compare human answers to machine answers from one of these large language models, but answering question about why people do things in stories. And we have humans then rate those answers on using what's called a Likert scale, five points, you know, from really bad to really good. And we eventually found that the best model we could construct and test was getting an average Likert score similar to a human. But then we looked at the breakdown of what those answers look like. And we found a very different distribution. When the model was right, it was really excellent. It produced these beautiful answers that it got more fives. It got more top scores on this Likert scale by human ratings than humans do. On the other hand, it got a number of ones and twos. The humans never got ones and twos. Sometimes the machine makes just really dumb answers that a person would never make. So it's not making the same types of mistakes as humans. It's making much dumber mistakes than human makes when it makes a mistake. But when it gets it right, it's even better than human. It's superhuman. So then when you look at the average, it obscures that difference from being very, very smart to being very, very stupid.
Ron Green: Well, I completely share your perspective on intelligence. I don't think it's black and white or binary. I think it's entirely in a gradient. And I do think that these large language models are exhibiting intelligence, not at the human level yet. And the biggest open question to me on that front is how far we can push this. Maybe we've run out of data, maybe increase parameter count, we'll see dimension returns. But I'm really, really interested to see just how far we can push it and just how far we can mimic human intelligence with just such a simple underlying training task. Okay, I want to totally shift focus now. You gave a presentation on the new era of big science AI, where you talked about how with the massive data sets that are being used to train these state-of-the-art models and the enormous hardware costs that go into them, it's very similar to sort of the shift that we've seen in other fields. You can look at astronomy, you could build a telescope and grind lens on your own. And now the James Webb Telescope cost, I think, $10 million and took a decade or two decades to build. We saw the same thing with particle colliders. Within AI now, most of the really, really bleeding -edge stuff is being done by private corporations, not government -backed companies. I'd love to just kind of hear your thoughts on that situation.
Dr. Raymond Mooney: Yeah, so like you said, I gave a talk of this at my own department, the University of Texas, last January, a year and a half, almost a year and a half ago now when I sensed that this was really becoming an issue. So it's really caused academic research to sort of reorient it because you can't, you know, do what the big tech companies are doing and build these extremely large models like, you know, ChatGPT or GPT-4. So I think to a certain extent, you know, a lot of the academic work has done more into I've heard it called analysis instead of synthesis. So instead of trying to build these models, more academic research has just been diving in on understanding these large models that the large tech companies have built. But there are open models too, like llama, that, you know, thanks to Facebook, actually making some of their models public. And those are even easier to analyze, because you actually, there are more white boxes where you can dive in and actually look at the underlying model compared to a total black box. Like, you know, one joke I always make is, is, you know, open AI should be relabeled closed AI because their models are very closed, they're not open. And, but I think a lot of academic research has gone more to what I, like I said, was just with the work we did on my question answering, exploring what the limits of these models are that out there, because they do have limits. And it is a useful academic enterprise to understand, okay, where are we right now in AI, it's hard to build new models as a simple little academic team at the universities that, that have the capabilities of these large models built by the large companies. But there's still a lot of useful work that can be done at universities analyzing and understanding these models better, and also making it clear that they do have limitations and what those limitations are.
Ron Green: Mm-hmm. How do you, as somebody who is teaching actively in artificial intelligence, in academia, keep your curriculum up to date? Things are moving so quickly. We, you know, we, practitioners who are building, you know, production systems, we are struggling to stay up to date with the latest white papers that are coming out like, you know, a fire hose. What is it like trying to teach in a field moving this fast?
Dr. Raymond Mooney: Yeah, so it is very difficult. You have to constantly be updating your classes to keep things up to date. So we were talking, I teach a graduate research seminar on what I call grounded natural language processing, which is connecting language to what I call action perception in the world, connecting language to vision. The big thing is these days is now there's even an acronym for it too is VLMs, Vision Language Models, where you actually have a model and GPT4V is out there now from OpenAI where you can give it images and language and it can process both of them. So that's an area that I'm interested in and it's moving incredibly rapidly. So I have in a seminar class in graduate school, a lot of times you read academic papers that have come out recently. And now this last year, I had to replace at least a third of the papers I had from the previous year because things had moved so rapidly. I basically felt at least a third of what I covered last year was no longer relevant. I had to replace it with newer stuff. So things are, and it's every day I get all these notifications from Google Scholar about all these papers I should be reading. I just glanced down the list and say, well, that looks interesting. Send that to one of my grad students to read because I don't have time to do it.
Ron Green: Oh, that's a great hack. I like that.
Dr. Raymond Mooney: Yeah, the joy of being an advisor and having grad students to do most of the work for you. But yeah, so it is incredibly hard to keep up. And it's also, you get worried about getting scooped because you think you have a new idea. Okay, you do some literature search at Google, you look on Google Scholar. Ah, someone's done something, anyone's done this. You start working on it. Then one month later, your grad student comes in, I thought I'd have results and go, oh God, you know, three papers just came out on archive doing exactly what we were trying to do, getting better results than we're getting, we're screwed.
Ron Green: Yeah. Yeah. One of my favorite things about the state that we're in right now is there's so much unexplored territory. There are so many opportunities to even take what I think are relatively simple ideas and combine them together and see what you can get out of them. Well, what about the future? Is there anything in particular that you're excited about? You mentioned large vision models and these sort of multi -models. Is that where your interest lies right now or are there other areas that you're kind of excited about?
Dr. Raymond Mooney: So I mean, I definitely am still very interested in multimodal models, not just vision and language, but also audio. Now there's a lot of models in action. So now I think there's, I forget what people call in the acronym, but vision, you know, language and action models, you know, that include like robot actions. And so I think there's lots of capability. Video is another big thing that I think has still a lot of room to make progress because the computational demands of processing video every, you know, you get an image every, you know, 30 seconds, yeah, incredibly amount of data to process. So I think there's lots of room for progress in multimodal models. And I think that's, that's certainly one very interesting direction that I think has a lot of promise for the near future of being improved.
Ron Green: I totally agree. Do you think the fact that the transform architecture in particular has made it easy for different fields, whether you're like computer vision or you're doing speech recognition or you're doing NLP, the fact that those disparate areas have kind of converged on a common architecture, I think that is part of the acceleration and it's part of what's making multimodal possible?
Dr. Raymond Mooney: Yeah, absolutely. You know, so we were talking about, you know, my career path and you know, like I said, originally language and learning were very different fields. And then they said emerged in the 90s. Well, for the longest time, language and vision were two completely different fields. I happened to have had exposure to both in grad school, I took a graduate class in computer vision and a graduate class in natural language. Even in the 80s, not many people did that. So I always had a little bit of broader perspective. I think that a lot of people who are just in language or just envision, I always thought these things should be brought together. But it's it was hard because people, the two fields were using completely different techniques and made more no connection. Now, one of the things I think that is great about the deep learning era is all the fields of AI are using similar, if not the same tools like transformers, like RNNs, like deep networks. And it allows much more fluid communication of ideas and combining multimodal data in the same model to fuse those things. And I think that has been one of the amazing things. And of course, another application that's become big is generating images and video, you know, we have Dolly and Sora and these sorts of things. And again, they're not that great. If you really dig into them, they really can't generate consistent long videos, say. But there's room for a lot of progress there.
Ron Green: Yeah, and I think Sora is generating up to a minute a video, which is a pretty significant jump, but even some of the best videos you see there, there are glitches, weird things, but it's still moving at a breakneck pace. You've been working within the field of artificial intelligence for decades, and I remember in the early 2000s when I had completed my graduate degree in AI, and I was talking to somebody who was probably 2005, and I mentioned what I'd done at grad school, and their reaction was literally like, ooh, ooh, artificial intelligence. What a bad move. What a waste of time. Was there ever a period during your career where, as AI sort of waxed and waned in popularity and research funding, where you felt like maybe you had made a mistake, maybe it was too far away in the future to ever be effective in your lifetime.
Dr. Raymond Mooney: No, no, not since 1979. I mean, I guess sometimes I feel like I've been lucky. I gave this talk at the Natural Language Conference in Singapore last December about my career and things like that. And someone talked to me about this sort of issue. And I was like, no, I got excited about AI when I was in high school and I've never questioned it. And I think I've always realized that it's going to be an important thing that's going to change the world. And I've never questioned that despite all the ups and downs in the field. And I think there are a lot of people like me, right? I mean, there are a lot of people from my generation. I went to AAAI in Vancouver just a couple months ago, met people I first met 40 years ago at AAAI when I first came, right? And we've both we've all been in the field all that time, optimistic that the field was eventually going to really change the world. And, you know, now we're at a point where it really is seriously. You know, one that we didn't talk about is all the negative side effect. And we'd had some of our previous conversation about, you know, fake news, you know, you know, yeah, we didn't really get into all the negative, you know, possible negative impact, which I think is, you know, a serious interesting topic that, of course, a lot of people are talking about.
Ron Green: What are the things that you're most concerned around? Is it principally disinformation or other areas?
Dr. Raymond Mooney: I mean, particularly given this election year, right? I think it's something we all have to be concerned about. It was bad enough the last couple of times, right? And now we have much more powerful AI technology that can create amazingly convincing, you know, fake news, you know, personalized to individual people. You know, that could really, you know, potentially have a really negative impact in our ability to identify whether this is, you know, automatically generated, large language model generated text or not, you know, is already a complicated problem. So I would say immediately, I think the misinformation generation and fake pictures and video and deep fakes and all this stuff, probably would be my number one concern for the near future. And we need better techniques for controlling that.
Ron Green: Ray, I've enjoyed this conversation so much. We like to wrap up and ask our guests if they could have AI automate anything in your daily life. Just make something in your daily life better. What would you go for?
Dr. Raymond Mooney: That's a good, good question. I guess, you know, just like you said, I get all these notices for all these papers, and there is technology out there that can do summarization and things of that sort. But it's, and I think some people have looked at like personalized summarization, where it's actually, you know, summarize a bunch of documents just for you, right, rather than just for a general audience. And so I think something like that, and again, I think maybe the challenge is getting close to the point to do it that instead of me just having to scan through all these things is I just got a nice summarize report is here's what's happened in AI, you know, today, that here's important for you, that I think you really would like to know about based on your interest and your expertise. And I think that one one thing could maybe make my life a little bit easier.
Ron Green: Yeah, I could go for that too. Well, Ray, thank you so much. This was an absolute pleasure and an honor.
Dr. Raymond Mooney: Okay, well, it was nice talking to you.