
Night Science
Where do ideas come from? In each episode, scientists Itai Yanai and Martin Lercher explore science's creative side with a leading colleague. New episodes come out every second Monday.
Night Science
8 | Yana Bromberg on getting creative with machine learning
Yana Bromberg is a Professor at Rutgers, where she teaches computers to speak the functional language of biological sequences. In this episode, she talks with Itai and Martin about the amazing creativity of machine learning, the search for weirdness, and her superpower of translating things from one field to another.
Her work is being recognized from virtually all sides, including NASA and NIH. She has received a CAREER award from the National Science Foundation. Yana asks deep fundamental questions whose answers are very important for improving our health, preserving our environment, and, as she writes on her website, also to figure out if “Well… did we really start as green slime?!”
For more information on Night Science, visit www.night-science.org
Yana: The Night Science is finding something that wasn't initially thought of.
Yana: So just basically wandering into the forest of the data and trying to figure out what's up.
Itai: Welcome to the Night Science Podcast, where we explore the untold story of the scientific creative process.
Martin: We are your hosts.
Itai: I'm Itai Yanai.
Martin: And I am Martin Lercher.
Martin: Yana Bromberg earned her Ph.D.
Martin: in biomedical informatics from Columbia University.
Martin: Today, she is professor at the Department of Biochemistry and Microbiology at Rutgers University.
Martin: And her work is being recognized from virtually all sides, including NASA and NIH.
Martin: And she has received, for example, a career award from the National Science Foundation.
Itai: And Yana researches how it is that we can decipher DNA, which is a kind of blueprint for all of life.
Itai: To do this, she develops computational methods to understand where the machinery came from and how it works.
Itai: And these are extraordinarily deep fundamental questions whose answers are very important for improving our health, for preserving the environment.
Itai: And as she writes on her website, also to figure out if, quote, well, “did we really start out as green slime?”
Itai: So Yana, thanks so much for joining us today.
Martin: Yeah, it's great.
Yana: Yeah, happy to be here.
Martin: It's great to have you here.
Martin: Yana, on your website, you don't only ask about green slime.
Martin: You also write that your findings consistently indicate that our world functions via dependencies and interactions at all scales.
Martin: Can you explain that a little more?
Yana: No.
Yana: So, when you think about the networks of interactions, so this cool word, systems biology, has been around lately.
Yana: So everything interacts with everything.
Yana: And if you try to focus on individual components of this overall interaction, then perhaps you don't do as well in understanding the observations that you're making on a larger scale.
Yana: So, for instance, a human phenotype cannot be explained by simply looking at two genes.
Yana: So you need to look at the interactions overall.
Yana: The interactions, the way I think of them now, is more not just on the level of the proteins and the genes, but also on the level of interactions of organisms.
Yana: So basically, let's say human and its microbiome, or human and the environment microbiome, for instance.
Yana: I guess that's kind of like a very vague, non-committal statement.
Martin: Yeah, so as you know, this podcast is concerned mostly with the creative side of science.
Martin: So do you think that the kind of creativity that drives research is different when you focus on these dependencies on the systems level compared to when you would focus on just one or two proteins or genes?
Yana: Well, it allows for a lot more space to be creative.
Yana: So you can actually try to answer a lot more questions at the same time that not always leads to anywhere good, but definitely will require a lot of creativity to sort through.
Yana: So if you're allowing a lot of connections, you have to be very creative in interpreting what you're getting and also in asking the questions that you would like to answer.
Itai: And you know, Yana, it also kind of deserves more creativity to think about links rather than the objects themselves, because the objects, the genes, the proteins, those are physical things.
Itai: And many times we sort of focus on those because it's tangible.
Itai: It's right in front of us.
Itai: But links between things, that's more intangible.
Itai: Does that require sort of a more abstract view of things?
Yana: Yeah.
Yana: So I don't tend to think about creativity as an object.
Itai: It's intangible.
Yana: Well, yes, it's intangible.
Yana: But I also don't know what it is, right?
Yana: So from the perspective where I sit, if you tell me I'm creative, I don't know whether I agree with that.
Yana: Creativity to me is artists thinking about paintings or music or dance.
Yana: And science is potentially creative in the sense that you can interpret hardcore data, for instance, as something that's musical.
Martin: Yeah, but actually, Itai and I think that scientists are fooling themselves by maintaining that worldview.
Martin: I mean, this is the general way in which people see this when you're young, you learn this at school, that, you know, arts is where creativity is, and science is all hardcore, and you just follow rules.
Martin: Scientists like you very rarely view themselves as being creative.
Martin: But then, you know, if you write a paper, at the beginning of that paper, there was an idea.
Martin: And I think one could argue that that idea was the crucial starting point for the whole project.
Martin: Without that idea, the project would never have started.
Martin: So when we talk about creativity, it's basically that question, where do those ideas come from?
Itai: Yes.
Itai: Aren't you being too modest, Yana?
Itai: I know you're being modest.
Yana: Oh, okay.
Yana: So thank you for that.
Yana: A vote of confidence in my modesty.
Yana: I really think it's a matter of definition.
Yana: So however you want to define creativity.
Yana: So in my head, and I guess this is coming from when I was a kid, creativity refers to something that you can engage other people with.
Yana: So it has to be something that's unique and something that's novel, perhaps, and something that interests others.
Yana: So you can totally be creative with science, but you actually have to convert it into something that is visible to others as being exciting and new and unique.
Yana: So I do that with my perspective students.
Yana: I talk to them and I tell them about the stuff that we do in the lab, and I'm excited and I guess it leaks down to them.
Yana: So they get excited.
Yana: And I think that's the creative part, because there you don't actually have the hard data you can talk about all these things and all these connections and all the objects, however, you want to use that terminology.
Yana: But overall, it's this picture that you're painting for someone who doesn't actually know what you're doing.
Yana: And I think that's creative.
Yana: Papers are pretty rigorously formed, or at least the papers that I read.
Yana: I know you guys write really cool papers that are more of a general broader appeal, the genome biology stuff.
Yana: But my papers are fairly rigorous, as in like, this is what we want to show.
Yana: Sorry, go ahead.
Martin: Yeah, no, but we totally agree with that.
Martin: The papers as the end point of a scientific project, they follow very rigorous rules.
Martin: And we would call that day science.
Martin: So paper writing is part of day science.
Martin: But when you do a project, there must be points, maybe at the beginning of the project, but also somewhere in the middle, where you're confronted with something you didn't expect, right?
Martin: Where you have to make sense out of data.
Martin: And that is what we would also call creative.
Martin: So that must be something that you experience in your research or not.
Yana: All the time.
Yana: Well, okay, so I guess I'm back to the point of how you define creativity, right?
Yana: That's slightly different.
Yana: So I've spoken to Itai about day science and Night Science before, and that is a very clear notion in my head.
Yana: I mean, I can't differentiate the two precisely, so it's not this is black and this is white.
Yana: So they leak into each other, obviously.
Yana: But it makes total sense that day science to me is working on a hypothesis.
Yana: So trying to design tests that would disprove that hypothesis or, you know, find something else.
Yana: And the Night Science is finding something that wasn't initially thought of.
Yana: So just basically wandering into the forest of the data and trying to figure out what's up.
Martin: That's a beautiful image.
Itai: Right.
Itai: Walking into the forest is a beautiful image.
Itai: And going back to your question of what is the definition of creativity, I'm sure there is one in Webster's dictionary.
Itai: But what do you think about this definition?
Itai: The way I think about it is it's trying to be comfortable with a very uncomfortable situation.
Itai: You know, the uncomfortable situation is that you're in the dark forest and you don't know where you're going.
Itai: And it's all very confusing.
Itai: But instead of snapping out of it and trying to go into a different part of your brain, you just hang around.
Yana: That's a nice way to define it.
Yana: I'm not sure if Merriam-Webster would agree, but I like that definition, yes.
Yana: So I guess the story was being in the forest, just to take that a little further.
Yana: You can focus on a single tree in that forest.
Yana: And that would still be technically creative then by this definition, because you may still be uncomfortable in the forest, but here you are focusing on this tree.
Yana: And that's new, because you didn't know that the tree exists before you got there, right?
Yana: But I guess that's not what I would call creative to me.
Yana: Creative, maybe I am creative.
Yana: So the idea there would be to see the forest for what it is, the forest, right?
Yana: The day science would be the trees in this forest, trying to figure out and define the role of every individual, what the individual is doing and how is it living and so on and so forth.
Yana: But this forest, the overall interaction, the overall, I guess, ten thousand foot view of this thing is what I would consider creative.
Yana: The problem is you're in the forest.
Yana: So it takes a leap of the mind to sort of go above it and see if you can see anything.
Martin: So when you're in the situation that you're in the forest and you want to have an idea of what the forest would look like, right from ten thousand feet above, do you have any tricks, any methods that you use at that point?
Martin: Or any habits that you maybe have?
Yana: I don't know if you guys notice it, but I like to talk a lot.
Martin: No, no, we didn't notice it.
Yana: So usually, I usually talk to people, to everybody.
Yana: And I try to describe to them what I'm seeing and why I'm excited about it.
Yana: And usually, I get these blank stares.
Yana: But once in a while, I get these questions out of politeness, I guess, or general interest, whatever.
Yana: But in the end, it's not for them, it's for me.
Yana: So the more I talk about something to people, the more perspective I get.
Martin: Okay.
Martin: That sounds like it only partially depends on those people's reactions.
Yana: It doesn't depend on them at all.
Yana: It's for me.
Yana: Okay, that's not true.
Yana: So if I talk to scientists or my students or my student scientists or something like that, then of course there is a more concrete and precise interaction.
Yana: And once in a while, I'll get comments from people who are like, I didn't see that coming at all.
Yana: Thank you for that insight.
Yana: But in the majority of the cases, I think it's all about me formulating my thought again and again and again and again in different ways.
Yana: And then all of a sudden, it becomes clear to me.
Yana: There was a Russian joke about this, right?
Yana: So the teacher goes to class and he doesn't like his students.
Yana: So he comes back to his friends and they ask him, what's wrong?
Yana: Why do you not like your students?
Yana: And he says, well, I explained the material to them once.
Yana: They didn't get it.
Yana: They're really not getting it.
Yana: I don't understand.
Yana: I explained it to them once.
Yana: They don't get it.
Yana: I explained it another time.
Yana: They don't get it.
Yana: I explained it a third time.
Yana: They don't get it a third time, even I understood, and they still don't get it.
Martin: That makes perfect sense.
Itai: That's good.
Itai: And they always say that the best way to learn something is to teach a class on it, or the third time you learn it.
Yana: But I mean, if you think about science, doing science in general, not just teaching classes is trying to explain to people the current state of the field most of the time.
Yana: But if you try to explain to them your own things, things that don't yet exist as a state of the field, it takes a while.
Yana: And it crystallizes in your head as well.
Martin: So is that something that you do, that you teach topics that you're still developing and that helps you actually in the development?
Yana: Yes, in my class, I think my last lecture is completely and totally dedicated to the things that we do in the lab.
Yana: And it's not a required thing for the students.
Yana: So they don't have to be there if they don't want to.
Yana: So it's basically just for me and they ask questions.
Yana: But you can't escape the fact that you're doing things.
Yana: If you're really trying to do and understand it just permeates everything you do.
Yana: Everything comes down to that one question.
Yana: I don't know if you guys noticed this, but literally, if you're excited about something, this comes up in your laundry day.
Yana: When you're talking to your parents, it comes up everywhere.
Yana: So it definitely comes up in class lectures.
Yana: But I don't think that's the main sort of thing I was referring to.
Yana: I was saying that I talk to people a lot in general, not just to class, and I try to explain it to them.
Yana: That's what I do a lot.
Yana: I explain.
Martin: So talking is a very important part of your, let's call it the Night Science process, rather than creative, because you're not comfortable with that word, or you don't know what it means.
Martin: Neither do we, I guess.
Martin: So you said that talking is an important part of your Night Science process, but you also wrote a paper on scientific comics, and that to me suggests, which I think was really, really cool.
Martin: And to me, that suggests that drawing might also play a role in your creative process.
Martin: Is that true?
Yana: So disclaimer, the drawings in the paper are all Jason McDermott.
Yana: He's also known as “RedPen, BlackPen”.
Yana: I'm sure you've seen his comics around.
Yana: So all the drawings in that paper are from him.
Yana: We started a conversation on that paper because I was interested in whether I can perhaps draw some things to make explaining easier.
Yana: And I have given it an honest shot, and sometimes it works and it's fun.
Yana: But I wouldn't do that like Jason does it.
Yana: So I wouldn't first draw a comic and then show it to people to explain things.
Yana: I would just try to explain first and then be like, okay, it's easier to do with paper.
Yana: Let's do it on paper.
Martin: But is that only when you explain things to people or is that also when you're trying to figure something out?
Yana: Oh, writing things down for when I'm trying to figure things out is really important.
Yana: But this usually happens at somewhat later stage.
Yana: So I have a question in hand and then I will try to write things down as if I was trying to write an introduction for a paper or something, right?
Yana: Not really writing the introduction, just trying to write a description of what it is that I'm thinking.
Yana: And I erase it and I do it again and I erase it and I do it again.
Yana: And thank God for computers because otherwise the paper trail would be too big.
Martin: Yeah.
Martin: And in a way, I think that's actually a mirror image of what you do when you talk to people, right?
Martin: Like what you said, you explain it once and then you explain it again.
Martin: And every time you understand it better.
Martin: So, you know, writing that introduction once and then rewriting it and rewriting it is like a different version of the same thing.
Yana: Absolutely.
Yana: I mean, the people would make it harder, I think.
Yana: And that's a good thing in that sense that something that makes sense in your head completely makes sense in your head.
Yana: Once you try to speak it or put it on paper, it just doesn't come.
Yana: It's almost like you can imagine yourself flying, right?
Yana: But you probably will not fly if you try.
Itai: It's really interesting that a part of your scientific method is to kind of roll around a question in your head, but also out loud and just to be so immersed in it that it keeps bouncing back and forth.
Itai: And throughout this process, you sort of develop it more and more.
Itai: And I'm wondering, is this something that's always been a part of your method?
Itai: Or do you see that it's changed from when you were a grad student to a postdoc, to an independent researcher?
Yana: So the thing that changed, I don't think the method changed, not that I could trace it necessarily well.
Yana: But I think what did change is what am I asking?
Yana: So as a grad student, I had a very precise question.
Yana: So my advisor was a genius.
Yana: I wish I could follow his steps in this, but he wouldn't take people into the lab without a question.
Yana: So you know how normally when PhD students come, the expectation is that you have a question and they kind of have to answer that question?
Yana: In his world, you basically had to have a question to come to the lab.
Yana: And if it was of interest to the things that he wanted to do, then yeah, he would take you on.
Yana: At least that was my understanding.
Yana: I don't know if he would say something else.
Yana: We can ask him once the podcast is done.
Itai: Tell us more about your PhD advisor, Burkhard Rost.
Itai: Tell us more about his method.
Itai: He'll take students that kind of are motivated by a particular direction and then set them free?
Yana: So I can't really tell you what Burkhard was thinking.
Yana: I can only tell how I experienced it.
Yana: So that's a disclaimer, right?
Yana: But he would, as far as I understood it back then, so this is now that I joined his lab.
Yana: And so this is now years ago.
Yana: We should celebrate, I think.
Yana: But basic idea was you had to have an idea.
Yana: And I started out actually with the idea when I came to Columbia, I wanted to do computational stuff.
Yana: So this is all thanks to the little stint at Weizmann the year before.
Yana: So I wanted to do computational stuff.
Yana: And I didn't want to lose the biology of it.
Yana: And machine learning in biology was a thing that was just beginning to be sort of a thing, or at least in my head was beginning to be sort of a thing.
Yana: And so I wanted to figure out whether we can use machine learning in order to be able to predict the effects of human variants.
Yana: And as every admissions essay that I've ever read, my motivation for doing this work was because grandma, parent, myself have some kind of disease or whatever, or I want to help the world get rid of disease or some other version of this, right?
Yana: So my motivation was indeed an autoimmune disease history in my family, and I wanted to see if I can identify the variants which were responsible for causing this disease.
Yana: Now, it never occurred to me that it was going to be one variant which seemed to be the trend in the field, that you can pick one pathogenic variant for a very long time.
Yana: But that wasn't the case for me.
Yana: So I had this question.
Yana: I wanted to see if I can use machine learning to make predictions of variant effects, genome variant effects.
Yana: And now, mind you, in , we're talking about the human genome that has just like yesterday.
Yana: So I remember in the Weizmann, Doron Lancet coming in and telling me, oh, here's the Nature and the Science paper that came out with the human genome cover.
Yana: So that was the year before I started my PhD program.
Yana: So basically, the idea was that now that we have this information, we need to convert it into human health.
Yana: And that was my plan.
Yana: Like literally, that was my plan.
Yana: So I went to this guy named Rudy Leibel.
Yana: He's a big diabetes guy over in Columbia.
Yana: He's a really great guy.
Yana: But he's an experimental biologist.
Yana: And I said, I want to do computational stuff, but I need you to advise me.
Yana: And he said, OK, I'm not a computational guy.
Yana: Find yourself a computational guy.
Yana: I'm happy to co-advice.
Yana: And I emailed, I don't know, about people in Columbia asking if they would take me on.
Yana: What I forgot to mention is that I had a fellowship, so they wouldn't have to pay me.
Yana: I guess if I didn't forget it.
Martin: That sounds like an important factor to mention, actually.
Yana: I didn't know.
Yana: I mean, I was coming into this, I didn't know.
Yana: So I didn't even realize that they would have to somehow pay for my education.
Yana: This was not clear to me to begin with.
Yana: I guess I should have been paying more attention.
Yana: But the idea is that nobody had replied except Burkhard.
Yana: And he said, we'll just come in and we'll talk.
Yana: It was an interesting experience.
Yana: I didn't know what to expect when I walked in.
Yana: Burkhard is a unique personality.
Yana: And I love it.
Yana: But back then, I walked in, and this was not something I would expect from a professor.
Yana: He's kind of a long-haired guy sitting in some back end room in Columbia's College of Physicians and Surgeons.
Yana: So Columbia wasn't set up for computational biology.
Yana: Burkhard didn't look like a standard professor.
Yana: So all of this was really exciting to me.
Yana: And he was excited by the topic.
Yana: So he was the one that was doing a lot of the protein analysis using machine learning.
Yana: He had actually written the neural network in Fortran that I ended up using in order to build my SNAP method, which is still around in multiple iterations.
Yana: But the original one got like citations.
Yana: I'm very proud of it.
Yana: That's a lot.
Yana: So, I don't know.
Yana: So, Burkhard was an incredible mentor in the sense that whenever you wanted to talk, you could talk to him.
Yana: But I would say that I am a lot more present than he is in the sense that I actually seek out my students and ask them if they want to talk, and he would just let us run wild.
Martin: Even starting with not giving the students a question, but assuming that they would bring one with themselves.
Yana: I think that's a great way of doing it, but I'm not sure how you can get away with it frequently.
Yana: I tried.
Martin: No, but that would also be my impression, because just from my own experience, even the best students that I had, the best PhD students, when they started their PhDs, maybe they had an idea of in what direction they wanted to go, but they wouldn't have had an idea of what the precise question would be.
Martin: So I think the people who are good and who know what they want to work on, and that is something that is of scientific interest and is interesting to the PI, I think that's just a very unlikely combination.
Yana: It worked for him.
Itai: But also even if the student knows what they want to do, that's more often than not just an initial direction.
Itai: And then together you can work with the student and discover the specific question.
Itai: You know, I love that you seek out your students.
Itai: It reminds me of the time I actually met Martin.
Itai: We were in Peer Bork's lab in Heidelberg.
Itai: And Per, I don't know how he survived every day because he had this habit of just going into the lab and just tapping you on the shoulder and say, “Hey, do you want to go for a coffee?”
Itai: And then, you know, you would go with him for a coffee, talk about the work, have some ideas, and then you'd come back and he would go up to someone else and tap them on the shoulder and say, hey, do you want to have a coffee?
Itai: And his whole day was just coffee after coffee after coffee.
Itai: I don't know how many cups a day he had.
Yana: Well, I mean, that seems to be the standard for scientists, at least the ones that I know, at least the ones from that same lab in Heidelberg that you're referring to.
Yana: So I would love to be able to do that, right?
Yana: So Burkhard and I and Burkhard and everybody else had walked around a lot in Manhattan on Riverside Drive.
Yana: We would just talk and walk.
Yana: But this was once we had things to discuss.
Yana: So this was, I would say, about four years after I started in the lab.
Itai: Yeah, there are some PIs that only sort of get interested once things get going and data is starting to come in.
Yana: So as I said, if I had a question, if he was available, we could talk.
Yana: We had lab meetings in general to bring in the entire lab every week.
Yana: So he wasn't absent.
Yana: But these conversations that are more scientist to scientist came only when I was sort of beginning to resemble a scientist.
Yana: I like his idea, but I have not been able to reproduce that consistently.
Yana: I've had some amazing graduate students who were really...
Yana: My first graduate student was really great.
Yana: For instance, he came from microbiology, not from computational biology.
Yana: And I had nothing to do with microbiology before I joined Rutgers.
Yana: So we were both, at the same time, learning things.
Yana: So he was learning computational world.
Yana: I was, you know, exploring microbiology world.
Yana: And it was really great.
Yana: He had a lot of really great ideas about things that he was doing, about things that he wasn't doing too, which is, you know, not very often.
Yana: But most graduate students, I find, are very focused on their own question.
Yana: They may be very creative in that question, but they focus on that question.
Yana: And it's okay.
Yana: I mean, not everybody needs to be consistently in creative research.
Yana: I mean, we need engineers, right?
Yana: So people working at Pfizer, I don't think they're really trying to solve the reason why something happens.
Yana: They're just trying to optimize a drug, for instance.
Martin: Like you said before, there's not a black and white as regards day science and Night Science.
Martin: You know, I've never worked at Pfizer, right?
Martin: But I presume that people working there, they're also interested in the why.
Martin: It's just not their main focus, right?
Martin: So they might spend less time thinking about that part than about thinking about optimizing something.
Martin: And also like these different types of students, right?
Martin: The ones that are just trying to work on a task that was given to them.
Martin: They also have to do day science and Night Science.
Martin: They also have to be creative because they will come to roadblocks, right?
Martin: And then they have to figure out a way around them.
Martin: And there you also have to do Night Science, you have to be creative.
Martin: So I would think that maybe these are just the two extremes, but there's a lot of intermediate balances in terms of creative and not so creative work between that.
Martin: Wouldn't you agree?
Yana: Oh, there is lots of gray in between.
Yana: I can give you guys an example.
Yana: I have a graduate student right now who was stuck at a certain place in his thesis.
Yana: And I sent him the last paper you guys published in Genome Biology.
Yana: And they said...
Yana: No, but seriously...
Martin: This is a great story.
Yana: Was it the last paper, the gorilla paper?
Yana: I don't know if it was the last one.
Yana: So I said, imagine yourself in the place of these students that have to figure out things.
Yana: Forget your question.
Yana: See if you can figure out things.
Yana: And he did.
Itai: Find the gorilla.
Martin: And he found the gorilla?
Yana: Well, there was no gorilla.
Martin: That's something else.
Yana: Yes, yes.
Yana: So we have a paper coming out shortly, hopefully.
Itai: Oh, and I hope we receive a citation there.
Yana: And acknowledgement.
Itai: Yana, you mentioned that your first grad student was kind of interested in microbiology.
Itai: And I'm wondering, maybe you two worked so well together as a team because you had different backgrounds and different interests.
Yana: Maybe.
Yana: Maybe he's just really cool.
Yana: But that's not a maybe, he is just really cool.
Yana: But I think the reason why this was productive was because I had this bird's eye view on the things that we were doing.
Yana: And he had knowledge of the detail in the microbiology.
Yana: So I can tell you that the first paper that we wrote, it had a lot of issues in the field.
Yana: So one of the comments that we got from one of the reviewers when we submitted it was go read Bergey's manual.
Yana: Basically, Bergey's manual is the Bible of classification, prokaryotic classification.
Yana: The first edition was in .
Yana: They keep updating it and making changes to be more in line with what's happening with the knowledge in the world.
Yana: But it's basically all this taxonomic annotation, which makes sense within the individual clades of the bacteria.
Yana: But globally, if you wanted a machine to be able to make some inference on top of that, you would probably fail because there is no logic that is inherent to the entire taxonomy.
Yana: So what we were looking to do is we were trying to see if we can infer functionality of the individual bugs, microbes, using the predictive mechanisms that we had built.
Yana: And if we can infer taxonomic similarities of the bacteria using these functional signatures of the individual bacteria, we hoped that this could represent a different way of looking at taxonomy.
Yana: And the problem there, and I see it now, the problem there is that I came into a field which did things this way, and there was nothing else you could do that would be right.
Yana: And in , there was a paper that came out that describes phylogenetic construction for the bacteria that uses genome information directly.
Yana: And they create these clades, phyla, classes based on genome similarity, which is a lot more understandable to a computer because there is some function, but it also somehow reflects what people expect, this evolutionary relationships between organisms.
Yana: But from my perspective, I wasn't looking at it this way because I'm not from the microbiology field.
Yana: I wasn't looking at it this way.
Yana: I was looking at trying to understand how functionally similar organisms, because evolution for bugs is a weird thing.
Yana: How do you describe horizontal gene transfer in a normal vertical?
Yana: You can't, right?
Yana: And so we tried to do this functional annotation.
Yana: We submitted a paper to PNAS.
Yana: It took forever to be reviewed, something like nearly a year.
Yana: And then it was a whole stoplight worth of comments.
Yana: So the first person was all green, all wonderful.
Yana: This is good.
Yana: We should think about it this way.
Yana: The second person was somewhere yellow, right?
Yana: The third one, basically, his whole review was “go read Burgey's manual”.
Itai: Well, we've been thinking about a confirmation bias.
Itai: Yana, this reviewer, was not used to thinking about things the way you are presenting it.
Itai: And so he set a much higher bar for you than if you were to come with something that confirmed everything he or she believed.
Itai: So I think it's kind of like a hazard of being interdisciplinary, that you're coming into a field with very different sort of presets.
Yana: Absolutely.
Yana: This is definitely a problem, right?
Yana: So you're not a computer scientist to computer scientist, and you're not a biologist to biologist.
Yana: So in fact, you are a bioinformatician, but that doesn't exist as a science on its own, I guess.
Yana: So the reason I brought that up is because my student at that point, he was really upset.
Yana: This was his first paper getting rejected.
Yana: But then he was able a couple of weeks later to regroup and sort of alter the text on the manuscript to be more palatable.
Yana: Now, to the point where we had a comment from another reviewer in a different one, in a different submission, where he said, why are you so shy about presenting your results?
Martin: But like you said, you came into that field with a bird's eye view, but without knowledge about how are things usually done in that field.
Martin: That is a burden because it's harder to convince your reviewers who probably are from that field.
Martin: But on the other hand, we believe that it helps you to be more creative, because you're not fixed into that frame that all the people within that field don't know how to get out of.
Martin: So you can think outside of the box of that field.
Martin: And that's exactly what you did in that paper.
Yana: Yeah, and I think that's my superpower, that I can translate things from one field to another.
Yana: So the way I think about the stuff that we do, we do molecular function research, protein function research, that can be applied anywhere.
Yana: So the human cells have proteins that function, the microbial cells have proteins that function, microbiomes have proteins that function.
Yana: And if you are looking at the origins of life, I believe we have peptides that would have been cofactors of the original RNAs that function.
Yana: So in the end, if you focus on the protein functionality, you can describe anything.
Yana: It's kind of like this new trend of finding embeddings to describe the sequence for deep learning models.
Yana: So they're basically teaching the language.
Yana: We're teaching machines the language.
Yana: We're trying to design this language and teach machines the language so that then we can ask the questions.
Yana: So there was this movie that I saw recently.
Yana: I can't remember what it's called anymore.
Yana: Arrival, the arrival, right?
Yana: So these aliens that arrive, obviously it has to be aliens.
Yana: They speak in a different language, in a language that is not something that humans or human cultures ever had thought about.
Yana: It takes a while to deconstruct it, but they eventually get it.
Yana: But there are lots of problems along the way, so you can interpret different words differently.
Yana: But I think that once they get it, then the questions that are being asked can be answered, but also the questions don't really make sense anymore.
Yana: So what we're doing with deep learning, at least that's how I imagine it, is we're trying to develop this language that machines can understand or teach the machines the language so that we can ask the questions.
Yana: So if you teach the machines a language of sequence, it's not a sequence of amino acids or four nucleotides repeated over and over.
Yana: There is some meaning to the portions of this or to the components of this.
Yana: And if you want to answer the question, that is more deeper.
Yana: So let's say we wanted to predict the functions of the microbes that hang out in a particular microbiome.
Yana: We can do that by first teaching the machines the language and then using the read data converted into that language to ask the more precise function question.
Yana: The idea for me is that if you can learn the language of function, so this is a little higher, but if you can learn the language of function, then you can ask whatever questions.
Yana: And it doesn't matter which organism you are in or if you are in a biome or whatever.
Martin: I think that's a fascinating way of looking at things.
Martin: And of course, the genomic language is universal across organisms, right?
Martin: So I think that's the deeper reason why you can so easily transfer what you do from one organism to another organism.
Martin: But I have another question.
Martin: You were talking about the role of language, of developing a language in which you can basically talk to the computer, or in which the computer can think about the problem.
Martin: But then, does that also mean that the questions that you're asking change because you have to ask them in that language?
Martin: And is that a constraint?
Yana: So that's a very good question, at least the way I interpreted it from your language.
Yana: But I think the point that I was making with the movie is that once you learn the language, you no longer need to ask the same questions.
Yana: But as of now, I don't think that that works for our interaction with the computer.
Yana: We have a question.
Yana: We want that question answered.
Yana: Now, if in the process of teaching the machine to understand what we're asking, we realize that there is another way to ask it, which is better for a machine, but may be also informative for us, that's on us.
Yana: That's a gain for us that we generated by ourselves.
Yana: I don't know that necessarily changing the language will change the question.
Yana: I don't know.
Yana: I just really haven't thought about that.
Itai: It's really interesting.
Itai: And I think, you know, Yana, it's cool to think about a process such as machine learning, which is, in a sense, not creative, because it's sort of out of your hands to some degree, but then you're talking about how you then use that to then pose questions.
Itai: And so it's this kind of becoming creative, using something that's not necessarily creative.
Itai: Does that make any sense?
Yana: Well, first of all, I think that machine learning is very creative.
Yana: I mean, talk about connectivity.
Yana: This thing is all over the place.
Yana: Also, I think very importantly, there is a difference between machine learning in computer science and machine learning for biological questions.
Yana: So I mean, it's a much longer discussion, but I think that the way that computer scientists look at it is that the data that you have in your hands to answer a particular question is representative of the reality.
Yana: And as a biologist, we know that's not necessarily true.
Yana: So that's data from a particular experiment that may be answering the question you're looking at.
Yana: But any new data that you extract from the same setup may not be the same.
Yana: Or, more importantly, any data that you extract from a different perception attacking this question from another side may not also have the same types of noise and the same types of signal as the one that you had had before.
Yana: So in bioinformatics, the idea is that you know the biology of it.
Yana: So you ask the question in maybe ten different ways, but you realize that you're asking the same question in ten different ways.
Yana: In computer science and machine learning, it's just normal if Google applies their methods to let's say predicting the protein structure with the story with alpha fold.
Yana: Does that mean they solve the folding problem?
Yana: No.
Yana: They answer the very specific question.
Yana: What's going to happen the moment you put a disordered protein into this conversation?
Yana: I don't know.
Yana: So I think bioinformatics is super creative in the sense that you have to figure out how to ask the question and what data you have to ask the question and what models you have to ask the questions.
Itai: Earlier you mentioned bioinformatics.
Itai: You said that it's sort of not here nor there.
Itai: It's not biology.
Itai: It's not computer science.
Itai: What do you make of that, that your field in a sense doesn't exist to you?
Itai: The problem is it doesn't exist to others.
Yana: Well, it exists to me.
Yana: Well, that's okay.
Yana: That happens all the time.
Yana: I imagine things that don't exist to others, so it's all right.
Yana: No, so I am not looking to necessarily be accepted by everybody.
Yana: The same goes.
Yana: I'm not a hundred dollar bill for everyone to like me.
Yana: But I think that that's the story, right?
Yana: So a computer scientist may not agree with the way that a question is asked in biology, right?
Yana: So someone can tell you that your data is too wide for the number of samples that you have.
Yana: So you're probably overfitting.
Yana: And the biologists may say, well, the doctors do this a lot, not necessarily biologists.
Yana: So if they can't understand why a particular statement is made, they're like, well, I don't believe it.
Yana: And this is really interesting because they believe the blood tests, for instance.
Yana: It's not like they know exactly how this experiment that tells you that measurements works, but they've been using it for so long that they tend to believe it.
Yana: But if you come up with a new way of doing things, so if you were to run a patient's genome through an analytical tool and the tool doesn't explain why the results are there, you could say that this tool is not going into clinical decision making.
Yana: So people may not agree, people may not understand, but in the end, if you want a translational field, you're kind of stuck.
Yana: There is no way that you can be appeasing everybody.
Yana: So it's okay.
Yana: Bioinformatics is a field.
Yana: I exist.
Martin: Yeah, we can vouch for that.
Yana: Well, you can't, right?
Yana: So you're hearing my voice.
Yana: I could be generating that.
Martin: Well, something exists.
Martin: That's all I know.
Martin: Earlier on, you said that bioinformatics is extremely creative because you have to figure out which questions you can answer, with which methods and what data you need for that, and what are the limitations and so on.
Martin: So do you think there's a specific way in which you can address these questions?
Martin: I mean, something that helps you with very specific type of creativity that you need there?
Yana: That's a very good question.
Yana: I don't know.
Yana: I mean, I think that there are some algorithms baked in.
Yana: So some things you try just by default before you even think too much about it.
Yana: And they usually open up doors to where you should be going.
Yana: So if you try something that has been tried, and you get a result that is weird, then you know where to turn to see if you can reproduce the weirdness or the result.
Martin: So your advice then would be to just start some very preliminary analysis and meditate over the results?
Itai: Look for weirdness.
Itai: Look for the weirdness.
Yana: That's exactly it.
Yana: That's looking for weirdness, right?
Yana: And for me, my advisor also had a really good thing to say about that.
Yana: He said if someone reports 99% accuracy, I don't believe anything they said.
Yana: So biology is not going to be precise, right?
Yana: So you have to have some room first, right?
Yana: But you also need a baseline.
Yana: So if you have a binary classifier of yes and no, and you have balanced classes and your performance is 55%, that is unlikely a real thing, right?
Yana: You can continue pursuing it.
Yana: On the other hand, if your performance is 99%, that's definitely not a real thing.
Yana: It has to be somewhere in the middle.
Yana: And actually act the same, whether it's 35% or 75%, that doesn't matter.
Itai: What would you say when you're working with a student and when they present you with something weird?
Yana: Did you check your code?
Itai: Yeah, first eliminate the artefact scenario.
Yana: Step one.
Itai: And then step two.
Yana: So if I can explain it to myself in my head for why this result would appear, maybe with a little more forcing of my ideas, that's probably even better, then we can pursue the question of whether that's what actually happened.
Yana: But if I cannot at all explain what I'm seeing and if the student cannot at all explain what he or she is seeing, then we need to completely disassemble this thing and try something else.
Yana: And if we're still getting weird results, I go back to asking the question about the code.
Martin: Okay, so when you say disassemble, you mean trying to ask the same question from a different angle?
Yana: A different question.
Yana: I do a lot of this permutation stuff.
Yana: So I say, now change your labels, change the order of your labels, leave the data as is.
Yana: Are you getting the same performance as you were when the labels were correct?
Yana: If you are, there is a problem.
Yana: So my first questions are always about the code or the data.
Itai: Just so you don't go down a blind alley and waste too much time.
Yana: Absolutely.
Yana: But it also happens very rarely, but it happens that we get an answer which is not like something we expected.
Yana: But then we talk to someone who is usually a biologist, right?
Yana: Someone who is more in depth on the subject that we're looking at, and they go, yeah, yeah, that's okay.
Yana: That could have been.
Yana: And then it's cool.
Itai: You know, it reminds me of a song by The The.
Itai: It was a band called The The.
Itai: And the song goes, if you can't change the world, then change yourself.
Itai: He just keeps singing that.
Itai: If you can't change the world, change yourself.
Itai: But at the very end he says, but if you can't change yourself, then change the world.
Yana: Yeah, yeah, yeah.
Yana: There is a song like that, I think, in every language, no?
Itai: Right, probably.
Itai: Even in science.
Martin: Actually, it reminds me, I may have mentioned this before, it reminds me of a talk I heard when I was an undergraduate student of physics.
Martin: Klaus von Klitzing talked about his discovery of the quantum hall effect.
Martin: And what he was saying is that they saw these weird steps in their measurement curves that shouldn't have been there, and they spent a year trying to find the error in the measurement equipment that was causing them before they finally decided it was real, right?
Martin: So that's a little bit like what you describe, I think.
Martin: You first try to get rid of the weirdness, and only if you can't get rid of it, you have to face it and think about what could it mean biologically?
Yana: Absolutely, but I also don't think it's sequential.
Yana: I think we're all parallel processing things.
Yana: So I think my general assumption is that there is an error in code, or some logic, or data, or whatever.
Itai: Which is usually true.
Yana: Yeah, most of the time it's going to be true.
Yana: But then it's going to be particularly true when, in my experience, when the person doing the work insists that they checked everything times.
Martin: Particularly.
Yana: Because if you checked it times, then there is a reason, right?
Yana: But at the same time, I'm thinking about what would be the explanation for this if it wasn't the bug.
Yana: And that's actually a very cool place to get more questions too.
Yana: They just grow.
Martin: It takes me back to the forest that you went into at the beginning of our conversation, right?
Martin: So now you're planting new trees in that region opened up by that weirdness.
Yana: Yeah.
Itai: Oh, Martin, that was beautiful.
Yana: I don't know how far we can take that analysis.
Yana: That's true.
Martin: But I think if we take it too far, we really are in the arts.
Yana: That's true.
Yana: Someone want to paint that?
Yana: Because I don't know how to draw a comic out of it.
Yana: But I do think that the comics were a big deal.
Yana: It was a big deal for me.
Yana: And the feedback that we got from people, that it helped explain their science.
Yana: There was a poster at the ISMB, this meeting that we go to, that was completely comics based.
Yana: That was excellent.
Martin: That's fantastic, actually.
Martin: Yeah, that's a great way to communicate your science.
Yana: Well, to some people.
Itai: Yeah, I love the one figure there, where it's like this huge Venn diagram monster.
Yana: Vennster, Vennster.
Itai: Vennster, right?
Itai: And everybody's like screaming, and it's like one more category, they said.
Itai: It'll be fine, they said.
Martin: Yeah, that was funny.
Yana: And that's another thing.
Yana: Do you guys find yourselves that you have pictures for a manuscript that needs three?
Martin: Yeah, that order of magnitude, yes.
Yana: Okay.
Martin: But typically they all go into the supplement.
Yana: They still do.
Yana: No, I try to cut them out.
Martin: No, of course, not all.
Yana: Because you think that a picture is worth a thousand words, but it doesn't mean that you don't need , words.
Martin: You need both.
Martin: And actually, I have to admit, when I read papers, I am one of those people that actually read the words and don't look so much at the figures.
Yana: Oh, you're weird.
Martin: Yeah, I know, I know.
Martin: I'm totally weird.
Yana: Check the code.
Martin: I checked it times.
Yana: Well, so I think that may be a good thing, right?
Yana: I think that figures are there for when you're completely lost in the words.
Yana: And I think the big problem is now, I mean, it's both a problem and a blessing, but we are talking about a multilingual scientific community that for some reason wants to use Google Translate to translate the things that they want to say from their language to English.
Yana: And I'm sure you've reviewed papers like this for journals where you are trying to understand what's happening, but you can't because the language is completely out.
Yana: But it looks interesting because of the formulas or the pictures.
Yana: So you try to understand, right?
Yana: And so the idea is that, you know, in the s or s to get a PhD in the US, you needed to speak another language fluently.
Yana: It was often German, for instance, or Russian.
Yana: And you could expect that a person would read these manuscripts in another language.
Yana: It's no longer the case.
Yana: And we also have other languages to add on top of that.
Yana: So everything is being translated to English, but unfortunately, it's keeping the idiosyncrasies of the language where it came from.
Yana: But the formulas and the pictures allow me to stay on track.
Itai: Right.
Itai: The pictures can add a lot.
Itai: You know, I remember when I was a postdoc, my advisor, Craig Hunter, he looked at one figure that I made and he said, you know, Itai, they say a figure can be worth a thousand words, but yours is just worth like five words.
Itai: And I think he had a good point, you know, figures should add a lot if designed right.
Itai: Well, Yana, this has been really great.
Yana: It was fun.
Itai: Thank you so much for joining us.
Itai: Yeah, it's great talking to you.
Martin: Yeah, it was really cool.
Yana: Thank you for having me.