Interview by Nashi Gunasekara (SC ’19) and Natasha Anis (PO ’19), Staff Writers
Founder of Text IQ, Apoorv Agrawal is making significant strides in understanding and perfecting how humans and machines can work together for a common cause. Agrawal obtained his undergraduate degree from the Netaji Subhas Institute of Technology in New Delhi, India and went on to pursue his Masters in Science and Ph.D in Computer Science at Columbia University. Since then, Agrawal has published over thirty academic papers, received the IBM Ph.D. Fellowship award for his work as the first author on two patents for IBM’s Watson, and has been cited by a number of prestigious organizations such as the American Banker and Science Magazine. In addition to having a passion for teaching, Agrawal also aspires to test the boundaries of human-machine collaboration in solving some of big businesses’ high-stakes data problems.
Apoorv Agrawal visited the Claremont Colleges on November 13, 2017 to speak at the Athenaeum on Artificial Intelligence and its future impact on areas of law, literature, and organizational relations. Earlier that day, Agrawal was interviewed by two members of the Claremont Journal of Law and Public Policy. In the interview, they discussed matters pertaining to his app (Text IQ), the future of AI in disciplines outside of computer science, and systematic issues that govern human relations.
CJLPP: Artificial intelligence seems to have many different definitions and interpretations. Seeing that AI is driving your application, Text IQ, how do you define artificial intelligence?
Agrawal: It’s very hard to define intelligence to begin with. With computers, they’re really good at multiplying numbers. They can multiply really large numbers whereas human cannot. So, would we say that computers are more intelligent than humans? Probably not. People have certainly simulated machines to pass the Turing test at least to some extent, but there is no notion of machines being able to think like humans. So, it is really hard to define artificial intelligence and intelligence in general. I have a personal definition of what I think constitutes intelligence and that is, the ability to summarize things. Given a news article or a novel, the ability to go through all of this information, understand all of this material, and be able to summarize it in a short, effective fashion—that is intelligence. As humans, this operation occurs every single time we encounter new material. We do it all the time. For example, when you are pursuing so many majors, and begin to understand a subject more and more, you’re able to abstract away from details, and see connections between different disciplines. Internally what’s actually happening is that you’re consuming all this material and you’re summarizing. In the human brain, what we can store has a very finite capacity so, there is a need for us to unlearn. But before we can unlearn, we have to summarize all of the concepts we have already learned in a way that takes up less space and allows for innovation, which we know happens at the intersection of two or more disciplines. This ability to abstract away from details and see connections is something that computers can’t do right now. It’s very hard. But humans can and that is how we’re able to innovate. So, I would say thats my personal definition of intelligence.
CJLPP: Your company advertises itself as “protecting enterprises from high-stakes legal disasters.” What exactly does Text IQ do and how did you come about this idea?
Agrawal: Most of the research I performed at Columbia had to do with literature. We built machines that would read a novel and summarize it around characters and their attractions to each other. This was of much interest to the government because there was nothing like this that existed before. The government has a lot of access to different sorts of communication with all kinds of people and is very interested, of course, in finding terrorists. But [the government] has collected so much data in the process that it does not have enough time to go through it. If [the government] had a tool that could take all of this data, summarize it, and report back what kind of people exist, what kind of conversations are being had, who is being talked about, et cetera, it would make it easier for them to identify sensitive information and therefore, identity potentially dangerous individuals. The U.S. government saw the commercial potential in our technology and granted us some funding to start a company.
It was very interesting the day [the government] first gave us funding to do this research and then start this company because if I had continued on the career path that I was on, which is to become a professor somewhere, all of this research would have died in a lab. And I believe that explains the government’s interest in pushing us to commercialize this technology, because they knew that this kind of technology would have taken someone much longer to reintroduce and therefore, necessitated commercialization. So, that’s how it got started.
Like our websites says, there are a lot of legal compliance disasters out there. Facebook earlier this year was fined over 100 million dollars, that was just one fine, because the EU had trouble with Facebook’s privacy concerns and data policies. Recently, there was a big merger between Aetna and Humana that didn’t go anywhere but Aetna had to pay a fine of a billion dollars. All of these kinds of disasters are costing our economy literally billions, if not trillions, of dollars every year. And with the government investigating criminals, collecting all of this data that people simply do not have the time or resources to go through and look for sensitive information, that is where we come in and help. What our technology does is it takes communication data and unstructured text like e-mails, texts, or any log of conversations, and it is able to understand who exists in the text, what relationships exist, how these relationships evolve, which people are important, what communities there are, et cetera. With this information, our technology summarizes the data and identifies the people and attractions that exist among them given the communication data. Once we understand how an organization functions or how the people in an organization function, then we can zoom into the sensitive information rather quickly.
CJLPP: Text IQ was developed from a program that you created that could read a 19th-century novel and map out character relationships. Why did you choose literature as a way of testing AI capabilities?
Agrawal: We received a grant from National Science Foundation in 2009 and that grant was provided for us to study the communications of employees at ENRON, an accounting company that committed fraud and went bankrupt in early 2000s. After the company shut down, the government took some of ENRON’s emails and gave it to us to see if we could build something that could detect these disasters early on and alert regulatory and legal teams to intervene. Essentially, they gave us this grant to study human relationships and organization functions based on communication data. There were two problems with the data the government provided us. One, was that this was an incomplete data set. We didn’t have communication logs of all of the employees and two, we didn’t have a gold standard. Let’s say we end up building a machine that says person one and person two had a certain type of relationship. Well, since we don’t know what person one or person two’s relationships to the organization are, how will we ever validate our machine? If we built something that didn’t work, then there would be two things that we could blame: either our methodology or the data. If we worked on ENRON emails, it would’ve been easy for us to blame the data because the data provided was incomplete to begin with. That’s why I picked novels.
Novels create a fictional world where characters are introduced, personalities are developed, relationships evolve, all in this confined world. Novels provide a complete picture to start working with, giving us a gold standard for comparison. We did some analysis on Alice and Wonderland and given this story, we know what the role of the rabbit is, we know what the role of the queen is, we know the relations between characters, and what communities exist. Now, if we build something and our prediction is wrong, we know the problem is with our methodology and not with the data because we have the written novel to compare the prediction to. That is the main reason why we worked with literature because it provides a complete data set and we know what the right answer is, so if we build something, we can compare our answer with the right answer.
CJLPP: What are the implications of law in literature, as you have discovered from working on Text IQ?
Agrawal: There is no direct implication, but there is to the extent that we built machines on literature and saw that these machines are working accurately and producing the results we need based on literature, which means our methodology is correct. We decided to take this methodology, apply it to the real world, and see what comes back because the results we got back are now verified as trustworthy and actually make sense. For example, one thing that we discovered is that the queen is in a position of power. The way that we came to the conclusion that the queen is in a position of power is by realizing that a lot of characters talked about the queen; however, the queen, herself, interacted with very few people and the set of people she interacted with was very different from those who talked about her. If you think about it, this is similar to celebrities in the real world. A lot of people talk about Obama, for example, but he only interacts with a very particular set of people. Now given a data set, if we want to find those who have status or influence, we can look for this pattern. That is something that we learned and can more or less be directly applied to our work in the real world by training our machines to detect similar patterns are come up with similarly meaningful findings and/or answers.
CJLPP: What are your next steps in transforming the legal field with AI? Do you have plans for future applications that will influence the way law is practiced?
Agrawal: I do see a mathematical algorithm that will serve as a model for true democratic governance where there will be no centralized power, but rather a public entity or quantity that will govern the world. This kind of bleeds into the research around block chains, which is a very interesting mathematical model that builds a self-sustaining world where people are actually motivated to work and if someone tries to rig the system, then they are automatically thrown out of the system. You don’t need police or law to come in, if you did something wrong, and punish you because it is a self-sustaining system. Everyone in this model of governance is essentially motivated and incentivized to tell the truth and do good work and so on. Eventually, I think that there will be a model of governance where police, courts, lawyers, and even notions of nation states will be obsolete. That is, of course, far into the future though.
I was actually talking to a federal judge the other day and I was talking to him about how the law defines everything. Everyone wants it to be objective and fair across the board, but even then we need inherently subjective humans to help with the education of law. Clearly there is a problem. He said that the problem stems from the Constitution, which is written subjectively. A lot of the things in there are up for interpretation. Now, when we’re building laws to honor and protect this Constitution, by implication, we’re building on top of something subjective and are going to receive subjective results. In order to build law that is objective so that we don’t need humans, very expensive humans, to come in and interpret it, in some sense, the Constitution needs to be rewritten. It needs to be rewritten in an objective way, which gets into this whole problem of language, because language is inherently subjective and open to interpretation. There are some languages, like Sanskrit for example, in which if you say something it only means one thing. It can’t mean two things. For example, if you say “I shot the elephant in my pajamas” something like that won’t exist. If the elephant is in my pajamas there will be one way of saying that, and if I’m in my pajamas and I shot the elephant there will be another way of saying that. So, there are certainly languages that allow for coding or writing something very objectively. English is very ambiguous language. Maybe the Constitution needs to be rewritten in a language that offers that level of objectivity.
As a company, we are not doing anything to make law more objective, however, I do see that there are a lot of other startups out there that try to disrupt law. Text IQ comes into a context of litigation, when there are two parties that try to sue each other, and we help them find sensitive info, but there are other companies that are doing case law research and training machines to read case law so that if a new case comes in, it can make suggestions based on precedent and applicability, as well as create arguments to be presented in court. There is also a lot of disruption on the immigration side, a lot of startups building AI systems that can help with immigration so you don’t have to go to an immigration attorney that costs a lot of money. Law in general is being disrupted from many different angles. And of course lawyers and the whole institution are trying to guard against this disruption but I think it’s absolutely inevitable. I can’t wait to see when law becomes more objective, more fair, and less expensive.
CJLPP: Many argue that while the law itself may be “color blind,” interpreters of the law such as judges, lawyers, and jurors often apply the law unevenly due to biases generated by either their or their client’s race, class, sexuality, gender, education background, etc. How do you think AI will solve issues of uneven application of the law due to legal interpreters’ biases? I know you talked about subjective language, but can you address AI with the intersection of race, gender, sexuality, et cetera?
Agrawal: That’s a really interesting question, and one I haven’t thought a lot about, so I’m going to think out loud. First, think about similarities between how lawyers function and how doctors function. A patient walks into a doctor’s office, the patients says these are my symptoms and doctor needs to come up with a diagnosis. The doctor is like: “hey, in all the cases I’ve seen there are 80 cases where patients had very similar symptoms and therefore I believe this is the diagnosis, with 80 percent probability.” Very analogously this is what happens with law. Someone walks into a lawyer’s office: “hey, here’s what happened to me and I want to sue this person for these reasons, what do you think? What should I do, what is the diagnosis of this problem?” Now, the lawyers going to go back and look at all the case law that has happened, and of course they have their own knowledge. But then they’re going to start citing case law from the past and so on and come up with an argument to be presented in court in defense of their client.
A lot of biases can go into this. First of all, a lawyer may not find the right case law, so AI can come in and help because AI can read many more documents and case law and suggest what’s applicable in these matters based on what happened in the past, so it can help lawyers present better cases in court. But at same time to make things fair, if a judge is veering toward one decision, AI can come up with a summary or suggest to the judge that this case is similar to ten other matters that happened in the past, this is how it’s similar or different, and maybe the decision should be this and not that. Or, it can self correct for human biases by recommending or arguing that given this line of thought this is what the outcome should be. I think this is how it can help to recommend or correct for certain biases. But it is complex, because if these biases have existed in past case law as well, such as gender biases which we know for a fact exist, the machine will learn these biases. How is it ever going to self-correct for this?
CJLPP: You suggested earlier that lawyers and law will become obsolete because we have this governing system, and I think that you used the example of block chains, could you speak a little more about that? What would that look like what are the implications of such a system?
Agrawal: I’ve been kind of living under a stone running a company so I haven’t had a lot of time to make progress on these things, so honestly I’d love to to spend a month reading about blockchains. I’m certainly not an expert. But from what little I know or learned I think the implication will be that we will no longer need institutions such as the police making sure that law is implemented. We won’t need people to come in and interpret a contract or thing that parties have agreed upon because it will be automatically baked into self-governing. So there are certain institutions that will go away. It will be a true, absolutely true, example of democracy because there will be no governing bodies. I really think that currently democracy is an illusion and there’s a lot of marketing that goes into giving us an illusion that we’re making choices. But we’re really not, so I think that’s another implication, that’s how we will achieve true democracy in the world.
CJLPP: Where does human choice in the democracy come into this, if there is an AI regulating body making these decisions instead?
Agrawal: There will actually be no regulating body making the choice or no set of humans in charge of regulating anything. There will be no voting, you will not be electing a leader because there is no leader. It’s a self-sustaining system. For example, in block chains you produce a finite number of bitcoin and people are doing computations to collect these bitcoins, so there is a model of incentives that makes the system run. If you want to run the world like a block chain, or humans like a block chain, someone will need to come up with this model of incentives. I think that’s a hard problem, definitely non-trivial. The other problem is that once you’ve come up with a model of incentives, how will this model actually evolve, because incentives will need to change. They will be tied to a common goal that we want to achieve in society, and once we achieve this goal how will we come in to define new goals and create another model of incentives to let the world run? There’s all these things that need to happen, and it is pretty challenging.
CJLPP: So this might be a good segue into talking about how you think AI should be regulated, if at all, and what current regulations exist. At what point do you think the government should intervene to regulate AI?
Agrawal: So, the federal government is already regulating AI. I actually met with some Senators and Congressmen at the White House a few weeks ago and some presenters there said these kind of, I would say, very irresponsible things, like AI will bring the world to war. I don’t think that’s going to happen. I think it just causes so much anxiety for no good reasons. They are trying to regulate self-driving vehicles because one of the purposes of AI essentially is automation. You can automate discovery of evidence from documents and you can automate driving cars. Discovery of evidence from documents, that’s not going to physically hurt a human or anything because that’s just software running on hardware. That’s not moving around. But automation of vehicles could hurt humans, and that’s a matter of life and death. I think when it becomes a matter of life and death, it needs to become regulated. The government is working very hard to regulate it, in fact they have to make so much progress in such a short time because they need to be ready for level four vehicles on the roads by 2021. So that’s kind of the timeline they’re working with, it’s very crunched.
CJLPP: What do you mean by level four vehicles?
Agrawal: There are only five levels of automated vehicles, I don’t exactly know which level means what, but from what I hear level four is literally a driverless car that will come pick you up and drop you off. Right now, still for the last mile we need humans behind the wheel, but as we go higher in levels that goes away and these vehicles become completely autonomous. All of this obviously will need to be regulated. For example, if I own a self-driving vehicle that crashes into someone else’s, but it wasn’t my mistake, that was the software’s mistake, who is to be blamed? It’s also a big security question, if someone hacks into the system they can literally crash and have cars jump off highways…. we’ll see new ways of terrorist attacks. All of that needs regulation from a security aspect. Now there are some high-profile figures arguing we should regulate all of AI, I really don’t think there’s a need for that. I think AI that is just software, and that’s just literally helping humans do their jobs better: data entry, data cleaning. You don’t need to regulate that, how would you even go about regulating that? I don’t know. The part of AI that absolutely needs to be regulated is AI that becomes hardware and can walk and talk and be out in the world interacting with humans, because now it has the ability to hurt humans and change the environment, and the government is working on regulating that. Just AI in general, I don’t think that needs to be regulated. It will only hurt research and the progress of science and won’t be good for society.
CJLPP: I know you talked about AI and its influence beyond the sphere of the legal field and leading into politics as a governing body. Where do you feel like AI will seep into next? Where do you think it’s not appropriate or effective? Additionally, do you think AI is a cure-all for human caused error? How do you see AI with regards to human relationships?
Agrawal: It’s very hard to say because “most effective” is very relative. I think there are three purposes of AI, there could be more of course, but these are the ones that I see. First, of course, automation. Any task that humans are doing that’s monotonous, non-creative, you bring in AI to automate it. Another thing I think is… I don’t have the right phrase but I’m calling it self-realization. AI that helps us discover where we came from, laws of nature, or the process of evolution. The third purpose is accelerating creatively. Certainly, I see a lot of mathematicians who didn’t used to use computers and now do. One thing that machines can do really well is create a lot of combinations. If we define creativity as discovering new sets of combinations of a given set of objects, then actually machines are in a lot of ways more creative than humans. Now researchers use machines to produce a set of combinations and then look at sets and decide which ones are appropriate. That’s obviously accelerating creativity, because for them to get to this place without machines would have taken a much longer time. I think that’s a third purpose of AI. One could argue which is most effective versus the other. In our day-to-day lives automation is very effective. For bigger progress of science, accelerating creativity. To look into the past understanding where we came from I think AI would be effective in helping us simulate evolution, and then we can see which of these simulations ends up producing the world that we live in. If we are able to find that simulation that is governed by certain set of physical laws, we would have discovered how we came into being, so that’s also very effective. It has the potential to affect our understanding of the past, obviously it’s affecting the present, but it can also affect the future.
CJLPP: That you so much for speaking with us. We really appreciate it.