This year’s annual report shows that AI is mainstream. New themes emerging in the document show the increasing interest from regulators and politicians as well as expanding areas within the domain itself such as ML Ops, application to biology and medicine, and ethical considerations.

The dangers of increasing model complexity

Of note in the research section of the paper is the increasing complexity of the AI models that are being created. More often than not these are owned by corporations with deep pockets as it can cost millions of dollars to develop and train the state of the art models. In some cases the results are put into the public domain to allow academics and smaller businesses to build on their shoulders.

Illustrating the number of parameters of AI models / Source: Microsoft

There are two significant consequences of these giant models. In certain domains and tasks they perform as well as humans and could therefore be used for nefarious purposes. For example one of the models (GPT-3) is capable of generating several pages of human standard text and a concern is that a bad actor could use such technology for mass fake media messaging. At the moment this model is being kept behind closed doors and only an API is being made available rather than the model itself.

The second consequence is that it is incredibly difficult to understand how the model is making the inferences it does. With over 175B parameters, the sheer size and complexity of the model makes it difficult to explain how it came to its predictions. These models are not true AI and Dr. Noura Al-Moubayad at Durham University remarks: “they don’t ‘comprehend’ the knowledge behind the text and they capture all the facets of the data. This means they can produce sexist, racist and homophobic text and is another reason that ethical machine learning is becoming increasingly important.”

Data set bias

Our work with Durham University Computer Science Department and our paper published at BMVC 2020 shows how important it is to really understand the data you are working on and how your machine learning behaves on that data. We illustrated this by showing how our analysis of the TVQA dataset allowed us to show the inherent textual bias in the data and go on to achieve state of the art results on video question answering by simply changing the way the textual part of the data was modeled. We also showed only a small fraction of the dataset, built to allow researchers to advance multi-modal question-answering, was truly multi-modal.

The Great Brain Drain

The report highlights the significant movement of academics to business in the US and the potential conflict of interest of the ‘sponsored professorships’. 

In the UK, we are not seeing quite the same level of change and Dr Noura Al-Moubayed at Durham University says:

“I see working with/in the industry is generally a positive development, but the scale of it is concerning. Research in industry seems to follow the latest trends without much attention to other areas of research that might lead to the next technical revolution. This risks the field going in narrow rabbit holes which might stifle innovation.

— Dr Noura Al-Moubayed, Durham University

Ethical machine learning and artificial intelligence gains momentum

We take a strong ethical stance at Carbon and this aligns well with the changes we are seeing in the ad tech landscape and the political theme of the report. As we see greater and tighter regulation there will be new opportunities. An example of this is how in our Knowledge Transfer Partnership project with Durham University Maths and Statistics Department we are developing new algorithms to extract more information from individual web pages and using this as contextual information rather than information associated with a particular web user.

In some quarters we see a slow adoption rate with respect to the new privacy policies and regulations. Our work with Teesside University on compliance scoring of websites together with data science based analysis will allow us to help work with our clients to ensure we operate in a privacy safe internet.  

Another area, highlighted in the report, may provide a solution for organisations who want to use people’s information for their commercial goals. Privacy safe federated learning may be a way to create democratic, privacy safe models of users’ behaviours. Keeping the data and the learning on the user’s device will give users comfort of security and perhaps open a way for a value exchange for allowing their data to be used.


Let’s finish off with a couple of predictions of our own :

1. AI saves a high profile life An AI health system using devices such as the Apple watch predicts a heart attack, calls the medical service and saves a life.

2. AI partners with the legal profession : Rather than just being on the receiving end of the judiciary AI will start to become embedded within it and we will see a new AI solicitor service that will offer a legal service such as conveyancing.

Check out the State of AI 2020 report for yourself.

Carbon is now part of Magnite.