State of the Internet’s Languages

On 23 February 2022 (just after the International Mother Language day), we launched the first ever State of the Internet’s Languages Report, together with our research partners at the Centre for Internet and Society (CIS) and Oxford Internet Institute (OII). This report includes a baseline research with both numbers and stories, mapping some of the ways in which languages are represented online, while taking into account a variety of perspectives and contexts. The report is intended to raise awareness and help prioritize future actions. We believe it will demonstrate how far we are currently from a truly multilingual internet, and offer ideas for taking action.

As this first baseline research is shared and amplified, we’ll be working with communities and partners from around the world, to prioritize and support actions that address these critical gaps in the internet’s languages.

Read the report ➝

Why is this important?

People often assume the internet is a “global” public space, but it does not (yet) speak, write, or recognise most human languages. Of the 7000+ languages in the world (spoken and signed), only about 500 are estimated to be represented online, with English and Chinese dominating. Taking language as a proxy for knowledge, most human knowledge – especially from and by marginalized communities – is not represented in this critical information infrastructure. This is a key challenge for digital rights, and a significant manifestation of knowledge injustice.

No one has good enough data on the online inequalities and gaps around language so far. At the same time, there isn’t sufficient awareness around the subset of research that already does exist around language gaps online – including data from Wikipedia and the broader internet compiled by the Centre for Internet and Society, and the Oxford Internet Institute. We also need to learn more about what technical tools and resources (particularly free and open source) already exist for language preservation and amplification, or what needs to be created from scratch, in order to prioritize future interventions.

When marginalized communities cannot create in their own languages on the internet, this reinforces and deepens inequalities that already exist offline. Most critically, those of us who are the primary consumers of digital content and infrastructure are still not the producers nor the decision-makers of its design, architecture, substance, and experience. The effort to change this – to re-imagine the internet and re-design digital knowledges – needs a multitude of us working together.

What have we done in the past?

In August 2019, we launched a call for contributions and reflections on “Decolonising the Internet’s Languages”. We received 50 submissions in over 38 languages, from which nine proposals were selected — covering a wide range of topics, regions, experiences and languages. These works and a few others have been published as part of the State of the Internet’s Languages Report.

In October 2019, we hosted thirty participants from around the world to scheme about “Decolonizing the Internet’s Languages”. It was a diverse group of thoughtful, powerful folks who recognise that language is a proxy for knowledge, and who want to reclaim our many languages beyond English on the internet. Much of what we’ve learned from participants at this convening have been rolled into our State of the Internet’s Languages Report as well.