State of the Internet’s Languages

Stay tuned for an upcoming research report in 2022 on the State of the Internet’s Languages!

People often assume the internet is a “global” public space, but it does not (yet) speak, write, or recognise most human languages. Of the 7000+ spoken languages in the world, only about 500 are estimated to be represented online, with English and Chinese dominating. Taking language as a proxy for knowledge, most human knowledge – especially from and by marginalized communities – is not represented in this critical information infrastructure. This is a key challenge for digital rights, and a significant manifestation of knowledge injustice.

When marginalized communities cannot create in their own languages on the internet, this reinforces and deepens inequalities that already exist offline. Most critically, those of us who are the primary consumers of digital content and infrastructure are still not the producers nor the decision-makers of its design, architecture, substance, and experience. The effort to change this – to re-imagine the internet and re-design digital knowledges – needs a multitude of us working together.

Why is this important?

No one has good enough data on the online inequalities and gaps around language so far. At the same time, there isn’t sufficient awareness around the subset of research that already does exist around language gaps online – including data from Wikipedia and the broader internet compiled by the Centre for Internet and Society, and the Oxford Internet Institute. We also need to learn more about what technical tools and resources (particularly free and open source) already exist for language preservation and amplification, or what needs to be created from scratch, in order to prioritize future interventions.

What are we doing now?

We’re creating a State of the Internet’s Languages Report, to be released later in 2022, together with our research partners at the Centre for Internet and Society (CIS) and Oxford Internet Institute (OII). This report will include baseline research with both numbers and stories, looking at the issue from a variety of perspectives and contexts. The report is intended to raise awareness and help prioritize future actions. We believe it will demonstrate how far we are currently from a truly multilingual internet, and offer ideas for taking action.

After this first baseline research is shared, we’ll be working with communities and partners from around the world, to prioritize and support actions that address these critical gaps in the internet’s languages.

What have we done so far?

In August 2019, we launched a call for contributions and reflections on “Decolonising the Internet’s Languages”. We received 50 submissions in over 38 languages, from which nine proposals were selected – covering a wide range of topics, regions, experiences and languages. These works and a few others will be published as part of the State of the Internet’s Languages Report.

In October 2019, we hosted thirty participants from around the world to scheme about “Decolonizing the Internet’s Languages”. It was a diverse group of thoughtful, powerful folks who recognise that language is a proxy for knowledge, and who want to reclaim our many languages beyond English on the internet. Much of what we’ve learned from participants at this convening will be rolled into our upcoming State of the Internet’s Languages Report as well.