Language Bias

1share

ThinkstockPhotos-475257046 Programmers for Google have discovered sexist bias in their machine-induced word embeddings.

There are many nuances and implications tied up in language. Not only do the meanings of words change over time, but also the associations with a particular word. Often, negative connotations of words are difficult to shake, even as a society moves forward away from racist, sexist, or other outdated behaviors. One area in which this has recently been discovered is word embeddings.

A word embedding is a language model where words are mapped to vectors in a network and represent the context in which they appear. As data is collected on words and phrases, they are embedded in algorithms for reasons such as predictability. These word embeddings serve as a dictionary of sorts for computer programs that would like to use word meanings such as search engines.

There is a glaring problem that researchers have discovered and unearthed in a recent study by Bolukbasi and co titled “Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings.” According to the paper, the algorithms risk amplifying biases that are present in data because of word embedding. They discovered that embeddings trained on Google News articles exhibited female/male gender stereotypes to a “disturbing extent.” Bolukbasi and co are concerned because these embeddings are in widespread use by Google users (among others), and could amplify biases because of the previous embeddings.

The way that the word mappings ended up displaying sexist outputs is wrapped up in how the vectors operate. “First, words with similar semantic meanings tend to have vectors that are close together. Second, the vector differences between words in embeddings have been shown to represent relationships between words,” Bolukbasi and co explain. If the machine was given an analogy puzzle where it was asked “man is to king as woman is to x” the vectors from word embedding could find that x=queen. Similarly if it was asked “Tokyo is to x as Paris is to France” it could find that x=Japan. “It is surprising that a simple vector arithmetic can simultaneously capture a variety of relationships,” says Bolukbasi, “It has also excited practitioners because such a tool could be useful across applications involving natural language. Indeed, they are being studied and used in a variety of downstream applications e.g., document ranking, sentiment analysis, and question retrieval.”

However, the embeddings also pinpoint sexism implicit in text. For instance, if it were asked “man is to woman as computer programmer is to x”, the vector would answer “x=homemaker”.The paper goes on to list further extreme cases in which sexist bias exists, such as: woman is to nurse/ man is to doctor, woman is to cosmetics/ man is to pharmaceuticals, along with she occupations including exclusively socialite, hairdresser, and receptionist, and he occupations including exclusively boss, magician, and philosopher.

This sexist word embeddings happen because any bias in the articles that are collected to make up Word2vec (the vector program used) are captured within the geometry of the vector space. Bolukbasi and co scoff at this in their paper, stating, “One might have hoped that the Google News embedding would exhibit little gender bias because many of its authors are professional journalists.”

Luckily, the team offers a solution. They aim to locate the vector space for he/she word pairs, access all the analogies, and analyze them to see if they are appropriate or inappropriate using Amazon’s Mechanical Turk (an online marketplace for employers to hire actual workers for human intelligence tasks (HITs). The team considers an analogy biased if more than half of the turkers think that it is biased. For example, sewing/carpentry is clearly gender biased, unlike convent/monastery which has little bias.

This has high implications for the change of the way vectors are passed along in a real-world setting. Imagine if a company searches for “computer programmer resume” on a program that uses Word2vec. It may be more likely to have male resumes come up higher before the de-biasing because of the connection between male and computer programmer.

“One perspective on bias in word embeddings is that it merely reflects bias in society, and therefore one should attempt to debias society rather than word embeddings,” say Bolukbasi and co. “However, by reducing the bias in today’s computer systems (or at least not amplifying the bias), which is increasingly reliant on word embeddings, in a small way debiased word embeddings can hopefully contribute to reducing gender bias in society.”
“At the very least,” Bolukbasi concludes, “machine learning should not be used to inadvertently amplify these biases.”

1share

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Building Young Writers’ Stamina

Unlocking the Vietnamese Language: A Student’s Perspective in Saigon

Fastest Growth in US International Students in 40 Years

International Students Returning to US

Building Young Writers’ Stamina

Unlocking the Vietnamese Language: A Student’s Perspective in Saigon

Fastest Growth in US International Students in 40 Years

International Students Returning to US

Forever

Recommended

1-Year

1-Month

Become a member

Supporting Multilingual Learners in Accessing CTE Texts

Canadian Communities Welcome French-Speaking Students and Immigrants

Welsh and Irish Unite in Song

Russian Push in Africa Accompanies Unrest

I Teach Content in Secondary Schools. Do I Need to Teach Reading?

Creating a Community of Readers

Curing Initiative Fatigue

Early Literacy Especially Important for ELLs