Been thinking about machine bias again recently

Amazon AR haircut

Yesterday I met up with some friends to celebrate a birthday. We went to the Wharf in Castlefield, Manchester. Nice outdoor space with a massive teepee to help with Manchester’s typical rainfall.

I had a few drinks so visited the toilet a few times and of course washed my hands well so needed to dry them. A few times I tried the hand dryer but there was a red light, so assumed it wasn’t working from a fault or due to the spread of germs? Once I noticed the paper towels were refilled and used that.

However the last time I went in there was white man using the hand dryer, I was surprised and naturally thought it must be fixed now. So afterwards attempted to use it. Did it work, did it heck!

This doesn’t come as too much of a shock as its not the first time and there are many examples on youtube. However with a lot more knowledge now, I’m pretty peed off about it. I wanted to record it but needed a white hand to trigger it and at the end of the night, very few people would join my video experiment. I can tell you I moved, flipped, waved, even touched the sensor with my hand. Nothing would trigger it.

After returning to the table, I asked if the men had used the hand dryer but didn’t get a clear yes or no. So I’ll have to go back to the Wharf soon to film this I think.

Another interesting point also came up after the hand dryer discussion.

Amazon opens its first hair salon, where customers can use augmented reality to experiment with hair colors

I instantly wanted to know if Amazon’s AR app will actually work on non-white people? From all the press pictures, its all pictures of white skin women. If it doesn’t work on non-white skin, expect an explosion of coverage, but it would speak volumes about the total bias of this whole industry. Something many have covered but watching Coded Bias during Mozfest made super clear.

Its fascinating to see how the question, how normal am I hits home

How normal am I?

I was watching the NGI Policy Summit last week and it was good. Lots to take away but I found What your face reveals – the story of HowNormalAmI.eu. Stuck out as one of the highlights.

Dutch media artist Tijmen Schep will launch his latest work – an online interactive documentary that judges you through your webcam, and explains how face detection algorithms are invisibly pervading our lives. Can we really asses someone’s beauty, BMI or even life expectancy from just a photo of their face? After experiencing his creation, we’ll dive into the ‘making of’ and emerge with a better understanding of what face detection AI can – and cannot – do.

If you haven’t seen it, give it a try.

But I found the social media responses really interesting. It seems half the people are talking and sharing their data, while the other half are talking about the details. People can’t help themselves and compare the details although they know its bias.

Its a provoking art project and deserves to be watched fully. Theres plenty of details here once you watched/experienced it.

Clearview AI GDPR’s reply

Today I got my reply from Clearview AI after I submitted my request

Clearview AI GDPR request submitted

The reply was short…

Subject: No Results

Hello,
You are receiving this email as a response to your request for data access. After running the photo provided through our algorithm, no results were found.
You can click here to learn more about how Clearview collects the images that appear as search results, and how those images are used and shared.
Regards,
Clearview Privacy Team
I don’t buy it… and feel like I should try again with a slightly different picture for reference. I was looking forward to reporting them to the ICO, although they never followed up on my houseparty complaint.

Clearview AI GDPR request submitted

Clearview AI

There is so much to say about Clearview AI. If you never heard of them, well put it this way…

They have amassed a database of peoples faces by illegally scraping the likes of facebook, twitter, instagram, youtube, flickr, etc, etc… All the companies have sent legal cease and desists but Clearview don’t seem to really care too much. Recently they were hacked allowing exposing all those pictures and training data to attackers.

Because of this and my experience with the IBM Dif project, I wanted to know if I’m in the database and the best way to find out is to send a GDPR request. This all follows my GDPR request from Houseparty just recently,

I think they have gotten serious about the EU and the UK because I didn’t need to send my usual email. I filled in the form using my junk mail and used my Estonian digital ID for verification.

Look forward to seeing what comes back. I’m expecting quite a lot.

Of course IBM, Microsoft and Amazon have backed away (for now) from their facial recognition systems because the huge amount of bias of the datasets have against black people. We will see how long they will keep this line over the year and next year?

Update
In my inbox from for the two requests…

EU/UK/Switzerland Data Access Form Request
EU/UK/Switzerland Data Objection Form Request

This e-mail is to confirm that we have received your EU/UK/Switzerland Data (Access/Objection) Request. We will get back to you as soon as possible.

Sincerely,

Team Clearview

IBM Dif project returns the full list of photos scraped without consent

Then I got a further 2 replies from IBM. One of them is IBM asking if I want my GDPR data for everything regarding IBM? But the 2nd one is from IBM Diversity in faces project.
Thank you for your response and for providing your Flickr ID. We located 207 URLs in the DiF dataset that are associated with your Flickr ID. Per your request, the list of the 207 URLs is attached to this email (in the file called urls_it.txt). The URLs link to public Flickr images.
For clarity, the DiF dataset is a research initiative, and not a commercial application and it does not contain the images themselves, but URLs such as the ones in the attachment.
Let us know if you would like us to remove these URLs and associated annotations from the DiF dataset. If so, we will confirm when this process has been completed and your Flickr ID has been removed from our records.
Best regards,
IBM Research DiF Team

So I looked up how to Wget all the pictures from the text file they supplied. and downloaded the lot, so I can get a sense of which photos were public/private and if the licence was a conflict. I think hiding behind the notion of the link is a little cheeky but theres so much discussion about hyperlinking to material.

Most of the photos are indeed public but there are a few which I can’t find in a public image search. If they are private, then somethings wrong and I’ll be beating IBM over the head with it.

Facial recognition’s ‘dirty little secret’: Millions of online photos scraped without consent

By Olivia Solon

Facial recognition can log you into your iPhone, track criminals through crowds and identify loyal customers in stores.

The technology — which is imperfect but improving rapidly — is based on algorithms that learn how to recognize human faces and the hundreds of ways in which each one is unique.

To do this well, the algorithms must be fed hundreds of thousands of images of a diverse array of faces. Increasingly, those photos are coming from the internet, where they’re swept up by the millions without the knowledge of the people who posted them, categorized by age, gender, skin tone and dozens of other metrics, and shared with researchers at universities and companies.

When I first heard about this story I was annoyed but didn’t think too much about it. Then later down the story, its clear they used creative commons Flickr photos.

“This is the dirty little secret of AI training sets. Researchers often just grab whatever images are available in the wild,” said NYU School of Law professor Jason Schultz.

The latest company to enter this territory was IBM, which in January released a collection of nearly a million photos that were taken from the photo hosting site Flickr and coded to describe the subjects’ appearance. IBM promoted the collection to researchers as a progressive step toward reducing bias in facial recognition.

But some of the photographers whose images were included in IBM’s dataset were surprised and disconcerted when NBC News told them that their photographs had been annotated with details including facial geometry and skin tone and may be used to develop facial recognition algorithms. (NBC News obtained IBM’s dataset from a source after the company declined to share it, saying it could be used only by academic or corporate research groups.)

And then there is a checker to see if your photos were used in the teaching of machines. After typing my username, I found out I have 207 photo(s) in the IBM dataset. This is one of them:

Not my choice of photo, just the one which comes up when using the website

Georg Holzer, uploaded his photos to Flickr to remember great moments with his family and friends, and he used Creative Commons licenses to allow nonprofits and artists to use his photos for free. He did not expect more than 700 of his images to be swept up to study facial recognition technology.

“I know about the harm such a technology can cause,” he said over Skype, after NBC News told him his photos were in IBM’s dataset. “Of course, you can never forget about the good uses of image recognition such as finding family pictures faster, but it can also be used to restrict fundamental rights and privacy. I can never approve or accept the widespread use of such a technology.”

I have a similar view to Georg, I publish almost all my flickr photos under a creative commons non-commercial sharealike licence. I swear this has been broken. I’m also not sure if the pictures are all private or not. But I’m going to find out thanks to GDPR

There may, however, be legal recourse in some jurisdictions thanks to the rise of privacy laws acknowledging the unique value of photos of people’s faces. Under Europe’s General Data Protection Regulation, photos are considered “sensitive personal information” if they are used to confirm an individual’s identity. Residents of Europe who don’t want their data included can ask IBM to delete it. If IBM doesn’t comply, they can complain to their country’s data protection authority, which, if the particular photos fall under the definition of “sensitive personal information,” can levy fines against companies that violate the law.

Expect a GDPR request soon IBM! Anything I can do to send a message I wasn’t happy with this.

Google apologizes again for bias results

Google once again was in hot water for its algorthim which meant looking up happy families in image search would return results of happy white famalies.

Of course the last time, Google photos classified black people as gorillas.

Some friends have been debating this and suggested it wasn’t so bad, but its clear that after a few days things were tweaked. Of course Google are one of many who rely on non-diverse training data and likely are coding their biases into the code/algorithms. Because of course getting real diverse training data is expensive and time consuming; I guess in the short term so is building a diverse team in their own eyes?

Anyway here’s what I get when searching for happy families on Friday 2nd June about 10pm BST.

logged in google search for happy families
Logged into Google account using Chrome on Ubuntu
incognito search for happy families
Using incognito mode and searching for happy families with Chrome on Ubuntu
Search for happy families using a russian tor and chromium
Search for happy families using a russian tor node on Chromium on Ubuntu