IBM DIF project removes my flickr urls


Hopefully the final follow up from my post about facial recognitions dirty little secret millions of online photos scraped without consent. and the update.

Thank you for your prompt response. We confirm that we have deleted from the DiF dataset all the URLs linked to your Flickr ID and associated annotations. We have also deleted your Flickr ID from our records. IBM will require our research partners to comply with your deletion request and provide IBM with confirmation of compliance.

Best regards,

IBM Research DiF team

End of the matter, although part of me wants to contact everybody in the photos and tell them what happened. Not sure what that would achieve however?

Waking up to the fire, Zucked…

Its been interesting to hear and read about Roger McNamee and his new book Zucked: Waking Up to the Facebook Catastrophe.

I’ve listened to a few podcasts with him talking away and I find it insightful.

I took him talking with a massive bucket of salt being a investor in Facebook in the past and even now. To be fair I heard about Roger before in previous articles such as An Apology for the Internet — From the Architects Who Built It, The Ugly Unethical Underside of Silicon Valley and Our minds can be hijacked’: the tech insiders who fear a smartphone dystopia

Roger McNamee, a venture capitalist who benefited from hugely profitable investments in Google and Facebook, has grown disenchanted with both companies, arguing that their early missions have been distorted by the fortunes they have been able to earn through advertising.

He identifies the advent of the smartphone as a turning point, raising the stakes in an arms race for people’s attention. “Facebook and Google assert with merit that they are giving users what they want,” McNamee says. “The same can be said about tobacco companies and drug dealers.”

That would be a remarkable assertion for any early investor in Silicon Valley’s most profitable behemoths. But McNamee, 61, is more than an arms-length money man. Once an adviser to Mark Zuckerberg, 10 years ago McNamee introduced the Facebook CEO to his friend, Sheryl Sandberg, then a Google executive who had overseen the company’s advertising efforts. Sandberg, of course, became chief operating officer at Facebook, transforming the social network into another advertising heavyweight.

McNamee chooses his words carefully. “The people who run Facebook and Google are good people, whose well-intentioned strategies have led to horrific unintended consequences,” he says. “The problem is that there is nothing the companies can do to address the harm unless they abandon their current advertising models.”

Thanks to Herb who recommended links to me.

Updated….

I knew I heard him 3 times in recent times and thanks to Eastmad for reminding me he was on Team Human too. I actually recommended him to Herb.

https://twitter.com/cubicgarden/status/1108643229833015302

IBM Dif project returns the full list of photos scraped without consent

Then I got a further 2 replies from IBM. One of them is IBM asking if I want my GDPR data for everything regarding IBM? But the 2nd one is from IBM Diversity in faces project.
Thank you for your response and for providing your Flickr ID. We located 207 URLs in the DiF dataset that are associated with your Flickr ID. Per your request, the list of the 207 URLs is attached to this email (in the file called urls_it.txt). The URLs link to public Flickr images.
For clarity, the DiF dataset is a research initiative, and not a commercial application and it does not contain the images themselves, but URLs such as the ones in the attachment.
Let us know if you would like us to remove these URLs and associated annotations from the DiF dataset. If so, we will confirm when this process has been completed and your Flickr ID has been removed from our records.
Best regards,
IBM Research DiF Team

So I looked up how to Wget all the pictures from the text file they supplied. and downloaded the lot, so I can get a sense of which photos were public/private and if the licence was a conflict. I think hiding behind the notion of the link is a little cheeky but theres so much discussion about hyperlinking to material.

Most of the photos are indeed public but there are a few which I can’t find in a public image search. If they are private, then somethings wrong and I’ll be beating IBM over the head with it.

Is everything public on social media fair game?

https://www.flickr.com/photos/xdxd_vs_xdxd/6829374609/

Of course I would say not exactly, especially in the face of the  IBM’s Diversity in Faces project which I wrote about here and got a initial reply here. But its a interesting question which prompts the post, scientists like me are studying your tweets are you ok with that?

“Public” is the magic word when it comes to research ethics. “But the data is already public.” That was the response from Harvard researchers in 2008, when they released a data set of college students’ Facebook profiles, and from Danish researchers in 2016, when they released a data set scraped from OKCupid. The regulatory bodies that oversee research ethics (like institutional review boards at U.S. universities) usually don’t consider “public” data to be under their purview. Many researchers see these review boards as the arbiters of what’s ethical; if it’s not something that the boards care about, then it can’t be unethical, right?

Whether the data is public or not is important for ethical decision-making — in fact, it’s necessary.

There is a old-school hacker thing, that anything public is public and if you don’t want it public don’t put it online. But to be fair that idealistic view before the likes of cloud services broke the notion badly.

However there is a question for research which upholds its self above the likes of commercial companies.  I know being in the research field myself, research and the ethics boards are really strict with this all. To be fair I’m glad of this because I’ve seen too many bad uses of public data including semi-public (dating site data for example) and heck private data!

As researchers, we have a responsibility to acknowledge that factors like the type of data, the creator of that data, and our intended use for the data are important when it comes to using public information. These factors must inform the decisions we make about whether and how to collect data and to report findings. I hope the work that my collaborators and I are doing will help to inform best practices, so that, in the end, we can continue to contribute great science to the world while also respecting the people who share their data with us every day.

Absolutely!

Now can some tell IBM this too?

Reply from IBM about my online photos scraped without consent

Diversity in Faces(DiF)

Following my post about facial recognitions dirty little secret millions of online photos scraped without consent. I got a reply from Flickr and IBM’s Diversity in Faces project.
First Flickr’s automated email…

Hi ian,

Thanks for reaching out to us!

We’ve received your message and will be responding as quickly as possible. In the meantime, do visit the Flickr Help Forum and our Help Center as the answer to your question may be found there.

We look forward to connecting and will be in touch soon.

Cheerfully,
The Flickr Team

Already Pro? Then expect a response shortly, because you are already in our VIP queue! (Make sure to write to us using the email address on your Pro account.)

Dear Ian,
Thank you for your email.
The Diversity in Faces (DiF) project, referenced in your request below, is a non-commercial, research initiative. The DiF dataset includes a list of URLs (but not the images themselves), linking to publicly available images on Flickr under certain creative commons licenses, along with associated annotations. We have taken great care to ensure that the DiF dataset does not include Flickr IDs or any other Flickr identifiers of individuals.
In order to respond to your request, we will need to locate the URLs in the DiF dataset that are linked to your Flickr ID (if any). To do this, we will need your Flickr ID, along with your express consent to use it for the sole purpose of locating such URLs and responding to your request.  Separately, if you would like us to, we can remove any URLs of images linked to your Flickr ID from the DiF dataset.  Please confirm this by reply.
After conducting our search, we will delete your Flickr ID from our records, and if you so request, we will also remove any URLs and associated annotations from the DiF dataset connected to your Flickr ID. We will confirm when this process has been completed.
With respect to your request to access your personal data processed by IBM outside the DiF project, you will be contacted separately by the IBM Data Subject Rights Operations Team (Email at ibm.com) to proceed with your request.
Let us know if you have any questions or how we can further assist you with your request.
IBM Research DiF Team

Talent is universal, but opportunity is not

Tanitoluwa Adewumi

I do love the story of Tanitoluwa Adewumi and the New York times has it right on the button – Talent is universal, but opportunity is not.

In a homeless shelter in Manhattan, an 8-year-old boy is walking to his room, carrying an awkward load in his arms, unfazed by screams from a troubled resident. The boy is a Nigerian refugee with an uncertain future, but he is beaming.

He can’t stop grinning because the awkward load is a huge trophy, almost as big as he is. This homeless third grader has just won his category at the New York State chess championship.

“It’s an inspiring example of how life’s challenges do not define a person,” said Jane Hsu, the principal of P.S. 116, which held a pep rally to celebrate Tani’s victory. Hsu noted that while Tani lacks a home, he has enormously supportive parents dedicated to seeing him succeed.

Tani’s mom can’t play chess but takes him every Saturday to a three-hour free practice session in Harlem, and she attends his tournaments. His dad lets Tani use his laptop each evening to practice. And although religion is extremely important to the family, the parents let Tani miss church when necessary to attend a tournament.

Theres a silverlineing to Tanitoluwa Adewumi’s success too.

 

The race pay gap deserves the same attention as the gender pay gap

I wanted to annotate the original Pearn Kandola article with some links…

In 2018, the gender pay gap took up a lot of column inches. Whether it be large businesses having to publicly declare their pay discrepancies, or well-known figures like Jodie Whittaker confirming that she’ll receive the same pay for her role as Doctor Who as her male predecessors, the pressure has been rising and change seems to have begun.

But gender is not the only cause of pay discrepancy; there’s another pay gap just as damaging that hasn’t received anywhere near as much media attention

There’s a long history of BAME (black, Asian and minority ethnic) people being paid less than their white colleagues. Analyses of pay by race have been carried out in many countries, and the similarity of the results is striking. Generally speaking, in every walk of life, in every craft and profession, minorities are consistently paid less than white people.

In November 2017, the BBC found itself at the centre of a significant gender pay gap scandal. Whilst its race pay gap was just as, if not more, prevalent, far less attention was given to it. The average white male earned:

  • four and a half times more than the highest earning white female
  • seven and a half times more than the highest paid minority male
  • nine times more than the highest paid minority female

The BBC is by no means a lone example, though. Independent Television News (ITN) IN 2018 revealed mean ethnicity pay gap of 16% which rose to 66% for bonus payments

The lack of attention given to the race pay gap is highlighted when one looks at organisations’ responses to dealing with it. Global professional services firm, PwC, also revealed a pay gap of 13% between its BAME and white staff. This gap is almost as substantial as the firm’s gender pay gap of 14%.

Its sad and sobering to read and hear. Why it wasn’t picked up by the mainstream press is a whole different question. Like I seen elsewhere, its much easier to focus on diversity in the form of binary male & female. But the honest truth is diversity is never that binary.

Reporting and transparency around the BAME pay gap is the best way to making this all viable.

The pay gap is a symptom of a wider culture in which black and ethnic minority workers are undervalued and underpromoted.

Poor rich America, the first nation?

I was reading why America is the World’s First Poor Rich Country by Umair and was pretty much agreeing with everything he wrote.

The crux of his blog is about the basics of life which you need to pay for in America.

In Europe, Canada, and even Australia, society invests in all these things — and the costs of basic necessities societies don’t provide are regulated. For example, I pay $50 dollars for broadband and TV in London — but $200 for the same thing in New York — yet in London, I get vastly more and better media for my money (even including, yes, American junk like Ancient Aliens). That’s regulation at work. And when basic goods like healthcare or elderly care or education are provided and managed at a social scale, that is when they are cheapest, and often of the best quality, too. Hence, healthcare costs far less in London, Paris, or Geneva — and life expectancy is longer, too.

So if you are earning $50k in America, it is a very different thing than earning $50k in France, Germany, or Sweden — in America, you must pay steeply for the basics of life, for basic necessities. Thus, incomes stretch much further in other countries, which enjoy a vastly higher quality of life, even though people there earn roughly the same amount, because they pay vastly less for basic necessities. Americans are rich, but only nominally — their money doesn’t buy nearly as much as their peers does, where it matters and counts most, for the basics of life.

I remember many friends moving to America and reporting the wages they were getting as a result.

One friend for example said he was earning 6 figures as a contractor and I replied great, are you paying health insurance? He replied no, he will be fine. I said GET health insurance because one slip and you are so screwed.

America is pioneering a new kind of poverty. The kind of poverty that’s developed in America isn’t just bizarre and gruesome — it’s novel and unseen. It isn’t something that we understand well, economists, intellectuals, thinkers, because we have no good framework to think about it. It’s not absolute poverty like Somalia, and it’s not just relative poverty, like in gilded banana republics. It’s a uniquely American creation. It’s extreme capitalism meets Social Darwinism by way of rugged self-reliance crossed with puritanical cruelty.

Its a big deal and Umair is right. I do have a worry that the UK is sleep walking in the same direction too!

Been thinking about this a lot as the Brexit drama turns into full on insanity. Really good to finally watch Noam Chomsky’s Requiem for the American Dream.

https://twitter.com/cubicgarden/status/1105614199512883200

We need a PBS for the Internet age

PBS - Public Broadcasting Service Logo

Its quite amazing to read this opinion piece in the Washington Post recently… (if you like me are reading it in Europe, you might want to try this one)

Some bits I found amazing to read, especially since the united states’s public broadcast networks are so crippled. This says it all..

Americans like public media. NPR still consistently ranks among the most trusted news sources. Likewise, Americans have rated PBS among the most trusted institutions in the United States for the past decade and a half, according to polls conducted on PBS’s behalf. But these services operate in an increasingly challenging environment. Government cuts have forced public media to become far more dependent on listener contributions, sponsorships and private donors. These organizations have had to chase audiences migrating to private platforms along with the rest of the media, meeting audiences “where they’re at.”

To their credit, public media have made an impressive effort to upgrade on a dime. PBS states that its Digital Studios division averaged more than 38 million views per month on YouTube. NPR recently co-published a report about the promise of smart-speaker devices such as Amazon Echo for audience growth.

Rather than let public broadcasters who have accrued so much public trust languish — or, worse, be co-opted by a tech industry that has a vast interest in how its portrayed — both our federal and state governments need to play a more active role in public media’s health and digital future.

What the Internet needs is a fresh infusion of public media, properly funded and paired with federal policy that puts the public interest first.

Reading this piece, further reminds me why the public service internet research work is so critical. Without public media, we are lost. Can’t even really imagine what it must be like working for PBS and NPR consistently being knocked and sliced down. I mean the BBC has troubles but not like these (yet).

Facial recognition’s ‘dirty little secret’: Millions of online photos scraped without consent

By Olivia Solon

Facial recognition can log you into your iPhone, track criminals through crowds and identify loyal customers in stores.

The technology — which is imperfect but improving rapidly — is based on algorithms that learn how to recognize human faces and the hundreds of ways in which each one is unique.

To do this well, the algorithms must be fed hundreds of thousands of images of a diverse array of faces. Increasingly, those photos are coming from the internet, where they’re swept up by the millions without the knowledge of the people who posted them, categorized by age, gender, skin tone and dozens of other metrics, and shared with researchers at universities and companies.

When I first heard about this story I was annoyed but didn’t think too much about it. Then later down the story, its clear they used creative commons Flickr photos.

“This is the dirty little secret of AI training sets. Researchers often just grab whatever images are available in the wild,” said NYU School of Law professor Jason Schultz.

The latest company to enter this territory was IBM, which in January released a collection of nearly a million photos that were taken from the photo hosting site Flickr and coded to describe the subjects’ appearance. IBM promoted the collection to researchers as a progressive step toward reducing bias in facial recognition.

But some of the photographers whose images were included in IBM’s dataset were surprised and disconcerted when NBC News told them that their photographs had been annotated with details including facial geometry and skin tone and may be used to develop facial recognition algorithms. (NBC News obtained IBM’s dataset from a source after the company declined to share it, saying it could be used only by academic or corporate research groups.)

And then there is a checker to see if your photos were used in the teaching of machines. After typing my username, I found out I have 207 photo(s) in the IBM dataset. This is one of them:

Not my choice of photo, just the one which comes up when using the website

Georg Holzer, uploaded his photos to Flickr to remember great moments with his family and friends, and he used Creative Commons licenses to allow nonprofits and artists to use his photos for free. He did not expect more than 700 of his images to be swept up to study facial recognition technology.

“I know about the harm such a technology can cause,” he said over Skype, after NBC News told him his photos were in IBM’s dataset. “Of course, you can never forget about the good uses of image recognition such as finding family pictures faster, but it can also be used to restrict fundamental rights and privacy. I can never approve or accept the widespread use of such a technology.”

I have a similar view to Georg, I publish almost all my flickr photos under a creative commons non-commercial sharealike licence. I swear this has been broken. I’m also not sure if the pictures are all private or not. But I’m going to find out thanks to GDPR

There may, however, be legal recourse in some jurisdictions thanks to the rise of privacy laws acknowledging the unique value of photos of people’s faces. Under Europe’s General Data Protection Regulation, photos are considered “sensitive personal information” if they are used to confirm an individual’s identity. Residents of Europe who don’t want their data included can ask IBM to delete it. If IBM doesn’t comply, they can complain to their country’s data protection authority, which, if the particular photos fall under the definition of “sensitive personal information,” can levy fines against companies that violate the law.

Expect a GDPR request soon IBM! Anything I can do to send a message I wasn’t happy with this.