Public Service Internet monthly newsletter (Dec 2022)

The branches of the Fediverse diagram

We live in incredible times with such possibilities that is clear. Although its easily dismissed seeing the UK government setup a discord server, A podcasting app sharing user location to podcast creators and whats its like to work in India as a woman in tech.

To quote Buckminster Fuller “You never change things by fighting the existing reality. To change something, build a new model that makes the existing model obsolete.

You are seeing aspects of this with India following the EU with USB C. Flickr putting weight behind ActivityPub and even more calls to make privacy a human right.


Gifts which don’t track your friends and family

Ian thinks: I actually chuckle at the #askfirefox videos but this video makes good points about buying a surveillance device for friends and family this festive holiday. Shop smart with Mozilla’s privacy not included.

W3C Solid working group

Ian thinks: Solid the personal data store has found its place in the W3C groups, Tim Berners-Lee’s welcoming email is beautifully written, starting a genuine new phase of the internet.

The UK parliament debates the future of public service broadcasting

Ian thinks: Its good to see this discussion at this level but am concerned there isn’t more focus beyond broadcasting. Public service is much bigger and its time to bring what makes public service unique to this space.

Mozilla’s future looks bright and sustainable

Ian thinks: Mozilla although well know in certain circles, has been losing a lot of market share. However has good plans to build on its community roots for a bright and sustainable future. Don’t forget the Mozilla festival’s call for proposals ends Dec 16th

Thoughts on Blockchain technology a decade ago

Ian thinks: Tim Bray’s measured thoughts on blockchain technology is a good read. Its easy to say blockchains were not mature back when AWS started but Tim thoughts today haven’t changed much.

Elon musk’s take over of Twitter

Ian thinks: There is so much to say about this take over of Twitter but I didn’t want to spend the whole newsletter talking about it. However I’m sadden by the lack of understanding from Elon and the way employees have been treated.. No way was Twitter the public square.

More thoughtful discussion about the future of decentralised social media

Ian thinks: Interesting points made and worthy of listening to in full. Likewise this small panel with the folks from Bluesky, Manyverse and others exploring the possibilities way beyond what’s currently available.

The EFF look at Mastodon from a security & privacy point of view

Ian thinks: Its always great to see new systems deeply looked at by the EFF and Open rights group. Mastodon comes out looking great. However you certainly have to go about it differently.

Don’t like microblogging but like the idea of the fediverse?

Ian thinks: This is great news Automattic (WordPress) are once again supporting the standard ActivityPub and joining the large open network of the fediverse. How Tumblr will work on the Fediverse is another question.


Find the archive here

IBM Dif project returns the full list of photos scraped without consent

Then I got a further 2 replies from IBM. One of them is IBM asking if I want my GDPR data for everything regarding IBM? But the 2nd one is from IBM Diversity in faces project.
Thank you for your response and for providing your Flickr ID. We located 207 URLs in the DiF dataset that are associated with your Flickr ID. Per your request, the list of the 207 URLs is attached to this email (in the file called urls_it.txt). The URLs link to public Flickr images.
For clarity, the DiF dataset is a research initiative, and not a commercial application and it does not contain the images themselves, but URLs such as the ones in the attachment.
Let us know if you would like us to remove these URLs and associated annotations from the DiF dataset. If so, we will confirm when this process has been completed and your Flickr ID has been removed from our records.
Best regards,
IBM Research DiF Team

So I looked up how to Wget all the pictures from the text file they supplied. and downloaded the lot, so I can get a sense of which photos were public/private and if the licence was a conflict. I think hiding behind the notion of the link is a little cheeky but theres so much discussion about hyperlinking to material.

Most of the photos are indeed public but there are a few which I can’t find in a public image search. If they are private, then somethings wrong and I’ll be beating IBM over the head with it.

Reply from IBM about my online photos scraped without consent

Diversity in Faces(DiF)

Following my post about facial recognitions dirty little secret millions of online photos scraped without consent. I got a reply from Flickr and IBM’s Diversity in Faces project.
First Flickr’s automated email…

Hi ian,

Thanks for reaching out to us!

We’ve received your message and will be responding as quickly as possible. In the meantime, do visit the Flickr Help Forum and our Help Center as the answer to your question may be found there.

We look forward to connecting and will be in touch soon.

Cheerfully,
The Flickr Team

Already Pro? Then expect a response shortly, because you are already in our VIP queue! (Make sure to write to us using the email address on your Pro account.)

Dear Ian,
Thank you for your email.
The Diversity in Faces (DiF) project, referenced in your request below, is a non-commercial, research initiative. The DiF dataset includes a list of URLs (but not the images themselves), linking to publicly available images on Flickr under certain creative commons licenses, along with associated annotations. We have taken great care to ensure that the DiF dataset does not include Flickr IDs or any other Flickr identifiers of individuals.
In order to respond to your request, we will need to locate the URLs in the DiF dataset that are linked to your Flickr ID (if any). To do this, we will need your Flickr ID, along with your express consent to use it for the sole purpose of locating such URLs and responding to your request.  Separately, if you would like us to, we can remove any URLs of images linked to your Flickr ID from the DiF dataset.  Please confirm this by reply.
After conducting our search, we will delete your Flickr ID from our records, and if you so request, we will also remove any URLs and associated annotations from the DiF dataset connected to your Flickr ID. We will confirm when this process has been completed.
With respect to your request to access your personal data processed by IBM outside the DiF project, you will be contacted separately by the IBM Data Subject Rights Operations Team (Email at ibm.com) to proceed with your request.
Let us know if you have any questions or how we can further assist you with your request.
IBM Research DiF Team

Facial recognition’s ‘dirty little secret’: Millions of online photos scraped without consent

By Olivia Solon

Facial recognition can log you into your iPhone, track criminals through crowds and identify loyal customers in stores.

The technology — which is imperfect but improving rapidly — is based on algorithms that learn how to recognize human faces and the hundreds of ways in which each one is unique.

To do this well, the algorithms must be fed hundreds of thousands of images of a diverse array of faces. Increasingly, those photos are coming from the internet, where they’re swept up by the millions without the knowledge of the people who posted them, categorized by age, gender, skin tone and dozens of other metrics, and shared with researchers at universities and companies.

When I first heard about this story I was annoyed but didn’t think too much about it. Then later down the story, its clear they used creative commons Flickr photos.

“This is the dirty little secret of AI training sets. Researchers often just grab whatever images are available in the wild,” said NYU School of Law professor Jason Schultz.

The latest company to enter this territory was IBM, which in January released a collection of nearly a million photos that were taken from the photo hosting site Flickr and coded to describe the subjects’ appearance. IBM promoted the collection to researchers as a progressive step toward reducing bias in facial recognition.

But some of the photographers whose images were included in IBM’s dataset were surprised and disconcerted when NBC News told them that their photographs had been annotated with details including facial geometry and skin tone and may be used to develop facial recognition algorithms. (NBC News obtained IBM’s dataset from a source after the company declined to share it, saying it could be used only by academic or corporate research groups.)

And then there is a checker to see if your photos were used in the teaching of machines. After typing my username, I found out I have 207 photo(s) in the IBM dataset. This is one of them:

Not my choice of photo, just the one which comes up when using the website

Georg Holzer, uploaded his photos to Flickr to remember great moments with his family and friends, and he used Creative Commons licenses to allow nonprofits and artists to use his photos for free. He did not expect more than 700 of his images to be swept up to study facial recognition technology.

“I know about the harm such a technology can cause,” he said over Skype, after NBC News told him his photos were in IBM’s dataset. “Of course, you can never forget about the good uses of image recognition such as finding family pictures faster, but it can also be used to restrict fundamental rights and privacy. I can never approve or accept the widespread use of such a technology.”

I have a similar view to Georg, I publish almost all my flickr photos under a creative commons non-commercial sharealike licence. I swear this has been broken. I’m also not sure if the pictures are all private or not. But I’m going to find out thanks to GDPR

There may, however, be legal recourse in some jurisdictions thanks to the rise of privacy laws acknowledging the unique value of photos of people’s faces. Under Europe’s General Data Protection Regulation, photos are considered “sensitive personal information” if they are used to confirm an individual’s identity. Residents of Europe who don’t want their data included can ask IBM to delete it. If IBM doesn’t comply, they can complain to their country’s data protection authority, which, if the particular photos fall under the definition of “sensitive personal information,” can levy fines against companies that violate the law.

Expect a GDPR request soon IBM! Anything I can do to send a message I wasn’t happy with this.

Over 15 years of Flickr data

All those files to download

Its been a long haul but finally Flickr is beyond use for me. I briefly tried Flickr pro for a while but theres so many other options now. Its a shame but Flickr went through a lot of trouble at the end but was saved from Yahoo craziness by snugmug. But even looking at the pro account prices, I decided that after…

It was time to leave Flickr and just let it start deleting my photos, which I mainly had backed up in multiple places anyway.

I was quite impressed with Flickr’s data portability option, for example the uploaded files are exactly the same. But it would have been great if they embedded the tags into the original EXIF data. However it seems they kept the tags in account data. So with some work, it would be possible to pull the whole lot together again? I’m actually surprised no ones already done this?

Slack, dataportability done right?

Slack… love it hate it, its seems to be going everywhere…

We recently had to move slacks for reasons not worth mentioning. I was pretty impressed as one of the founders of the data portability group way back, how easy it was to export one slack into another slack. If only more services would take note!

I found the story of slack so reminiscence of Flickr’s inception via game never ending.

Flickr was famously developed as a side feature for the MMO Game Neverendingthat Butterfield was developing with his then-wife Caterina Fake and the rest of their company, Ludicorp. The team realized that the photo-sharing aspect of the game could be spun off into its own service.

Always reminds me of sitting in the audience at the doors of perception 6 in Amsterdam when Stuwart Butterfield talked about the concept and plans.

My photo used in Seattle and Ride Sharing article

Google photos vs Flickr Pro for my image backup

Speeding car

It all started when I came back from Tokyo to find my Spideroak storage full. I decided a terabyte of photos which are hardly private in a super secure storage is a little crazy and its time to put them somewhere else so I can make use of the secure storage better.

Originally I did look at using Amazon Glacier but quickly found out that its really not for general use in any shape. I looked at trove again to find trovebox has been shutdown…  but there is a Github community project for those wanting to keep developing.

We’ll be shutting down the hosted Trovebox service on March 31, 2015. We may extend this deadline to help accomodate customers to obtain archives of their photos.

A few of my friends said why don’t I use Flickr, especially since I’m already a pro member and have been since 2004!

I thought about it, because I tend to use Flickr to only upload photos I actively want to share rather than a place to upload all my photos. Basically I never really trust the privacy options and only upload things which I’m happy being public. It was time to trust Flickr’s privacy model but to be fair I’m still only uploading stuff which it doesn’t matter too much if its public.

Started doing that then Google announced at IO 2015, a revamped photo service with unlimited storage (if you are happy with them converting them down a bit).

This has got me wondering, if I should switch?

Flickr Pro is $ 24.99 a year but Google is $1.99 a month for 100gig,

Economically it makes sense to stay with Flickr as its unlimited even on high resolution photos and I have most of my good photos already there (incumbency advantage). But the google space purchase would only be used for photos over 2048x2048px big. Which I guess is quite a few as I switched to 5mpx and above very early on . I guess there’s the option of trusting googles image compression. I guess having the extra space in google drive would be useful but its not a big deal yet.

I’m going to keep uploading the photos and let google photo shake out a little. When the next year of Flickr comes up, I’ll decide then. Even made a google task to remind me. Hopefully there will be flickr to google drive exports or I’ll have gigabit internet and can upload the lot super fast.

Getty pictures go non-commercial but there is a downside

Although I welcome Getty opening up there picture archive for us bloggers to use. However there are concerns or downsides…

  • This isn’t creative commons licensed, Attribution-Noncommercial NoDerivitives would have been good enough but Getty wanted to add additional conditions.
  • All the images are available via an IFRAME tag. And I thought Flickr’s embed was bad!
  • Reading the terms of conditions… They reserve the right to do what they like within that IFrame including as you imagine, advertising…

Be-careful out there people…

Following on… Juliapowles on Twitter mentioned this to me. As Dr. Adam D. Kline says later, the first comment says it all. And it really does sum it all up…

As someone who has grown old and weary fighting Getty’s ‘licence first and clear the rights later, if at all’ business model on behalf of impecunious photographers it is difficult to view this development with unalloyed enthusiasm

Linked data on youtube?

Triples on youtube?

I thought I’d try writing some RDF/Turtle as I see it on youtube.

<http://youtu.be/jP6kzKygaOs>
dc:title "Jason Silva on London Real talks about Vanilla Sky";
dc:publisher "London Real";
dc:subject [
  dc:subject "Vanilla Sky"
  dc:type "Film" imdb:homepage <http://www.imdb.com/title/tt0259711/>
  tmdb:homepage <http://www.themoviedb.org/movie/1903-vanilla-sky>
  ]
cities:city "London".

Kind of reminds me of when people started hacking Triples into Flickr by using Machine tags. Still its interesting to see Youtube adding the ability to add a triple in a nice clean way (if its a well used ontology of course)