IBM Dif project returns the full list of photos scraped without consent

Then I got a further 2 replies from IBM. One of them is IBM asking if I want my GDPR data for everything regarding IBM? But the 2nd one is from IBM Diversity in faces project.
Thank you for your response and for providing your Flickr ID. We located 207 URLs in the DiF dataset that are associated with your Flickr ID. Per your request, the list of the 207 URLs is attached to this email (in the file called urls_it.txt). The URLs link to public Flickr images.
For clarity, the DiF dataset is a research initiative, and not a commercial application and it does not contain the images themselves, but URLs such as the ones in the attachment.
Let us know if you would like us to remove these URLs and associated annotations from the DiF dataset. If so, we will confirm when this process has been completed and your Flickr ID has been removed from our records.
Best regards,
IBM Research DiF Team

So I looked up how to Wget all the pictures from the text file they supplied. and downloaded the lot, so I can get a sense of which photos were public/private and if the licence was a conflict. I think hiding behind the notion of the link is a little cheeky but theres so much discussion about hyperlinking to material.

Most of the photos are indeed public but there are a few which I can’t find in a public image search. If they are private, then somethings wrong and I’ll be beating IBM over the head with it.

Like watching a baby playing with a loaded gun…

Baby face closeup

…Is what Miles said about me setting up my own virtual private server on the weekend. Yep I finally took bull by the horns and slapped down my credit card and decided to go with Hub.org for Cubicgarden.com's new resting place. To be fair I didn't really know what I was getting myself into. See I kind of thought Tomcat and Apache would be installed and ready to go. But nope I finally logged into my FreeBSD box and quickly found out that it was a barebones box and I would need to do the configuration of applications, permissions and users. Well trust me, this is no easy thing. I mean there something very different about running unix on the desktop and running it as a server. In a server environment permissions and applications running all need to be kept under tight wraps. I would agree this should be the case for a Desktop environment too, but you can be a little more flexable with the configuration of a desktop machine. Put it this way, being a admin with root access to your own server is certainly compareable to building your first F1 car then racing it along the streets of Monte Carlo. Or as Miles puts it a baby with a loaded gun.

Either way, with thanks to Miles and tons of resources online like this one, I'm almost up, running and hopefully pretty secure. Rather than the usual Apache 2.x and Tomcat 5.5.x type configuration with mod_jk, I've gone for Tomcat 5.5.x with Pen in front for a reverse proxy and load balancer. Miles suggested Pound and Balance but I couldn't get Pound to compile without seriously messing with OpenSSL and Balance didn't seem to forward HTTP traffic without stripping away the header information. Pen is just like Pound it would seem, but also runs on Windows which is good to remember for other projects I may have in mind with my old Windows 2000 box. So yeah its a pretty sweet setup so far and means I loose the overhead of running Apache when all I really want is Tomcat. By the way, I was very close to installing Resin 3.x but decided against it for now.

Compiling Cocoon only took 2 mins once I finally untared and gunzipped the source. Can I just say how much of a nightmare Compression is? This guide was very useful for not only uncompressing files (tar -xvvzf cocoon2.1.8.tar.gz) but also compressing them. It took me a while to work out the correct parameters to compress a directory of files and its contains but keep the permissions and modified dates (which is extreamly useful for moving blojsom blog entries) . tar -cRvzf archive.tar foldertocompress/. Anyway, Cocoon is running happily in Tomcat now and Blojsom is also running fine with everything this blog has up till Feb 26th. So I'll have to do another update just before I swap over to the new server. This will also mean there will be a period of maybe 2 days when the blog and RSS feeds may time out or seem out of date. Don't worry I'll warn you in advance of the exact day.

So what next on the horizon? Well I need to do some more securing and enable Log4J on Tomcat and Cocoon. I've also still got to sort out basic Unix type things. For example while I was setting up the server, the only text editor I had was vi and the only shells were tsch and some other weird ones. Yep thats right no Nano or Bash. I don't know how I managed, but trust me I'll be avoiding vi when ever possible. I've already chpass all the users and made Bash the default shell. Beyond this, I'm considering Hamachi for Linux which would mean I could securely login to Blojsom, Tomcat and anything else from anywhere without setting up that crazy port forwarding in Putty. This sounds over kill but I'm tempted to at least run Hamachi on my Smoothwall Firewall server at home.

In regards to Cocoon, well my next step which I had planned to do if I was not writing this long blog post, would be to install Saxon 8.7 (good to see a .net version btw) in Cocoon using this guide (I know it works, i already installed Saxon 8.4 on the development machine at work). While with Blojsom, I will start trimming down some of the outstanding issues I had.

Oh before I finish, did I say how great Wget and Sudo are? Loaded gun indeed.

Comments [Comments]
Trackbacks [0]