Huge news from Google, who announced today that they are going to stop using your web browsing behaviour to display targeted advertisements.
It’s difficult to conceive of the internet we know today — with information on every topic, in every language, at the fingertips of billions of people — without advertising as its economic foundation. But as our industry has strived to deliver relevant ads to consumers across the web, it has created a proliferation of individual user data across thousands of companies, typically gathered through third-party cookies. This has led to an erosion of trust: In fact, 72% of people feel that almost all of what they do online is being tracked by advertisers, technology firms or other companies, and 81% say that the potential risks they face because of data collection outweigh the benefits, according to a study by Pew Research Center. If digital advertising doesn’t evolve to address the growing concerns people have about their privacy and how their personal identity is being used, we risk the future of the free and open web.
That’s why last year Chrome announced its intent to remove support for third-party cookies, and why we’ve been working with the broader industry on the Privacy Sandbox to build innovations that protect anonymity while still delivering results for advertisers and publishers. Even so, we continue to get questions about whether Google will join others in the ad tech industry who plan to replace third-party cookies with alternative user-level identifiers. Today, we’re making explicit that once third-party cookies are phased out, we will not build alternate identifiers to track individuals as they browse across the web, nor will we use them in our products.
This is a big step that will have massive consequences for the advertisement industry as a whole, but at the same time, companies do not just give up on revenue streams without having alternatives ready. My hunch would be that Google has become so big and collects data from so many other sources, that it simply doesn’t need your web browsing behaviour and third-party cookies to sell targeted ads effectively.
There is not need to look for ulterior motives. Reducing complexity is usually a net benefit, and removing inter-product dependencies is a good way to achieve this.
Also, if you can achieve same or similar performance with less data; you not only save on resource costs, but things run faster, and of course this is good for users. Why not do it, if possible, then?
sukru,
Tracking cookies are super easy, barely an inconvenience.
As for motives, I would guess it has to do with competitors and politicians portraying google as the bad guys on privacy, which they kind of are.
Alfman,
There are real engineering benefits for doing “less work”.
Usually a product iteration loop goes like:
a. better algorithms
b. more data
c. repeat a-c
But every now and then, the thing that used to run in hours start taking days, and then you want to shed as much as work as possible.
Same is true at every scale.
If you have built a very detailed mobile game, but realize it does not hit the 60fps target, then you start chopping off asset and effect resolution, until a frame can render in 1/60 secs.
If you have built a very good ML model, but the client is low powered, then you simplify the model, and start using 8 even 4 bits math for inference, instead of full 64 bit floats.
At the end of the day, is the the products’ actual usefulness that counts.
Side note: reading a single terabyte would take 5,000 machine seconds. So every bit counts.
@sukru
A very good summary. With high performance graphics it’s important to keep the main loop as tight as possible. For the creative assets: fake it to make it and/or cull early cull often. (Some drivers offer optimisations which do similar things and cheat to achieve perceived performance at the loss of quality.)
Scaleability is an interesting problem which gets little attention. There’s lots of things from asset creation to proxies to varies invisible optimisation which can make inital engine design a bit tricky as the decisions are often interrelated. An easy one is the amount of detail on the screen. Everyone always wants more detail but more of the wrong kind of detail causes cognitive load on top of performance issues for sometimes little real gain in subjective experience and a bigger headache for the user. In that case you have to consider leaving the detail out or changing placement of detail within the context of narrative flow. It’s a level of workflow refinement very few consider because they simply slap cycles at the problem and call it a day without considering cognition. In theory this kind of global view should have been what UX designers considered but they don’t as we see from over-simplifcation. Going back to scaleability and realism a lot of focus now is less on surfaces and more on lighting especially shadows to create deeper realism and, most forget, emotion and narrative. This is partly why ray-tracing gets a lot of attention but it’s the same old problem of slapping cycles at the problem while the art and sublety gets lost. It’s a fine discussion and one where you have to step away from the screen and read books and consume a lot of well directed movies to have a real clue about.
Lastly, the brain carried a lot of deeply ingrained models. It doesn’t just work in parallel but throws an enormous amount of data away which is why we don’t see – we perceive.
Find one game which is as good as a colour or even monochrome standard definition movie originally produced on film by a competent director. It doesn’t come close. Hitchcock’s “Rope” and Scott’s “Bladerunner” are good examples of direction with varying degrees of cheating. Say 600MB to be generous for an hour or two of quality visuals versus however many gigabytes and processing power for a modern game to achieve a fraction of the effect? Of course you cannot beat reality but the difference in effect on a frame by frame basis is worlds apart when comparing the two. Lower resolution? Less colour bandwidth? How can this be so?
Simplifying even more an animated line drawing can produce a level of realism far far beyond what may be only a handful of lines. Adding texture: Prince of Persia and Flashback used early motion capture technqiues to create a very realistic effect with very limited graphics quality. The graphics and motion compression technique used in Prince of Persia was very inventive!
@HollyB,
We will be sidetracked. But as you said pushing more details is not always the best thing. Crisis had the best effect of its time, but no gaming machine was able to run it properly (for a while “can it run Crysis?” was a common gag). Due to that, it had been a technological marvel, but not as much popular as an actually played game.
Yet, the most beautiful game I played recently was “Ori and the Will of the Wisps”. It looks “simple and retro”, things look blurry, and can achieve 120 fps meaning sacrifices. However the colors and lighting is done just right. And it looks like as if it came from an art canvas.
(And yes, Prince of Persia was ahead of its time).
sukru,
It’s not that I don’t understand the merit of simplicity, but that tracking cookies are already very simple.
I know that google doesn’t use SSN, but consider 1) how easy it would be for a service provider to add SSN to their database and 2) once they have a shared identifier, how easy it is to lookup profiles between databases. Of all the things a service provider does, it isn’t really all that complicated. I’d say this is a drop in the bucket compared to implementing something like pagerank, youtube, gmail, etc.
Obviously it’s possible not to track information, but it doesn’t seem likely to me that simplicity is the primary motivation. I am more inclined to believe all the public criticism, broken tracking cookies in firefox, etc.
Alfman,
There are now techniques to help “not learn” things in machine learning.
For example this recent paper: https://arxiv.org/pdf/1801.07593.pdf . Authors describe using “adversarial learning” (also used in deep fake generation) to ensure the model is unable to learn specific things.
Take, for example, “race” as a feature. You might not want to include it in your model to prevent biases. However the trainer would then be able to pick up “zip code” as a proxy. (This is made up, but of course very much possible). This technique gives you a separate model that penalizes the original one when those kinds of substitutions happen.
This is a very new development though. (Paper is from 2018).
Side note: SSN would not be a very good identifier for web. The user will have different personas. For example, they could be at work, and interested in C++ or office furniture, they could be at home and interested in sound systems, etc. Even the credit cards they use would be different.
sukru,
Yeah, I’ve read a few articles on this topic in terms of sexism and racism in algorithms. It’s not that the algorithms are coded this way, but that they’re picking up pre-existing biases in the data we feed them. For example a sophisticated automated candidate selection system could use personnel records to determine that those that do best at the company are male, and therefor become complicit in discriminatory hiring practices that exacerbates the problem. I find it quite ironic that we have to add more rules to our “unbiased” algorithms to undo this, sort of like the digital equivalent to affirmative action.
It becomes morally complicated when better qualified candidates are discriminated against on the basis of improving diversity. Sometimes, no matter what the algorithm does, it’s going to hurt someone unfairly. Ideally there would be enough demand for everyone to easily get a job, etc.
Advertisers and government agencies probably wouldn’t care to be honest, the more identifiers they get to snag from us, the better for them (up until the point we start spamming them with garbage data, of course).
Alfman,
But the data naturally becomes “garbage” as more is piled upon.
For example, let’s do the reverse analysis.
Is New York Times a good source for Politics?
What about Sports?
or Technology?
Being a generic news site it becomes equally distant to all topics. A random blog that only talks about politics will be much more closer to the cluster center compared to a national outlet like New York Times.
Similarly it would be much better to look at the task at hand, and not be bogged down by the huge amount of non-relevant data coming from user’s past interactions.
Alfman,
On the second topic.
Yes, it has become too easy to blame AI or Algorithms, when they are actually putting a spotlight into existing issues.
For example, an algorithm preferred to give white patients more care over black ones in New York: https://www.businessinsider.com/an-algorithm-treatment-to-white-patients-over-sicker-black-ones-2019-10
However there was a valid reason for it, and it actually helped uncover the underlying issue.
All the news traffic about the issue is “big boogeyman algorithm is racist”. However without the algorithm making this mistake the overall problem in the society would have continued to stay hidden. Now the doctors can actually realize black patients will do less follow up visits, and hence work on ways to encourage them to seek more care.
And a better algorithm can be prepared with an additional output “do more follow ups with the patient”.
sukru,
I looked at new york times just now. Users who don’t have a blocker are being tracked via doubleclick, which google purchased a long time ago. I don’t have the precise details of what google is tracking with each other their trackers, but I have seen google analytics reports and at least with this tracker google tracks user events down to the page level and not just the site level. IMHO this is very rich data from which to build a user profile.
Anyways I’m not complaining. Browsers blocking 3rd party cookies, companies committing to do less tracking…these are very good things regardless of the reason they’re doing it. The only thing is that I’m skeptical of taking press releases at face value because they can spin the truth.
Yeah, we definitely need to keep an eye on these things.
I’ll believe it when I see it, quite honestly.
I wouldn’t be surprised. It also reduces the visible attack area for privacy rights campaigners as everything is hidden in algorithms and back-end data obtained from who knows who. One step forward, two steps back. Another factor is people at the top will give up money if it means remaining dominant.
Government isn’t so different. In the UK under the current regime there are problems with government using unpublished likely very rigged formulas to allocate spending (aka bribe) on a gerrymandered basis as well as additional privatisation to hide activity behind corporate confidentiality.
Advertisers: “Hey Google, what cool tech have you cooked up to replace cookies?”
Google: “Chrome is at 69% marketshare, Android is at 46%, Search is at 92%, Google Analytics is at 72%, Recaptcha is at 43%, and everyone stays logged in to their Google Account. We don’t need a replacement. We’re not Facebook.”
Yeah, I feel that the title for this news pick is pulled out of somebody’s ass yet again. The linked article actually doesn’t even contain words “browsing” or “behavior” let alone them combined together. Google is still very much phishing for our browsing behaviour but simply not through tracking cookies that are already being phased out anyway by browser vendors.
It’s nothing but marketing jargon. Google won’t replace third-party cookies, instead they will swiftly announce new technologies that complement their existing techniques for identifying persons for the purposes of advertising.
I have been using DDG for about a decade now. Love it.
The exodus from WhatsApp to Signal and the attempted embrace by Parlor of conservative Tweeters scared the crap out of them. All these info corporations live in fear. They where once the usurpers so they know demise is inevitable. And because the know that they might even avoid it.
Iapx432,
I’d say that exodus by “conservatives” was triggered more by censorship than privacy.
A “conservatives” doesn’t mean what it used to. Now days it’s been overrun with highly gullible conspiracy nuts. I really don’t know what to make of this trend towards mass ignorance. There’s a lot of misinformation for those who are seeking it, even ridiculous things. The mass ignorance is getting so bad that it can even pose legitimate threats to national security, especially in the hands of authoritarian politicians who don’t mind exploiting it. I really do wonder about the role that tech companies may have inadvertently had in these social developments. They took away the main revenue streams for traditional newspapers and replaced that with modern services that amplify confirmation bias as well as limiting people’s exposure to reality using filter bubbles.
“I’d say that exodus by “conservatives” was triggered more by censorship than privacy.
A “conservatives” doesn’t mean what it used to ..”
The move from Signal to WhatsApp was due to an alleged new set of terms that allowed WhatsApp to send your data to Facebook. Later they denied it, but I remember a youtuber reading the text of the a version of the agreement and it specifically allowed for the transfer. They probably removed it later. I agree on the conservative thing. In fact I struggled with what word to use for “those who went to Signal” (includes me). Antifa probably went too.
lapx432,
You are right, the migration from WhatsApp to Signal is motivated by privacy and I didn’t mean to contradict that. I was only referring to social media platforms censoring “conservatives”.
I too dislike being so dependent on big tech corporations like facebook/ms/google/apple for infrastructure, it’s problematic when they have all the money to buy everybody out.
My VOIP provider has been merged more times than I can count, what used to be a small local company in Pennsylvania is now in the hands of a behemoth corporation I never signed up to. Small companies don’t remain viable in mature markets because they’re either killed off by far bigger players or exit the market by selling out.
Monopolies have always been a problem, but even so mass consolidation has made the world very different from the one I grew up in. It pains me that we’ve lost so much independence to corporate giants.
It’s not clear that this will be much of a loss for Google, nor much of a gain for privacy. The cohort algorithms Google plans to use instead will assign Chrome users an ID based on the web pages they visit, grouping them with other users who visited similar pages. The ID can be used to derive an “interest profile” of the users in the group.
https://github.com/google/ads-privacy/blob/master/proposals/FLoC/FLOC-Whitepaper-Google.pdf
So while it’s true Google won’t technically be tracking individual users anymore, it will still know about your interests based on your browsing history (something which I personally consider to be private data) and which it will use for targeted advertising. Moreover, Google will still receive your IP address for probably the majority of pages you visit on the web (at least anything with Google Analytics, Google adverts, Google Fonts, etc.).
EFF has an article about it.
https://www.eff.org/deeplinks/2021/03/googles-floc-terrible-idea
That’s a great article, really insightful. Thank you.