Google to optionally ingest your Google Messages history into its “AI”

Thom Holwerda 2024-01-31 Google 15 Comments

Researchers have just unveiled a pre-release, game-changing AI upgrade for Google Messages. But it’s one with a serious privacy risk—it seems that Bard may ask to read and analyze your private message history. So how might this work, how do you maintain your privacy, and when might this begin.
↫ Zak Doffman

As long as this “AI” hoovering is an optional ‘feature’, I don’t really have any issues with it – it’s a free world, and if you want to spice up your autocomplete like this, go ahead. The real danger, of course, is that this won’t be optional for long, and eventually Google’s “AI” will just ingest your messages and emails by default, consent or no.

About The Author

Thom Holwerda

Follow me on Mastodon @thomholwerda@exquisite.social

15 Comments

2024-01-31 6:48 pm
The1stImmortal
Depends on how they’re handling it. Is it just a module per-user that gets applied only to processes running for that user, and is that module subject to deletion with other user data if requested? Or is it ingested into a wider set of data across users, and applied across users, with the accompanying risk that near-verbatim source material may get spat out somewhere else for some other user to see, if the right inputs are provided?

2024-01-31 7:04 pm
Morgan
Or is it ingested into a wider set of data across users, and applied across users, with the accompanying risk that near-verbatim source material may get spat out somewhere else for some other user to see, if the right inputs are provided?
It’s going to be this, I guarantee it. That’s already happened with ChatGPT and super sensitive medical info, including logins and passwords for medical records accounts.
https://arstechnica.com/security/2024/01/ars-reader-reports-chatgpt-is-sending-him-conversations-from-unrelated-ai-users/

2024-01-31 7:36 pm
Alfman verbose=1
Morgan,
It’s going to be this, I guarantee it. That’s already happened with ChatGPT and super sensitive medical info, including logins and passwords for medical records accounts.
I think there was a session bug where more than one user ended up sharing the same session ID or something like that. Obviously that needed to be fixed, but nothing was intrinsic to AI and in principal the bug could happen on any service.
Chatgpt or not, it just seems ill-advised to post one’s login and password into arbitrary services that don’t have a need to know that stuff. Do users take responsibility for that?

2024-01-31 7:12 pm
Alfman verbose=1
The1stImmortal,
Depends on how they’re handling it. Is it just a module per-user that gets applied only to processes running for that user, and is that module subject to deletion with other user data if requested? Or is it ingested into a wider set of data across users, and applied across users, with the accompanying risk that near-verbatim source material may get spat out somewhere else for some other user to see, if the right inputs are provided?
We also need to distinguish between data used to train a NN versus merely being input to a NN to generate immediate output without risk of it ending up anywhere else. In other words there’s a huge difference between training a NN and merely using it.

2024-01-31 9:13 pm
Sysau
My question is: how spending ungodly amounts of energy in artificial intelligence will help us go through the coming meta crisis when 20w brains are left unused because of the artificial rules we use to concentrate potential power into the hands of the few for their personal aggrandizement. The future belongs to the efficient, not the egotistical wastefuls.

2024-01-31 10:29 pm
Alfman verbose=1
gagol2,
My question is: how spending ungodly amounts of energy in artificial intelligence will help us go through the coming meta crisis when 20w brains are left unused because of the artificial rules we use to concentrate potential power into the hands of the few for their personal aggrandizement. The future belongs to the efficient, not the egotistical wastefuls.
You bring up an interesting philosophical point, although I think there’s much more nuance than that. Looking at just a single component overlooks true end to end efficiency. It’s kind of like claiming electric cars are zero carbon, while ignoring all the carbon used to create and power said car – these external factors matter.
1) The brain is useless by itself without other body organs. These requires more energy, like the heart and lungs.
2) For every calorie of energy we use biologically, we have to consume almost a magnitude more energy in the form of food. That’s not very efficiency on our part.
3) Between humans and machines doing the same task, it’s not enough to sustain that human just for that task, humans expect to be sustained outside of the task, incurring additional unrelated energy costs for things like heating, showers, and even retirement. By contrast, a machine can dedicate it’s full existence to doing it’s task.
4) We know that neural nets take a lot of energy to train, but once trained the nodes are trivially duplicated to an arbitrary number of instances with marginal unit energy costs. These AI instances can be evaluated using modest resources in real time. We also need to look at computation time. A fictitious example: if it takes a human brain 5 minutes to write a page of text at 20W, but the machine can do it in 1s at 500W, then which is actually more efficient at the task? Answer: human=1.67WH, machine=0.14WH.
5) Sometimes results are more important than brain efficiency. Our brain, despite only using 20W, may not be sufficient for the task. Take winning at chess for example, or raise the stakes with tasks such as stock market trading where getting the answer right is more important than using 20W to do it.
You bring up very intriguing questions. and I don’t say these things as definitive answers, but rather to illustrate that there’s actually a lot more to consider.

2024-02-01 12:04 am
Sysau
Are we existing to serve an industrial economy? I think the industrial component is mostly useful to concentrate power to a few at the cost of actual happyness and self realization of our species. In a sense the super-organism of industrial societies is trying to replace biology with cybernetics at the cost of our own reason to live.

2024-02-01 12:10 am
Sysau
Shoud we allow ourselves to optimize out biology to attain maximum efficiency? Efficiency of what?
2024-02-01 1:51 am
Alfman verbose=1
gagol2
Are we existing to serve an industrial economy? I think the industrial component is mostly useful to concentrate power to a few at the cost of actual happyness and self realization of our species.
…
Shoud we allow ourselves to optimize out biology to attain maximum efficiency? Efficiency of what?
Note that you are the one who brought up the matter of brain efficiency. However if I am reading you correctly, efficiency wasn’t actually what mattered for you. Instead you had a reason for wanting to portray AI in a bad light. For what it’s worth, I actually agree that much of our economy is optimized for those at the top at the cost of everyone else. There are some valid points there and we don’t need to talk about AI efficiency to get there. A lot of our social institutions are incredibly unfair by way of creating feedback loops that serve to reinforce systematic inequality….although this is a very different topic from the one I thought we were going to discuss 🙂

2024-02-01 2:25 am
drstorm
Not sure what “spicing up auto complete” means, but AIs are only going to get smarter no matter how many quotes you put around those two letters.
I am not gloating. This can be a huge problem, but I can’t support the denial.
2024-02-01 2:26 am
Mojdsk
After all, what’s the need of writing? Let bots write each other.

2024-02-01 2:40 am
drstorm
I’m just waiting to tell an AI to write a polite letter to my boss telling him that he is an idiot in a roundabout way, only for him to have an AI summarize it as: “He’s telling you that you’re an idiot.”

2024-02-01 4:22 am
sukru
There are basically two sets of data that goes into these kind of models:
Training and Personalization
Training data covers the general model, and is fixed for all users. Hence utmost care is spent to make sure no individual user’s PII is leaked into here, nor even unique group of users (say the population of a small town in Colorado, or a single class in a university). Basically there is a lot of scrubbing, filters, and anonymization.
The per-user data is (usually) processed by other pipelines to generate your individual personalization profile. Since this is extremely personal, in theory this is protected similar to your private inbox, or communication history. (As it literally contains your private data already). It also has the benefit of being more fresh, since it can be learned in small batches.
They come together to give personalized recommendation. Think roughly 99% of the “model” is fixed, and covered in the general training, while a certain subset of parameters can be “patched in” to allow per user results.
The same can be applied to language models, including Bard. The model can learn to incorporate a profile, and at “inference” time, it can be supplied with your personal history to give better results.
So, why did I go into this detail? If implemented right, this can be both extremely useful, and privacy sensitive at the same time.
2024-02-01 7:51 am
Timo
Big data of Google, Microsoft, Facebook and Amazon have suddenly become a goldmine.
2024-02-01 3:57 pm
Bill Shooter of Bul Platinum Prime
Google Messages? I lost track of what that is. I’m sure it will be replaced in a few days with a new one.