The short version is this: In its current form, Recall takes screenshots and uses OCR to grab the information on your screen; it then writes the contents of windows plus records of different user interactions in a locally stored SQLite database to track your activity. Data is stored on a per-app basis, presumably to make it easier for Microsoft’s app-exclusion feature to work. Beaumont says “several days” of data amounted to a database around 90KB in size. In our usage, screenshots taken by Recall on a PC with a 2560×1440 screen come in at 500KB or 600KB apiece (Recall saves screenshots at your PC’s native resolution, minus the taskbar area).
Recall works locally thanks to Azure AI code that runs on your device, and it works without Internet connectivity and without a Microsoft account. Data is encrypted at rest, sort of, at least insofar as your entire drive is generally encrypted when your PC is either signed into a Microsoft account or has Bitlocker turned on. But in its current form, Beaumont says Recall has “gaps you can drive a plane through” that make it trivially easy to grab and scan through a user’s Recall database if you either (1) have local access to the machine and can log into any account (not just the account of the user whose database you’re trying to see), or (2) are using a PC infected with some kind of info-stealer virus that can quickly transfer the SQLite database to another system.
↫ Andrew Cunningham at Ars Technica
It really does seem Recall is kind of a mess in the security department, and it has a certain rushed quality about it. All the screenshots are saved in an AppData folder, and data pulled from those screenshots is stored in a local SQLite database that happens to be entirely unencrypted. TotalRecall, a tool developed by Alexander Hagenah, will neatly pull the data from Recall for you without any hassle or issues.
This truly is a security nightmare. Aside from all the obvious issues this presents, such as making it even easier for law enforcement to gain access to pretty much everything you do online, something especially troubling for minorities or in countries with less-than-stellar police departments, Recall also presents a whole host of other problems. Imagine being in an abusive relationship, and the abusive partner demanding Recall be left on at all times to exert even more control. Imagine an unscrupulous employee abusing Recall to steal sensitive information from a company for a competitor. Imagine living in some backwards part of a country with controlling religious parents, and you happen to be gay. The problems here are endless.
The fact you can turn Recall off doesn’t mean much, since in the above examples, turning it off is not an option since there are controlling people involved who will demand you keep it on. Browser history and other forms of history in your computer exist as well, of course, but they’re not always as easy to parse, they’re easier to manipulate, sanitise, and temporarily hide. Recall just combines all of this and puts a neat little bow on it, ready to be abused by anyone with bad intentions.
Recall is ill-conceived, badly implemented, and a solution looking for a problem, that in an of itself creates tons of other problems. I hope Microsoft reconsiders, but in a world where “AI” makes investors go nuts, I doubt we’ll see a sudden sense of clarity coming out of Redmond.
Recall actually makes a lot of sense, and yes, the technology would be really useful.
However, I agree that, Microsoft has not demonstrated they would be the best stewards of users sensitive information. Especially something that amounts to a screen recorder and keylogger.
I will be honest, I would trust (the old) Google with this. And there are maybe a few more companies.
I wouldn’t trust any of them. Google is just as bad as MS, even Apple can’t properly delete files that have been deleted from the cloud. There is no good outcome from this, and they don’t need it to make search work properly. they could just fix the search.
I thought the deleted photos bug turned out to be a nothing sandwich. We all know that most stuff that gets ‘deleted’ isn’t actually erased, just it’s entry in the disk directory gets deleted and the original data can sit there until is written over at some random time. That’s why there are so many options around to securely delete stuff. It’s also why the cops can so often get convictions based on retrieving data that felons thought they had deleted.
Strossen,
Got a link? What news did you hear about this?
It’s true that an unlinked file doesn’t get overwritten on normal file systems. And it is additionally true that someone can scan the empty space. However it is not normal for files to return from the dead on their own regardless of if their contents had been overwritten. Assuming the claims were true and Apple icloud did this, then IMHO people are right to be concerned. Not only is it a bug, but it brings up series questions about apple’s document storage practices too.
Edit: t’s true that an unlinked file doesn’t immediately get overwritten on normal file systems.
Here’s a link as requested
https://www.forbes.com/sites/kateoflahertyuk/2024/05/24/apple-reveals-what-caused-iphone-photos-bug-fixed-in-ios-1751/#
Seems the bug was not related to iCloud at all but to a corrupted on device data base. An unfortunate bug maybe (most bugs are unfortunate) but it reveals nothing nefarious going on
Strossen,
Thank you for the link, I had not read that.
It is impossible for “corruption” to cause old files to suddenly show up on new phones as claimed.
One of these must be true: 1) either the users claiming the old photos suddenly showing up on new phones were mistaken about not having copies of those pictures on their new phones, or 2) the files were never deleted from apple’s backups. If both of these were false, there would be no file to restore regardless of corruption.
To believe apple’s version of events we have to assume the users were just mistaken, so let’s assume that. I still have questions about what this means though “photos that did not fully delete from a user’s device”…? I understand that file deletion does not wipe the sectors, but even when that happens, reassembling those “free” sectors back into a valid file seems unlikely unless IOS has a mechanism to keep files files around after deletion. Does it?
I don’t profess any specific technical knowledge about this photo bug but it’s possible that on device deleted photos could be transferred to new phones. When I upgrade I don’t use iCloud backups to set up the new phone I transfer the data and apps directly from my existing iPhone, and the option to do so is always offered. I find it quicker and more reliable. Presumably quite a lot of people do the same and if it’s a direct clone of the storage maybe deleted files could be included. The problem is we are both discussing this with no direct technical knowledge so it’s all speculation. Which can be fun.
Strossen,
Well, I understand you don’t have the answer, but that is why this statement “”photos that did not fully delete from a user’s device” needs further explanation . Why/how is this the case? “Corruption” doesn’t satisfactorily explain how deleted files came many years later. Anyway I don’t expect apple will be providing more details, but they’re kind of leaving IOS users in the dark as to what measures they need to take to make sure files are actually deleted as expected.
I think I mentioned old Google.
Disagree strongly. I can’t think of legit usage of Recall. It will suck resources and provide little of value, in addition to the numerous security issues. Its a gee whiz feature in search of an actual use case that someone might have. A feature that manages to be less useful than VR goggles.
Bill Shooter of Bul,
Needs change over time.
Back in the day, my stuff was on HDDs measured in hundreds of MBs, and a box of diskettes. I knew where everything was, for the rest, dir /s was more than enough.
Later on we had gigabytes of data, and tools like Google Desktop Search and similar became necessary. So that we could look up “tax returns 2010” (today they are pretty much standard on all operating systems).
Today, the data is not only much larger, it is “ephemeral”, not even stored locally. The queries would then be “where did I see that nice backyard pool design schematics?” or “what was the meme about failing robot”, which requires recording and understanding everything that I see.
Even if we don’t realize it, we actually need such tools today.
(Again, Microsoft?)
“””Recall actually makes a lot of sense, and yes, the technology would be really useful.
Sukru useful for who is the question. Lets find software that not AI with the same kind of feature set.
https://networklookout.com/
Supervise staff activity by watching and recording live computer screens, web browsing history, applications use, keystrokes pressed…
https://networklookout.com/keystroke-logger.htm
Yes these parties are smart enough to be file encrypting.
I would say this is staff supervision software with AI feature put on top with the hope of getting the user hooked on the AI feature so they don’t turn the supervision off. Yes this explains why administrator can access all users collected data. Processing the images down into sqlite makes the data more compact to send to central location in a business for processing.
Staff supervision software does get legally questionable if you put it on a personal computer. Staff supervision software data leaks in the past have made attacks into businesses worse kind for in face reasons like recorded usernames and passwords so allowing attacker to access more resources. Yes staff supervision software is double sided sword not something you should be enabling by default even if it correctly designed.
Yes recall is just a staff supervision software with fancy bit of Ai and missing the auto transfer to central business server. Current design it would not be hard at all to implement the auto transfer to central business server. Something I have seen no one check what happens with roaming profile with Recall. Is Recall data stored where if you are using a roaming profile it will transfer to the Windows server storing your roaming profile. Recall could be nothing more than a poorly implement staff supervision software with just a little bit of AI feature to the user to attempt to hide what it is and use of NPU processor to make the OCR bit not be system crippling. There have been other staff supervision software that have OCR screen before these end up out of use because of costing too much system performance.
Yes if having a roaming profile results in Recall data automatically being transferred to central server without encryption this is nothing more than staff monitoring software with poor security being attempted to be dressed up as something user useful.
oiaohm,
Thanks for the link, and I am sure these people have pretty good security standards.
However it is for an entirely different purpose, as you mentioned, employee monitoring.
Something like “recall” is more for semantic understanding of what you do (which really requires modern ML models / AI), and being able to give you personalized answers about the things you saw in the past.
Specifically,
Instead of “what was I doing 9pm yesterday”?
It would answer “what was the recipe that I saw last week for butterscotch cookies that looked like fancy animals”?
In other words a modern, really useful assistant.
https://staffcounter.net/change-screenshot-frequency/
This is another staff monitor. You find this screenshot functionality in lots of them/
“””Instead of “what was I doing 9pm yesterday”?
Can you ask this question of recall
https://github.com/xaitax/TotalRecall
“””Date Filtering:
“”” Specify start and end dates to limit the extraction to a particular time frame.
Yes absolutely can you ask that question of Recall because that date and time is recorded for all recorded events.
“”“what was the recipe that I saw last week for butterscotch cookies that looked like fancy animals”?
Staff monitoring you find employers are ask questions like this like what staff members happened to go to unfair work reporting sites or anything that looked like that. Yes business use AI to process staff monitoring collected data to-do this.
sukru the screenshots and the OCR processed data in the sqlite database is the same data you will find employee monitoring software doing with more security.
sukru the reason why I want to know if this data of Recall happens to transfer back to server if someone is using roaming profiles because Recall could be just a Windows OS built in employee monitoring software.
AI ML Models are used in may different employee monitoring software.
The reality the only difference between Recall and some of the most advanced employee monitor software on computers is that end user can run the own AI agent against the collected data. Yes the most advanced employee monitoring is using like PGP style encryption were one key encodes that data being the images and keystrokes being stored to be send up to server latter for processing and different key is that on the processing server is required to decode and allow processing of that data. Yes at server they OCR images instead of doing it locally and apply AI agents at the server so staff members cannot see what they are being checked for.
Yes Recall to me looks like distributed processing advanced employee monitoring software done badly given name that end user might miss what it is. Yes the staff member is thrown a small bone of a feature so they don’t turn it off and get use to thinking it normal to have this amount of back ground load of monitoring. Common issue with staff monitoring software is staff complaining about their computers been slow/under performing because their home machine is faster. Yes recall lets make it harder for staff to notice that they are being monitored.. Turn off recall install staff monitoring software the result could be a net zero change in system performance or better use recall as the staff monitoring to be as hidden as possible.
Once again:
I am not advocating for Microsoft’s particular implementation, but the idea in general.
sukru there is problems here even with the idea in general..
“”“what was the recipe that I saw last week for butterscotch cookies that looked like fancy animals”?
Take this question here of yours you did as example. To answer this question record of when and what events have happened have to be recorded somewhere.
This is basically building staff monitoring software with all the security risks this involves.
Due any AI assistant or staff monitoring of this class being a possible security risk this should not be on by default.
The key difference here with AI assistance for general usages you want opt in not opt out(this is what Recall is) So you opt in when you are doing stuff that not a security risk.
Sukru, I agree with you that there =does= seem to be some utility to this kind of thing that other’s aren’t recognizing. I can’t tell you how many times I’ve given up on recalling something that I’ve done or found out before because I’ve misplaced or discarded a file. A life-long database of everything that I’ve done that could be queried with vague and general queries would be absolutely amazing.
But I totally agree with others that a large corporation with financial self-interests and a track record of collaborative ties to government and law enforcement is not an entity that can be trusted with developing this system.
For me it would have to be entirely self-hosted, solidly encrypted and definitely open source.
rlees42,
In reading this thread I can’t help but think that the main purpose of using an AI with OCR to scan windows as they are being interacted with is just to make up for metadata deficiencies in the operating system and applications that would be better solved with metadata API standards. And the “database” should just be a regular file system that supports metadata. If you wanted to search this metadata using AI you could, but it shouldn’t be obligatory to do so.
To the extent that we want AI to do this because the applications developers won’t, that seems to be the main justification. However I question the usefulness given that the AI only sees snapshots and doesn’t have way to scan the entirety of a document nor a way to go back to a specific point as far as I can tell. I do think an AI can make a compelling user interface for natural language searches, but I don’t think this “windows recall ocr” can be better than a purpose built API, which would not only be more reliable, faster, less invasive, but also have more functional improvements too.
“””A life-long database of everything that I’ve done that could be queried with vague and general queries would be absolutely amazing.
rlees42 are you sure you want it to be everything. Commercial Employee monitoring software some of them allows users to log into the central servers of the business and do queries like you described. This has lead to business privacy problems with user names/passwords/people finding who going to be terminated before they are terminated…. and the list goes on.
“””For me it would have to be entirely self-hosted, solidly encrypted and definitely open source.
Extras I would be asking for off by default so that you have to opt in. Not the current design of opt out.
Tools to sanitize collected data like to pick up if setting happen to be collecting list of data that it should not be collecting and so on. Yes this is feature of advanced employee monitoring software. Recall is in what it doing about 8 years behind compared to server side employee monitoring. The requirements employee monitoring software doing this stuff have had to add for data privacy and security any AI advanced AI assistant needs.
Alfman lot of applications their output sent to the OS is just a image. OCR to scan windows is something that is required. Yes for blind person attempting to use some software OCR screen is required.
https://dynobo.github.io/normcap/
Alfman like above software. These don’t store archive of images.
So there is only so far “metadata API standards” can go there is always going to be metadata deficiencies. There is always going to be text in images that to make heads/tails of what going on is going to need to be processed. Also lots of due care not be recording items into database/AI that should not go there need to be done.
oiaohm
“””Just because there are deficiencies today doesn’t mean that the deficiencies are inevitable in principal.
Alfman No this are inevitable in principal as long as applications can display images and use images for buttons and the like. Because application at some point will display text as image containing something that need to be known.
The meta data on a Image could say that image is yes but the image contents is no and that image is used on a press-able button as it image. This has happen to the nightmare of blind users because they are hitting what they think is the yes button of an application when it really the no button.
There is a old saying trust but verify. With metadata information is only trust-able as far as you can verify it. OCR the output would be verify step you need against graphical metadata.
oiaohm,
So what? That says absolutely nothing of the (in)ability to create a useful metadata API for applications to explicitly read/write to.
I have the impression that you are focused on screen readers, but I’m talking about meta data for the purposes of this article, ie “windows recall” metadata.
Alfman what is the “windows recall” metadata.
“””The short version is this: In its current form, Recall takes screenshots and uses OCR to grab the information on your screen; it then writes the contents of windows plus records of different user interactions in a locally stored SQLite database to track your activity.
Yes recall is an screen reader it after the contents of Windows include contents of windows that have put text on screen as images.
Microsoft recall is after the same level of meta data as a screen reader or greater to they can feed this into the AI/Search to answer user requests.
https://dynobo.github.io/normcap/
This link I gave before was not for no reason Alfman. Go to the front page of dynobo look at the example. Lets say I asked the Recall what day did I look at the Daily Telegraph of the moon landing? Yes that page of “The Daily Telegraph” is in a file called demo.jpg. You normal GUI metadata is not going to be able to answer that because the demo.jpg does not have metadata describing it contents. That page has exact example of how normal GUI metadata will fail to be useful to items like recall so resulting in the need to OCR screen read.
There is going to be a problem where the metadata application puts on images and what the images display on screen are not aligned. This is where trust but verify come in.
Reality is that Recall is a screen capture, screen reader and AI tool combination. Metadata problem is the screen reader problem. Screen readers have OCR these days because without OCR too many parts of applications are unreadable due to being images or have incorrect metadata.
Alfman basically to make something like Recall max useful you have to tool the AI up like the AI fully blind user reading the screen with all the security risks that come with that. Limitations of screen readers align to the limitation on how useful Recall searches could be.
Screen Reader and Recall are not totally independent technology.
oiaohm,
A screen reader may use OCR like Windows Recall does, however you are conflating Windows Recall with a screen reader when it’s purpose is fundamentally quite different. So when I talk about recording metadata, I’m am referring to metadata in the same sense that Windows Recall does.
Your screen reader might also benefit from having more metadata for every UI element, which we can talk about, but I want you to realize that’s actually a different topic. The job of making screen readers probably got a lot worse after microsoft and other devs started to deprecate menu-bars, which was a rich source for highly structured interface commands. These got replaced with flat metro interfaces built in frameworks that no longer used standard win32 controls, that must have been a terrible transition for screen readers. But again that UI data is different from what’s important for Windows Recall even though you might use OCR to capture it.
Even if you don’t see them as independent, you haven’t provided any reason there should not be a proper metadata API. There’s no reason in principal that what windows recall’s metadata collection couldn’t be done in a more robust way using a proper API instead. Applying AI to the screen window using OCR on whatever happens to be displayed at screen shot intervals is far from ideal and moreover it’s functionally very limited in terms of what can be done to navigate back to the data. A well defined metadata API could solve most of these issues.
“””The job of making screen readers probably got a lot worse after microsoft and other devs started to deprecate menu-bars, which was a rich source for highly structured interface commands.
Alfman do not guess.
https://en.wikipedia.org/wiki/Microsoft_UI_Automation
The problem has not changed.
“””metadata for every UI element
This already exists as part of assistance interfaces what we have not been able to agree on a unified one..
The reality is the ,menubars going way did not change the fact that metadata for UI elements have problem of developers either not doing them or they do them and screw them up. This is like how you see localization screw up in different applications.
Yes fun one was the extra metadata on open telling user that it was close and the entry for delete was open in the extra metadata.
With the assistant tech you do need the tooling to double check that what the metadata says and what the icon/text in image displayed to user does in fact line up.
Yes the Russian saying “trust but verify” comes important here. Alfman like it or not you cannot trust GUI metadata we have had full interface meta data for Qt and GTk and many other applications for a long time. We have also had Qt/Gtk applications have incorrect metadata information that will lead users without vision right up the garden path to failure.
There is a limitation to metadata is human error of forgetting to make the metadata or not having the language skills to write metadata that makes sense to end user.
Alfman this is the problem we already have examples of fully metadataed toolkits that developers have written programs in and in those examples is how it goes wrong and what information is missed. OCR and image identification turn out to be important for screen readers.
The recall taking a screen shot every 5 seconds is a performance optimization. Real time OCR/image identification on screen would make system less responsive. Yes those who are using screen-readers for program navigation end up losing quite a bit of system performance to the OCR and image identification.
Yes the recall screenshots can be OCR when system load is low. Yes this is another downside of recall is reduced runtime of laptops and the like. Processing is not free.
oiaohm,
I’m not disagreeing with you about your points about screen readers, however what I am talking about is that windows recall is designed for. I am only talking about the later but you keep conflating these and that does not help the discussion.
This is a direct link to the Kevin Beaumont analysis – worth a read. A shit show.
https://doublepulsar.com/recall-stealing-everything-youve-ever-typed-or-viewed-on-your-own-windows-pc-is-now-possible-da3e12e9465e
Like the Google’s shredding of search by insisting on rebuilding it around AI it’s about tech companies panicking because of the speed of the developments in AI and floundering around trying to win the race. I very interested in what Apple announces on Monday and to see whether they have resisted the temptation to throw all good judgement and common sense caution out of the window like everybody else.
Yes exactly. I don’t like Apple but am interested to see what they do. Microsoft and Google are losing the plot.
It’s like when Microsoft got terrified of the iPad and ruined Windows with Windows 8. Nobody asked for that and nobody wanted it.
Paradroid,
Windows 8 (metro) may not have been as bad as a dedicated tablet OS. It sure sucked on the desktop though.
The interplay with the “legacy desktop” was so bad that it’s inconceivable that Microsoft’s own designers didn’t know it was a turd internally. I suspect those awful interactions were settled on because of an executive decision to eventually kill off legacy apps while forcing new apps to rely on microsoft’s metro walled garden. Obviously neither users nor developers bought into this (thankfully), but just imagine where we would be at if microsoft had succeeded… windows would be locked down like IOS (or windows 10s) and if you wanted to keep using the legacy desktop you might have to buy a pro or enterprise edition to unlock it.
Alfman,
They “wanted to eat their cake and have it too”.
It is obvious they wanted to also support portable machines. But they did this at the expense of well established desktops. They were not ready, and pushed too early, and too prematurely.
Hopefully they learned their lesson, and gave us the excellent Windows 10, which is the last ever Windows version.
(Okay found the reference, they never officially said it was the last windows: https://answers.microsoft.com/en-us/windows/forum/all/what-happened-to-the-last-version-of-windows/e969d870-5013-484f-8476-6ea5d0446182)
Isn’t it interesting that we’ve been strongarmed into signing in with a Microsoft account for years, because it was supposedly essential to make our experience better. Yet this data-slurping so-called AI nightmare is not gated behind a Microsoft account.