Internet Archive

Tumblr and WordPress owner is striking deals with OpenAI and Midjourney for training data, says report

Thom Holwerda 2024-02-28 Internet No Comments

Speaking of collecting data, here’s another major content player signing a deal to sell your content to “AI” companies. The owner of Tumblr and WordPress.com is in talks with AI companies Midjourney and OpenAI to provide training data scraped from users’ posts, a report from 404 Media alleges. The report, based on an anonymous source inside the company, says that deals between Automattic and the two AI companies are “imminent.” It follows nebulous rumors that have spread on Tumblr over the past week, suggesting a deal with Midjourney could provide a new revenue stream for the site. ↫ Adi Robertson at The Verge We use WordPress for OSNews, but it seems this only applies to content hosted at WordPress.com, not on WordPress installations hosted elsewhere. If you host a site at WordPress.com, you might want to go to your admin panel and opting-out of this nonsense real fast.

Meta will start collecting “anonymized” data about Quest headset usage

Thom Holwerda 2024-02-28 Internet No Comments

Meta will soon begin “collecting anonymized data” from users of its Quest headsets, a move that could see the company aggregating information about hand, body, and eye tracking; camera information; “information about your physical environment”; and information about “the virtual reality events you attend.” In an email sent to Quest users Monday, Meta notes that it currently collects “the data required for your Meta Quest to work properly.” Starting with the next software update, though, the company will begin collecting and aggregating “anonymized data about… device usage” from Quest users. That anonymized data will be used “for things like building better experiences and improving Meta Quest products for everyone,” the company writes. ↫ Kyle Orland at Ars Technica Is it just me, or is the idea of Facebook collecting this type of data in particular just exceptionally creepy? I mean, browsing history or whatever is one thing – already bad enough – but hand, body, and eye movements, and camera information? Of course, this was the only expected course for Quest owners, but now that the time is here, it still feels just as creepy as when we first imagined it when Facebook bought Oculus.

The surprising truth about pixels and accessibility

Thom Holwerda 2024-02-20 Internet 8 Comments

Should web developers use pixels or ems/rems for accessible fonts? It’s an emotionally-charged question because there are a lot of conflicting opinions out there, and it can be overwhelming. Maybe you’ve heard that rems are better for accessibility. Or maybe you’ve heard that the problem is fixed and pixels are fine? The truth is, if you want to build the most-accessible product possible, you need to use both pixels and ems/rems. It’s not an either/or situation. There are circumstances where rems are more accessible, and other circumstances where pixels are more accessible. ↫ Joshua Comeau The linked article isn’t just an explanation of why, but also a tutorial.

The text file that runs the internet

Thom Holwerda 2024-02-14 Internet 8 Comments

The robots.txt file governs a give and take; AI feels to many like all take and no give. But there’s now so much money in AI, and the technological state of the art is changing so fast that many site owners can’t keep up. And the fundamental agreement behind robots.txt, and the web as a whole — which for so long amounted to “everybody just be cool” — may not be able to keep up either. ↫ David Pierce for The Verge Another thing “AI” does not respect.

Here’s how WhatsApp plans to interoperate with other messaging apps

Thom Holwerda 2024-02-06 Internet 8 Comments

As noted by Wired, WhatsApp wants the messaging services it connects with to use the same Signal Protocol to encrypt messages. Meta is also open to apps using alternate encryption protocols so long as companies can prove “they reach the security standards that WhatsApp outlines in its guidance.” The third-party services will also have to sign a contract with Meta before they plug into WhatsApp, with more details about the agreement coming in March. ↫ Emma Roth at The Verge They way this should work is that these megacorporations create free and open APIs any instant messaging application can tap into. I’m not looking to bring other services into WhatsApp; I’m looking to bring all services together in one unified application that respects my platform’s conventions and integrates properly with the operating systems I use. I feel like this contractual interoperability Facebook (and Apple) is offering is not interoperability at all, and does not reflect the spirit of the Digital Markets Act.

Browsers are weird right now

Thom Holwerda 2024-02-06 Internet 14 Comments

I love this quick to-the-point summary of most of the popular browsers out there right now. I’m a Firefox user, of course, since it’s the best choice between Chrome (I’d rather choose death), Safari (not cross-platform so utterly pointless), the various Chrome skins, and Firefox (the one independent browser). Still, I’m continuously worried about Firefox’ future – specifically on platforms other than Windows or macOS – and strongly believe we need more true alternatives for a healthier browser ecosystem.

SeaweedFS: a simple and highly scalable distributed file system

Thom Holwerda 2024-02-03 Internet, OS News 1 Comment

SeaweedFS is a simple and highly scalable distributed file system. There are two objectives: to store billions of files!, to serve the files fast! SeaweedFS started as an Object Store to handle small files efficiently. Instead of managing all file metadata in a central master, the central master only manages volumes on volume servers, and these volume servers manage files and their metadata. This relieves concurrency pressure from the central master and spreads file metadata into volume servers, allowing faster file access (O(1), usually just one disk read operation). There is only 40 bytes of disk storage overhead for each file’s metadata. It is so simple with O(1) disk reads that you are welcome to challenge the performance with your actual use cases. ↫ SeaweedFS’s GitHub page It’s Apache-licensed and the code is, as usual, on GitHub.

Two months in Servo: better inline layout, stable Rust, and more

Thom Holwerda 2024-01-27 Internet 2 Comments

Another month, another pile of improvement to Servo, the rendering engine written in Rust, originally a Mozilla project. This month the proof-of-concept browser UI got forward and backward buttons, making this bare-bones UI just a tiny bit more usable. Of course, the vast majority of changes and improvements are all focused on the actual rendering engine, which makes sense because Servo definitely isn’t ready for any prime time use – nor is anyone claiming it is. I’m incredibly curious to see where Servo goes in the future.

Meta now lets EU users unlink their Facebook, Messenger and Instagram accounts

Thom Holwerda 2024-01-22 Internet 5 Comments

In a major move addressing European regulations, Meta will soon give users in the EU, EEA, and Switzerland significantly more control over how their data is used across Facebook and Instagram. The changes, set to begin rolling out in the coming weeks, aim to comply with the Digital Markets Act (DMA). ↫ Omer Dursun at NeoWin You’ll be able to unlink Facebook’s various services – such as Instagram and Facebook’s main social network thing – and you’ll be able to use Facebook Messenger as a standalone service without needing to have a Facebook account. Sadly, there’s no word on WhatsApp. This only applies to people in the EU/EEA. Americans need not apply.

Ruffle: an open source Flash Player emulator

Thom Holwerda 2024-01-17 Internet 7 Comments

Made to run natively on all modern operating systems and browsers, Ruffle brings Flash content back to life with no extra fuss. ↫ Ruffle website It’s using Rust and WASM, making it supposedly safer than the real Flash PLayer ever was, and of course, it’s open source too. Their most recent progress report details just how far along this project already is.

A shocking amount of the web is machine translated: insights from multi-way parallelism

Thom Holwerda 2024-01-16 Internet 16 Comments

We show that content on the web is often translated into many languages, and the low quality of these multi-way translations indicates they were likely created using Machine Translation (MT). Multi-way parallel, machine generated content not only dominates the translations in lower resource languages; it also constitutes a large fraction of the total web content in those languages. We also find evidence of a selection bias in the type of content which is translated into many languages, consistent with low quality English content being translated en masse into many lower resource languages, via MT. Our work raises serious concerns about training models such as multilingual large language models on both monolingual and bilingual data scraped from the web. ↫ Brian Thompson, Mehak Preet Dhaliwal, Peter Frisch, Tobias Domhan, Marcello Federico As a translator myself, this is entirely unsurprising. Translating is a craft, a skill, and much like with any other craft, you get what you pay for. If you pay your translator(s) a good rate, you get a good translation. If you pay your translator(s) a shit rate, you get a shit translation. If you pay nothing, you get nothing. I’m definitely seeing more and more people in my industry integrate machine translations, but so far, it’s not been an actual issue – I have no qualms about accepting a job where I take a machine-translated text and whip it into shape and turn it into a human-readable, quality translation… As long as people pay me a reasonable rate for it. Working from a machine translation is often quicker and easier, so the going rate obviously reflects that. The quality of machine translations is absolutely atrocious, however, and the idea of relying on it for texts other people – customers, clients, employees, etc. – are actually supposed to read and work from is terrifying. Google Translate is an effective tool for personal use, but throwing, I don’t know, your product’s manual at it and dumping the unedited result onto your customers is borderline criminal. Pay nothing, get nothing.

I used Netscape Composer in 2024

Thom Holwerda 2024-01-15 Internet 14 Comments

Netscape Composer was my first introduction to web development. As a kid, I created my first web pages using it. Those pages never made it online, but I proudly carried them around on a floppy disk to show them off on family members’ and friends’ computers. This is likely how I got the understanding that websites are just made of files. Using Netscape Composer also taught me basic web vocabulary, such as “page” and “hyperlink”. Of course, the web landscape has evolved immensely since then. I was curious to try out that dated software again and see what its limitations were, and what the code it produces looks like from a 2024 perspective. The first thing I needed was a goal. I decided to try and reproduce the home page of my personal website as closely as the application allowed it. That seemed like a sensible aim as my website has a rather minimalistic design, with very little that should be completely out of reach for an antiquated tool. ↫ Pier-Luc Brault What a fun exercise.

NetSurf 3.11 released

Thom Holwerda 2024-01-01 Internet 10 Comments

NetSurf, the small and efficient browser for RISC OS, Haiku, AmigaOS 4, and obscure platforms you’ve probably never heard of like “Linux” and “macOS” has seen a new release – version 3.11. NetSurf is written in C and has its own browser engine – it’s not based on Google’s browser engines, Chromium and Firefox’ Gecko/Quantum. NetSurf 3.11 features improved page layout with CSS flex support. It also features many other optimisations and enhancements. ↫ NetSurf’s official website It’s an obvious upgrade for everyone who uses NetSurf, since if you’re using NetSurf, odds are the platform you’re using it on doesn’t really offer many alternatives.

AI-created “virtual influencers” are stealing business from humans

Thom Holwerda 2023-12-29 Internet 13 Comments

Pink-haired Aitana Lopez is followed by more than 200,000 people on social media. She posts selfies from concerts and her bedroom, while tagging brands such as hair care line Olaplex and lingerie giant Victoria’s Secret. Brands have paid about $1,000 a post for her to promote their products on social media—despite the fact that she is entirely fictional. Aitana is a “virtual influencer” created using artificial intelligence tools, one of the hundreds of digital avatars that have broken into the growing $21 billion content creator economy. ↫ Christina Criddle for Ars Technica While there’s a ton of questions to be asked about where, exactly, this could lead, and what “AI” will mean for especially women having their likeness recreated as “AI” avatars for people to sleaze over, or worse, the concept of having “AI” influencers doing fairly mundane and harmless things like promote a brand or show some fake photos of their apartments seems fairly benign and even interesting and beneficial to me. Of course, I say this with all the caveats that this is incredibly early days, we have no idea if there are any shady businesses behind these new “AI” influencers, and so on, and so forth. We’ve all seen what technology such as this can be used for, and it ain’t pretty.

Unblocking user freedom: the right to use adblockers

Thom Holwerda 2023-12-21 Internet 21 Comments

Advertisements are a part of our lives, including our digital ones. They are in the websites we browse, the search results we receive, and the online news we read. Tired of receiving so many ads, some users try to avoid them by installing an adblocker. But is this a legal practice? Is using adblockers an act of restricting market autonomy, or do they help achieve user freedom? Imagine a scenario where website owners hold copyright over their websites, including whatever ads they place, and could effectively sue for copyright infringement if users were to remove or suppress ads when visiting these websites. This hypothetical situation would enable any website copyright holder to use the legal system to stop any ordinary user on the internet who tries to bypass these ads. This would lead to an internet where unsolicited information and advertisements are imposed on users. Fortunately, recent court decisions have at least prevented this hypothetical from becoming a reality in Germany. ↫ FSFE Good. My position has always been clear: your computer, your rules. Block ads to your heart’s content. Even on OSNews – block away if you want. There are far better ways to support us, anyway (Patreon, Ko-Fi, Liberapay, merch).

Ousted propaganda scholar Joan Donovan accuses Harvard of bowing to Meta

Thom Holwerda 2023-12-04 Internet 4 Comments

A prominent disinformation scholar has accused Harvard University of dismissing her to curry favor with Facebook and its current and former executives in violation of her right to free speech. Joan Donovan claimed in a filing with the Education Department and the Massachusetts attorney general that her superiors soured on her as Harvard was getting a record $500 million pledge from Meta founder Mark Zuckerberg’s charitable arm. ↫Joseph Menn for The Washington Post This is why “voting with your wallet” is such an empty platitude, usually used by corporatists trying to absolve corporations from misdeeds and shifting the blame to us, mere consumers. How on earth can us regular folks vote with our wallet when someone like Zuckerberg can just buy the entire “election” without blinking?

Google researchers’ attack prompts ChatGPT to reveal its training data

Thom Holwerda 2023-12-01 Internet 5 Comments

A team of researchers primarily from Google’s DeepMind systematically convinced ChatGPT to reveal snippets of the data it was trained on using a new type of attack prompt which asked a production model of the chatbot to repeat specific words forever. Using this tactic, the researchers showed that there are large amounts of privately identifiable information (PII) in OpenAI’s large language models. They also showed that, on a public version of ChatGPT, the chatbot spit out large passages of text scraped verbatim from other places on the internet. So not only are these things cases of mass copyright infringement, they also violate countless privacy laws. Cool.

This month in Servo: better floats, :has(), color-mix(), and more!

Thom Holwerda 2023-11-30 Internet 5 Comments

Our nightly example browser, servoshell, is now easier to navigate, accepting URLs without http:// or https:// both in the location bar and on the command line, and should no longer lock up when run with --no-minibrowser. Local paths can also be given on the command line, and are still preferred when the path points to a file that exists. Work is now underway to improve our embedding story and prepare Servo for integration with Tauri, starting with precompiled ANGLE for faster initial builds, better support for offscreen rendering, and support for multiple webviews. These changes haven’t landed yet, but once they do, apps will be able to open, move, resize, and interleave Servo with other widgets. I’m curious what the future will bring to Servo. It seems under very active development, but it’s not part of any of the main browser projects. Let’s hope they can keep up the momentum so that it can grow into a viable alternative. Because lord do we need one.

Ethernet is still going strong after 50 years

Thom Holwerda 2023-11-17 Internet 10 Comments

The PARC facility also is known for the invention of Ethernet, a networking technology that allows high-speed data transmission over coaxial cables. Ethernet has become the standard wired local area network around the world, and it is widely used in businesses and homes. It was honored this year as an IEEE Milestone, a half century after it was born. Truly one of the success stories of the technology world. Sure, those first Ethernet cables and accessories have changed a lot over the decades, but we’re still using it to this day, and we’ll be using it for many more decades to come.

Facebook and Instagram to offer subscription for no ads in Europe

Thom Holwerda 2023-11-02 Internet 8 Comments

Facebook has unveiled the prices it’s going to charge European users who want to have an ad-free experience on Facebook and Instagram. People in these countries will be able to subscribe for a fee to use our products without ads. Depending on where you purchase it will cost €9.99/month on the web or €12.99/month on iOS and Android. Regardless of where you purchase, the subscription will apply to all linked Facebook and Instagram accounts in a user’s Accounts Center. As is the case for many online subscriptions, the iOS and Android pricing take into account the fees that Apple and Google charge through respective purchasing policies. Until March 1, 2024, the initial subscription covers all linked accounts in a user’s Accounts Center. However, beginning March 1, 2024, an additional fee of €6/month on the web and €8/month on iOS and Android will apply for each additional account listed in a user’s Account Center. That’s a high price to pay to read your racist uncle’s rants and see the heavily photoshopped photos of some random influencer peddling vitamin pills.