As best I can tell, there is no broad consensus on how large a kilobyte is. Some say that a kilobyte is 1000 bytes while others say it’s 1024 bytes. Others are ambiguous.
This also means that the industry does not agree on the size of megabytes, gigabytes, terabytes, and so on.
Not entirely new information to most of us, I would presume, but in my head canon a kilobyte is 1024 bytes, even though that technically doesn’t make any sense from a metric perspective. To make matters worse, as soon as we get into the gigabytes and terabytes, I tend to back to thinking in terms of thousands again since it just makes more sense. The kibibytes and cohorts are a way to properly distance the base 2 system from the base 10 one, but I’ve never heard anyone in day-to-day speech make that distinctions outside of really nerdy circles.
https://en.wikipedia.org/wiki/Kilobyte
Historically and informally, we use the prefix kilo for order of 1024 bytes (2^10) due to base 2 being used in computers, but the SI system mandates that it is for order of 1000 (10^2) in base 10. New prefixes were ‘invented’ by the IEC especifically for base 2, kibi, mibi, gibi, etc, and adopted by SI.
As long there’s no need for absolute correctness and no ambiguity, using the standard base 10 prefixes is well accepted and understood in the IT world when refering to base 2 quantities.
The “math” got worse as unit types widened and more combinations were used in “math”. Thus, more of a need for the “i” designations. A little error can go a long long way.
Stuff gets messy when you buy a 1TB SSD and it turns out it only has 0.91 TiB.
This has been the way for decades now so I wouldn’t consider it anything shady anymore, especially since there are actually valid reasons for that discrepancy.
There is valid reason for everything.
Providsing both option to the buyer and in the OD without insulting us with the silly i would be preferable.
Also really sotrage shoudl obviously be base 2 alliged sizes for you know actually making sense. What next 27.3 Mib ram chips? Meh. (I feel direty now).
Arawn,
I find the inconsistent usage very confusing. Different system tools use different units that you just have to memorize or look up. And they often do care about absolute correctness so making the wrong assumptions can cause errors.
Do these disk tool use SI units or binary ones? ls, dd, df, ls -lh, fdisk, sfdisk, parted, mkfs, lvm, etc
What about monitering tools like top, free, iotop, iftop,etc?
Iftop brings up another confusing point, mega bits or megabytes? They’re both common units in networking. Sometimes the ‘B’ denotes bits versus bytes, but often it doesn’t.
If unix had a list of commandments,, “Thou shall be consistent with SI units” would be on there and we wouldn’t have these inconsistencies!
To me 2^n units are becoming less relevant for 2 reasons:
– Fewer applications. They were useful for small memories with binary addressing. These days that’s a small niche. Discrepancy between GB and GiB is larger and physical layout is either irrelevant or obscured by overheads.
– Data transmission. It is inconvenient to mix data sizes in 2^n units with clock rates in 10^n Hz. It’s good to know that 10Gb/s link with its 10:8 encoding will take ~1000s to transfer 1TB of data.
ndrw,
I agree with both of your rational. But I think the use of binary units in tools might stick around as legacy baggage for generations to come.
IIRC, I had some scripts with incorrect sectors when I switched from GNU tooling to busybox in my distro. The same commands represented different unit sizes.
Alfman,
While for telecommunication, it might make sense, for computers base-2 is still fundamentally there.
Our storage, be it in RAM or non volatile drives are still organized in blocks of 4096 (in most cases). It would be either 4096 byte long sectors (even in NVMe), or 4096 byte size long memory pages (or similar numbers on the GPU).
And when I mmap 4KB from the disk, I expect it to be 4096 bytes. Using the “SI” notation the mmap would be 4.096KB (ouch), and getting worse as the size increases.
(Again for end users streaming a 10GB movie, the details don’t mean as much).
sukru,
It’s not so much that base 2 isn’t there, we know why it’s used. But pretending that base 2 units == SI units was always completely wrong and conflating them as we do today is just an awful practice. The mathematical error between the mathematically incorrect units and correct ones increases as we represent larger sizes.
https://i.postimg.cc/52fQJNSy/si-units.png
By sheer coincidence this very week my nephew installed storage and didn’t understand why there was a discrepancy between the TB on the packaging and the capacity show by windows. I believe windows showed the correct SI units and the packaging showed the binary units. And before anyone suggests we should just show the binary units for capacity because sector sizes are always multiples of 2 (actually 512 or 4096 depending on sector sizes), this would have to be reflected in file sizes too since It would be highly inconsistent to show show binary units for disk capacity and real SI units for file sizes. But then this would lead people to wonder why a 1MB download from a webserver or NAS or even a database shows up as “977kB” on their file system… “What happened to the missing bytes? Every time I download the file it gets chopped off!?”
The only way to fix these mathematical discrepancies is to be absolutely consistent with units across the board anywhere normal users are exposed. Should that consistency favor true SI units or fake binary ones?
While there may be a need for binary units in specific low level cases like partitioning, in general I’d argue that operating system primitives are abstractions for the underlying hardware. The fact that files may be stored into binary sized sectors on disk or pages in ram doesn’t change the fact that 1,000,000 bytes is the same quantity of bytes regardless of if it’s stored in ram, on disk, on a network share, on an ftp server, in a database, and so on. Using binary lengths because the storage media is built that way creates a leaky abstraction, and IMHO it’s bad practice.
1MB on disk should equal 1MB on the network and so on and we should never have to second guess this. Unfortunately this can’t happen now because these inconsistencies are already so widespread, ideally though we would have been consistent from the start.
sukru,
It isn’t that 2^n sizes stopped to exist. It is just they became a niche for low level programmers and even that niche is shrinking. This used to be a problem but now it is only good for starting a discussion on osnews.
For general population these units are a leaky abstraction rather than a tool. If for any reason it was important to measure data sizes with these units there are better alternatives: N sectors/pages, N KiB, N x 2^10 B, N x 1024 B.
Regardless of usage, the idea of redefining “kilo” was plain wrong from the beginning, even if it had that hacky appeal. It is like German or Japanese automakers adopting inches in their products after redefining them as 25mm. Creating new units is fine, redefining existing ones is not.
ndrw,
Generally I believe operating systems should expose true SI units and not leaky abstractions. The representation of byte counts should be consistent regardless of which bytes we’re talking about and where they are stored. A megabyte should equal be 1,000,000 bytes and users shouldn’t have to second guess it.
Couldn’t agree more. If 2^ units are going to be used at all they should always use the appropriate prefixes. Our industry has done a lot of damage to the SI standard by using it wrongly.
Alfman,
Developers and end users continue to have different world views. And this might be normal.
For the CPU/cache/RAM world, everything is still naturally base-2. When we get an Intel 9700k with 12MB cache, it means there are 12,582,912 bytes. Same with RAM, 128GB is 137,438,953,472. I don’t think we’d have 137GB RAM advertised anytime soon. And it will be very confusing, since we’d have 134MB, 137GB, and 141TB RAM (in future) instead of 128MB, 128GB, and 128TB.
This will probably continue for a while, until developers themselves lose familiarity with the underlying hardware.
Anyway, I feel like the “man shouts at clouds” meme here…
sukru,
Yes, I agree that’s way the industry evolved, but I’m still very bothered that the units are mathematically wrong. Binary units should never have been represented with SI prefixes. It should be “128GiB”, etc.
Your data transmission example does not make sense to me.
If you know your link speed is 10Gb/s, and indeed approximate it as a 1 GB/s transfer rate, you just need to know how many gigabytes of data you need to transfer.
Convert terabytes to gigabytes, big deal.
Your link speed unit being a SI unit and your storage unit being a non-SI unit is not really relevant.
You will have a much harder time (!) converting seconds to hours/minutes/seconds!
You like it or not, binary units are no longer used in telecomms. Converting between Gb/s and b/ns is one reason, another is that 2^n sizes are rarely used, because of all the encoding and protocol overheads. So next step up fro a bit is a frame/block/packet etc.
ndrw,
Yes, the “gigabit” link never transferred 128MB/s anyway. There are physical limits (noise), electronic signaling (Ethernet pre and post packet synchronization), protocol overhead (TCP/IP headers), all of which reduce available bandwidth.
Hence, when I see “gigabit”, I think slightly over 100MB/s in ideal conditions.
The only tool I know of that (stupidly) switched to SI units is parted.
Thankfully I don’t need it, it was only useful when fdisk did not yet support GPT.
It gets even weirder with the 1.44MB floppy disk:
https://en.wikipedia.org/wiki/Floppy_disk
It is actually 1440 KB, or 1.41 MB. Or 1,474,560 bytes if you counted them.
I think this might be the first instance of storage manufacturers trying to use base-10 to pad their numbers. And yes, from there we reached into 0.91TB SSDs which sell as 1TB ones.
Let’s be real here, 1 kilobyte was defined as 1024 bytes a long time ago because computers are much better at diving by 2 than they are diving by 10. Also, a 4GB disk can be memory-mapped to a 32-bit pointer without wasted address space if we define 1 kilobyte as 1024 bytes, and the same goes for two 2GB discs, so it makes sense to sell disks at power-of-two capacities.
Then someone in the hard disk industry realized they could sell less storage for the same price if they argued that 1 kilobyte is 1000 bytes and point to the SI units as a justification, and apparently this nonsense passed legal challenge, so here we are…
Have you ever wondered why CDs are measured in base-2 while DVDs and Blu-Rays are measured in base-10? It’s because the above-mentioned marketing nonsense happened sometime in between the invention of CD and the invention of DVD.
What I find particularly funny is HDD DVD recorders that advertise the capacity of their HDD in base-10 on the fascia but their own firmware measures in base-2 (I have a Sony HDD DVD recorder that does exactly that). This is your proof that even manufacturers don’t buy their own BS. And of course, most smartphones until very recently did the same. Some versions of Android show total device storage as base-10 (to align with the marketing) and, since Android up until very recently measured in base-2, accounted for the difference and added to “storage used by system files”. Ridiculous.
kurkosdr,
I don’t think anyone is suggesting the hardware itself be physically changed to fit nice round decimal numbers. We’re just talking about having units representing those sizes be consistent with the SI prefixes used by the rest of the world. If binary sizes need to be used, then the kiB prefixes should be used without exception to avoid ambiguity.
If we meet aliens one day, it will speak so poorly of humanity that we haven’t been able to solve this inconsistent unit debacle. Haha.
When you define a unit, you are free to define what the “kilo” prefix means for that unit. The byte is not an SI unit and doesn’t have to use the SI definitions for the “kilo” prefix. The fact that the argument that a byte is an SI unit and should use the SI definitions of prefixes passed legal challenge is absurd.
Also, 1 kilobyte being 1024 bytes before the lawyers and marketers got their way is not a US-only thing. The CD, which was co-invented by Sony and Philips, is a good example.
kurkosdr,
What legal matter? It’s just extremely shortsighted to reuse prefixes with different values. IMHO the sooner everyone gets back to using SI prefixes correctly the better.
I don’t understand your angle here. I really don’t care who did it, appropriating long established SI prefixes with different values is an objectively bad idea for standardization purposes.
Do you have a problem with the binary prefixes?
https://en.wikipedia.org/wiki/Binary_prefix
Wouldn’t you agree it’s best to have an internally consistent standard where prefixes are unambiguous? If so then you should agree that it would have been far better to use new prefixes than to refined SI prefixes.
The byte is not an SI unit and doesn’t even exist in the physical world, so it doesn’t have to be held to the same standard that physical-world units are, but should be held to the standards of the digital (binary) world. Even the companies who push this base-10 kilobyte nonsense don’t buy their own nonsense. As I’ve said in another comment, Samsung uses base-10 when selling SSDs but base-2 when selling main memory, and Sony will sell you devices advertising capacity at base-10 but their own fully home-grown firmware will report in base-2 in some devices. If Samsung and Sony don’t believe their own BS that the byte is allegedly an SI unit (it’s not) and/or should use SI prefixes, why should I?
Again, SI prefixes may be long-established in the physical world, but aren’t long-established (or even well-established) in the digital world and also don’t make sense. 1000 meters is a nice round number in the physical world, 1024 bytes is a nice round number in the digital world (base-2 maps well to pointers etc). 1000 bytes isn’t a round number in the digital world and doesn’t exist in the physical world.
First of all, they sound moronic and are unpronounceable. This is reason enough. Secondly, they aren’t used consistently even by the companies that push this base-10 kilobyte nonsense, see the example with SSDs and main memory above, or the example of device firmware.
It’s good to be consistent in the physical world, and it’s also good to be consistent in the digital world, but the two don’t have to be consistent between each other, because they are unrelated. Which brings me to my main point: All this nonsense of shoehorning SI units in the digital world, besides not making sense from a digital perspective, has led to tons of inconsistency, like the two examples I mentioned above. Hate, hate, hate.
Personally, I would love to see those “binary prefixes” be legally mandated (for example when the main memoryof a laptop is advertised), just so I can see people attempt to pronounce things like “gibibyte” and “tebibyte” in radio and TV ads.
But back in the real world, these “binary prefixes” are never used. Manufacturers just use “decimal bytes” in storage specs as an excuse to sell less storage at the same price. For literally all other purposes, 1 kilobyte is 1024 bytes.
shoehorning SI units = shoehorning SI prefixes
kurkosdr,
Honestly this is extremely shortsighted. Just because we could let everyone redefine the SI prefixes like “kilo” to suit their own ends doesn’t mean it’s a good idea… in fact it’s a terrible idea for standardization and it recreates many of the very same problems that the world faced before switching to metric. Everyone just used to invent their own values for inches, feet, ounces, and so on. This was a disaster as nobody could agree on values and you are guilty of doing the same thing to SI prefixes. We should not allow this to happen which is why we need to be firm in insisting that SI prefixes are consistent with their long established meanings not only for traditional units but also for new units. It’s unfortunate that the tech industry didn’t come to its senses sooner, but at least today our standards organizations are promoting the correct usage of SI prefixes everywhere including computers and I for one agree with them. It’s the right thing to do for the sake of consistency and standardization.
Having new binary prefixes can be useful for things like sectors. However to suggest that normal SI prefixes don’t make sense for modern units like bytes is nonsense. We can absolutely talk bytes without there being an intrinsic need to convert byte quantities into base 2. How large is a jpeg file? What is the throughput of your ISP connection? What is the capacity of a cloud storage account?
Not only does redefining prefixes around base 2 create inconsistencies and contribute to confusion and ambiguity for standard prefixes like “kilo” 1,000 and “mega” 1,000,000, but it’s also a regression to units that were painful to work with because they didn’t fit into our decimal system. Consider that scale conversions are trivial using the standard SI prefixes:
1GB/Gs == 1B/s
But with your binary prefixes:
1GB/Gs == 1.073741824B/s
Granted it’s a contrived example to make a point, but it does highlight how you’re throwing away consistency and properties that make SI prefixes so easy to work on with no hidden surprises.
The whole value proposition of the standard is in providing consistency. When you start allowing everyone to redefine what basic things like prefixes mean, we loose these nice properties that everyone can understand.
Pixels and parsecs didn’t exist when SI units were invented. So going by your logic, prefixes can cake on new values. They could define a “megapixel” to mean 921,600 pixels on the basis that this is the number of pixels on a 1280×720 HD screen. HD screens means nothing to scanner manufactures though so for them they’ll use “megapixels” to mean 1,000,000 pixels.
Hopefully you see what I’m getting at. Redefining SI prefixes for new industries just leads to nothing but chaos. It’s significantly better to apply SI units and prefixes in consistent and predictable ways that everyone can easily understand regardless of the industry that’s using them. As long as we stand our ground on the integrity of SI prefixes values, then we can just do away with all industry specific conversion factors and use SI prefixes consistently everywhere. This really is for the best.
The official standards bodies I looked at don’t recognize a 1024byte kilobyte, are there any that officially do? Most users expect “kilo” to mean 1000 even with bytes. There aren’t that many applications where users need to know the binary values since it’s mostly abstracted away. For things like DIMMS where we still use them, I think using binary prefixes such as “16GiB” is both appropriate and reasonable.
1 kilo was defined as 1000 in the 18th century, long before computer were even conceived.
Then, a bunch of programmers took the utterly moronic decision of using it with a different meaning.
Now that we have standard binary prefixes, there is absolutely no excuse to perpetrate this confusion: if you mean binary multiples, use binary prefixes.
1 kilo was defined as 1000 in the 18th century for SI units only, and the byte is not an SI unit. The fact the same companies that gaslit everyone about this will sell main memory using the base-2 definition (because base-10 doesn’t offer a financial advantage in this case) tells you everything you know. For example, Samsung uses base-10 when selling SSDs but base-2 when selling main memory. Or the fact that Sony will sell you devices advertising capacity at base-10 but their own fully home-grown firmware counts using base-2 in some devices. If Samsung and Sony won’t believe their own BS that the byte is allegedly an SI unit (it’s not), why should I?
This isn’t the first time the metric system has caused confusion, the “tonne” bein pronounced the same as “ton” comes to mind. The farenheit scale being redefined to be precisely 9/5ths of celsius causes confusion with old measurements. And more.
The only solution to the KB/MB/TB confusion is permamently depreciating those terms and forever treating them as undefined. Then come up with a different prefix for base~2, and another for base-10. Both sets of terms must be obviously different than KB/MB/TB and each other, and both sets must NOT sound idiotic in any common language, unlike the meba biba bytes nonsense.
Trying to redefine commonly used terms was incedibly foolish and short-sighted. Drop the terms, and put a little effort into proper new ones. Until someone does that, the situation will never be resolved.
It is at least not as crazy as the ince or the ounce, where most countries had their own measurement of those, some countries had several using the same name. In the 1700s there was at least 40 different inches used, and some of them lives on to this day. And lets not talk about a foot: https://en.wikipedia.org/wiki/Foot_(unit) where wikipedia lists over 70 different foot measurements used around the world, and there are plenty more that is not on the list.
This isn’t the first time the metric system has caused confusion, the “tonne” being pronounced the same as “ton” comes to mind.
As a SI-units native, I might be missing something of significance here, hence my desire to dig a bit. I’m under the impression that the “ton” was already a mess (with the short (US) and long (UK) variants) to begin with. Furthermore, the fact that it is pronounced the same as “tonne” simply stems from the fact it IS the same word originally, since both come from the old french “tonne” (which meant “cask”, now “tonneau” in modern french to distinguish it from the metric tonne).
Although in this case the metric system has barely contributed to the confusion, I’m under the impression that kind of evolution has actually always existed with previous words being reused for new similar (but not equal) units through the history of agricultural, industrial and commercial needs. It’s not like the foot, the yard, the mile and many other units were pretty clearly and universally defined before the french revolution…