Modern memory management isn’t as simple as knowing that you have 150MB of programs to run and 256MB of memory to do it in. Modern Unix-like operating systems have their own characteristics for allocating and using memory. Howard Feldman explains how this works and shows how to analyze and reduce the memory consumption of your programs, no matter what language you use.
From the article:
“You must be most careful with C, where you control all memory allocation and freeing. Languages such as C++, Java, Perl, and PHP take care of a lot of the housekeeping automatically.”
As C, C++ does not take care of any housekeeping automatically.
“The majority of programming languages ultimately end up using a single system call to allocate memory, malloc. malloc is part of the main C library”
malloc is not a system call…
“You must be most careful with C, where you control all memory allocation and freeing. Languages such as C++, Java, Perl, and PHP take care of a lot of the housekeeping automatically.”
As C, C++ does not take care of any housekeeping automatically.
Oh, does it. It’s called a destructor, and it’s called automatically (also note it says “a lof of”, not “all of”).
There are other mistakes in the article, like claiming top is a useful utility (it is, but only if you really understand it, which pretty much everybody including the author fails – SIZE is not total RAM usage, it’s the total virtual memory usage) or claiming that freed memory is not returned to the “heap” (probably meaning the OS, and it sometimes is). The article is not that bad in general though, this topic is far from trivial.
Lubos Lunak
Where’s the analysis? I only see a lot of nothing and some babbling
A C++ destructor doesn’t free memory (unless a programmer writes memory freeing code), it’s a method that’s called when memory is freed.
The fact that a programmer has to write memory freeing code in the destructor for it to be of any use whatsoever, means that C++ does not take care of a lot of the housekeeping automatically.
The author doesn’t know much about how memory allocation and memory management work these days.
His claims about fragmentation are only true for systems without an MMU and virtual addresses.
He should look up external vs. internal fragmentation.
The author of this post in bold below either doesn’t know that much about memory management themselves, or perhaps knows exactly as much as the author of the article, because making the statement “His claims about fragmentation are only true for systems without an MMU and virtual addresses.” shows a limited understanding of what the author was saying, along with a limited understanding of what an MMU does for you, and where it does it for you.
And now, a bit of education to erase the ignorance shown in the bold-quoted post below:
An MMU is a piece of hardware that helps manage translations between virtual and physical addresses in a cooperative manner using the OS to decide the gory details. The virtual addresses are what a user-space process sees: the physical address is looked up via the use of some sort of lookup table in RAM that is maintained by a combination of hardware and the OS, and depending on the hardware/OS combination, may track things such as how recently a range of memory was accessed, and whether or not the range of memory (virtual address) is actually in physical RAM (physical address) when a process attempts to access it.
The same physical area of RAM (physical address) that is used by more than one process may occupy more than a single virtual address range (the address a user-level process sees), and, indeed, it is possible to have the same physical area of RAM occupy more than one virtual address space in the same process. Shared libraries are often loaded exactly once into physical memory, and shared amongst many different processes in their virtual address spaces, which has the benefits of saving RAM space, increasing the chance that the needed code will be in the CPU and filesystem/VM cache, etc. and results in much less swapping. Shared libraries may reside in the same logical (user process virtual memory) address in each process, or it may change; that’s one of those details that may vary, as described above. Often, there will be a preferred address space (virtual address) in the user process that a shared library is loaded in; this space may not be available, due to fragmentation within the process virtual address space, and the OS will need to load it into some other portion of virtual address space, all while it isn’t moved in the physical address space, which is the space that is only of concern to the kernel level code.
There are certain ranges of physical address space in all machines that are not available for normal RAM use, due to being used for such special purposes as system firmware, memory mapped I/O switches, special buffers used for DMA/bus mastering devices, and simply not having physical RAM present, etc. and thus a user process doesn’t usually have these areas mapped into their virtual address space. Depending on the machine architecture, these special use memory pages may leave the address space fragmented at the physical level, which is the case that the kernel cares about. If the kernel needs a portion of RAM larger than is available between these special pages for kernel space data, it needs to either break up its data, or (if it can) move the data in currently used physical RAM address space into a more convenient space, and update the MMU data to reflect that for whatever user-level processes use the data in their virtual address spaces; this is completely transparent to the user-level processes. It is important to note that fragmentation is still an issue for the kernel to deal with.
But wait, there’s more! EVen though the kernel can move data in and out of physical RAM for both its own process as well as all user-level processes, and this is transparent to the usr-level processes, it still doesn’t solve the problem of fragmentation for each user-level process. Sure, they have as much virtual address space as the system allows them, which may be the entire possible virtual address space possible by that processor (not that common, though) and processes don’t have to fight for an address space that they can use, as that’s all arbitrated by the OS’s policy. Ok, so now we’ve established that user-level processes don’t interfere with each other in terms of fragmenting each other’s virtual address space, which is all that really matters to a user-level process. Now, we have our processes with their own sandbox to play in. They can ask for a larger sandbox, and also ask to reduce the size of the sandbox after they’re done and still play, as long as they don’t choose to terminate. However, this does absolutely nothing for avoiding fragmentation by itself. The smallest amount of memory a process can request from the OS and have it handed to it is in exact multiples of the MMU memory page size (which is most commonly 4K bytes for the x86, 32 bits and less; this varies from processor to processor, and also on the OS) which may not even be a single VM page; Windows NT doesn’t allocate anything smaller to a user-level process than increments of 64K bytes, for example, and observation is that BeOS 5.03 is the same in that regard. Each of these OS-level memory allocations may be further subdivided and managed by language-specific memory allocation libraries, such as malloc/free and new/delete for C/C++. When going through a language-specific library that abstracts the OS calls from the user-level process, it may never return unused (by the process) pages of memory that the process has released via free/delete, and keep it within its own heap management for further use, for as long as that process exists. IF those pages are returned to the system, the OS can reassign them to the same process at the same virtual address, or to any other process at any other virtual address, or both. There will then be a hole in the virtual address space of the process that returned that page or group of pages to the OS, however. When that process needs to allocate memory for something, it may ask for something larger than the virtual address space has a hole that is currently unused as assigned by the OS, and it may also be asking for something larger than what is available in the RAM already managed by the library and is present in the virtual address space, and the memory allocation will fail.
It really is a pity that an MMU doesn’t track pointers within data structures of a user-level process and adjust them when there’s no option to getting memory than rearranging all that’s allocated within a process to make room for it; if an MMU did do that, then fragmentation truly wouldn’t ever be a problem, and life would be so much easier. Here is an experiment to demonstrate that an MMU does not solve the fragmentation issue by an example application that does these things:
1. Create a list of randomly-sized allocated memory blocks, that vary in size from the minimum size of pages allocated by the OS, up to some integer multiple of that number of pages, and record all their addresses. This range needs to be tracked for the upper bounds. This application needs to allocate memory until it cannot allocate more.
2. Go through the list of memory blocks allocated by this, and free every other allocated memory block.
3. This is the proof that an MMU does not solve memory fragmentation: attempt to allocate one or more blocks of memory that are larger than any single allocation made above. This will fail, and it does not matter what programming language, computer hardware, or operating system is used, if it uses an MMU that works in a similar manner to an x86 processor (ie. doesn’t do the magic of fixing addresses within virtual address space when things get tight).
Somewhere, somehow, a process either will run into a wall when it comes to fragmentation and will fail to be able to allocate memory, or it will hit the wall, and have to rearrange data within its own address space until it can make a large enough area of free memory for an allocation to succeed, and having an MMU does nothing to make this any easier for anyone. Using handles to refer to ranges of memory that you lock down when you need to use them makes this less fragile than using a virtual address space alone, but is rather tedious, to say the least. This is a method that was used in Windows before Windows NT/95/3.1, among other systems, such as the PalmOS, etc. and is still likely to be used in more limited memory environments. In a way, the Java VM and the Microsoft .NET runtime environment automate all that, and leave the developer to worry about coding the application in a less involved way that doesn’t require them to worry as much about memory management, by taking care of all of that for them. This becomes much more important for long-running applications that have dynamic loads where static buffers cannot be allocated, and where the allocation sizes vary over time. Ever wonder why your favorite web browser memory usage climbs over a period of time? They are a great example of memory fragmentation, as well as disk cache fragmentation, because they work with small bits of text mixed with large and randomly-sized graphic files. The best thing you could do for overall system performance when running a web browser is to have their cache files kept on an isolated partition as a result. If your email application uses separate files, the same thing applies. That way, you can maintain a less-fragmented filesystem for everything else with little effort on your part.
Jonathan Thompson
The author doesn’t know much about how memory allocation and memory management work these days.
His claims about fragmentation are only true for systems without an MMU and virtual addresses.
He should look up external vs. internal fragmentation.
His claims about fragmentation are only true for systems without an MMU and virtual addresses.
My above statement was a bit harsh, I know. The fragmentation description in the article was a bit to simple. My point is that most applications has no problem with external fragmentation when running in a 32-bit virtual address space. Even less so, in a 64-bit space. The example that were given certainly wouldn’t be a problem.
The article should be replaced by the 2 parter by Jonathan Thompson on this message board. This was a great and usable explanation!
“The fact that a programmer has to write memory freeing code in the destructor for it to be of any use whatsoever, means that C++ does not take care of a lot of the housekeeping automatically.”
Do you use C++?
If you write good C++ code it will clean itself up automatically. No *custom* destructors required.
The problem is most people treat C++ like C (which you can do) and do not understand the power that comes with C++ memory management.
Have a look at Auto pointers and Smart pointers. You will never have to free memory manually again.
Have a look at Auto pointers and Smart pointers. You will never have to free memory manually again.
But you have to choose to use them, and then use them correctly. You can as easily argue that in C, you should have a look at free() to solve all your memory porblems..
I learned C++ about seven years ago, when smart pointers weren’t well known; these days I mainly use Java, Delphi plus one other scripting language (currently hovering between PHP and Perl, though I’d love to look at Ruby if I get the chance).
I’ve avoided C++ for the most part, I think it’s an awful jumble of a language (you know you’ve got problems when you have four casting methods to choose from). As a result I’ve never used smart pointers. I will say, though, that a C++ programmer still needs to be acutely aware of how memory works to ensure their application doesn’t leak anything: using auto_ptr<> with lists of objects shared across the application, for example, can result in nasty surprises, e.g.
void crazyFunc()
{
auto_ptr<MyClass*> mc = new MyClass();
auto_ptr<MyList*> list = myOtherClass->MyList;
list->add (mc);
}
void crashInBurnBaby()
{
auto_ptr<MyList*> list = myOtherClass->MyList;
cout << dynamic_cast<MyClass*> (list[0])->attr;
}
The obvious solution is to have a good hard look at the smart pointers available to you (e.g. http://www.boost.org/libs/smart_ptr/smart_ptr.htm), or else use normal pointers or references in certain areas, but at that rather complex point, aren’t you – the developer – working quite hard to avoid leaking memory?
You should of course *not* use std::auto_ptr<> for lists. That’s what the STL containers like std::vector or std::list are for.
IMO you shouldn’t judge a programming language when you haven’t really used it lately. It’s like judging Java based on experiences with Java 1.0.
Is it just me or is there a conflict between the linked web site and Firefox?
On my Ubuntu Breezy system, after a minute or so, my browser’s shared memory is 125 MB and its CPU use at 96%, untill it finally crashes.
I am using the latest 1.5 RC1. I have also tried the earlier 1.7 (not the Ubuntu version).
Don’t see the irony?
In good C++ code there are nearly no direct memory allocations or deallocations via new/delete.
Most is automatically done by container classes like std::vector or std::list.
The only situation where I have to think about memory management is when I implement a low-level utility class like Tree for example.
In good C++ code there are nearly no direct memory allocations or deallocations via new/delete.
Most is automatically done by container classes like std::vector or std::list.
That’s complete and utter bull feces!
What you say is all fine and good if you are writing an application that falls neatly within the STL.
However, find me a good C++ GUI library that prevents me from having to use new/delete? Qt, wxWidgets, MFC, and gtkmm all require it.
However, find me a good C++ GUI library that prevents me from having to use new/delete? Qt, wxWidgets, MFC, and gtkmm all require it.
Nope. I have a few thousand lines of Qt code where I didn’t use new or delete even once. There’s no reason to.
A C++ destructor doesn’t free memory (unless a programmer writes memory freeing code), it’s a method that’s called when memory is freed.
The first part is right. The second part however is not true in general. If you free only memory, no destructor will be called. It is just that you usually do not only free memory, but rather delete an object.
The fact that a programmer has to write memory freeing code in the destructor for it to be of any use whatsoever, means that C++ does not take care of a lot of the housekeeping automatically.
Of course it depends on the programmer if it is a lot or not. But the fact stands: You *can* save yourself of a lot of manual memory management if you encapsulate it properly. Unless of course you define that a few “free” or “delete” expressions in the whole code is already a lot.
However, find me a good C++ GUI library that prevents me from having to use new/delete? Qt, wxWidgets, MFC, and gtkmm all require it.
I can not say for the others, but Qt certainly frees you from having to use “delete” a lot. It has this nice object-tree capability which usually fits very well with the task of building a GUI, and sometimes GUI unrelated tasks. If this was actually a *real* call for help, rather than a rhetoric one, then go and try Qt again.
I hate these technical articles where I can’t troll because I can’t understand a flying f–k of what’s going on…
What a load.
Memory fragmentation isnt an issue if you have an MMU, and how many versions of UNIX run on a computer without one?
I can only assume most of the rest is aimed at someone fresh out of computer studies course because only a moron would load a 100Mb file into memory and process it that way unless there was a *very* compelling reason. Its something only a new programmer would do, and I would expect any graduate to know better.
Man, you are so right. Somebody had better tell every programmer for every CAD package that they don’t have to load a 100Mb file. I mean c’mon when I’m working on a large solid model I like it when I have a hard drive hit to slow me down.
Sorry, couldn’t resist! 🙂
—I can only assume most of the rest is aimed at someone fresh out of computer studies course because only a moron would load a 100Mb file into memory and process it that way unless there was a *very* compelling reason. Its something only a new programmer would do, and I would expect any graduate to know better.
Two points:
o First, an MMU is not a panacea for memory fragmentation. Frequent memory allocations and deallocations of different sizes “randomly” can result in a process’s address space resulting in the memory required for the process growing much larger than necessary. Further, much of this memory may not be paged out (depending on page sizes and memory block size), which can have a significant impact on overall memory usage.
o Second, another technique for managing contiguous memory regions is to use the mmap() function (usually implemented as a system call). This can be used both for purely private space (mapping to /dev/zero with the MAP_PRIVATE option) or for large-scale file I/O (rather than reading a large file, for example, it may be mapped into memory and accessed directly). This technique may substantially reduce fragmentation in a process’s address space and (for read-only files) eliminate swap space allocation.
> I can only assume most of the rest is aimed at someone fresh out of computer studies course because only a moron would load a 100Mb file into memory and process it that way unless there was a *very* compelling reason. Its something only a new programmer would do, and I would expect any graduate to know better.
It would be very fun to play FPS games with messages like “PLEASE WAIT; LOADING SCENE…” displayed every 20 seconds.
Here’s something I’d like to know from someone knowledgable.
If I have a 100K process, and fork it, I have 2 100K processes. If I run ‘ps’, they’ll both have identical sizes.
But, in those calculated sizes are the common areas of shared libraries, the shared text pages, etc. The only distinctions between two processes running the same executable are basically the data segments.
Let’s say my contrived 100K program has 50K of code, and 50K of data. When I fork it, even though adding up the sizes says 100K each, in fact, in terms of actual physical memory, I’m only consumeing 150K of physical RAM (50K for the shared code, and 100K for the two 50K segments).
Another example is if you do a ‘top’ or ‘ps’ on, say, a slew of Oracle processes, you’ll see each one takes up a large chunk of memory. But what you’re not told is that much of that memory is a shared memory segement being used by all of them.
So, the REAL question, is there any way I can determine the real impact on my system of running any arbitrary executable? Something that tells me and summarizes what pages are shared or “public” vs private pages (which can reflect the real impact on the system of a new process).
It gets even more complicated than what you’ve described, depending on the OS and how the processes were started: was one forked from the other? If that’s the case, they may employ Copy On Write (COW) for the data segment only at the time the forked process modifies data in what it believes is its sole data segment. Thus, measuring total memory used is a transient thing for two “identical” processes with data that was identical when they started, and thus the amount of physical memory changes up to a certain point.
However, it’d be unwise for the OS to start the second process without committing as much swap space to the data segment as the original process already has, if the data segment is writeable to start. Under Windows, this is referred to as “Commit Charge” which is the amount of actual RAM/swap space is reserved for the process. This same sort of memory sharing can happen on systems where the UNIX fork isn’t used, such as Windows, which has similar ways to share memory between processes as well. Thus, you could readily have completely different executables share COW memory segments where both have write access, and have the same transient memory use and reporting issue.
Jonathan Thompson
It gets even more complicated than what you’ve described, depending on the OS and how the processes were started: was one forked from the other? If that’s the case, they may employ Copy On Write (COW) for the data segment only at the time the forked process modifies data in what it believes is its sole data segment.
Yea, I hand waved over that, basically because the new process has the POTENTIAL of changing and getting copies of all of the COW pages to it. Also, some processes can create memory blocks that are static and read only that the OS could well share across processes, even if not forked from the same parent.
So, yea, there are all sorts of complexities and shenanigans that can go on, but if I can simply get a solid summary of private vs shared pages, I’d be just giddy!
This news item is a little better than the one about filesystems a week or two ago. But it is still not what I thought it would be from the title.
I thought that this article would be how modern operating systems handled memory management including swapping out pages to virtual memory, sharing libraries and kernal code between processes, and using main memory store as cache.
I think Jonathan Thompson’s posts were pretty good.
Now, some questions: don’t all processes have virutal addresses starting at the same point, therefore removing the need to write relocatable code or have a loader that relocates code?
Do all processes have a virtual 4GB of address space (2 GB actual) with certain portions set aside as stack, library, kernal, program, and data space?
If the previous answer is yes, how does memory fragmentation work in a process? When malloc gets a certain number of pages from the OS, older pages get swapped out. Even if malloc doesn’t release the memory from the process but keeps it in its own free pool for that process it would get swapped out if it is not accessed regularly. So even though tools such as top would show more memory being used, if the pages are swapped out and never used again then does this have any affect on performance? At least until the process runs out of virtual address space?
Are most OSes smart enough to use the same RAM for multiple instances of the same program? What about for libraries? When the OS swaps out portions of an process’ code, does it use swap space or point to where the executable is on disk (or in cache)?
How do modern OSes use main memory as cache to speed up access to recently or frequently accessed data or code?
Anyway, an article describing some of these issues or one discussing MMU issues (which I think I understand mostly but a good article would teach me more) would have been much better.
Now, some questions: don’t all processes have virutal addresses starting at the same point, therefore removing the need to write relocatable code or have a loader that relocates code?
Yes (I’m assuming a reasonably modern desktop system
here).
Do all processes have a virtual 4GB of address space (2 GB actual) with certain portions set aside as stack, library, kernal, program, and data space?
Size of the virtual space is system specific, but there are portions of set aside for different parts of the process. The most notable bits are the code-segment, which is the in-memory copy of the executable file, and is read-only, and the data-segment, which is the memory used by the process do do it’s job. The data-segment contains the stack (on which normal variables and function parameters & results are stored) and the heap (accessed using malloc() and free())
If the previous answer is yes, how does memory fragmentation work in a process? When malloc gets a certain number of pages from the OS, older pages get swapped out. Even if malloc doesn’t release the memory from the process but keeps it in its own free pool for that process it would get swapped out if it is not accessed regularly. So even though tools such as top would show more memory being used, if the pages are swapped out and never used again then does this have any affect on performance?
You can over-use memory in a virtual address space by calling malloc and free a lot, causing gaps. The OS can only do some much profiling on your app, so it will load unnecesary pages from time to time. Also note that these gaps can mean a two 4K pages might have a lot of empty space between them, but are both needed because they have fragments of usable memory too.
Overuse of memory by a single task really affects multi-tasking, as all those pages have to be loaded back in again when the the OS gives a time-slice to the next task. This isn’t too much of an issue with most GUI apps, which just idle along, but can be an issue when the user switches tasks. As an example, start a couple of heavy apps on your system (Firefox with 20-odd tabs open, Eclipse, OpenOffice with a large spreadsheet open, etc.). While you’re working on one your fine, but when you switch to the next, there is a noticeable pause while the old pages are swapped to disk and the new ones are loaded.
Are most OSes smart enough to use the same RAM for multiple instances of the same program? What about for libraries?
Yes and yes, and then some. For example, people were talking about the fork() call above. This creates an duplicate copy of a process. It doesn’t bother to copy the code-segment (the in-memory copy of the executable file), it only copies the data-segment. More advanced systems, like Linux, go one further. They don’t create a copy of the data-segment, they mark all the old pages as shared, and only create copies when the duplicate process tries to write to one of the pages on “its” copy of the data-segment.
Systems have supported this for libraries for years. It’s one of the reasons code-segments are read-only (the other being that re-entrant code makes kittens cry)
When the OS swaps out portions of an process’ code, does it use swap space or point to where the executable is on disk (or in cache)?
I’m don’t understand the question. I can say that how pages are written to disk, and how they’re identified as being on disk, is very much system dependent.
How do modern OSes use main memory as cache to speed up access to recently or frequently accessed data or code?
If you’re talking about data loaded into memory, it’s just a matter of keeping used pages in memory. Most OSs keep a track of how often certain pages are used, and only the Least Recently Used (LRU) pages are written out to disk. As programs tend to execute on the same part of code all the time (think of all the loops you write) called the “Window of Execution” it’s possible to write out quite a bit of memory without any ill-effects. Most OSs don’t bother trying to second-guess where the Window is moving. As regards cache, that’s handled by the processor, the OS has very little control over it.
If you’re talking about access to data on a disk there are two ways:
First, they share the code segments for processes, and libraries between processes. As a result, if you start a second copy of a program, the OS won’t reload the executable from the file-system, it will simply create a new data-segment and then a new process structure using the existing in-memory code-segment and the new data-segment. The same applies to all the libraries used by that program.
Secondly, most operating systems have an in-memory buffer of frequently used files (which will include executable files). If you’re lucky, when you try to start a program, or load a file, it can be pulled from this in-memory buffer instead of directly from the disk. On systems that use swap paritions (like Linux and the BSDs) instead of a swap file (like Windows) you will get a speed benefit even if the buffer has been swapped out, as data can be loaded faster from a swap partition than by going through the file-system layer.
Thanks for the reply, very informative. A few clarifications:
When the OS swaps out portions of an process’ code, does it use swap space or point to where the executable is on disk (or in cache)?
What I meant here by cache of course is file cache not processor cache. Maybe I can explain a little better.
The loader loads a process’ code from disk and starts executing. Later the OS notices that the code (or a few pages of it) haven’t been accessed in awhile and can be swapped out to use the RAM for something else. Instead of copying the code to swap space, doesn’t the OS just know which parts of the file itself are in RAM or not and point to the code that is already on the HDD? Therefore hopefully not wasting resources on multiple copies of read-only data. I think Windows does this but I’m not sure.
Also related to file caching. Don’t modern OSes use the majority of system RAM as file cache? But when code/data is loaded from disk, does the VM system point to where the code/data is in the file cache instead of copying it to another part of RAM?
Instead of copying the code to swap space, doesn’t the OS just know which parts of the file itself are in RAM or not and point to the code that is already on the HDD
I don’t actually know the answer to that one! I would suspect not, as it could make the code a bit awkward (especially if you use a swap partition), but kernel developers are an inventive bunch.
Don’t modern OSes use the majority of system RAM as file cache? But when code/data is loaded from disk, does the VM system point to where the code/data is in the file cache instead of copying it to another part of RAM?
On most systems, I suspect that it would go ahead and create the copy, as the file-cache (which does tend to fill up RAM) is a fairly dynamic entity and may change while the program is running. Again, it comes down to keeping the code nice and clean and easy to maintain, but, once again, I wouldn’t want to second guess a kernel developer.
If you’re interested in kernel internals, the kernel section in http://lwn.net (Linux Weekly News) is a good place to pick up information.
The article is pretty thin for it’s title (to be nice about it). You can’t talk about modern memory management without saying a word about virtual memory or paging.
To make an idea of what modern memory management really means, I recommend reading or looking over a serious book about Operating Systems (like Tanenbaum’s Modern Operating Systems) or memory management (like Understanding the Linux Virtual Memory Manager). I recommend this to the author of the article also.
Just look at the contents of Chapter 4 of Tanenbaum’s book
http://www.prenhall.com/divisions/esm/app/author_tanenbaum/custom/m…
to see how many topics are involved in modern memory management.
–sadyc
Edited 2005-11-10 14:59