One of the innovations that the V7 Bourne shell introduced was built in shell wildcard globbing, which is to say expanding things like
↫ Chris Siebenmann*
,?
, and so on. Of course Unix had shell wildcards well before V7, but in V6 and earlier, the shell didn’t implement globbing itself; instead this was delegated to an external program,/etc/glob
(this affects things like looking into the history of Unix shell wildcards, because you have to know to look at theglob
source, not the shell).
I never knew expanding wildcars in UNIX shells was once done by a separate program, but if you stop and think about the original UNIX philosophy, it kind of makes sense. On a slightly related note, I’m currently very deep into setting up, playing with, and actively using HP-UX 11i v1 on the HP c8000 I was able to buy thanks to countless donations from you all, OSNews readers, and one of the things I want to get working is email in dtmail, the CDE email program. However, dtmail is old, and wants you to do email the UNIX way: instead of dtmail retrieving and sending email itself, it expects other programs to those tasks for you.
In other words, to setup and use dtmail (instead of relying on a 2010 port of Thunderbird), I’ll have to learn how to set up things like sendmail, fetchmail, or alternatives to those tools. Those programs will in turn dump the emails in the maildir format for dtmail to work with. Configuring these tools could very well be above my paygrade, but I’ll do my best to try and get it working – I think it’s more authentic to use something like dtmail than a random Thunderbird port.
In any event, this, too, feels very UNIX-y, much like delegating wildcard expansion to a separate program. What this also shows is that the “UNIX philosophy” was subject to erosion from the very beginning, and really isn’t a modern phenomenon like many people seem to imply. I doubt many of the people complaining about the demise of the UNIX philosophy today even knew wildcard expansion used to be done by a separate program.
Mandatory prof Kernighan on the whole one-program-one-action thing and very cool pipelining demos:
https://www.youtube.com/watch?v=tc4ROCJYbm0
I strongly recommend his UNIX: A History and a Memoir. Tons of tales about how specific UNIX utilities came to be.
The UNIX philosophy is alive for the ones who want it. Tons of utilities remain alive and maintained in Linux and the BSDs.
Case in point – I write some quite complex automations for Azure in Bash and I prefer to use curl or wget and the REST APIs vs. Azure CLI.
Azure CLI is so heavy and so slow and eats so much RAM that it actually using plain REST saves me from requiring more RAM in the machines that run the automations.
I can’t find the article now but, a year or so ago, I went through a blog post and the author claims having done faster (an order of magnitude) parse of a huge (few TB) dataset using basic awk, sort than natively via SQL.
Sometimes I prefer to use the basic tools because I know they will be there. Not everyone using my scripts can install applications, so I know I can count on awk being installed, but I can’t always count on jq. Usually, there will be curl or wget, so using REST makes my automations compatible out-of-the-box with BSD, rather than counting on the user installing Azure CLI, etc..
Thanks to the magic of the open source world and people being as different as they are, we can live the raw pure UNIX way if we want. Or count on modern combined tools if we wish. For example, I have full access to my email on my self hosted server from my HP 712, point-to-point, in my “vintage VLAN”. Different rules apply connecting from the Internet. All thanks to the power of mixing and matching UNIX tools. =)
Or… https://www.terminal.shop
There’s really for something for everyone these days!
Shiunbird,
We take it all for granted today, but in those years it was revolutionary.
Ahh, I absolutely hate shell programming, haha. I would go strait to any real programming language for all but the most trivial of commands. I don’t mind shells executing commands as they were designed to do in the beginning, but IMHO cramming programming capabilities into them has created some of the most hamfisted & awkward programming languages we have. Even simple algorithms and structures have to get translated into obtuse shell form. Just because we can doesn’t make them a good choice. But to each their own 🙂
SQL databases can be optimized. However I do find that getting data into a relational database in the first place can take time and create bottlenecks. When that’s the case external tools that compute results directly on source data can eliminate tons of data transfer/conversion overhead. If you don’t need a database or the power of SQL, then go ahead and use simpler tools instead 🙂
We actually went from a C# application doing tons of string concatenations to call Linux commands to Bash scripts, which was DEFINITELY the correct move. =)
I use Bash a lot to automate system administration tasks, and this is probably the ideal use case for it. Also, I do a lot of one liners to go through text/parse output of commands, as second nature already – tons and tons of pipes. But it’s nothing that I’d recommend to someone else. I just grew into it naturally.
It seems that we are in agreement – there is always the best tool for the job. And “best” depends on the operator, too. Sometimes better to use the solution you know best than doing a shitty job with the “theoretically ideal” solution.
We have choice these days, and computers are fast enough to run most things just fine.
Shiunbird,
Everyone’s entitled to an opinion, I say to each their own 🙂
But I can assure you that you will not convince me that bash is good for programming of anything significant. If you are only using it to call other command line tools that’s one thing, but the native programming facilities are awful and I do think the majority of programmers would feel the same way.
That’s true, computers can be fast enough to make inefficient approaches viable. However my main criticism of bash scripting isn’t the spawning tons of children and opening up lots of pipes, but rather how painful it is to implement advanced programming algorithms and structures natively in bash on it’s own merits. As a professional programmer, why would I bother using tools that are both beneath my skill set and harder to use?
I agree with you about how cool unix pipes are though.
Very true – it is painful to implement advanced algorithms.
If you need a lot of that, it is truly not the tool for the job. We are actually going to take some of our flows out of Bash in the near future.
About efficiency and “best tool for the job”, we also had an interesting situation some time ago.
We needed, at some point, a quick and dirty way to store outgoing emails coming out of our application, all running locally, to troubleshoot a bug.
I am not a good programmer by any stretch of imagination. My background is sysadmin (thus, Bash). I put together 238 lines of C in one hour, pretending to be a SMTP server, doing very basic handshake, dumping message into a file and waiting for next message. Tested, checked for memory leaks, done. The binary is 18K and it consumes 868K of memory when it runs.
We discussed this exercise with 2 other colleagues – one would use Python and the other, .NET/C#. The Python colleague completed the exercise in 15 minutes and the C# colleague took 30 minutes. The python file is smaller but takes a few seconds to load. I don’t recall the memory usage for them, but in both cases they were loading multiple email handling libraries and nothing would consume less than 100M of memory. When we went to do a mass test in Kubernetes with multiple instances, the lightweight C program helped us use less nodes than we would use otherwise and save money.
Anyway – it is like that. I love multiple options to do things and, as time goes, we specialize and, agreed, we pick our tools based on our skills and preferences. Multiple roads go to Rome.
Shiunbird,
Garbage collected languages typically use more memory, although it really shouldn’t be that bad. If I were to guess it probably wasn’t a language difference so much as a library or implementation difference. Maybe the library does a lot more than what you needed. If you had written your code solution in a language other than C, you could have likely gotten the resource consumption down.
I’m a fan of efficient programming and I also like to develop with resource optimization in mind, but I find this to be relatively uncommon. The prevailing attitude is that we shouldn’t waste developer time on optimization when hardware is cheap. The main reason this bothers me is that the costs of inefficient software get multiplied by time and deployed instances. Spending a bit more time optimizing can make a huge difference on a much bigger scale than the original developer…the problem is that from a project management standpoint short term thinking almost always prevails.
There are lots of ways to do things on almost every level, not only in terms of language and libraries, but also in terms of models. I like designing software around event oriented asynchronous programming models, which can handle thousands of event handlers with low overhead, but the problem is that many languages and even operating systems make async implementations awkward. I like C#’s native async support. Most languages don’t support this though and their libraries favor multithreading instead. Threads are quite expensive however, every thread needs a stack that imposes cache/memory/sync overhead not to mention the notoriety of race conditions.
I might be accused of over-optimizing stuff sometimes, which is fair. But I see so much code being used in production capacities with millions of instances (such as wordpress websites) that ends up performing significantly worse and I can’t help but feel that popular platforms should be better optimized. Oh well.
I prefer when the program being launched performs the expansion.only when it’s appropriate rather than the shell presuming that * inside of command line parameters should match files. IMHO DOS did this better since commands could explicitly decide if that was the appropriate thing to do, which seems superior to me.
Consider something like
Today on a unix system what this does is ambiguous and it depends on the contents of the current working directory.
If there are no files/directory matching test, then the command is executed as is printing “test*”. But if there is a file or directory matching test, then you’ll get output like “test test.c test.html” despite the fact that echo has nothing to do with files. This is just an illustrative example but consider other applications where * makes sense however the tool is meant to operate beyond the domain of the local file system, maybe the tool performs math, matches text in a file, runs sql commands, or it’s for a filespec on a remote system…
Most of the time this will execute correctly because “othermachine” does not exist as a local file, but if coincidentally or maliciously the local file does exist then it will be expanded and the wrong rsync command gets executed. This expansion can actually make things unnecessarily dangerous in some cases of user error:
Since the shell performs the expansion without cp being able to see that a destination file operand is missing, cp doesn’t know any better than to accept the expansion as a source and destination, which is wrong.
Of course I understand that the * can/should be escapped so the shell won’t replace it, but nevertheless I find such inconsistencies to be extremely annoying and this design makes for bad outcomes. I find it would be much better to let the tool decide if/when local file expansion is appropriate…alas the convention is already set in stone and all our commandline tools assume the shell is responsible for expanding wildcards. It’s way too late to fix this.
The discussion about doing “email the UNIX way” goes a bit deeper. Some of the “UNIX way” here is multi-user..
Having all this power on a desktop ( nevermind phone ) is not how it always was. UNIX was not designed as a single-user desktop. Mail on UNIX was designed with the expectation that relatively simple computers with little storage would be working as terminals to much more powerful computers with “lots” of storage. The mail system in UNIX ran on a central computer where everybody would have their emails stored. A Mail Transfer Agent (MTA), like Sendmail, would fetch, send, and organize mail by user account on the central server as local files in a standard layout and format. The email client that each user would use on their terminal was just a simple front-end to this system.
On a desktop system where even “root” and “user” are the same person, none of this local complexity makes much sense and is in fact a bit contrary to the UNIX philosophy. Mail clients that can directly interact with remote servers are a better reflection of the UNIX philosophy on a modern desktop.
Mail/mailx (POSIX mail) can be configured to interact with remote SMTP servers directly without needing a local MTA like Sendmail. Older versions of mailx may lack this option though.
I do not know much about dtmail. The man page talks about mailx but it is not clear if it uses mailx under the hood or just uses the same commands and directory structure. I suspect the latter. In that case, or if mailx lacks SMTP options, dtmail will require a local MTA.
My favourite email client (or Mail User Agent – MUA) from back in the day was ELM which was actually created at HP (quite likely on HP-UX though I used it on Sun workstations and early Linux).