cancel
Showing results for 
Search instead for 
Did you mean: 
option8
Novitiate III

At this point in the series, I would hope you need no further convincing of PowerShell’s utility, even to someone whose main job is maintaining and managing Macs. 

If you’re still on the fence, you can agree PowerShell has certain advantages in some areas over other scripting languages. Maybe you’ll keep PowerShell (and the sample scripts we’ve built) in mind if you’re ever tasked with building reports or sifting through reams of structured data. But it’s not your “every day carry”, in the parlance of pocket knife and multitool enthusiasts. One thing we haven’t explored yet is integrating PowerShell into your everyday workflows. 

While it’s possible to set your default login shell to pwsh (chsh -s /usr/local/bin/pwsh) I wouldn’t necessarily go that far. Sure, if you’re a Windows expat who yearns for some familiar vestige of your former OS, go for it. I expect, though, if you’re reading this series, you’re not already a regular PowerShell user. Unless you’re only reading for the bad jokes and puns, in which case be sure to comment, like, and subscribe.

No, most Mac admins who have command line experience are pretty happy with bash, and (maybe reluctantly) getting accustomed to the newfangled zsh all the kids are raving about. Rather than discarding what you already know in favor of an unfamiliar language, consider adding “pwsh” to your existing vocabulary.

Papa’s got a brand new bag

As I’ve mentioned before, I believe there’s a certain utilitarian beauty in constructing one-line shell commands. Shuttling a block of data through a complex pipeline of edits and transforms to extract just the functionally relevant bits is a kind of art. Like Michelangelo, chipping away all of the marble that isn’t “David”. Properly crafted, a one-liner simultaneously saves space and unfortunately renders the operations therein utterly opaque to the inexperienced eye: a kind of gatekeeping for command line jockey wannabes.

Take, for instance, this monstrosity I found whilst spelunking the source for Installomator (https://github.com/Installomator/Installomator). This is part of a function for determining the latest version of the Zulu open source Java runtime tools:

curl -fs "https://cdn.azul.com/zulu/bin/" | grep -Eio '">zulu18.*ca-jdk18.*aarch64.dmg(.*)' | cut -c 3- | sed 's/<\/a>//' | sed -E 's/([0-9.]*)M//' | awk '{print $2 $1}' | sort | cut -c 11- | tail -1 

Sublime. Let’s break that down into steps, shall we?

Go ahead. Give it to me

First, we acquire the block of marble that contains our “David”.

curl -fs "https://cdn.azul.com/zulu/bin/"

I think we’re all familiar with how “curl” works, but since the “curl” documentation is nearly 6,000 lines long, it’s understandable if you don’t know off the top of your head what each switch and option does.

-f Means to fail fast with no output at all on server errors. If the page is gone or the server refuses your connection, it won’t bother passing the contents of the 404 error page on to the next command. 

-s Silent mode prevents any progress indicator or other output, so the result is just the contents of that URL.

The content in question is a web page with a list of files to download, so we’re probably looking at the web interface to an FTP site.

 

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<html>
<head>
<title>Index of /zulu/bin</title>
</head>
<body>
<h1>Index of /zulu/bin</h1>
<pre><img src="/icons/blank.gif" alt="Icon "> <a href="?C=N;O=D">Name</a>               <a href="?C=M;O=A">Last modified</a>      <a href="?C=S;O=A">Size</a>  <a href="?C=D;O=A">Description</a><hr><img src="/icons/back.gif" alt="[PARENTDIR]"> <a href="/zulu/">Parent Directory</a>                                                      -   
<img src="/icons/unknown.gif" alt="[   ]"> <a href="index.yml">index.yml</a>            2023-05-02 08:55  2.9K  
<img src="/icons/compressed.gif" alt="[   ]"> <a href="zre1.7.0_65-7.6.0.2-headless-x86lx32.zip">zre1.7.0_65-7.6.0.2-headless-x86lx32.zip</a>                 2014-11-17 03:04   33M  
<img src="/icons/compressed.gif" alt="[   ]"> <a href="zre1.7.0_65-7.6.0.7-headless-x86lx64.zip">zre1.7.0_65-7.6.0.7-headless-x86lx64.zip</a>                 2014-11-11 04:59   32M  
[…]

 

grep -Eio '">zulu18.*ca-jdk18.*aarch64.dmg(.*)'

That page content is piped to “grep” which I only recently learned stands for “Global Regular Expression Print”. This will print or return from whatever you input only the lines that match the given regular expression (see spoiler below). In this case, a string that begins with a literal double quote and closed angle bracket “">” then “zulu18” followed by some number of characters, then “ca-jdk18”, another set of characters of indeterminate length, then “aarch64.dmg”, and finally any number of characters to the end of that line. 

Spoiler
I won’t go much deeper on the topic of regular expressions beyond attempting to translate the regex used in these examples. Anything more complicated than what you see here tends to make my brain itch and my eyes go crossed. If you haven’t already, go ahead and bookmark regex101.com – you’ll thank me later.

The following parameters affect how the search operates and what “grep” sends down the pipe:

-E Allows for “extended” regular expressions. 

-i Ignores case while matching, which also makes for slightly easier to read regex, thankfully.

-o Only prints the matching parts of the resulting lines, not the entire line.

So we get back this:

">zulu18.28.13-ca-jdk18.0.0-macosx_aarch64.dmg</a>             2022-03-22 16:09  186M  

">zulu18.30.11-ca-jdk18.0.1-macosx_aarch64.dmg</a>             2022-04-22 11:16  186M  

">zulu18.32.11-ca-jdk18.0.2-macosx_aarch64.dmg</a>             2022-07-19 18:10  186M  

">zulu18.32.13-ca-jdk18.0.2.1-macosx_aarch64.dmg</a>           2022-08-31 12:11  186M  

And that gets trimmed using “cut”.

cut -c 3-

Which is a simple string editing command, able to select pieces of input separated by specific delimiters or, in this instance character positions. “-c” Takes a range of numbers indicating character positions. “3-” keeps everything from the third character onward to the end of the line, discarding the first two. This deletes the quote and close angle bracket from the beginning of each line.

zulu18.28.13-ca-jdk18.0.0-macosx_aarch64.dmg</a>             2022-03-22 16:09  186M  

zulu18.30.11-ca-jdk18.0.1-macosx_aarch64.dmg</a>             2022-04-22 11:16  186M  

zulu18.32.11-ca-jdk18.0.2-macosx_aarch64.dmg</a>             2022-07-19 18:10  186M  

zulu18.32.13-ca-jdk18.0.2.1-macosx_aarch64.dmg</a>           2022-08-31 12:11  186M  

Now the fun really starts.

sed 's/<\/a>//'

The “sed” string editor takes commands in its own language and syntax, and here you can see a command passed into “sed” enclosed in single quotes. The command in question is “s” which does a search and replace operation, the options of which  are defined as such: s/regular expression/replacement/flags

The simple search term in this case is “<\/a>” – the closing of an HTML anchor tag  (<a> is for “anchor”). The forward slash in the term needs to be escaped by a backslash since the forward slash would otherwise be interpreted as part of the “s” command syntax.

The empty “replacement” slot between the next two slashes in the “s” command means that “</a>” is replaced by nothing (deleted).

zulu18.28.13-ca-jdk18.0.0-macosx_aarch64.dmg             2022-03-22 16:09  186M  

zulu18.30.11-ca-jdk18.0.1-macosx_aarch64.dmg             2022-04-22 11:16  186M  

zulu18.32.11-ca-jdk18.0.2-macosx_aarch64.dmg             2022-07-19 18:10  186M  

zulu18.32.13-ca-jdk18.0.2.1-macosx_aarch64.dmg           2022-08-31 12:11  186M  
sed -E 's/([0-9.]*)M//'

Another “sed” command, this time with -E option, meaning any regular expressions are processed as “extended” regex, instead of basic. This allows the search and replace function to match on “([0-9.]*)M” and remove any string consisting of numbers or a period, of any length, inside parentheses and followed by the upper case letter M. 

As before, the matched terms (e.g. “186M”) are deleted.

zulu18.28.13-ca-jdk18.0.0-macosx_aarch64.dmg             2022-03-22 16:09    

zulu18.30.11-ca-jdk18.0.1-macosx_aarch64.dmg             2022-04-22 11:16    

zulu18.32.11-ca-jdk18.0.2-macosx_aarch64.dmg             2022-07-19 18:10    

zulu18.32.13-ca-jdk18.0.2.1-macosx_aarch64.dmg           2022-08-31 12:11    
awk '{print $2 $1}'

Which brings us to “awk” which I read inside my head (and sometimes aloud) as the sound of an angry crow. Like “sed” before, “awk” (“AWK! AWK! CAW!”) has its own command language and syntax. Again, the command is enclosed in single quotes, and “awk” operations are set off with braces. Like “sed”, “awk” operates on a line of text at a time, and in this case the command is the self-explanatory “print”. Without any other operators, “awk” will try to break up a line into words or “fields”, based on a default set of whitespace “separator” characters – space, tab, and a few other non-printing ASCII control characters. Each “field” is treated as a variable, and they are addressed by their numerical position in the text. So, the first word is “$1”, the second “$2” and so on. ($0 is the entire string.)

All this “awk” command is doing is printing the second word of each line, then the first. The third word, the file’s modification time, is omitted. You’ll notice the output has also stripped out the whitespace.

2022-03-22zulu18.28.13-ca-jdk18.0.0-macosx_aarch64.dmg
2022-04-22zulu18.30.11-ca-jdk18.0.1-macosx_aarch64.dmg
2022-07-19zulu18.32.11-ca-jdk18.0.2-macosx_aarch64.dmg
2022-08-31zulu18.32.13-ca-jdk18.0.2.1-macosx_aarch64.dmg
sort

This one’s pretty straightforward. It sorts the lines of input alphabetically, or (from the “sort” documentation) “lexicographically, according to the current locale's collating rules”. i.e. in ascending order, small numbers first, then A to Z. In this case, sorting on the release date or the version numbers gives the same result, and the lines remain in the same order as before. 

cut -c 11-

Another cut, this time from the character at position 11 onward, which removes the ten digits of the YYYY-MM-DD formatted modification date.

zulu18.28.13-ca-jdk18.0.0-macosx_aarch64.dmg
zulu18.30.11-ca-jdk18.0.1-macosx_aarch64.dmg
zulu18.32.11-ca-jdk18.0.2-macosx_aarch64.dmg
zulu18.32.13-ca-jdk18.0.2.1-macosx_aarch64.dmg
tail -1 

Finally, “tail”, which returns only the last part of any input you give it, “-1” indicating it should only return one line, the last line.

zulu18.32.13-ca-jdk18.0.2.1-macosx_aarch64.dmg

A little brain salad surgery

You may be asking some of the same questions I did when I started digging into this particular example. 

  1. Could that have been done in fewer steps? Absolutely. 
  2. Will the command break if the HTML of the file listing page changes, even slightly? Spectacularly so.
  3. Isn’t this blog series about PowerShell, and not bash and regular expressions? You’re right again.

I could show you how to replicate the entire operation in a PowerShell script, starting with “Invoke-WebRequest” in place of “curl”. Instead, though, let’s look at an aspect of PowerShell I haven’t covered yet.

% curl -fs "https://cdn.azul.com/zulu/bin/" | pwsh -C '$Input'

That should spit out the entire contents of the web page to the console. While not terribly impressive, it does demonstrate two things:

  1. You can pipe data into “pwsh” like you would another tool like “awk”, without launching an interactive PowerShell session, and 
  2. The content that is piped into “pwsh” can be referenced by the special variable “$Input”.

When called with the “-C” option, or “-command”, what follows is passed to PowerShell to execute as if it were typed in the interactive “pwsh” shell. To prevent bash from substituting variables or generally clobbering your work, it’s best to enclose the command in single quotes.  

Once in PowerShell’s capable hands, the HTML can be searched for strings with “Select-String”, which operates much like “grep”. If the file had been in XML format, or the output from an API that returned XML or JSON, it would be a simple matter to pass it to “Select-Xml” or “ConvertFrom-Json” and get what we’re after in just one or two steps. But where’s the fun in that? I’ve got column inches to fill, after all.

To find file names that fit the criteria in the original script, I’ve slightly modified the regex pattern from before.

% curl -fs "https://cdn.azul.com/zulu/bin/" | pwsh -C '$Input | Select-String -Pattern "zulu18.*ca-jdk18.*aarch64.dmg"'        

<img src="/icons/unknown.gif" alt="[   ]"> <a href="zulu18.28.13-ca-jdk18.0.0-macosx_aarch64.dmg">zulu18.28.13-ca-jdk18.0.0-macosx_aarch64.dmg</a>         
        2022-03-22 16:09  186M  

<img src="/icons/unknown.gif" alt="[   ]"> <a href="zulu18.30.11-ca-jdk18.0.1-macosx_aarch64.dmg">zulu18.30.11-ca-jdk18.0.1-macosx_aarch64.dmg</a>         
        2022-04-22 11:16  186M  

<img src="/icons/unknown.gif" alt="[   ]"> <a href="zulu18.32.11-ca-jdk18.0.2-macosx_aarch64.dmg">zulu18.32.11-ca-jdk18.0.2-macosx_aarch64.dmg</a>         
        2022-07-19 18:10  186M  

<img src="/icons/unknown.gif" alt="[   ]"> <a href="zulu18.32.13-ca-jdk18.0.2.1-macosx_aarch64.dmg">zulu18.32.13-ca-jdk18.0.2.1-macosx_aarch64.dmg</a>     
          2022-08-31 12:11  186M  

It looks like I’ve got the same four results back, only this time the command has returned the full matched line, and not just the matching text. Since the “Select-String” results are PowerShell objects, and not simply strings, I can examine the properties in the result to see if it contains the specific pieces I’m looking for. Piping the output into “Get-Member” (while still within the PowerShell portion of the pipeline) will display all the available properties and methods on a “MatchInfo” object.

% curl -fs "https://cdn.azul.com/zulu/bin/" | pwsh -C '$Input | Select-String -Pattern "zulu18.*ca-jdk18.*aarch64.dmg" | Get-Member'

 

TypeName: Microsoft.PowerShell.Commands.MatchInfo

    Name               MemberType Definition
    ----               ---------- ----------
    Equals             Method     bool Equals(System.Object obj)
    GetHashCode        Method     int GetHashCode()
    GetType            Method     type GetType()
    RelativePath       Method     string RelativePath(string directory)
    ToEmphasizedString Method     string ToEmphasizedString(string directory)
    ToString           Method     string ToString(), string ToString(string directory)
    Context            Property   Microsoft.PowerShell.Commands.MatchInfoContext Context {get;set;}
    Filename           Property   string Filename {get;}
    IgnoreCase         Property   bool IgnoreCase {get;set;}
    Line               Property   string Line {get;set;}
    LineNumber         Property   int LineNumber {get;set;}
    Matches            Property   System.Text.RegularExpressions.Match[] Matches {get;set;}
    Path               Property   string Path {get;set;}
    Pattern            Property   string Pattern {get;set;}

 

After a little digging through the documentation, it turns out the text strings within each line that match the given pattern are present in the “Matches” property. Wrapping the string-matching commands in parentheses, we can extract the values for each of the matched objects by specifying the property’s path. Again, all of this stays inside the single quotes denoting the “pwsh” command string.

% curl -fs "https://cdn.azul.com/zulu/bin/" | pwsh -C '($Input | Select-String -Pattern "zulu18.*ca-jdk18.*aarch64.dmg").Matches.value’
    
zulu18.28.13-ca-jdk18.0.0-macosx_aarch64.dmg">zulu18.28.13-ca-jdk18.0.0-macosx_aarch64.dmg
zulu18.30.11-ca-jdk18.0.1-macosx_aarch64.dmg">zulu18.30.11-ca-jdk18.0.1-macosx_aarch64.dmg
zulu18.32.11-ca-jdk18.0.2-macosx_aarch64.dmg">zulu18.32.11-ca-jdk18.0.2-macosx_aarch64.dmg
zulu18.32.13-ca-jdk18.0.2.1-macosx_aarch64.dmg">zulu18.32.13-ca-jdk18.0.2.1-macosx_aarch64.dmg

Oh, bother. The pattern matching looks to have gotten greedy, and kept going after finding “.dmg”, scooping up everything between it and another instance of “.dmg”. We may need to rethink that pattern in “Select-String”.

It turns out, if I include the double quotes around the file name in the HTML anchor tags I can get a single instance of the file per line. Since the double quotes need to be interpreted literally by “pwsh”, and they’re inside single quotes so that zsh doesn’t try to do substitutions, they can’t be escaped with backslashes like normal. Instead, they’re each surrounded by another pair of double quotes. It looks odd, but I assure you, it’s perfectly legal.

% curl -fs "https://cdn.azul.com/zulu/bin/" | pwsh -C '($Input | Select-String -Pattern """zulu18.*ca-jdk18.*aarch64.dmg""").Matches.value'

"Zulu18.28.13-ca-jdk18.0.0-macosx_aarch64.dmg"
"Zulu18.30.11-ca-jdk18.0.1-macosx_aarch64.dmg"
"Zulu18.32.11-ca-jdk18.0.2-macosx_aarch64.dmg"
"zulu18.32.13-ca-jdk18.0.2.1-macosx_aarch64.dmg"

It’s possible the page we’re pulling file listings from will get updated, and newer versions will appear at the top. Or a patch to an old version will be released with a newer modification date, sending it to the bottom of the page.  Just to be certain, we should sort the results so the highest available version is last. We can combine that operation with trimming the list so all we get back is the bottom result.

% curl -fs "https://cdn.azul.com/zulu/bin/" | pwsh -C '($Input | Select-String -Pattern """zulu18.*ca-jdk18.*aarch64.dmg""").Matches.value | Sort-Object -bottom 1'

"zulu18.32.13-ca-jdk18.0.2.1-macosx_aarch64.dmg"

And there we have it. Only, those pesky quotes are still hanging around. Well, that’s easily sorted. We piped the contents of the HTML into “pwsh” and we can pipe the filename back out again just as easily. 

The extraneous quotation marks can be trimmed with “sed”.

% curl -fs "https://cdn.azul.com/zulu/bin/" | pwsh -C '($Input | Select-String -Pattern """zulu18.*ca-jdk18.*aarch64.dmg""").Matches.value | Sort-Object -bottom 1' | sed 's/\"//g'

zulu18.32.13-ca-jdk18.0.2.1-macosx_aarch64.dmg

Or (since I know someone will write in) with PowerShell “string.split()”. But I’m stymied trying to express a literal double quote surrounded by escaped single quotes inside a command in single quotes. If you figure it out, please let me know.

The Good, the Bad, and the Ugly

In the end, my experiment only managed to shave 22 characters off the original command. Plus, it required firing up an instance of PowerShell just to perform a few string manipulations. That eats up both memory and time, so it’s a net loss if the goal of the exercise is improved productivity.

On the other hand, for a task that requires pulling a needle from a haystack, consider pushing that pile of straw into a pipe that leads to a hay baler that’s designed for that purpose. I obviously have no idea how industrial farm equipment works, but this should help:

% cat some-XML-file.xml | pwsh -C '$Input | select-xml -xpath "/container/content/"'

Give it a try the next time you’re handed a challenging block of XML to parse. It’ll save a lot of awkward string manipulation.

1 Comment
You Might Like

New to the site? Take a look at these additional resources:

Community created scripts

Our new Radical Admin blog

Keep up with Product News

Read our community guidelines

Ready to join us? You can register here.