Getting Data From the Middle of a PowerShell Pipeline

Pipeline Output

 

If you’ve used PowerShell for very long, you know how to get values of of a pipeline.

$values= a | b | c

Nothing too difficult there.

Where things get interesting is if you want to get data from the middle of the pipeline. In this post I’ll give you some options (some better than others) and we’ll look briefly at the performance of each.

Method #1

First, there’s the lovely and often overlooked Tee-Object cmdlet. You can pass the name of a variable (i.e. without the $) to the -Variable parameter and the valu

es coming into the cmdlet will be written to the variable.

For instance:

Get-ChildItem c:\ -Recurse | 
                   Select-Object -Property FullName,Length | 
                   Tee-Object -Variable Files | 
Sort-Object -Property Length -Descending

After this code has executed, the variable $Files will contain the filenames and lengths before they were sorted.  To append the values to an existing variable, include the -Append switch.

Tee-Object is easy to use, but it’s an entire command that’s essentially not doing anything “productive” in the pipeline. If you need to get values from multiple places in the pipeline, each would add an additional Tee-Object segment to the pipeline. Yuck.

Method #2

If the commands you’re using in the pipeline are advanced functions or cmdlets (and you’re only writing advanced functions and cmdlets, right?), you can use the -OutVariable common parameter to send the output of the command to a variable.  Just like with Tee-Object, you only want to use the name of the variable.

If you’re dealing with cmdlets or advanced functions, this is the easiest and most flexible solution. Getting values from multiple places would just involve adding -OutVariable parameters to the appropriate places.

 
Get-ChildItem c:\ -Recurse | 
    Select-Object -Property FullName,Length -OutVariable Files | 
    Sort-Object -Property Length -Descending 

This has the benefit of one less command in the pipeline, so that’s a nice bonus. If you want to append to an existing variable, here you would use a plus (+) in front of the variable name (like +Files).

Method #3

This method is simply to break the pipeline at the point you want to get the values and assign to a variable. Then, pipe the variable to the “remainder” of the pipeline. Nothing crazy. Here’s the code.

 
$files=Get-ChildItem c:\ -Recurse | 
    Select-Object -Property FullName,Length 
$files | Sort-Object -Property Length -Descending 

If you want to append, you could use the += operator instead of the assignment operator.

If you want to capture multiple “stages” in the pipeline, you could end up with a bunch of assignments and not much pipeline left.

Method #4

This method is similar to method #3, but uses the fact that assignment statements are also expressions. It’s easier to explain after you’ve seen it, so here’s the code:

 
($files=Get-ChildItem c:\ -Recurse | 
    Select-Object -Property FullName,Length) | 
    Sort-Object -Property Length -Descending 

Notice how the first part of the pipeline (and the assignment) are inside parentheses? The value of the assignment expression is the value that was assigned, so this has the benefit of getting the variable set and passing the values on to the remainder of the pipeline.

If you want to get multiple sets of values from the pipeline, you would need to nest these parenthesized assignments multiple times. Statements like this can only be used as the first part of a pipeline, so don’t try something like this:

 
#  THIS WON'T WORK!!!!!
Get-ChildItem c:\ -Recurse | 
    Select-Object -Property FullName,Length) | 
    ($Sortedfiles=Sort-Object -Property Length -Descending) 

Performance

I used the benchmark module from the gallery to measure the performance of these 4 techniques. I limited the number of objects to 1000 and staged those values in a variable to isolate the pipeline code from the data-gathering.

$files=dir c:\ -ErrorAction Ignore -Recurse | select-object -first 1000

$sb1={$files | select-object FullName,Length -OutVariable v1 | sort-object Length -Descending}
$sb2={$files | select-object FullName,Length | tee-object -Variable v2| sort-object Length -Descending}
$sb3={$v2=$files| select-object FullName,Length;$files | sort-object Length -Descending}
$sb4={($v2=$files| select-object FullName,Length)|sort-object Length -Descending}
Measure-These -ScriptBlock $sb1,$sb2,$sb3,$sb4 -Count 100 | Format-Table

Title/no. Average (ms) Count   Sum (ms) Maximum (ms) Minimum (ms)
--------- ------------ -----   -------- ------------ ------------
        1     98.60119   100   9860.119     131.7581      87.6203
        2    120.32475   100 12032.4754     150.4985     104.6586
        3    100.92144   100 10092.1436     132.2665      90.0685
        4     98.48383   100   9848.383     135.5229      84.7717

The results aren’t particularly interesting. -OutVariable is about 20% slower than the rest, but other than that they’re all about the same. I’m a little bit disappointed, but 30% isn’t that big of a difference to pay for the cleaner syntax and flexibility (in my opinion).

BTW, those timings are for Windows PowerShell 5.1. The numbers for PowerShell 6.0 (Core) are similar:

Title/no. Average (ms) Count  Sum (ms) Maximum (ms) Minimum (ms)
--------- ------------ -----  -------- ------------ ------------
        1    120.97498    10 1209.7498     136.1319     112.0041
        2     139.9865    10  1399.865      147.659     132.1466
        3    128.86957    10 1288.6957     148.0096     115.0421
        4    119.44978    10 1194.4978     142.9651     109.1328

Here we see slightly less spread (17%), but all of the numbers are a bit higher.

I’ll probably continue to use -OutVariable.

What about you?