Managing data with PowerShell

At Bitrise we love automation: it can save you time for creative work. Originally, PowerShell was developed to support system administrators automating and managing tasks and processes of the Windows operating system. Why should we consider and how can we use it in data business?

At Bitrise we love automation: it can save you time for creative work. Originally, PowerShell was developed to support system administrators automating and managing tasks and processes of the Windows operating system. Why should we consider and how can we use it in data business?

Why PowerShell?

PowerShell commands can be run in the terminal while supporting an object-oriented concept, we get the best of both worlds:

  • PowerShell works with objects,
  • but without labyrinth of versions and environments.
  • No need to compile.
  • It is relatively easy to find the needed cmdlets by their Verb-Noun format.
  • Built-in commands for HTTP requests.
  • Packages to manage cloud services like the ones of Google Cloud or Microsoft Azure.
  • Several options for specific tasks due to .NET.
  • It is cross-platform.

Installing on Linux

Unfortunately, as of today, installation on Ubuntu is not that straightforward. PowerShell Core depends on libicu60 which cannot be downloaded from the regular repositories.


# Download the Microsoft repository GPG keys
wget -q https://packages.microsoft.com/config/ubuntu/18.04/packages-microsoft-prod.deb

# Register the Microsoft repository GPG keys
sudo dpkg -i packages-microsoft-prod.deb

# Update the list of products
sudo apt-get update

# Get and register libicu as there is an issue with it in Ubuntu repositories
wget http://mirrors.kernel.org/ubuntu/pool/main/i/icu/libicu60_60.2-3ubuntu3_amd64.deb
dpkg -i libicu60*.deb

# Install PowerShell
sudo apt-get install -y powershell
Copy code

Installing packages and modules

In PowerShell Install-Package and Install-Module cmdlets adds new capabilities to your setup or session.
In case you need it, for example in a Dockerfile, you can use pwsh with -Command parameter in the following format:


# Install additonal package to connect to Azure
pwsh -Command &{Install-Package AzureRM.NetCore}

# Install Cosmos DB module
pwsh -Command &{Install-Module CosmosDB}
Copy code

Mind the '&' before your commands: it is needed when running pwsh from another shell.

Once we have PowerShell installed we can start using it for, besides several others, the following use cases.

Sample applications

CSV to JSON

PowerShell can work with CSV and JSON out of the box. The next snippet

  1. imports a CSV,
  2. adds current datetime property as updated_at and
  3. exports data to JSON.

$data = Import-Csv "test.csv"

# Datetime as a timetstamp string
$date = Get-Date -Format o

# Noteproperty is the general property
$data | Add-Member -name "updated_at" -MemberType NoteProperty -Value $date
$data = ConvertTo-Json $data
$data | Add-Content -Path "test.json"
Copy code

CSV to Document Database

It is a common task for data experts to collect information from flat files into a database. Here CSV files are our source and CosmosDB is our Destination.


$database = 'BitriseDataBase'
$collection = 'BitriseDataCollection'

# Create new DbContext - we have no database yet
$cosmosDbContext = New-CosmosDbContext -Emulator

try {
    New-CosmosDbDatabase -Context $cosmosDbContext -Id $database
}
catch {
    $ErrorMessage = $_.Exception.Message
}
Write-Host $ErrorMessage

# Create new DbContext with database
$cosmosDbContext = New-CosmosDbContext -Emulator -Database $database

# List collections
Get-CosmosDbCollection -Context $cosmosDbContext

try {
    New-CosmosDbCollection -Context $cosmosDbContext -Id $collection
    
}
catch {
    $ErrorMessage = $_.Exception.Message
 
}
Write-Host $ErrorMessage

# Import CSV as PowerShell obj
$importData = Import-Csv "results.csv"

# Unique id is needed in CosmosDB
foreach ($r in $importData){
    $r | Add-Member -name "id" -MemberType NoteProperty -Value $([Guid]::NewGuid().ToString())
}

# JSON is not enumerable but JSON format is needed to POST to Cosmos DB
foreach ($document in $importData){
    Write-Host 'record'
    $document = ConvertTo-Json $document
    Write-Host $document
    New-CosmosDbDocument -Context $cosmosDbContext -CollectionId $collection -DocumentBody $document
}
Copy code

Query Cosmos DB

You can use SQL here.


$query = "SELECT id, _ts FROM PowerShellCollection p WHERE (p._ts = 1544364332)"

Get-CosmosDbDocument -Context $cosmosDbContext -CollectionId $collection -Query $query | ConvertTo-Json
Copy code

Excel files

For several organizations data means Excel. Fortunately, there is a really feature-rich package from Doug Finke, to cope with XLSX files and automate tasks. It is available on both GitHub and PowerShell Gallery.

Install ImportExcel module for the current user:

Install-Module ImportExcel -scope CurrentUser

Then you can import your XLSX, CSV or JSON and define Excel tasks in Export-Excel command. E.g. importing a sheet from an XLSX file and exporting pivot table into another can be achieved with this few lines:


$InParams = @{
    Path              = "sampleIn.xlsx"
    WorkSheet         = "Sheet1"
}

$excel = Import-Excel @InParams

$OutParams = @{
    Path              = "sampleOut.xlsx"
    IncludePivotTable = $true
    PivotRows         = 'country'
    PivotTableName    = 'Orders'
    
    # hashtable defining aggregation
    PivotData         = @{'order' = 'SUM'}
    Activate          = $true 
}

$excel | Select-Object country, order | Export-Excel @OutParams 
Copy code

Download Recurly export files

At Bitrise, we use Recurly to manage our subscriptions. Unfortunately, relatively strict limits on number of API calls in a specific time frame are set by the provider. The suggested way of retrieving data is setting up exports on Recurly's site and collecting daily batches:

  1. We need to GET a temporary download url to each daily export of each endpoint.
  2. The export files are available for an hour. We can download them by another call.

# It is just for demostration, never add secret in your production code this way
$stringKey = 'yourkey'

# Recurly API expects Base64 encoded API key
$bytes = [System.Text.Encoding]::UTF8.GetBytes($stringKey)
$encodedKey =[Convert]::ToBase64String($bytes)

# Basic authentication - concatenate with encoded key
$basicAuth = [string]::Format("Basic {0}", $EncodedKey)

# Request download link
$uri = 'https://concreteci.recurly.com/v2/export_dates/2018-12-15/export_files/transactions_created.csv.gz'
$headers = @{
    'Authorization'      = $basicAuth
    'Accept'             = 'application/xml'
    'X-Api-Version'      = '2.5'  
    'Content-Type'       = 'application/xml; charset=utf-8'  
}
$headers | ConvertTo-Json | Write-Host
$results = Invoke-RestMethod -Method Get -Uri $uri -Headers $headers

# Extracts download url from XML response
$downloadUrl = $results.export_file.download_url

# Set target - Recurly provides Gz compressed CSV
$outFile = 'transactions.csv.gz'

# Request file as target
Invoke-RestMethod -Method Get -Uri $downloadUrl -OutFile $outFile
Copy code
No items found.

Explore more topics

App development

Best practices from engineers on how to use Bitrise to build better apps, faster.

Community

Meet other Bitrise engineers, technology experts, power users, partners and join our BUGs.

Company

All the updates about Bitrise events, sponsorships, employees, and more.

Insights

Mobile development, latest tech, industry insights, and interviews with experts.

Mobile DevOps

Learn why mobile development is unique and requires a set of unique practices.

Releases

Stay tuned for the last updates, new features, and product improvements.

Get the latest from Bitrise

Join other Mobile DevOps engineers who receive regular emails from Bitrise, filled with tips, news, and best practices.