Showing posts with label JSON. Show all posts
Showing posts with label JSON. Show all posts

Thursday, July 16, 2020

Testing table to json conversion - importing json in blob back to SQL

When comes to conversions, one of the best ways to verify it is converting back and see if you get the data as expected. Importing JSON to MSSQL from a local disk has become trivial, and MS documentation describe it well. But in practice there is a limited use of loading data from a local disk of the server. So I decided to upload it to a Azure blob container.

Loading from blob storage was a challenge, mainly due to different documents describing parts of it in different ways. Some have changed over time, and other requirements are for different purposes. Once you upload data to blob, this is all you need to do.
  1. Create SAS (Shared Access Signature) from Azure
  2. Create credential in SQL using SAS
  3. Create external data source using credential
  4. Import it using BULK and OPEN JSON to a new table
These are some points to remember - I spent long time to figure out some of them
  1. Create a SAS from Azure - this is not the storage key, but a token created using the key. You can time limit tokens and put some other restrictions too, which are cool.
  2. When copying SAS, omit the first question mark. It is documented everywhere but don't know why it is there in the first place - probably for easy construction of  REST urls...
  3. You have to create a DATABASE SCOPED credential. If not, get a cryptic error message saying credential doesn't exist or you don't have permission. 
  4. You might have to create a master key unless it was already done for something else.
  5. Identity of the credential should be SHARED ACCESS SIGNATURE
  6. You can use any (valid) name for the credential. Some documents ask to use the storage location for the name but it's for a different process, not to load from blob.
  7. When looking for the file, it uses location in data source + path and file from load statement. So you can use the storage root in location and give the file as 'container/file' or use the container in the location itself and give only the file name when loading.
When put all of them together, if the json file in blob is called orders.json and you import it to a SQL table called orderJson, SQL looks like,

CREATE DATABASE SCOPED CREDENTIAL MyContainerKey
WITH IDENTITY='SHARED ACCESS SIGNATURE',
SECRET = 'sv=xxxxxxxxxxxxxxxxxxx'


CREATE EXTERNAL DATA SOURCE blobdata
WITH ( TYPE = BLOB_STORAGE,
LOCATION = 'https://myblobstorage.blob.core.windows.net',
CREDENTIAL= MyContainerKey)


select * into orderJson
FROM OPENROWSET ( BULK 'mycontainer/orders.json', DATA_SOURCE='blobdata', SINGLE_CLOB) as j
CROSS APPLY OPENJSON(BulkColumn) With (
[OrderID] [int],
[CustomerID] [nchar](5),
[EmployeeID] [int],
[OrderDate] varchar(30),
[RequiredDate] varchar(30),
[ShippedDate] varchar(30),
[ShipVia] [int],
[Freight] [money],
[ShipName] [nvarchar](40),
[ShipAddress] [nvarchar](60),
[ShipCity] [nvarchar](15),
[ShipRegion] [nvarchar](15),
[ShipPostalCode] [nvarchar](10),
[ShipCountry] [nvarchar](15)
) as orderdata

This should be relatively fast,  if it takes minutes, usually the reason is url for location is wrong.


Monday, July 13, 2020

SQL/Azure or other database table to JSON using jq

With SQL server it is easy enough to use FOR Json to convert a table to JSON but if using a database that doesn't have a native conversion, we can do it using jq. Actually this is creating JSON from a CSV file so it can convert from any database including conventional DBs like Oracle or cloud DBs like Snowflake or Redshift.

jq is primarily a Linux tool, but there is a Windows and a Mac version that makes it a universal process. It is useful to get familiar with jq if you need to write a script to manipulate JSON too, for example I have used it to move data from tables to Kafka topics.

It is worth noting that jq installation doesn't add it to the path, and the executable is called jq-win64 (or 32). I usually create a symbolic link to a jq.exe in a tools folder and add that to the path. It is easy to create links for all the tools you need in a single folder, to not go through adding each to the path.

Using sqlcmd.exe for SQL server, command line is relatively simple. 
  1. get jq to read the full input and treat it as a generic string using -slurp and --raw-input options
  2. split it on line breaks, that will create one record for each raw
  3. split on the separator (I used ~ since data has comma) and assign key names
jq can pipe data within the application, so all this can be done in the same command. But unlike in linux, double quotes has to be escaped. When put all together,

sqlcmd -S . -U user -P PW -d northwind -s~ -y 0  -Q "select * from orders" | jq -c --slurp --raw-input --raw-output  "split(\"\n\") |  map(split(\"~\")) |map({\"OrderID\":.[0], \"CustomerID\":.[1], \"EmployeeID\":.[2], \"OrderDate\":.[3], \"RequiredDate\":.[4], \"ShippedDate\":.[5], \"ShipVia\":.[6], \"Freight\":.[7], \"ShipName\":.[8], \"ShipAddress\":.[9],\"ShipCity\":.[10],\"ShipRegion\":.[11],\"ShipPostalCode\":.[12], \"ShipCountry\":.[13]})"  > test.txt

Format using JSTools in Notepad++,



All the records are there with a few blank records at the end, that is due to the message from SQL server at the end. If needs to take it out, awk would do it easily like in my last post.