Thursday, July 9, 2020

SQL to JSON using "SELECT FOR JSON"

It has been a while, and thought of publishing here again. I have been working on various technologies, especially data and cloud related. I wanted to get some sample data for what I was writing, but it became the blog itself!

For a familiar sample, started using Adventure Works. Microsoft instructions and the script from github was great, then started converting a table to json.


There were 137 lines, that is after converting them to json, and SQL Management Studio suppose to combine them. I copied it and pasted in Notepad++, used the JSTools plugin to format it. It only had 5 or 6 items in the json file, which suppose to be 830. Quick google search showed that it didn't work ever, and I'm surprised that it has not been fixed after all these years.

Using management studio was not practical anyway as it is not easy to script the process, turned in to good old sqlcmd.
sqlcmd -S . -U User -P PW -d northwind -Q "select * from orders for json auto" > test.txt
Result was encouraging, it too came with 137 rows.


But after converting to json using Notepad++ it wasn't pretty - it loses the structure in the middle like this.


When converting tables to csv, you usually set the dynamic size to 0 using -y option. Tried that.
sqlcmd -S . -U User -P PW -d northwind -y 0 -Q "select * from orders for json auto" > test.txt
It was far better, visually it looked fine, but tests showed that it is missing some. Counting the string "OrderID", which should be 830 (one for each row) but it had only 828! Further checks showed that some others are missing too, and the actual reason is there was a line break in the middle of the string in two records.


It took me some time to find out that output from SQL server inserts new line char for every 2034 characters, and that shows up in json file. Most likely it is the limit of an internal data type it uses, and couldn't think of any way to get rid of it. I tried giving large values for -y and -Y parameters, but no use.

I turned to the next option of removing new lines from text output. Using awk for windows, it was easy.
sqlcmd -S . -U user -P PW -d northwind -y 0  -Q "select * from orders for json auto" | awk "{a=a $0}END{print a}" > test.txt This actually gave a good output, passed all basic tests. I didn't check if it is handling special chars etc, but it looks OK. More than anything this is a scriptable solution, if the data doesn't contain new lines. If they do, awk script would be slightly complex.
sqlcmd -S . -U user -P PW -d northwind -y 0  -Q "select * from orders for json auto" | awk "{a=a $0}END{print a}" > test.txt
However, you will note that SQL server omits the field if it is null. This is not ideal in some circumstances if they don't expect fields go missing. Solution is to add the switch INCLUDE_NULL_VALUES to the command line.
sqlcmd -S . -U user -P PW -d northwind -y 0  -Q "select * from orders for json auto, INCLUDE_NULL_VALUES" | awk "{a=a $0}END{print a}" > test.txt
One difference in null handling there is it's not quoted - it is marked as null, not "Null". This is valid in JSON and depending on the environment, this may be desirable or a problem.  Parsing this using jq, another effective solution  (can use for many other databases too) handles this differeently.


No comments:

Post a Comment