Loading from blob storage was a challenge, mainly due to different documents describing parts of it in different ways. Some have changed over time, and other requirements are for different purposes. Once you upload data to blob, this is all you need to do.
- Create SAS (Shared Access Signature) from Azure
- Create credential in SQL using SAS
- Create external data source using credential
- Import it using BULK and OPEN JSON to a new table
These are some points to remember - I spent long time to figure out some of them
- Create a SAS from Azure - this is not the storage key, but a token created using the key. You can time limit tokens and put some other restrictions too, which are cool.
- When copying SAS, omit the first question mark. It is documented everywhere but don't know why it is there in the first place - probably for easy construction of REST urls...
- You have to create a DATABASE SCOPED credential. If not, get a cryptic error message saying credential doesn't exist or you don't have permission.
- You might have to create a master key unless it was already done for something else.
- Identity of the credential should be SHARED ACCESS SIGNATURE
- You can use any (valid) name for the credential. Some documents ask to use the storage location for the name but it's for a different process, not to load from blob.
- When looking for the file, it uses location in data source + path and file from load statement. So you can use the storage root in location and give the file as 'container/file' or use the container in the location itself and give only the file name when loading.
CREATE DATABASE SCOPED CREDENTIAL MyContainerKey
CREATE EXTERNAL DATA SOURCE blobdata
WITH ( TYPE = BLOB_STORAGE,
select * into orderJson
FROM OPENROWSET ( BULK 'mycontainer/orders.json', DATA_SOURCE='blobdata', SINGLE_CLOB) as j
CROSS APPLY OPENJSON(BulkColumn) With (
WITH IDENTITY='SHARED ACCESS SIGNATURE',
SECRET = 'sv=xxxxxxxxxxxxxxxxxxx'
SECRET = 'sv=xxxxxxxxxxxxxxxxxxx'
CREATE EXTERNAL DATA SOURCE blobdata
WITH ( TYPE = BLOB_STORAGE,
LOCATION = 'https://myblobstorage.blob.core.windows.net',
CREDENTIAL= MyContainerKey)
CREDENTIAL= MyContainerKey)
select * into orderJson
FROM OPENROWSET ( BULK 'mycontainer/orders.json', DATA_SOURCE='blobdata', SINGLE_CLOB) as j
CROSS APPLY OPENJSON(BulkColumn) With (
[OrderID] [int],
[CustomerID] [nchar](5),
[EmployeeID] [int],
[OrderDate] varchar(30),
[RequiredDate] varchar(30),
[ShippedDate] varchar(30),
[ShipVia] [int],
[Freight] [money],
[ShipName] [nvarchar](40),
[ShipAddress] [nvarchar](60),
[ShipCity] [nvarchar](15),
[ShipRegion] [nvarchar](15),
[ShipPostalCode] [nvarchar](10),
[ShipCountry] [nvarchar](15)
) as orderdata
[CustomerID] [nchar](5),
[EmployeeID] [int],
[OrderDate] varchar(30),
[RequiredDate] varchar(30),
[ShippedDate] varchar(30),
[ShipVia] [int],
[Freight] [money],
[ShipName] [nvarchar](40),
[ShipAddress] [nvarchar](60),
[ShipCity] [nvarchar](15),
[ShipRegion] [nvarchar](15),
[ShipPostalCode] [nvarchar](10),
[ShipCountry] [nvarchar](15)
This should be relatively fast, if it takes minutes, usually the reason is url for location is wrong.
No comments:
Post a Comment