Saturday, April 10, 2021

Covid-19 and StackOverflow questions

We all hear good news on Covid-19 vaccines, they keep going well, and everyone waiting to see the light at the end of the tunnel - it was such a time!! Without question, this forced everyone to work from home, at least for some time, whether they like it or not.

I hear both pro and con comments on working from home, and I wondered how this situation affected information sharing and problem resolutions in software development. Suppose you are in an office, whenever in dought, just raising the head and asking a question solves most of them. However, when working remotely, though you have all the communication tools, I thought it is not easy to solve a question in the same way, for the context is not the same for all.

When looking for some data to see the trends, what first came to mind was Stack Overflow. It is the defacto place to go for software development questions these days, and it has a fantastic API. The API lets you get a variety of data, and it has a simple web interface to test your REST queries.

Quickly settle down on Python to do this for its visualization capabilities though this will be a simple graph. I played around with Visual Studio Code for no particular reason (while I use Visual Studio already) and found that its support for jupyter notebooks is excellent.

I wanted to get the number of questions submitted for a few categories per month, including OpenSource tools, proprietory and things in between. Making requests and get data from JSON response was a piece of cake with Python, but soon it failed with an error 'Throttling violation'. I briefly saw in documentation that there was a limit for anonymous requests (I made anonymous requests as this is a one-off thing), which suppose to be 10000 requests. At the same time, there is yet another restriction that you can't make more than 30 requests per second. 

I was not going anywhere near 10000 requests, but they sent from a list comprehension in Python, like

azure = [getDataFromSE(start, end, 'azure'for (start, end) in zip(data['utc_start'], data['utc_end'])]

I wondered if it sent all the requests without waiting for the result and violated the 30 req/sec limit, but slowing it down didn't work; I had to wait till the next day to get the quota to reset. Further research showed that the limit is somewhere around 300 requests per day for anonymous requests, and I had just passed it when I got the error. Probably 10000 is internal requests, and API certainly makes more internal requests for each REST request.

I exported data to CSV, so continuing and getting all the data the next day was easy.  Following is the final graph I got.

It went up around early days but came down after a few months, and for some categories, the increase is minuscule.

It isn't easy to see why they dropped reasonably fast, but I thought it is an interesting fact ....




Wednesday, March 10, 2021

Powershell script to access Amazon S3 as a Cognito userpool user


Continuing from my last blog, I started accessing the S3 bucket as an Amazon Cognito user using PowerShell. After all, the idea was to create a script to access S3 resources using access/refresh tokens.


First, I thought it is going to be trivial. Like many other APIs AWS PowerShell module implementations will have one-to-one commandlets to pretty much convert the CLI script to PowerShell – well, I was wrong.

 
To start with, commands are spread across various modules; each has very similar sounding API commands that does different things. Then, it was challenging to find out where to send the REST request – still, if you search the web, it is apparent that there is confusion on should it be sent to the common amazon gateway or your Coginto endpoint. It was the former - I had to ran a good old network trace to find out where the CLI command sends the request

 
Another challenge was figuring out how to send data. I hadn’t used AWS gateway with HTTP posts before, so it took a while to figure out the exact methods. Parameters are spread across the header and post body.

 
After that it is reasonably straightforward. Once you get an id_token for the Cognito user,
Use API 

com.amazonaws.cognito.identity.model.AWSCognitoIdentityService.GetId 

to get the id pool identity. This is not strictly necessary as it is shown under identity browser in the identity pool, but this call makes it more generalized.

Then use the following API to get the credentials.

com.amazonaws.cognito.identity.model.AWSCognitoIdentityService.GetCredentialsForIdentity 

Send these requests to the common Cognito gateway for your region, for example,

https://cognito-identity.<region>.amazonaws.com

Once the credentials are in hand, use read-s3object Commandlet to read S3 Objects..

Finally, this was the script.

$id_tkn = "<id_token>"
$identityPoolId = "<region>:<GUID>"
$id_provider = "cognito-idp.<region>.amazonaws.com/<userpool_id>"

$idbody = @"
{
"IdentityPoolId": "$identityPoolId",
 "Logins": {
     "$id_provider":"$id_tkn"
     }
}
"@

$idheaders=@{'X-AMZ-TARGET'='com.amazonaws.cognito.identity.model.
  AWSCognitoIdentityService.GetId';  'CONTENT-TYPE' = 'application/x-amz-json-1.1' } 
$credheaders=@{'X-AMZ-TARGET'='com.amazonaws.cognito.identity.model.
  AWSCognitoIdentityService.GetCredentialsForIdentity';  
  'CONTENT-TYPE' = 'application/x-amz-json-1.1'} 


$id = Invoke-RestMethod -uri https://cognito-identity.<region>.amazonaws.com 
  -Method Post -Body $idbody -Headers $idheaders

$IdentityId = $id.IdentityId

$credBody= @"
{
  "IdentityId": "$IdentityId",
  "Logins": {
         "$id_provider":"$id_tkn"
     }
}
"@

$cred = Invoke-RestMethod -uri https://cognito-identity.<region>.amazonaws.com 
  -Method Post -Body $credbody -Headers $credheaders

$accKey = $cred.Credentials.AccessKeyId
$secretKey = $cred.Credentials.SecretKey
$sessionTkn = $cred.Credentials.SessionToken

read-s3object -bucketname <bucket or accesspoint name> -key <item_key> 
  -File  <local file name to store> -AccessKey $accKey 
  -SecretKey $secretKey -SessionToken $sessionTkn