Saturday, April 10, 2021

Covid-19 and StackOverflow questions

We all hear good news on Covid-19 vaccines, they keep going well, and everyone waiting to see the light at the end of the tunnel - it was such a time!! Without question, this forced everyone to work from home, at least for some time, whether they like it or not.

I hear both pro and con comments on working from home, and I wondered how this situation affected information sharing and problem resolutions in software development. Suppose you are in an office, whenever in dought, just raising the head and asking a question solves most of them. However, when working remotely, though you have all the communication tools, I thought it is not easy to solve a question in the same way, for the context is not the same for all.

When looking for some data to see the trends, what first came to mind was Stack Overflow. It is the defacto place to go for software development questions these days, and it has a fantastic API. The API lets you get a variety of data, and it has a simple web interface to test your REST queries.

Quickly settle down on Python to do this for its visualization capabilities though this will be a simple graph. I played around with Visual Studio Code for no particular reason (while I use Visual Studio already) and found that its support for jupyter notebooks is excellent.

I wanted to get the number of questions submitted for a few categories per month, including OpenSource tools, proprietory and things in between. Making requests and get data from JSON response was a piece of cake with Python, but soon it failed with an error 'Throttling violation'. I briefly saw in documentation that there was a limit for anonymous requests (I made anonymous requests as this is a one-off thing), which suppose to be 10000 requests. At the same time, there is yet another restriction that you can't make more than 30 requests per second. 

I was not going anywhere near 10000 requests, but they sent from a list comprehension in Python, like

azure = [getDataFromSE(start, end, 'azure'for (start, end) in zip(data['utc_start'], data['utc_end'])]

I wondered if it sent all the requests without waiting for the result and violated the 30 req/sec limit, but slowing it down didn't work; I had to wait till the next day to get the quota to reset. Further research showed that the limit is somewhere around 300 requests per day for anonymous requests, and I had just passed it when I got the error. Probably 10000 is internal requests, and API certainly makes more internal requests for each REST request.

I exported data to CSV, so continuing and getting all the data the next day was easy.  Following is the final graph I got.

It went up around early days but came down after a few months, and for some categories, the increase is minuscule.

It isn't easy to see why they dropped reasonably fast, but I thought it is an interesting fact ....




Wednesday, March 10, 2021

Powershell script to access Amazon S3 as a Cognito userpool user


Continuing from my last blog, I started accessing the S3 bucket as an Amazon Cognito user using PowerShell. After all, the idea was to create a script to access S3 resources using access/refresh tokens.


First, I thought it is going to be trivial. Like many other APIs AWS PowerShell module implementations will have one-to-one commandlets to pretty much convert the CLI script to PowerShell – well, I was wrong.

 
To start with, commands are spread across various modules; each has very similar sounding API commands that does different things. Then, it was challenging to find out where to send the REST request – still, if you search the web, it is apparent that there is confusion on should it be sent to the common amazon gateway or your Coginto endpoint. It was the former - I had to ran a good old network trace to find out where the CLI command sends the request

 
Another challenge was figuring out how to send data. I hadn’t used AWS gateway with HTTP posts before, so it took a while to figure out the exact methods. Parameters are spread across the header and post body.

 
After that it is reasonably straightforward. Once you get an id_token for the Cognito user,
Use API 

com.amazonaws.cognito.identity.model.AWSCognitoIdentityService.GetId 

to get the id pool identity. This is not strictly necessary as it is shown under identity browser in the identity pool, but this call makes it more generalized.

Then use the following API to get the credentials.

com.amazonaws.cognito.identity.model.AWSCognitoIdentityService.GetCredentialsForIdentity 

Send these requests to the common Cognito gateway for your region, for example,

https://cognito-identity.<region>.amazonaws.com

Once the credentials are in hand, use read-s3object Commandlet to read S3 Objects..

Finally, this was the script.

$id_tkn = "<id_token>"
$identityPoolId = "<region>:<GUID>"
$id_provider = "cognito-idp.<region>.amazonaws.com/<userpool_id>"

$idbody = @"
{
"IdentityPoolId": "$identityPoolId",
 "Logins": {
     "$id_provider":"$id_tkn"
     }
}
"@

$idheaders=@{'X-AMZ-TARGET'='com.amazonaws.cognito.identity.model.
  AWSCognitoIdentityService.GetId';  'CONTENT-TYPE' = 'application/x-amz-json-1.1' } 
$credheaders=@{'X-AMZ-TARGET'='com.amazonaws.cognito.identity.model.
  AWSCognitoIdentityService.GetCredentialsForIdentity';  
  'CONTENT-TYPE' = 'application/x-amz-json-1.1'} 


$id = Invoke-RestMethod -uri https://cognito-identity.<region>.amazonaws.com 
  -Method Post -Body $idbody -Headers $idheaders

$IdentityId = $id.IdentityId

$credBody= @"
{
  "IdentityId": "$IdentityId",
  "Logins": {
         "$id_provider":"$id_tkn"
     }
}
"@

$cred = Invoke-RestMethod -uri https://cognito-identity.<region>.amazonaws.com 
  -Method Post -Body $credbody -Headers $credheaders

$accKey = $cred.Credentials.AccessKeyId
$secretKey = $cred.Credentials.SecretKey
$sessionTkn = $cred.Credentials.SessionToken

read-s3object -bucketname <bucket or accesspoint name> -key <item_key> 
  -File  <local file name to store> -AccessKey $accKey 
  -SecretKey $secretKey -SessionToken $sessionTkn

Monday, March 1, 2021

Getting AWS Cognito user pool users to access different S3 resources

I started working with AWS and oAuth. Compared to Azure, finding information on AWS is quite challenging. There are many docs in AWS, but they are auto-generated like isolated explanations than answering the questions you have when doing a real-life solution.

My task was to allow S3 access to users who don't have AWS accounts, using oAuth. Amazon oAuth infrastructure is in the Amazon Cognito service.


Authorization


Once the identity and User pools are created, you need to add an app client for the user pool. App client controls the attributes of keys such as the lifetime. Refresh, access, and Id tokens have unique values for the validation period. Then calling the authorize and token endpoints with appropriate parameters provide refresh token and access token. The scope should include the parameter OpenId to receive an OpenID token from the refresh/authorize keys. The access token is no good to access S3 as it doesn't contain the token issuer's details. Different identity providers such as Google and Facebook are supported by the Cognito identity pool, but I chose to use the Cognito identity provider. The identity provider does not make much of a difference to the process. Cognito identity doesn't have any deep relationship with S3 but treated as any other provider.


Identity


Once the user/identity pool is created, auth and unauth roles from the identity pool are visible in the IAM console under roles (Primary reason I made them in the same AWS account as S3). You can create granular policies in the policy section, defining who can access what resources, at what level by each role. Though the policies can be tied directly to the buckets or parts of it, I created an access point that makes managing more comfortable.


Accessing S3 resources using aws cli


I had to work for days on this. Though the AccessToken is granted from Cognito, there was no interface to add it to S3 APIs. S3 APIs look for access keys and a secret, usually associated with the IAM users. Finally, it turned out asking for keys by providing an OpenID does the trick. If the string "OpenID" is in the scope when asking for authorization, Cognito sends an OpenID with the following format. 


{

  "kid": "lXY ...... =",

  "alg": "RS256"

}

JSON {

  "at_hash": "4y9WXD5LhlIBA1jSNMnjYA",

  "sub": "c4 ,...... 88",

  "email_verified": true,

  "iss": "https://cognito-idp.<region>.amazonaws.com/<UserPoolId",

  "phone_number_verified": true,

  "cognito:username": "c3.....3",

  "given_name": "name",

  "aud": "....",

  "token_use": "id",

  "auth_time": 1613947027,

  "nickname": "<name>",

  "phone_number": "<PhoneNumber>",

  "exp": 1614557229,

  "iat": 1614553629,

  "family_name": "<Surname>",

  "email": "<email>"

}


To get the temporary ids, Cognito needs the user pool id. Though this is available in Console when browsing identities, asking for it by sending the key makes it more generalized. When sending this to retrieve a temporary credential, the key is sent in the logins option, which has the format "provider=key". Use awscli command getid to get temp credentials. aws cognito-identity get-id --identity-pool-id <identity pool id>> --logins "cognito-idp.<region>.amazonaws.com/<identity pool id>=<id token>" Which sends the user pool id in the format,


{

    "IdentityId": "<user pool id>"

}



Now we have everything required to ask for the temporary key using get-credentials-for-identity.


aws cognito-identity get-credentials-for-identity --identity-id "<user pool id>" --logins "cognito-idp.<region>.amazonaws.com/<identity pool id>=<id token>"

Response is in the format,


{

    "IdentityId": "<user pool id>",

    "Credentials": {

        "AccessKeyId": "ASIAOKHKDLENZPN72UMU",

        "SecretKey": "TKxoZruHTjhyD87kjDjhjHSJ7uj78zB9rXpbMZ2/",

        "SessionToken": "IQoJb3J ……… H+FMZ4azHYaoo26uXB3A1dBLWe4fqLNG03RCVTMahodcnPs+2LiBzA/vlPf/Zo8=",

        "Expiration": "2021-03-01T14:22:23+13:00"

    }

}


There are many ways to provide the credentials, but I chose the most straightforward method - setting them as environment variables. When aws APIs look for credentials, environment variables has the highest priority.


Set AWS_ACCESS_KEY_ID=<accessKeyId>

Set AWS_SECRET_ACCESS_KEY=<SecretKey>

Set AWS_SESSION_TOKEN=<Session Token>



Now, use the list command to list the contents of the s3 access point.


> aws s3 ls s3://<access point name>


                           PRE testdata/

2020-11-20 13:00:42       1011 test.csv

2020-11-20 13:00:42       2338 test - (Copy).csv


Download a file using cp.

aws s3 cp s3://<access point arn>/test.csv c:\temp


download: s3://<arn>/test.csv to ..\..\temp\test.csv