Tuesday, June 27, 2017

Securing Apache Solr - part II

This is the second post in a series of articles on securing Apache Solr. The first post looked at setting up a sample SolrCloud instance and securing access to it via Basic Authentication. In this post we will temporarily deviate from the concept of "securing Apache Solr", and instead look at how the Apache Ranger admin service can be configured to store audit information in Apache Solr.

1) Download and extract the Apache Ranger admin service

The first step is to download the source code, as well as the signature file and associated message digests (all available on the download page). Verify that the signature is valid and that the message digests match. Now extract and build the source, and copy the resulting admin archive to a location where you wish to install the UI:
  • tar zxvf apache-ranger-incubating-1.0.0.tar.gz
  • cd apache-ranger-incubating-1.0.0
  • mvn clean package assembly:assembly 
  • tar zxvf target/ranger-1.0.0-admin.tar.gz
  • mv ranger-1.0.0-admin ${rangerhome}
2) Install MySQL

The Apache Ranger Admin UI requires a database to keep track of users/groups as well as policies for various big data projects that you are securing via Ranger. For the purposes of this tutorial, we will use MySQL. Install MySQL in $SQL_HOME and start MySQL via:
  • sudo $SQL_HOME/bin/mysqld_safe --user=mysql
Now you need to log on as the root user and create two users for Ranger. We need a root user with admin privileges (let's call this user "admin") and a user for the Ranger Schema (we'll call this user "ranger"):
  • CREATE USER 'admin'@'localhost' IDENTIFIED BY 'password';
  • GRANT ALL PRIVILEGES ON * . * TO 'admin'@'localhost' WITH GRANT OPTION;
  • CREATE USER 'ranger'@'localhost' IDENTIFIED BY 'password';
  • FLUSH PRIVILEGES;
Finally,  download the JDBC driver jar for MySQL and put it in ${rangerhome}.

3) Configure Apache Solr to support auditing from Ranger

Before installing the Apache Ranger admin service we will need to configure Apache Solr. The Apache Ranger admin service ships with a script to make this easier to configure. Edit 'contrib/solr_for_audit_setup/install.properties' with the following properties:
  • SOLR_USER/SOLR_GROUP - the user/group you are running solr as
  • SOLR_INSTALL_FOLDER - Where you have extracted Solr to as per the first tutorial.
  • SOLR_RANGER_HOME - Where to install the Ranger configuration for Solr auditing.
  • SOLR_RANGER_PORT - The port to be used (8983 as per the first tutorial).
  • SOLR_DEPLOYMENT - solrcloud
  • SOLR_HOST_URL - http://localhost:8983
  • SOLR_ZK - localhost:2181
Make sure that the user running Solr has permission to write to the value configured for "SOLR_LOG_FOLDER" (/var/log/solr/ranger_audits). Now in 'contrib/solr_for_audit_setup' run 'sudo -E ./setup.sh'. The Solr configuration is now copied to $SOLR_RANGER_HOME.

4) Start Apache Zookeeper and SolrCloud

Before starting Apache Solr we will need to start Apache Zookeeper. Download Apache Zookeeper and start it on port 2181 via (this step was not required in the previous tutorial as we were launching SolrCloud with an embedded Zookeeper instance):
  • bin/zkServer.sh start
As per the first post, we want to secure access to SolrCloud via Basic Authentication (note that this is only recently fixed in Apache Ranger). So follow the steps in this post to upload the security.json to Zookeeper via:
  • server/scripts/cloud-scrip/zkcli.sh -zkhost localhost:2181 -cmd putfile /security.json security.json
Start Solr as follows in the '${SOLR_RANGER_HOME}/ranger_audit_server/scripts' directory:
  • ./add_ranger_audits_conf_to_zk.sh 
  • ./start_solr.sh
Edit 'create_ranger_audits_collection.sh' and change 'curl --negotiate -u :' to 'curl -u "alice:SolrRocks"'. Save it and then run:
  • ./create_ranger_audits_collection.sh
5) Install the Apache Ranger Admin UI

Edit ${rangerhome}/install.properties and make the following changes:
  • Change SQL_CONNECTOR_JAR to point to the MySQL JDBC driver jar that you downloaded above.
  • Set (db_root_user/db_root_password) to (admin/password)
  • Set (db_user/db_password) to (ranger/password)
  • audit_solr_urls: http://localhost:8983/solr/ranger_audits
  • audit_solr_user: alice
  • audit_solr_password: SolrRocks
  • audit_solr_zookeepers: localhost:2181
Now you can run the setup script via "sudo -E ./setup.sh". When this is done then start the Apache Ranger admin service via: "sudo ranger-admin start".

6) Test that auditing is working correctly in the Ranger Admin service

Open a browser and navigate to "http://localhost:6080". Try to log on first using some made up credentials. Then log in using "admin/admin". Click on the "Audit" tab and then select "Login Sessions". You should see the incorrect and the correct login attempts, meaning that ranger is successfully storing and retrieving audit information in Solr:


Monday, June 26, 2017

Securing Apache Solr - part I

This is the first post in a series of articles on securing Apache Solr. In this post we will look at deploying an example SolrCloud instance and securing access to it via basic authentication.

1) Install and deploy a SolrCloud example

Download and extract Apache Solr (6.6.0 was used for the purpose of this tutorial). Now start SolrCloud via:
  • bin/solr -e cloud
Accept all of the default options. This creates a cluster of two nodes, with a collection "gettingstarted" split into two shards and two replicas per-shard. A web interface is available after startup at: http://localhost:8983/solr/.

Once the cluster is up and running we can post some data to the collection we have created via the REST interface:
  • curl http://localhost:8983/solr/gettingstarted/update -d '[ {"id" : "book1", "title_t" : "The Merchant of Venice", "author_s" : "William Shakespeare"}]'
  • curl http://localhost:8983/solr/gettingstarted/update -d '[ {"id" : "book2", "title_t" : "Macbeth", "author_s" : "William Shakespeare"}]'
  • curl http://localhost:8983/solr/gettingstarted/update -d '[ {"id" : "book3", "title_t" : "Death of a Salesman", "author_s" : "Arthur Miller"}]'
We can search the REST interface to for example return all entries by William Shakespeare as follows:
  • curl http://localhost:8983/solr/gettingstarted/query?q=author_s:William+Shakespeare
2) Authenticating users to our SolrCloud instance

Now that our SolrCloud instance is up and running, let's look at how we can secure access to it, by using HTTP Basic Authentication to authenticate our REST requests. Download the following security configuration which enables Basic Authentication in Solr:
Two users are defined - "alice" and "bob" - both with password "SolrRocks". Now upload this configuration to the Apache Zookeeper instance that is running with Solr:
  • server/scripts/cloud-scripts/zkcli.sh -zkhost localhost:9983 -cmd putfile /security.json security.json
Now try to run the search query above again using Curl. A 401 error will be returned. Once we specify the correct credentials then the request will work as expected, e.g.:
  • curl -u alice:SolrRocks http://localhost:8983/solr/gettingstarted/query?q=author_s:Arthur+Miller

Thursday, June 22, 2017

SSO support for Apache Syncope REST services

Apache Syncope has recently added SSO support for its REST services in the 2.0.3 release. Previously, access to the REST services of Syncope was via HTTP Basic Authentication. From the 2.0.3 release, SSO support is available using JSON Web Tokens (JWT). In this post, we will look at how this works and how it can be configured.

1) Obtaining an SSO token from Apache Syncope

As stated above, in the past it was necessary to supply HTTP Basic Authentication credentials when invoking on the REST API. Let's look at an example using curl. Assume we have a running Apache Syncope instance with a user "alice" with password "ecila". We can make a GET request to the user self service via:
  • curl -u alice:ecila http://localhost:8080/syncope/rest/users/self
It may be inconvenient to supply user credentials on each request or the authentication process might not scale very well if we are authenticating the password to a backend resource. From Apache Syncope 2.0.3, we can instead get an SSO token by sending a POST request to "accessTokens/login" as follows:
  • curl -I -u alice:ecila -X POST http://localhost:8080/syncope/rest/accessTokens/login
The response contains two headers:
  • X-Syncope-Token: A JWT token signed according to the JSON Web Signature (JWS) spec.
  • X-Syncope-Token-Expire: The expiry date of the token
The token in question is signed using the (symmetric) "HS512" algorithm. It contains the subject "alice" and the issuer of the token ("ApacheSyncope"), as well as a random token identifier, and timestamps that indicate when the token was issued, when it expires, and when it should not be accepted before.

The signing key and the issuer name can be changed by editing 'security.properties' and specifying new values for 'jwsKey' and 'jwtIssuer'. Please note that it is critical to change the signing key from the default value! It is also possible to change the signature algorithm from the next 2.0.4 release via a custom 'securityContext.xml' (see here). The default lifetime of the token (120 minutes) can be changed via the "jwt.lifetime.minutes" configuration property for the domain.

2) Using the SSO token to invoke on a REST service

Now that we have an SSO token, we can use it to invoke on a REST service instead of specifying our username and password as before. For Syncope 2.0.3 only, the header name is the same as the header name above "X-Syncope-Token". From Syncope 2.0.4 onwards, the header name is "Authorization: Bearer <token>", e.g.:
  • curl -H "Authorization: Bearer eyJ0e..." http://localhost:8080/syncope/rest/users/self
The signature is first checked on the token, then the issuer is verified so that it matches what is configured, and then the expiry and not-before dates are checked. If the identifier matches that of a saved access token then authentication is successful.

Finally, SSO tokens can be seen in the admin console under "Dashboard/Access Token", where they can be manually revoked by the admin user:


Monday, June 19, 2017

Querying Apache HBase using Talend Open Studio for Big Data

Recent blog posts have described how to set up authorization for Apache HBase using Apache Ranger. However the posts just covered inputing and reading data using the HBase Shell. In this post, we will show how Talend Open Studio for Big Data can be used to read data stored in Apache HBase. This post is along the same lines of other recent tutorials on reading data from Kafka and HDFS.

1) HBase setup

Follow this tutorial on setting up Apache HBase in standalone mode, and creating a 'data' table with some sample values using the HBase Shell.

2) Download Talend Open Studio for Big Data and create a job

Now we will download Talend Open Studio for Big Data (6.4.0 was used for the purposes of this tutorial). Unzip the file when it is downloaded and then start the Studio using one of the platform-specific scripts. It will prompt you to download some additional dependencies and to accept the licenses. Click on "Create a new job" called "HBaseRead". In the search bar on the right-hand side, enter "hbase" and hit enter. Drag "tHBaseConnection" and "tHBaseInput" onto the palette, as well as "tLogRow".

"tHBaseConnection" is used to set up the connection to "HBase", "tHBaseInput" uses the connection to read data from HBase, and "tLogRow" will log the data that was read so that we can see that the job ran successfully. Right-click on "tHBaseConnection" and select "Trigger/On Subjob Ok" and drag the resulting arrow to the "tHBaseInput" component. Now right click on "tHBaseInput" and select "Row/Main" and drag the arrow to "tLogRow".
3) Configure the components

Now let's configure the individual components. Double click on "tHBaseConnection" and select the distribution "Hortonworks" and Version "HDP V2.5.0" (from an earlier tutorial we are using HBase 1.2.6). We are not using Kerberos here so we can skip the rest of the security configuration. Now double click on "tHBaseInput". Select the "Use an existing connection" checkbox. Now hit "Edit Schema" and add two entries to map the column we created in two different column families: "c1" which matches DB "col1" of type String, and "c2" which matches DB "col1" of type String.


Select "data" for the table name back in tHBaseInput and add a mapping for "c1" to "colfam1", and "c2" to "colfam2".


Now we are ready to run the job. Click on the "Run" tab and then hit the "Run" button. You should see "val1" and "val2" appear in the console window.

Wednesday, June 14, 2017

Securing Apache HBase - part II

This is the second (and final for now) post in a short series of blog posts on securing Apache HBase. The first post looked at how to set up a standalone instance of HBase and how to authorize access to a table using Apache Ranger. In this post, we will look at how Apache Ranger can create "tag" based authorization policies for Apache HBase using Apache Atlas.

1) Start Apache Atlas and create entities/tags for HBase

First let's look at setting up Apache Atlas. Download the latest released version (0.8-incubating) and extract it. Build the distribution that contains an embedded HBase and Solr instance via:
  • mvn clean package -Pdist,embedded-hbase-solr -DskipTests
The distribution will then be available in 'distro/target/apache-atlas-0.8-incubating-bin'. To launch Atlas, we need to set some variables to tell it to use the local HBase and Solr instances:
  • export MANAGE_LOCAL_HBASE=true
  • export MANAGE_LOCAL_SOLR=true
Now let's start Apache Atlas with 'bin/atlas_start.py'. Open a browser and go to 'http://localhost:21000/', logging on with credentials 'admin/admin'. Click on "TAGS" and create a new tag called "customer_data". Now click on "Search" and then follow the "Create new entity" link of type "hbase_table" with the following parameters:
  • Name: data
  • QualifiedName: data@cl1
  • Uri: data
Now add the 'customer_data' tag to the entity that we have created.

2) Use the Apache Ranger TagSync service to import tags from Atlas into Ranger

To create tag based policies in Apache Ranger, we have to import the entity + tag we have created in Apache Atlas into Ranger via the Ranger TagSync service. After building Apache Ranger then extract the file called "target/ranger-<version>-tagsync.tar.gz". Edit 'install.properties' as follows:
  • Set TAG_SOURCE_ATLAS_ENABLED to "false"
  • Set TAG_SOURCE_ATLASREST_ENABLED to  "true" 
  • Set TAG_SOURCE_ATLASREST_DOWNLOAD_INTERVAL_IN_MILLIS to "60000" (just for testing purposes)
  • Specify "admin" for both TAG_SOURCE_ATLASREST_USERNAME and TAG_SOURCE_ATLASREST_PASSWORD
Save 'install.properties' and install the tagsync service via "sudo ./setup.sh". Start the Apache Ranger admin service via "sudo ranger-admin start" and then the tagsync service via "sudo ranger-tagsync-services.sh start".

3) Create Tag-based authorization policies in Apache Ranger

Now let's create a tag-based authorization policy in the Apache Ranger admin UI. Click on "Access Manager" and then "Tag based policies". Create a new Tag service called "HBaseTagService". Create a new policy for this service called "CustomerDataPolicy". In the "TAG" field enter a "c" and the "customer_data" tag should pop up, meaning that it was successfully synced in from Apache Atlas. Create an "Allow" condition for the user "bob" with the "Read" permission for the "HBase" component.

We also need to do is to go back to the Resource based policies and edit "cl1_hbase" and select the tag service we have created above. Now we are ready to test the authorization policy we have created with HBase. Start the shell as "bob" and we should be able to read the table we created in the first tutorial:
  • sudo -E -u bob bin/hbase shell
  • scan 'data'

Tuesday, June 13, 2017

Securing Apache HBase - part I

This is the first in a short series of blog posts on securing Apache HBase. HBase is a column-based database that facilitates random read/write access to data stored in the Hadoop FileSystem (HDFS). In this post we will focus on setting up a standalone instance of Apache HBase, and then demonstrate how to use Apache Ranger to authorize access to a HBase table.

1) Install Apache HBase

Download Apache HBase (version 1.2.6 was used for the purposes of this tutorial) and extract it. As stated above, we will set up a standalone version of HBase, which means that HBase itself and Apache Zookeeper run in a single JVM, and data is stored in the local filesystem instead of HDFS. Normally we would authenticate users via Kerberos, but as we are just running HBase in standalone mode, we will focus solely on authorization in this series of tutorials. Start HBase via:
  • bin/start-hbase.sh
Then start the shell and create a sample table called "data", with two column families, and add some rows to the table:
  • bin/hbase shell
  • create 'data', 'colfam1', 'colfam2'
  • put 'data', 'row1', 'colfam1:col1', 'val1'
  • put 'data', 'row1', 'colfam2:col1', 'val2'
  • scan 'data'
The latter command will print out the values stored in the table. Next we will look at using Apache Ranger to restrict access to the 'data' table to authorized users only.

2) Install the Apache Ranger HBase plugin 

Download Apache Ranger and verify that the signature is valid and that the message digests match. Extract and build the source, and copy the resulting plugin to a location where you will configure and install it, e.g.:
  • mvn clean package assembly:assembly -DskipTests
  • tar zxvf target/ranger-1.0.0-SNAPSHOT-hbase-plugin.tar.gz
  • mv ranger-1.0.0-SNAPSHOT-hbase-plugin ${ranger.hbase.home}
Now go to ${ranger.hbase.home} and edit "install.properties". You need to specify the following properties:
  • POLICY_MGR_URL: Set this to "http://localhost:6080"
  • REPOSITORY_NAME: Set this to "cl1_hbase".
  • COMPONENT_INSTALL_DIR_NAME: The location of your Apache HBase installation
Save "install.properties" and install the plugin as root via "sudo ./enable-hbase-plugin.sh". The Apache Ranger HBase plugin should now be successfully installed. The ranger plugin will try to store policies by default in "/etc/ranger/cl1_hbase/policycache". As we installed the plugin as "root" make sure that this directory is accessible to the user that is running HBase.

3) Configure authorization policies in the Apache Ranger Admin UI 

The next step is to create some authorization policies for Apache HBase in the Apache Ranger admin service. Please refer to this blog post for information on how to install the Apache Ranger admin service. Assuming the admin service is already installed, start it via "sudo ranger-admin start". Open a browser and log on to "localhost:6080" with the credentials "admin/admin".

Create a new HBase service, adding the following configuration items to the default values:
  • Service Name: cl1_hbase
  • Username/Password: admin
  • hbase.zookeeper.quorum: localhost
Click on "Test Connection" (if HBase is running) to verify that the connection is successful (note: only works from 1.0.0 onwards - see RANGER-1640) and then save the service. Click on "cl1_hbase" and edit the default policy which has been created, and add the user running HBase to the "Allow Condition" permission.

Now we will add a new authorization policy to test access to HBase. Under "Settings + Users/Groups" add two new users called "alice" and "bob", and also create these new users in your local system. Now we can create a new authorization policy to grant "alice" the "Read" permission for the "data" table (all column families and columns).



4) Testing authorization in HBase

The policy we have created above will get downloaded and enforced by the Ranger HBase plugin we installed into HBase. Restart HBase before proceeding further (if it was running with the Ranger plugin before downloading the policy which granted the user running HBase "admin" privileges, then HBase might not be working properly). Now start the shell as "alice" and try to read the table we created earlier:
  • sudo -E -u alice bin/hbase shell
  • scan 'data'
This should work due to the authorization policy we created. However "alice" should not be allowed to write to "data", e.g the following should result in a "AccessDeniedException":
  • put 'data', 'row1', 'colfam1:col1', 'val3'

Tuesday, June 6, 2017

Securing Apache Storm - part IV

This is the fourth and final post in a series of blog posts on securing Apache Storm. The first post looked at setting up a simple Storm cluster that authenticates users via Kerberos, and deploying a topology. The second post looked at deploying the Storm UI using Kerberos, and accessing it via a REST client. The third post looked at how to use Apache Ranger to authorize access to Apache Storm.  In this post, we will look at how Apache Ranger can create "tag" based authorization policies for Apache Storm using Apache Atlas.

1) Start Apache Atlas and create entities/tags for Storm

First let's look at setting up Apache Atlas. Download the latest released version (0.8-incubating) and extract it. Build the distribution that contains an embedded HBase and Solr instance via:
  • mvn clean package -Pdist,embedded-hbase-solr -DskipTests
The distribution will then be available in 'distro/target/apache-atlas-0.8-incubating-bin'. To launch Atlas, we need to set some variables to tell it to use the local HBase and Solr instances:
  • export MANAGE_LOCAL_HBASE=true
  • export MANAGE_LOCAL_SOLR=true
Now let's start Apache Atlas with 'bin/atlas_start.py'. Open a browser and go to 'http://localhost:21000/', logging on with credentials 'admin/admin'. Click on "TAGS" and create a new tag called "user_topologies".  Unlike for HDFS or Kafka, Atlas doesn't provide an easy way to create a Storm Entity in the UI. Instead we can use the following json file to create a Storm Entity for "*" topologies:

You can upload it to Atlas via:
  • curl -v -H 'Accept: application/json, text/plain, */*' -H 'Content-Type: application/json;  charset=UTF-8' -u admin:admin -d @storm-create.json http://localhost:21000/api/atlas/entities
Once the new entity has been uploaded, then you can search for it in the Atlas UI, then click on "+" beside "Tags" and associate the new entity with the "user_topologies" tag.

2) Use the Apache Ranger TagSync service to import tags from Atlas into Ranger

To create tag based policies in Apache Ranger, we have to import the entity + tag we have created in Apache Atlas into Ranger via the Ranger TagSync service. After building Apache Ranger then extract the file called "target/ranger-<version>-tagsync.tar.gz". Edit 'install.properties' as follows:
  • Set TAG_SOURCE_ATLAS_ENABLED to "false"
  • Set TAG_SOURCE_ATLASREST_ENABLED to  "true" 
  • Set TAG_SOURCE_ATLASREST_DOWNLOAD_INTERVAL_IN_MILLIS to "60000" (just for testing purposes)
  • Specify "admin" for both TAG_SOURCE_ATLASREST_USERNAME and TAG_SOURCE_ATLASREST_PASSWORD
Save 'install.properties' and install the tagsync service via "sudo ./setup.sh". Start the Apache Ranger admin service via "sudo ranger-admin start" and then the tagsync service via "sudo ranger-tagsync-services.sh start".

3) Create Tag-based authorization policies in Apache Ranger

Now let's create a tag-based authorization policy in the Apache Ranger admin UI. Click on "Access Manager" and then "Tag based policies". Create a new Tag service called "StormTagService". Create a new policy for this service called "UserTopologiesPolicy". In the "TAG" field enter a "u" and the "user_topologies" tag should pop up, meaning that it was successfully synced in from Apache Atlas. Create an "Allow" condition for the user "alice" with all of the component permissions for "Storm":


We also need to do is to go back to the Resource based policies and edit "cl1_storm" and select the tag service we have created above. Finally, edit the existing "cl1_storm" policy created as par of the previous tutorials, and remove the permissions for "alice" there, so that we can be sure that authorization is working correctly. Then follow the first tutorial and verify that "alice" is authorized to deploy a topology as per the tag-based authorization policy we have created in Ranger.