Monday, December 30, 2013

Workflow Extentions with WSO2 AM 1.6.0

This is a major requirement raised by many WSO2 API Manager users many times and finally it was introduced with latest WSO2 APIManager 1.6.0 release.It's the ability of plugin workflow extensions in to APIManager major functionalities. As the first cut,we have introduced workflow extension support for below three major functionalities;
  • User Signup
  • API Subscription 
  • Application Creation
By default,APIM binary pack is shipped with a java workflow executor and a web service executor to trigger workflow processes on above three functions and that executor can be defined and configurable via api-manager.xml file.If you wish to add your own workflow executor on each above three processes,that also possible.What you have to do is write your own custom executor and integrate it to WSO2 APIM pack via a configuration change in api-manager.xml file.For more information,please refer 
http://docs.wso2.org/display/AM160/Adding+Workflow+Extensions.

Addition to providing ability to plug workflow executors to above three functions,with this release we are shifting three BPELs and three human tasks to use with default  web-service executors mentioned above.To try that,download WSO2 AM 1.6.0 and navigate to {AM_Home}/business-processes and refer readme.txt of it.To deploy these provided BPELs and human tasks you need to download WSO2 BPS [which provides a complete web-based graphical console to deploy, manage and view business processes].Generally the flow of usage of each of these BPEL is that;
  1. A user will trigger a one of above functional request [For eg: a user tries to signup to APIStore ] from APStore UI and that function request will be result with pending status until that request complete its workflow process.

  2. It needed to configure api-manager.xml with the default web-service workflow executor   for each above functions and it needed to give the ws endpoints as the deployed bpel     endpoints in WSO2 BPS.Such that,once a workflow request execute from APIStore UI as in 1),then the configured ws workflow executor will be executed  and will trigger a business         process instance in WSO2 BPS side based on deploed BPELs.
    Deployed BPELs in WSO2 BPS


    Created business process instances in WSO2 BPS
  3. From that business process instance, a human task also will be created and to continue this process further,a permitted human interaction require to approve/reject the triggered task request.
  4. Once the permitted human being,approve/reject the triggered workflow request [for eg: a user signup request],then that human response will be pass to APIM workflow callback endpoint and the APIM database tables will be updated accordingly based on workflow   response.
You can get better idea,on usage of above BPELs,if you follow the above mentioned readme.txt inside APIM binary pack.

For above 3rd  step in the BPEL flow,we have introduced a new web-application UI called 'workflow-admin' [https://ip:port/wotkflow-admin] in APIM binary pack.From this web application,it will basically list down the pending tasks,which needed to get approved/rejected by permitted users groups[admin users].A permitted user can login to this web application and view the pending tasks and assign to him/her and proceed with approving/rejecting that request and complete that human interaction needed task.


NOTE :By default workflow-admin web application allows only users with admin role to be login to that web application,as the default shifting three human tasks in APIM has written as allowing the users having 'admin' roles to be approve/reject the created human tasks.

Workflow-admin webapp UI




Saturday, December 28, 2013

Digging to WSO2 BAM


From this blog-post,I'm going to explain how to setup WSO2 BAM 2.3.0 clustered setup with getting data from mediation agents[WSO2 ESB] and service agents[WSO2 AS].

Components of the setup

1. ESB Cluster
2. AS Cluster
3. BAM Cluster
    --- Two DR nodes
    --- Four cassandra nodes
    --- Two DA nodes
    --- Two zoo-keeper nodes
    --- One dashboard node

WSO2 BAM is used to aggregate ,analyze and visualize the data events coming from different agents.
By default WSO2 BAM contains data agents for
--Collecting mediation statistics
--Collecting service statistics 
and more.To get more information on it,please refer [1].

Once different data agents send different stat events to BAM side,first those row data will be stored to BAM integrated No-SQL cassandra data store.

Note -In WSO2 BAM,primary default data-store would be No-SQL cassandra and secondary data store is H2 based RDBMS database.Secondary database can be changed to any other RDBMS database type or Cassandra database.The reason to keep Cassandra as the primary data-store is because,there will be a very large volume of row data statistics come from different data agents to BAM in a real-world use-case.Since cassandra is having the capability of horizantal scalability and distributed storage;in other words,since we can have a large number of cluster nodes and able to write them to parellel cassandra has ben chosen.
Then through hive-scripts, map-reduce jobs will be scheduled [this can be one-time or  periodical] to underlying hadoop file-system and data will be analyzed and then move the analyzed data to a relational database.
Then by querying this relational database,the analyzed data will be visualize as gadgets/rendered html pages with using wso2 inbuilt dashboard capability.So in a company,the managerial level  can be used these visualize data to analyze and make decisions on their business related data.
Due to flexible and componentize architecture of WSO2 BAM,  same BAM node can be scale to act as different components.
For example,a single BAM node consists the components of
  • Data-Receiver
  • Data-Analyzer
  • Internal Cassandra
  • Internal Hadoop
  • Internal zoo-keeper

If an organization wants to keep a BAM node as only to function as Data-receiver,that can be achieved through configurations easily.This is same for other BAM inbuilt componentize features as well.
Additionally,BAM can setup with external cassandra store or external hadoop cluster or external zookeeper cluster without using internal embedded ones.

In our setup,we have used BAM to collect mediation statistics and service statistics.
First we'll look into the data flow of the setup. Below diagram is showing the high-level architecture of the setup.


  • Setting ESB & AS Data Agents
First we need to enable BAM statistics in ESB and AS.Then we have to enable mediation agent and service agent from each.For those,please refer;


     In our-case,since we have setup two DR nodes to receive BAM stat events in load-balancing manner.We need to set ESB and AS data agents as load balancing data agents and we need to add two BAM receiver nodes urls in a load balancing manner in ESB and AS data agents side.To refer on how to do that,please refer 
http://docs.wso2.org/display/BAM230/Setting+up+Multi+Receiver+and+Load+Balancing++Data+Agent
  • Setting BAM Data Receiver Nodes
Then it need to configure BAM Data Receiver [DR]nodes. From setting multiple BAM receiver nodes,what will happen is,the data events from ESB/AS will recieve to those BAM nodes in a high available manner.If one DR node is down,the other node will be act as the data receiver.Once data events recieved to these BAM nodes,those need to send to primary cassandra storage to store the row data.
As the configuration changes,we need to point these nodes to cassandra cluster and we need to define read/write consistency levels of data receivers which write data to cassandra.
Pointing to the cassandra cluster from DR nodes can be done by modifying cassandra-component.xml which can be found from {BAM_Home}/repository/conf as below.
<Cassandra>
<Cluster>
    <Name>Test Cluster</Name>
    <Nodes>cass_node1_ip:9160,cass_node2_ip:9160,cass_node3_ip:9160</Nodes>
    <DefaultPort>9160</DefaultPort>
    <AutoDiscovery disable="false" delay="1000" /&gt;
</Cluster>
</Cassandra>

Defining the read/write consistency levels of data receivers on writing data to cassandra cluster can be changed from streamdefn.xml which can be found from {BAM_Home}/repository/conf/advanced.
For example;

<StreamDefinition>
    <ReplicationFactor>3</ReplicationFactor>
    <ReadConsistencyLevel>QUORUM</ReadConsistencyLevel>
    <WriteConsistencyLevel>QUORUM</WriteConsistencyLevel>
    <StrategyClass>org.apache.cassandra.locator.SimpleStrategy</StrategyClass>
</StreamDefinition>


In above configuration,the WriteConsistency and ReadConsistency has set as QUORUM = '(replication_factor / 2) +1'  = '3/2 +1' = 2. Such that, you should have atleast 2 cassandra nodes up and running to the write to be succeed. 

Hence we need to plan what is the tolerance level of the system, and we have to plan the WriteConsistency and ReadConsistency depending on that. To keep the tolerance of 1 node to be down, then it can be specified as 'ONE' or 'ANY'. Please refer http://www.datastax.com/docs/1.1/dml/data_consistency

Additionally we need to change default users-store of each nodes to common users-store by configuring user-mgt.xml in {BAM_Home}/repository/conf location.

Since these BAM DR nodes will not use BAM data analyzing feature and inbuilt cassandra support;

--  Can remove the BAM Tool Box Deployer feature using feature manager

-- Start BAM nodes with stop starting cassnadra bundled with server by giving below property with the server startup command.

sh wso2server.sh -Ddisable.cassandra.server.startup=true


  • Setting BAM Cassandra Cluster
Next,it need to configure BAM cassandra cluster.As the cassandra cluster,you can either setup an external cassandra cluster or you can use BAM nodes with their inbuilt cassandra feature support.In this setup,we have used four  BAM nodes with their inbuilt cassandra support as the cassandra cluster.In each of these BAM nodes,you have to change cassandra.yaml file which can be found from {BAM_Home}/repository/conf/etc location.

Basically,we need to change the following configurations in cassandra.yaml file.
--cluster_name- Change to a common name
--listen_address- Hostname of each BAM cassandra node
--seeds- Hostname of the seed nodes in cassandra cluster.Here we have set only one BAM node as the seed node.
--rpc_address- Hostname of each BAM cassandra node
--rpc_port -A common port value across cassandra BAM nodes.Default value is 9160
--storage_port- Default value is 7000.Shoud be common across cassandra BAM nodes.

NOTE : If you have setup the BAM cassandra nodes with port-offset values,then you have to add additional two system properties to server startup as below,to connect all the nodes to one cassandra cluster.

-Dcassandra.rpc.port= default_port[9160]+offset
-Dcassandra.storage.port=defaut_port[7000]+offset

From above system properties,cassandra rpc and storage ports have been set to a common value with adding the offset value.Please note,the above defined same nodes need to be define in cassandra.yaml file.

The next major fact that we need to aware is, whether the cassandra cluster successfully created and nodes are joined successfully or not.
For that,we have used nodetool which is shipped with apache-cassandra 1.1.3.First downloaded apache-cassandra 1.1.3 from here,unzipped it and executed the below nodetool command.

 ./nodetool -u admin -pw admin -host -p ring

From above command,it will list down all the nodes connected with that cassandra cluster.
Additionally we need to change default users-store of each nodes to common users-store by configuring user-mgt.xml in {BAM_Home}/repository/conf location.
  • Setting BAM Data Analyzer Nodes
Next step is to configure BAM data analyzer nodes,which function as analyzing the row data stored in cassandra primary storage and put into a different secondary storage.For that BAM provided hive scripts support,such that,hive scripts will handle scheduling tasks into local inbuilt hadoop system in BAM nodes and process tasks with analyzing row data.
To collect anayzed statistics of ESB mediation data and AS service data,BAM itself has predefined hive-scripts written to do analytics jobs from hadoop.
These hive-scripts has been included to BAM binary pack in a deployable artifact type called BAM toolbox .
In our deployment,we have kept two BAM nodes as DA nodes in which one will act as read-write mode with ability to deploy toolbox artifacts,while other node is in read-only mode with disabling BAM toolbox deployment feature.Addition to that,we have used external zookeepr cluster setup to use with scheduling hadoop jobs in a high availability manner.The steps on how we did that can be found from the section "Configuring data analyzer cluster" described in 

Once you configured two nodes and zookeeper cluster,to check whether the DA nodes provide high availability,try first deploy the relevant analytic scripts as toolboxes and enable executing those analytic scripts to be run as scheduled tasks.Then down one DA node and check whether the schedule task is properly trigger with the second DA node,when the first DA node is down.

In this setup,we have not configured external hadoop cluster and used BAM inbuilt hadoop support.The reason for it is,the resources allocation we were had was less and the analyzing rate of data in our setup is not very frequent.Thus if one DA server down and if the analyzing of a row data entry failed at first time,when second task execution time,still that row data entry elligable to be analyzed and since analyzing of data not needed to be done frequently,we used BAM inbuilt hadoop support as it is.
The advantage you get from having an external Hadoop cluster is the possible performance increase. That is, basically, this affects the execution of a single Hive analytics operation. So if the Hive operation is an expensive operation, it's execution can be made faster if we had split operation among multiple Hadoop nodes, but in above setup, it will always execute in the local node. So if it need to scale the execution of individual jobs, it can be added an external Hadoop cluster and add nodes to it to make the operations ultimately finish executing earlier. If each individual Hive operations are not that large, and does not execute for a long period,then going with the internal Hadoop of each BAM node is fine.

Since these nodes will not use inbuilt cassandra support;
-- Start BAM nodes with stop starting cassandra bundled with server by giving below property with the server startup command.

sh wso2server.sh -Ddisable.cassandra.server.startup=true

  • Setting BAM Dashboard Node
This node is only to show the presentation dashboard with collected analyzed statistics.Such that you can edit the toolboxes to only include the presentation part and deploy in this node.Then modify the master-datasources.xml in {BAM_Home}/repository/conf/data-sources location to add the secondary relational database storage,which contains the analyzed statistics and same time which will be used to query to visualize from dashboard.
Since this node will not use inbuilt cassandra support;
-- Start BAM nodes with stop starting cassandra bundled with server by giving below property with the server startup command.

sh wso2server.sh -Ddisable.cassandra.server.startup=true


And disable data analyzing of the BAM node as well. For that remove the analyzer based features from that server from feature manager