Skip to main content

Digging to WSO2 BAM


From this blog-post,I'm going to explain how to setup WSO2 BAM 2.3.0 clustered setup with getting data from mediation agents[WSO2 ESB] and service agents[WSO2 AS].

Components of the setup

1. ESB Cluster
2. AS Cluster
3. BAM Cluster
    --- Two DR nodes
    --- Four cassandra nodes
    --- Two DA nodes
    --- Two zoo-keeper nodes
    --- One dashboard node

WSO2 BAM is used to aggregate ,analyze and visualize the data events coming from different agents.
By default WSO2 BAM contains data agents for
--Collecting mediation statistics
--Collecting service statistics 
and more.To get more information on it,please refer [1].

Once different data agents send different stat events to BAM side,first those row data will be stored to BAM integrated No-SQL cassandra data store.

Note -In WSO2 BAM,primary default data-store would be No-SQL cassandra and secondary data store is H2 based RDBMS database.Secondary database can be changed to any other RDBMS database type or Cassandra database.The reason to keep Cassandra as the primary data-store is because,there will be a very large volume of row data statistics come from different data agents to BAM in a real-world use-case.Since cassandra is having the capability of horizantal scalability and distributed storage;in other words,since we can have a large number of cluster nodes and able to write them to parellel cassandra has ben chosen.
Then through hive-scripts, map-reduce jobs will be scheduled [this can be one-time or  periodical] to underlying hadoop file-system and data will be analyzed and then move the analyzed data to a relational database.
Then by querying this relational database,the analyzed data will be visualize as gadgets/rendered html pages with using wso2 inbuilt dashboard capability.So in a company,the managerial level  can be used these visualize data to analyze and make decisions on their business related data.
Due to flexible and componentize architecture of WSO2 BAM,  same BAM node can be scale to act as different components.
For example,a single BAM node consists the components of
  • Data-Receiver
  • Data-Analyzer
  • Internal Cassandra
  • Internal Hadoop
  • Internal zoo-keeper

If an organization wants to keep a BAM node as only to function as Data-receiver,that can be achieved through configurations easily.This is same for other BAM inbuilt componentize features as well.
Additionally,BAM can setup with external cassandra store or external hadoop cluster or external zookeeper cluster without using internal embedded ones.

In our setup,we have used BAM to collect mediation statistics and service statistics.
First we'll look into the data flow of the setup. Below diagram is showing the high-level architecture of the setup.


  • Setting ESB & AS Data Agents
First we need to enable BAM statistics in ESB and AS.Then we have to enable mediation agent and service agent from each.For those,please refer;


     In our-case,since we have setup two DR nodes to receive BAM stat events in load-balancing manner.We need to set ESB and AS data agents as load balancing data agents and we need to add two BAM receiver nodes urls in a load balancing manner in ESB and AS data agents side.To refer on how to do that,please refer 
http://docs.wso2.org/display/BAM230/Setting+up+Multi+Receiver+and+Load+Balancing++Data+Agent
  • Setting BAM Data Receiver Nodes
Then it need to configure BAM Data Receiver [DR]nodes. From setting multiple BAM receiver nodes,what will happen is,the data events from ESB/AS will recieve to those BAM nodes in a high available manner.If one DR node is down,the other node will be act as the data receiver.Once data events recieved to these BAM nodes,those need to send to primary cassandra storage to store the row data.
As the configuration changes,we need to point these nodes to cassandra cluster and we need to define read/write consistency levels of data receivers which write data to cassandra.
Pointing to the cassandra cluster from DR nodes can be done by modifying cassandra-component.xml which can be found from {BAM_Home}/repository/conf as below.
<Cassandra>
<Cluster>
    <Name>Test Cluster</Name>
    <Nodes>cass_node1_ip:9160,cass_node2_ip:9160,cass_node3_ip:9160</Nodes>
    <DefaultPort>9160</DefaultPort>
    <AutoDiscovery disable="false" delay="1000" /&gt;
</Cluster>
</Cassandra>

Defining the read/write consistency levels of data receivers on writing data to cassandra cluster can be changed from streamdefn.xml which can be found from {BAM_Home}/repository/conf/advanced.
For example;

<StreamDefinition>
    <ReplicationFactor>3</ReplicationFactor>
    <ReadConsistencyLevel>QUORUM</ReadConsistencyLevel>
    <WriteConsistencyLevel>QUORUM</WriteConsistencyLevel>
    <StrategyClass>org.apache.cassandra.locator.SimpleStrategy</StrategyClass>
</StreamDefinition>


In above configuration,the WriteConsistency and ReadConsistency has set as QUORUM = '(replication_factor / 2) +1'  = '3/2 +1' = 2. Such that, you should have atleast 2 cassandra nodes up and running to the write to be succeed. 

Hence we need to plan what is the tolerance level of the system, and we have to plan the WriteConsistency and ReadConsistency depending on that. To keep the tolerance of 1 node to be down, then it can be specified as 'ONE' or 'ANY'. Please refer http://www.datastax.com/docs/1.1/dml/data_consistency

Additionally we need to change default users-store of each nodes to common users-store by configuring user-mgt.xml in {BAM_Home}/repository/conf location.

Since these BAM DR nodes will not use BAM data analyzing feature and inbuilt cassandra support;

--  Can remove the BAM Tool Box Deployer feature using feature manager

-- Start BAM nodes with stop starting cassnadra bundled with server by giving below property with the server startup command.

sh wso2server.sh -Ddisable.cassandra.server.startup=true


  • Setting BAM Cassandra Cluster
Next,it need to configure BAM cassandra cluster.As the cassandra cluster,you can either setup an external cassandra cluster or you can use BAM nodes with their inbuilt cassandra feature support.In this setup,we have used four  BAM nodes with their inbuilt cassandra support as the cassandra cluster.In each of these BAM nodes,you have to change cassandra.yaml file which can be found from {BAM_Home}/repository/conf/etc location.

Basically,we need to change the following configurations in cassandra.yaml file.
--cluster_name- Change to a common name
--listen_address- Hostname of each BAM cassandra node
--seeds- Hostname of the seed nodes in cassandra cluster.Here we have set only one BAM node as the seed node.
--rpc_address- Hostname of each BAM cassandra node
--rpc_port -A common port value across cassandra BAM nodes.Default value is 9160
--storage_port- Default value is 7000.Shoud be common across cassandra BAM nodes.

NOTE : If you have setup the BAM cassandra nodes with port-offset values,then you have to add additional two system properties to server startup as below,to connect all the nodes to one cassandra cluster.

-Dcassandra.rpc.port= default_port[9160]+offset
-Dcassandra.storage.port=defaut_port[7000]+offset

From above system properties,cassandra rpc and storage ports have been set to a common value with adding the offset value.Please note,the above defined same nodes need to be define in cassandra.yaml file.

The next major fact that we need to aware is, whether the cassandra cluster successfully created and nodes are joined successfully or not.
For that,we have used nodetool which is shipped with apache-cassandra 1.1.3.First downloaded apache-cassandra 1.1.3 from here,unzipped it and executed the below nodetool command.

 ./nodetool -u admin -pw admin -host -p ring

From above command,it will list down all the nodes connected with that cassandra cluster.
Additionally we need to change default users-store of each nodes to common users-store by configuring user-mgt.xml in {BAM_Home}/repository/conf location.
  • Setting BAM Data Analyzer Nodes
Next step is to configure BAM data analyzer nodes,which function as analyzing the row data stored in cassandra primary storage and put into a different secondary storage.For that BAM provided hive scripts support,such that,hive scripts will handle scheduling tasks into local inbuilt hadoop system in BAM nodes and process tasks with analyzing row data.
To collect anayzed statistics of ESB mediation data and AS service data,BAM itself has predefined hive-scripts written to do analytics jobs from hadoop.
These hive-scripts has been included to BAM binary pack in a deployable artifact type called BAM toolbox .
In our deployment,we have kept two BAM nodes as DA nodes in which one will act as read-write mode with ability to deploy toolbox artifacts,while other node is in read-only mode with disabling BAM toolbox deployment feature.Addition to that,we have used external zookeepr cluster setup to use with scheduling hadoop jobs in a high availability manner.The steps on how we did that can be found from the section "Configuring data analyzer cluster" described in 

Once you configured two nodes and zookeeper cluster,to check whether the DA nodes provide high availability,try first deploy the relevant analytic scripts as toolboxes and enable executing those analytic scripts to be run as scheduled tasks.Then down one DA node and check whether the schedule task is properly trigger with the second DA node,when the first DA node is down.

In this setup,we have not configured external hadoop cluster and used BAM inbuilt hadoop support.The reason for it is,the resources allocation we were had was less and the analyzing rate of data in our setup is not very frequent.Thus if one DA server down and if the analyzing of a row data entry failed at first time,when second task execution time,still that row data entry elligable to be analyzed and since analyzing of data not needed to be done frequently,we used BAM inbuilt hadoop support as it is.
The advantage you get from having an external Hadoop cluster is the possible performance increase. That is, basically, this affects the execution of a single Hive analytics operation. So if the Hive operation is an expensive operation, it's execution can be made faster if we had split operation among multiple Hadoop nodes, but in above setup, it will always execute in the local node. So if it need to scale the execution of individual jobs, it can be added an external Hadoop cluster and add nodes to it to make the operations ultimately finish executing earlier. If each individual Hive operations are not that large, and does not execute for a long period,then going with the internal Hadoop of each BAM node is fine.

Since these nodes will not use inbuilt cassandra support;
-- Start BAM nodes with stop starting cassandra bundled with server by giving below property with the server startup command.

sh wso2server.sh -Ddisable.cassandra.server.startup=true

  • Setting BAM Dashboard Node
This node is only to show the presentation dashboard with collected analyzed statistics.Such that you can edit the toolboxes to only include the presentation part and deploy in this node.Then modify the master-datasources.xml in {BAM_Home}/repository/conf/data-sources location to add the secondary relational database storage,which contains the analyzed statistics and same time which will be used to query to visualize from dashboard.
Since this node will not use inbuilt cassandra support;
-- Start BAM nodes with stop starting cassandra bundled with server by giving below property with the server startup command.

sh wso2server.sh -Ddisable.cassandra.server.startup=true


And disable data analyzing of the BAM node as well. For that remove the analyzer based features from that server from feature manager










Comments

Popular posts from this blog

Convert an InputStream to XML

For that we can use DocumentBuilder class in java. By using the method parse(InputStream) ; A new DOM Document object will return. InputStream input; DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder parser = factory.newDocumentBuilder(); Document dc= parser.parse(input); In the above code segment,by using the created Document object,the corresponding XML file for the inputStream can be accessed. References: http://www.w3schools.com/dom/dom_intro.asp http:// download.oracle.com/javase/1.4.2/docs/api/javax/xml/parsers/DocumentBuilder.html

CORS support from WSO2 API Manager 2.0.0

Cross-origin resource sharing (CORS) is a mechanism that allows restricted resources  on a web page to be requested from another domain outside the domain from which the first restricted resource was served. For example, an HTML page of a web application served from http://domain-a.com makes an <img src >  request for a different domain as 'domain-b.com' to get an image via an API request.  For security reasons, browsers restrict cross-origin HTTP requests initiated from within scripts as in above example and only allows to make HTTP requests to its own domain. To avoid this limitation modern browsers have been used CORS standard to allow cross domain requests. Modern browsers use CORS in an API container - such as  XMLHttpRequest  or Fetch - to mitigate risks of cross-origin HTTP requests.Thing to  note is it's not only sufficient that the browsers handle client side of cross-origin sharing,but also the servers from which these resources getting need to handl

[WSO2 AM] APIStore User Signup as an approval process

In previous versions of WSO2 APIManager before 1.6.0, it was allowed any user who's accessible the running APIStore come and register to the app.But there will be requirement like,without allowing any user to signup by him/her self alone,first get an approve by a privileged user and then allow to complete app registration.Same requirement can be apply to application creation and subscription creation as well.To fulfill that,we have introduced workflow extension support for  WSO2 APIManager  and you can find the introductory post on this feature from my previous blog post on " workflow-extentions-with-wso2-am-160 " . From this blog-post,I'll explain how to achieve simple workflow integration with default shipped resources with  WSO2 APIManager 1.6.0 and WSO2 Business Process Server 3.1.0 with targeting "user-signup" process. Steps First download the WSO2 APIManager 1.6.0[AM] binary pack from product download page . Extract it and navigate to