tosca message

Subject: [OASIS Issue Tracker] (TOSCA-189) Application Monitoring - Proposal
From: OASIS Issues Tracker <workgroup_mailer@lists.oasis-open.org>
To: tosca@lists.oasis-open.org
Date: Mon, 29 Sep 2014 14:28:34 +0000 (UTC)
     [ https://issues.oasis-open.org/browse/TOSCA-189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tomer Shtilman updated TOSCA-189:
---------------------------------

    Description: 

When we consider monitoring performance of a Cloud, we can broadly classify it to 2 categories
1. Infrastructure/Hardware Monitoring -  This involves performance of the various infrastructure components in the cloud like Virtual Machines, Storage, Network etc. 
E.g
• CPU usage; total – all CPUs, per CPU, and delta between
CPUs
• Disk usage; total, free, used
• Disk Latency
• Percentage Busy
• Percentage Ready
• Memory; percentage used, swap activity
• Network; bytes in/out
...
2.Application monitoring - In Calculating Application Performance we cannot go by the resources utilized by the application as in a cloud, applications move around and so the monitoring solution needs to track and map them.

E.g Application Response Time - key metric in Application Performance management which actually calculates the time taken for the application to respond to user requests. 

So just like we can detect deviations in application hardware performance we would like to so the same for application  : KPIs,response times, request statuses , or order throughput in order to allow us to be proactive with the business impact .

With this application monitoring we can:
- Understand the real-time performance of the cloud services from the end user’s perspective.
- Gain visibility into your workload, even when you do not control the backing infrastructure.
- Isolate problems and drill down to the root cause to immediately take action.
- Define thresholds and create alerts

We believe that TOSCA should recommends a monitoring service spec to be optionally implemented by TOSCA containers and provide a set of monitoring capabilities on the application workloads. 
This is a crucial and basic capability of any application lifecycle management orchestrator.
The idea is to simple allow the app developer to express in its service template the desired app KPIs to be collected and doing some dynamic reactions upon certain KPI’s threshold crossing.

The monitoring engine applies the Sample Metric collection on the exposed software component endpoint interface

In the example below you can see a simple db (software component) hosted on a compute, there is Sample Metric collected every minute on this software component, in addition there is an hourly aggregation based on this minutely sampling.


tosca_definitions_version: tosca_simple_yaml_1_0

monitoring_types:
# Metric base type
  tosca.monitoring.Metric:
      derived_from: tosca.nodes.Root
      description: The basic metric type all other TOSCA metric types derive from
      properties:
          polling_schedule:
                type: string
          return_type:
                type: string
          metric_unit:
                type: string
          aggregation_method:
                type: string
                constraints:
                    - valid_values: [SUM, AVG, MIN, MAX, COUNT]
  
# A single metric sample
  tosca.monitoring.MetricSample:
      derived_from: tosca.monitoring.Metric
      description: A single metric sample,applicatio KPI, like CPU, MEMORY, etc.
      properties:
          node_state:
               type: string
               constraints:
                    - valid_values: [RUNNING, CREATING, STARTING, TERMINATING, ..]
      requirements: 
                  #a sample metric requires an endpoint 
           - endpoint: tosca.capabilities.Endpoint 

#An aggregated metric
  tosca.monitoring.AggregatedMetric:
        derived_from: tosca.monitoring.Metric
        description: An aggregated metric
        properties:
           # The time window in millis for aggregating the metric
           msec_window:
                type: integer
                constraints:
                   - greater_than: 0
        requirements:
           - basedonmetric: tosca.monitoring.Metric


relationship_types:
# a relationship between sample and endpoint
   tosca.relationships.monitoring.EndPoint:
       short_name: endpoint
       derived_from: tosca.relationships.Root
       valid_targets: [ tosca.capabilities.Endpoint ]

#this is a relationship to enforce that aggregated metric is based on other sample/aggregate metric   
    tosca.relationships.monitoring.BasedOnMetric:
      short_name: basedonmetric
      derived_from: tosca.relationships.DependsOn
      valid_targets: [ alu.capabilities.Monitorable.MetricSample,alu.capabilities.Monitorable.AggregatedMetric ]

                  
                  
node_templates:
server1:
    type: alu.nodes.Compute
    properties:
    ...
    interfaces:
      tosca.interfaces.node.lifecycle.Standard:
        create: scripts/create.sh

  oracle_db:
    type: tosca.nodes.SoftwareComponent
    requirements:
      - host: server1
     capabilities:
         monitoring_endpoint:
            type: tosca.capabilities.Endpoint
            properties:
               protocol: http
               ...

monitoring_templates:
#single sample connects to the monitoring endpoint
oracle_connections_per_minute_sampled:
    type: tosca.monitoring.MetricSample
    properties:
      polling_schedule: 0 0/1 * 1/1 * ? *
      return_type: integer
                  # Defines the aggregation that is done over the instances of the tier
      aggregation_method: SUM
    #sampling (collecting the metric) is done through the endponint
     requirements:
        endpoint:       #based on proposal TOSCA-188
        target: oracle_db.monitoring_endpoint
        relationship: tosca.relationships.monitoring.EndPoint

#aggregation over the sample, polled hourly
oracle_connections_per_hour_aggregated:
    type: tosca.monitoring.AggregatedMetric
    properties:
       polling_schedule: 0 0 0/1 1/1 * ? *
       return_type: integer
                   #  Defines the aggregation that is done for the metric over time
       aggregation_method: AVG
       msec_window: 3600000
    requirements:
        basedonmetric:  #based on proposal TOSCA-188
           target: oracle_connections_per_minute_sampled
           relationship: tosca.relationships.monitoring.BasedOnMetric
                                  

       Proposal:   (was: 
When we consider monitoring performance of a Cloud, we can broadly classify it to 2 categories
1. Infrastructure/Hardware Monitoring -  This involves performance of the various infrastructure components in the cloud like Virtual Machines, Storage, Network etc. 
E.g
• CPU usage; total – all CPUs, per CPU, and delta between
CPUs
• Disk usage; total, free, used
• Disk Latency
• Percentage Busy
• Percentage Ready
• Memory; percentage used, swap activity
• Network; bytes in/out
...
2.Application monitoring - In Calculating Application Performance we cannot go by the resources utilized by the application as in a cloud, applications move around and so the monitoring solution needs to track and map them.

E.g Application Response Time - key metric in Application Performance management which actually calculates the time taken for the application to respond to user requests. 

So just like we can detect deviations in application hardware performance we would like to so the same for application  : KPIs,response times, request statuses , or order throughput in order to allow us to be proactive with the business impact .

With this application monitoring we can:
- Understand the real-time performance of the cloud services from the end user’s perspective.
- Gain visibility into your workload, even when you do not control the backing infrastructure.
- Isolate problems and drill down to the root cause to immediately take action.
- Define thresholds and create alerts

We believe that TOSCA should recommends a monitoring service spec to be optionally implemented by TOSCA containers and provide a set of monitoring capabilities on the application workloads. 
This is a crucial and basic capability of any application lifecycle management orchestrator.
The idea is to simple allow the app developer to express in its service template the desired app KPIs to be collected and doing some dynamic reactions upon certain KPI’s threshold crossing.

The monitoring engine applies the Sample Metric collection on the exposed software component endpoint interface

In the example below you can see a simple db (software component) hosted on a compute, there is Sample Metric collected every minute on this software component, in addition there is an hourly aggregation based on this minutely sampling.


tosca_definitions_version: tosca_simple_yaml_1_0

monitoring_types:
# Metric base type
  tosca.monitoring.Metric:
      derived_from: tosca.nodes.Root
      description: The basic metric type all other TOSCA metric types derive from
      properties:
          polling_schedule:
                type: string
          return_type:
                type: string
          metric_unit:
                type: string
          aggregation_method:
                type: string
                constraints:
                    - valid_values: [SUM, AVG, MIN, MAX, COUNT]
  
# A single metric sample
  tosca.monitoring.MetricSample:
      derived_from: tosca.monitoring.Metric
      description: A single metric sample,applicatio KPI, like CPU, MEMORY, etc.
      properties:
          node_state:
               type: string
               constraints:
                    - valid_values: [RUNNING, CREATING, STARTING, TERMINATING, ..]
      requirements: 
                  #a sample metric requires an endpoint 
           - endpoint: tosca.capabilities.Endpoint 

#An aggregated metric
  tosca.monitoring.AggregatedMetric:
        derived_from: tosca.monitoring.Metric
        description: An aggregated metric
        properties:
           # The time window in millis for aggregating the metric
           msec_window:
                type: integer
                constraints:
                   - greater_than: 0
        requirements:
           - basedonmetric: tosca.monitoring.Metric


relationship_types:
# a relationship between sample and endpoint
   tosca.relationships.monitoring.EndPoint:
       short_name: endpoint
       derived_from: tosca.relationships.Root
       valid_targets: [ tosca.capabilities.Endpoint ]

#this is a relationship to enforce that aggregated metric is based on other sample/aggregate metric   
    tosca.relationships.monitoring.BasedOnMetric:
      short_name: basedonmetric
      derived_from: tosca.relationships.DependsOn
      valid_targets: [ alu.capabilities.Monitorable.MetricSample,alu.capabilities.Monitorable.AggregatedMetric ]

                  
                  
node_templates:
server1:
    type: alu.nodes.Compute
    properties:
    ...
    interfaces:
      tosca.interfaces.node.lifecycle.Standard:
        create: scripts/create.sh

  oracle_db:
    type: tosca.nodes.SoftwareComponent
    requirements:
      - host: server1
     capabilities:
         monitoring_endpoint:
            type: tosca.capabilities.Endpoint
            properties:
               protocol: http
               ...

monitoring_templates:
#single sample connects to the monitoring endpoint
oracle_connections_per_minute_sampled:
    type: tosca.monitoring.MetricSample
    properties:
      polling_schedule: 0 0/1 * 1/1 * ? *
      return_type: integer
                  # Defines the aggregation that is done over the instances of the tier
      aggregation_method: SUM
    #sampling (collecting the metric) is done through the endponint
     requirements:
        endpoint:       #based on proposal TOSCA-188
        target: oracle_db.monitoring_endpoint
        relationship: tosca.relationships.monitoring.EndPoint

#aggregation over the sample, polled hourly
oracle_connections_per_hour_aggregated:
    type: tosca.monitoring.AggregatedMetric
    properties:
       polling_schedule: 0 0 0/1 1/1 * ? *
       return_type: integer
                   #  Defines the aggregation that is done for the metric over time
       aggregation_method: AVG
       msec_window: 3600000
    requirements:
        basedonmetric:  #based on proposal TOSCA-188
           target: oracle_connections_per_minute_sampled
           relationship: tosca.relationships.monitoring.BasedOnMetric
                                  
)

> Application Monitoring - Proposal
> ---------------------------------
>
>                 Key: TOSCA-189
>                 URL: https://issues.oasis-open.org/browse/TOSCA-189
>             Project: OASIS Topology and Orchestration Specification for Cloud Applications (TOSCA) TC
>          Issue Type: New Feature
>          Components: Profile-YAML
>            Reporter: Tomer Shtilman
>
> When we consider monitoring performance of a Cloud, we can broadly classify it to 2 categories
> 1. Infrastructure/Hardware Monitoring -  This involves performance of the various infrastructure components in the cloud like Virtual Machines, Storage, Network etc. 
> E.g
> • CPU usage; total – all CPUs, per CPU, and delta between
> CPUs
> • Disk usage; total, free, used
> • Disk Latency
> • Percentage Busy
> • Percentage Ready
> • Memory; percentage used, swap activity
> • Network; bytes in/out
> ...
> 2.Application monitoring - In Calculating Application Performance we cannot go by the resources utilized by the application as in a cloud, applications move around and so the monitoring solution needs to track and map them.
> E.g Application Response Time - key metric in Application Performance management which actually calculates the time taken for the application to respond to user requests. 
> So just like we can detect deviations in application hardware performance we would like to so the same for application  : KPIs,response times, request statuses , or order throughput in order to allow us to be proactive with the business impact .
> With this application monitoring we can:
> - Understand the real-time performance of the cloud services from the end user’s perspective.
> - Gain visibility into your workload, even when you do not control the backing infrastructure.
> - Isolate problems and drill down to the root cause to immediately take action.
> - Define thresholds and create alerts
> We believe that TOSCA should recommends a monitoring service spec to be optionally implemented by TOSCA containers and provide a set of monitoring capabilities on the application workloads. 
> This is a crucial and basic capability of any application lifecycle management orchestrator.
> The idea is to simple allow the app developer to express in its service template the desired app KPIs to be collected and doing some dynamic reactions upon certain KPI’s threshold crossing.
> The monitoring engine applies the Sample Metric collection on the exposed software component endpoint interface
> In the example below you can see a simple db (software component) hosted on a compute, there is Sample Metric collected every minute on this software component, in addition there is an hourly aggregation based on this minutely sampling.
> tosca_definitions_version: tosca_simple_yaml_1_0
> monitoring_types:
> # Metric base type
>   tosca.monitoring.Metric:
>       derived_from: tosca.nodes.Root
>       description: The basic metric type all other TOSCA metric types derive from
>       properties:
>           polling_schedule:
>                 type: string
>           return_type:
>                 type: string
>           metric_unit:
>                 type: string
>           aggregation_method:
>                 type: string
>                 constraints:
>                     - valid_values: [SUM, AVG, MIN, MAX, COUNT]
>   
> # A single metric sample
>   tosca.monitoring.MetricSample:
>       derived_from: tosca.monitoring.Metric
>       description: A single metric sample,applicatio KPI, like CPU, MEMORY, etc.
>       properties:
>           node_state:
>                type: string
>                constraints:
>                     - valid_values: [RUNNING, CREATING, STARTING, TERMINATING, ..]
>       requirements: 
>                   #a sample metric requires an endpoint 
>            - endpoint: tosca.capabilities.Endpoint 
> #An aggregated metric
>   tosca.monitoring.AggregatedMetric:
>         derived_from: tosca.monitoring.Metric
>         description: An aggregated metric
>         properties:
>            # The time window in millis for aggregating the metric
>            msec_window:
>                 type: integer
>                 constraints:
>                    - greater_than: 0
>         requirements:
>            - basedonmetric: tosca.monitoring.Metric
> relationship_types:
> # a relationship between sample and endpoint
>    tosca.relationships.monitoring.EndPoint:
>        short_name: endpoint
>        derived_from: tosca.relationships.Root
>        valid_targets: [ tosca.capabilities.Endpoint ]
> #this is a relationship to enforce that aggregated metric is based on other sample/aggregate metric   
>     tosca.relationships.monitoring.BasedOnMetric:
>       short_name: basedonmetric
>       derived_from: tosca.relationships.DependsOn
>       valid_targets: [ alu.capabilities.Monitorable.MetricSample,alu.capabilities.Monitorable.AggregatedMetric ]
>                   
>                   
> node_templates:
> server1:
>     type: alu.nodes.Compute
>     properties:
>     ...
>     interfaces:
>       tosca.interfaces.node.lifecycle.Standard:
>         create: scripts/create.sh
>   oracle_db:
>     type: tosca.nodes.SoftwareComponent
>     requirements:
>       - host: server1
>      capabilities:
>          monitoring_endpoint:
>             type: tosca.capabilities.Endpoint
>             properties:
>                protocol: http
>                ...
> monitoring_templates:
> #single sample connects to the monitoring endpoint
> oracle_connections_per_minute_sampled:
>     type: tosca.monitoring.MetricSample
>     properties:
>       polling_schedule: 0 0/1 * 1/1 * ? *
>       return_type: integer
>                   # Defines the aggregation that is done over the instances of the tier
>       aggregation_method: SUM
>     #sampling (collecting the metric) is done through the endponint
>      requirements:
>         endpoint:       #based on proposal TOSCA-188
>         target: oracle_db.monitoring_endpoint
>         relationship: tosca.relationships.monitoring.EndPoint
> #aggregation over the sample, polled hourly
> oracle_connections_per_hour_aggregated:
>     type: tosca.monitoring.AggregatedMetric
>     properties:
>        polling_schedule: 0 0 0/1 1/1 * ? *
>        return_type: integer
>                    #  Defines the aggregation that is done for the metric over time
>        aggregation_method: AVG
>        msec_window: 3600000
>     requirements:
>         basedonmetric:  #based on proposal TOSCA-188
>            target: oracle_connections_per_minute_sampled
>            relationship: tosca.relationships.monitoring.BasedOnMetric
>                                   



--
This message was sent by Atlassian JIRA
(v6.2.2#6258)