There Are No Ingest Nodes In This Cluster, Unable To Forward Request To An Ingest Node.

Posted on
  1. There Are No Ingest Nodes In This Cluster Unable To Forward Request To An Ingest Node

Jan 12, 2016  Rather than failing the request, when a node with node.ingest set to false receives an index or bulk request with a pipeline id, it should try to redirect the request to another node with node.ingest set to true. If there are no node with ingest set to true based on the current cluster state, an exception will be returned and the request will fail. The ingest-info,node info API can be used to figure out what processors are available in a cluster. The ingest-info,node info API will provide a per node list of what processors are available. Custom processors must be installed on all nodes. The put pipeline API will fail if a processor specified in a pipeline. With a heavy ingest load, it makes sense to use dedicated ingest nodes and to mark the master and data nodes as node.ingest: false. Tribe node A tribe node, configured via the tribe. settings, is a special type of coordinating only node that can connect to multiple clusters and perform search and other operations across all connected clusters.

Join GitHub today

GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.

Sign up New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

commented Feb 27, 2017
edited

We're parsing the Elasticsearch JSON error and try to produce an
error message that is as helpful as possible. The following cases
are detected:

  • A plugin providing a processor is missing. In case the plugin is one of
    ingest-geoip or ingest-user-agent, we can also suggest the command that
    installs them.
  • Elasticsearch < 5.0. We now detect this and tell the user that ES 5.0 is
    required by FBM.

A drawback of this approach is that if both the GeoIP and User-Agent plugins
are missing, only one will be reported. This might get solved by including the
user-agent one in ES, or by improving the error we get from ES, or by us querying
the node stats API

Note: this contains a change in the ES client, which makes it return the body
in case of errors. I think we need that part anyway, otherwise we often show
errors like 400 Bad request without any other details. I tried to do a minimal
change there, I hope I didn't introduce any changes in behaviour.

Improve error when ES ingest node plugins are not loaded

force-pushed the tsg:filebeat_modules_improve_error branch from 5085438 to 595dd48Feb 27, 2017

added reviewneeds_backportv5.3.0 and removed in progress labels Feb 27, 2017

approved these changes Feb 27, 2017

@@ -245,7 +247,7 @@ func (reg *ModuleRegistry) GetProspectorConfigs() ([]*common.Config, error) {
// PipelineLoader is a subset of the Elasticsearch client API capable of loading
// the pipelines.
typePipelineLoaderinterface {
LoadJSON(path string, json map[string]interface{}) error

Feb 27, 2017

Feb 27, 2017

I updated this to have the error as the last returned arg.

} `json:'error'`
}
err:= json.Unmarshal(body, &response)
if err != nil {

Feb 27, 2017

client.Connection.version could be used to check the ES version. Unfortunately the variable is currently not public.

Feb 27, 2017

Hmm, yeah, that would be an option. But we also don't have the client here and adding it would complicate unit testing, and we still need to do the error checking anyway, so I think it doesn't win us that much to add version checks.


// missing plugins?
iflen(response.Error.RootCause) > 0 &&
response.Error.RootCause[0].Type'parse_exception' &&

Feb 27, 2017

ES should (in the future) expose here a special root cause type so no check of the text is needed.

Feb 27, 2017

approved these changes Feb 28, 2017

reviewed Feb 28, 2017

if status >= 300 {
return status, nil, fmt.Errorf('%v', resp.Status)
retErr = fmt.Errorf('%v', resp.Status)
}

obj, err:= ioutil.ReadAll(resp.Body)
if err != nil {
return status, nil, err

Feb 28, 2017

Feb 28, 2017

Hmm, debatable I guess. I'm changing it to retErr because I think that's closer to the previous behavior. Thanks.

varresponse1xstruct {
Error string`json:'error'`
}
json.Unmarshal(body, &response1x)

Feb 28, 2017

json.Unmarshal might also fails, cause body is not json at all. the client was changed to always return the raw body. e.g. if error is in nginx proxying to ES, content might be a plain message.

Feb 28, 2017

I know, but in that case we fall back to the 'Could not load pipeline. Additionally, error decoding body:' message, which I think is what we want. Looking into making the code clearer..

plugins:=map[string]string{
'geoip': 'ingest-geoip',
'user_agent': 'ingest-user-agent',
}

Feb 28, 2017

Feb 28, 2017

Hmm, i don't know if that is future proof. There's no guarantees that other plugins will follow this pattern.

// missing plugins?
iflen(response.Error.RootCause) > 0 &&
response.Error.RootCause[0].Type'parse_exception' &&
strings.HasPrefix(response.Error.RootCause[0].Reason, 'No processor type exists with name') &&

Feb 28, 2017

instead of checking error message prefix, is there an error type report we can check?

Feb 28, 2017

The error type is parse_exception, so that's too generic.

@martijnvg are you adding a new error type as part of your PRs?

Feb 28, 2017

The PR doesn't change that. It failed during parsing hence the error type is parse_exception.

I'm open in changing this, but not sure what other existing error type to use. ES is very defensive in introducing new error types. The only existing general error type that comes up to me is resource_not_found. It is generic too, but maybe in this context better? (indicating that a processor type doesn't exist).

Feb 28, 2017

I think I'd prefer not changing it in that case, because we'd still have to leave this branch in the code for ES < 5.4, so it's probably not worth it.

return fmt.Errorf('The Ingest Node functionality seems to be missing from Elasticsearch. '+
'The Filebeat modules require Elasticsearch >= 5.0. '+
'This is the response I got from Elasticsearch: %s', body)
}

Feb 28, 2017

which error message will be returned in ingest node is disabled?

Feb 28, 2017

Just tried it out. There's no real way of disabling the ingest functionality, but what one can do is set node.ingest: false on all the ES nodes. In that case, the pipeline loading works as usual but the _bulk insert fails with:

I'd say improving the error handling in that case is beyond the scope of this PR.

Feb 28, 2017

ouch. Maybe we can probe ingest node availability via simulate API?

Feb 28, 2017

merged commit 96bb79b into elastic:masterMar 1, 2017

4 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

added a commit to tsg/beats that referenced this pull request Mar 1, 2017

referenced this pull request Mar 1, 2017

Merged

Cherry-pick to 5.3: Improve error when ES Ingest node plugins are not loaded #3703

added a commit that referenced this pull request Mar 1, 2017

added a commit to tsg/beats that referenced this pull request Mar 1, 2017

Show all missing plugins in the same err message

referenced this pull request Mar 1, 2017

Merged

Show all missing plugins in the same err message #3706

added a commit that referenced this pull request Mar 2, 2017

added a commit to tsg/beats that referenced this pull request Mar 2, 2017

referenced this pull request Mar 2, 2017

Merged

Cherry-pick to master: Show all missing plugins in the same err message #3711

added a commit that referenced this pull request Mar 3, 2017

Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.

Hello, cluster fans. In my previous blog, Part 1, I talked about how to work around the storage blocker in order to implement Windows Server Failover Cluster on Azure IAAS VM. Now let’s discuss another important part – Networking in Cluster on Azure.

Before that, you should know some basic concepts of Azure networking. Here are a few Azure terms we need use to setup the Cluster.

VIP (Virtual IP address): A public IP address belongs to the cloud service. It also serves as an Azure Load Balancer which tells how network traffic should be directed before being routed to the VM.

DIP (Dynamic IP address): An internal IP assigned by Microsoft Azure DHCP to the VM.

Internal Load Balancer: It is configured to port-forward or load-balance traffic inside a VNET or cloud service to different VMs.

Endpoint: It associates a VIP/DIP + port combination on a VM with a port on either the Azure Load Balancer for public-facing traffic or the Internal Load Balancer for traffic inside a VNET (or cloud service).

You can refer to this blog for more details about those terms for Azure network:

There Are No Ingest Nodes In This Cluster Unable To Forward Request To An Ingest Node

VIPs, DIPs and PIPs in Microsoft Azure
http://blogs.msdn.com/b/cloud_solution_architect/archive/2014/11/08/vips-dips-and-pips-in-microsoft-azure.aspx

OK, enough reading, Storage is ready and we know the basics of Azure network, can we start to building the Cluster? Yes!

Instead of using Failover Cluster Manager, the preferred method is to use the New-Cluster PowerShell cmdlet and specify a static IP during Cluster creation. When doing it this way, you can add all the nodes and use the proper IP Address from the get go and not have to use the extra steps through Failover Cluster Manager.

Take the above environment as example:

New-Cluster -Name DEMOCLUSTER -Node node1,node2 -StaticAddress 10.0.0.7

Request

Note:The Static IP Address that you appoint to the CNO is not for network communication. The only purpose is to bring the CNO online due to the dependency request. Therefore, you cannot ping that IP, cannot resolve DNS name, and cannot use the CNO for management since its IP is an unusable IP.

If for some reason you do not want to use PowerShell or you used Failover Cluster Manager instead, there are additional steps that you must take. The difference with FCM versus PowerShell is that you need create the Cluster with one node and add the other nodes as the next step. This is because the Cluster Name Object (CNO) cannot be online since it cannot acquire a unique IP Address from the Azure DHCP service. Instead, the IP Address assigned to the CNO is a duplicate address of node who owns CNO. That IP fails as a duplicate and can never be brought online. This eventually causes the Cluster to lose quorum because the nodes cannot properly connect to each other. To prevent the Cluster from losing quorum, you start with a one node Cluster. Let the CNO’s IP Address fail and then manually set up the IP address.

Example:

The CNO DEMOCLUSTER is offline because the IP Address it is dependent on is failed. 10.0.0.4 is the VM’s DIP, which is where the CNO’s IP duplicates from.

In order to fix this, we will need go into the properties of the IP Address resource and change the address to another address in the same subnet that is not currently in use, for example, 10.0.0.7.

To change the IP address, right mouse click on the resource, choose the Properties of the IP Address, and specify the new 10.0.0.7 address.

Once the address is changed, right mouse click on the Cluster Name resource and tell it to come online.

Now that these two resources are online, you can add more nodes to the Cluster.

Now you’ve successfully created a Cluster. Let’s add a highly available role inside it. For the demo purpose, I’ll use the File Server role as an example since this is the most common role that lot of us can understand.

Note:In a production environment, we do not recommend File Server Cluster in Azure because of cost and performance. Take this example as a proof of concept.

Different than Cluster on-premises, I recommend you to pause all other nodes and keep only one node up. This is to prevent the new File Server role from moving among the nodes since the file server’s VCO (Virtual Computer Object) will have a duplicated IP Address automatically assigned as the IP on the node who owns this VCO. This IP Address fails and causes the VCO not to come online on any node. This is a similar scenario as for CNO we just talked about previously.

Screenshots are more intuitive.

The VCO DEMOFS won’t come online because of the failed status of IP Address. This is expected because the dynamic IP address duplicates the IP of owner node.

Manually editing the IP to a static unused 10.0.0.8, in this example, now the whole resource group is online.

But remember, that IP Address is the same unusable IP address as the CNO’s IP. You can use it to bring the resource online but that is not a real IP for network communication. If this is a File Server, none of the VMs except the owner node of this VCO can access the File Share. The way Azure networking works is that it will loop the traffic back to the node it was originated from.

Show time starts. We need to utilize the Load Balancer in Azure so this IP Address is able to communicate with other machines in order to achieving the client-server traffic.

Load Balancer is an Azure IP resource that can route network traffic to different Azure VMs. The IP can be a public facing VIP, or internal only, like a DIP. Each VM needs have the endpoint(s) so the Load Balancer knows where the traffic should go. In the endpoint, there are two kinds of ports. The first is a Regular port and is used for normal client-server communications. For example, port 445 is for SMB file sharing, port 80 is HTTP, port 1433 is for MSSQL, etc. Another kind of port is a Probe port. The default port number for this is 59999. Probe port’s job is to find out which is the active node that hosts the VCO in the Cluster. Load Balancer sends the probe pings over TCP port 59999 to every node in the cluster, by default, every 10 seconds. When you configure a role in Cluster on an Azure VM, you need to know out what port(s) the application uses because you will need to add the port(s) to the endpoint. Then, you add the probe port to the same endpoint. After that, you need update the parameter of VCO’s IP address to have that probe port. Finally, Load Balancer will do the similar port forward task and route the traffic to the VM who owns the VCO. All the above settings need to be completed using PowerShell as the blog was written.

Note: At the time of this blog (written and posted), Microsoft only supports one resource group in cluster on Azure as an Active/Passive model only. This is because the VCO’s IP can only use the Cloud Service IP address (VIP) or the IP address of the Internal Load Balancer. This limitation is still in effect although Azure now supports the creation of multiple VIP addresses in a given Cloud Service.

Here is the diagram for Internal Load Balancer (ILB) in a Cluster which can explain the above theory better:

The application in this Cluster is a File Server. That’s why we have port 445 and the IP for VCO (10.0.0.8) the same as the ILB. There are three steps to configure this:

Step 1: Add the ILB to the Azure cloud service.

Run the following PowerShell commands on your on-premises machine which can manage your Azure subscription.

# Define variables.

$ServiceName = 'demovm1-3va468p3' # the name of the cloud service that contains the VM nodes. Your cloud service name is unique. Use Azure portal to find out service name or use get-azurevm.

$ILBName = 'DEMOILB' # newly chosen name for the new ILB

$SubnetName = 'Subnet-1' # subnet name that the VMs use in the VNet

$ILBStaticIP = '10.0.0.8' # static IP address for the ILB in the subnet

# Add Azure ILB using the above variables.

Add-AzureInternalLoadBalancer -InternalLoadBalancerName $ILBName -SubnetName $SubnetName -ServiceName $ServiceName -StaticVNetIPAddress $ILBStaticIP

# Check the settings.

Get-AzureInternalLoadBalancer –servicename $ServiceName

Step 2: Configure the load balanced endpoint for each node using ILB.

Run the following PowerShell commands on your on-premises machine which can manage your Azure subscription.

# Define variables.

$VMNodes = 'DEMOVM1', “DEMOVM2' # cluster nodes’ names, separated by commas. Your nodes’ names will be different.

$EndpointName = 'SMB' # newly chosen name of the endpoint

$EndpointPort = '445' # public port to use for the endpoint for SMB file sharing. If the cluster is used for other purpose, i.e., HTTP, the port number needs change to 80.

# Add endpoint with port 445 and probe port 59999 to each node. It will take a few minutes to complete. Please pay attention to ProbeIntervalInSeconds parameter. This tells how often the probe port detects which node is active.

ForEach ($node in $VMNodes)

{

Get-AzureVM -ServiceName $ServiceName -Name $node Add-AzureEndpoint -Name $EndpointName -LBSetName '$EndpointName-LB' -Protocol tcp -LocalPort $EndpointPort -PublicPort $EndpointPort -ProbePort 59999 -ProbeProtocol tcp -ProbeIntervalInSeconds 10 -InternalLoadBalancerName $ILBName -DirectServerReturn $true Update-AzureVM

}

# Check the settings.

ForEach ($node in $VMNodes)

{

Get-AzureVM –ServiceName $ServiceName –Name $node Get-AzureEndpoint where-object {$_.name -eq 'smb'}

}

Step 3: Update the parameters of VCO’s IP address with Probe Port.

This

Run the following PowerShell commands inside one of the cluster nodes if you are using Windows Server 2008 R2.

# Define variables

$ClusterNetworkName = 'Cluster Network 1'# the cluster network name (Use Get-ClusterNetwork or GUI to find the name)

$IPResourceName = “IP Address 10.0.0.0'# the IP Address resource name (Use get-clusterresource where-object {$_.resourcetype -eq 'IP Address'} or GUI to find the name)

$ILBIP = “10.0.0.8”# the IP Address of the Internal Load Balancer (ILB)

# Update cluster resource parameters of VCO’s IP address to work with ILB.

cluster res $IPResourceName /priv enabledhcp=0 overrideaddressmatch=1 address=$ILBIP probeport=59999 subnetmask=255.255.255.255

Run the following PowerShell commands inside one of the cluster nodes if you are using Windows Server 2012/2012 R2.

# Define variables

$ClusterNetworkName = 'Cluster Network 1' # the cluster network name (Use Get-ClusterNetwork or GUI to find the name)

$IPResourceName = “IP Address 10.0.0.0' # the IP Address resource name (Use get-clusterresource where-object {$_.resourcetype -eq 'IP Address'} or GUI to find the name)

$ILBIP = “10.0.0.8” # the IP Address of the Internal Load Balancer (ILB)


$params = @{'Address'='$ILBIP';
'ProbePort'='59999';
'SubnetMask'='255.255.255.255';
'Network'='$ClusterNetworkName';
'OverrideAddressMatch'=1;
'EnableDhcp'=0}

# Update cluster resource parameters of VCO’s IP address to work with ILB

Get-ClusterResource $IPResourceName Set-ClusterParameter -Multiple $params

You should see this window:

Take the IP Address resource offline and bring it online again. Start the clustered role.

Now you have an Internal Load Balancer working with the VCO’s IP. One last task you need do is with the Windows Firewall. You need to at least open port 59999 on all nodes for probe port detection; or turn the firewall off. Then you should be all set. It may take about 10 seconds to establish the connection to the VCO the first time or after you failover the resource group to another node because of the ProbeIntervalInSeconds we set up previously.

In this example, the VCO has an Internal IP of 10.0.0.8. If you want to make your VCO public-facing, you can use the Cloud Service’s IP Address (VIP). The steps are similar and easier because you can skip Step 1 since this VIP is already an Azure Load Balancer. You just need to add the endpoint with a regular port plus the probe port to each VM (Step 2). Then update the VCO’s IP in the Cluster (Step 3). Please be aware, your Clustered resource group will be exposed to the Internet since the VCO has a public IP. You may want to protect it by planning enhanced security methods.

Great! Now you’ve completed all the steps of building a Windows Server Failover Cluster on an Azure IAAS VM. It is a bit longer journey; however, you’ll find it useful and worthwhile. Please leave me comments if you have question.

Happy Clustering!

Mario Liu
Support Escalation Engineer
CSS Americas WINDOWS HIGH AVAILABILITY