In a single machine deployment, one instance of Splunk handles the entire end-to-end process, from data input through indexing to search. A single-machine deployment can be useful for testing and evaluation purposes and might serve the needs of department-sized environments. For larger environments, where data originates on many machines and where many users need to search the dtae, you'll want to distribute functionality across multiple instances of Splunk.
How Splunk Scales
Splunk performs three key functions as it moves data through the data pipeline. * First, Splunk consumes data from files, the network, and elsewhere. * It then indexes the data (Actually, it first parses and then indexes the data, but for purposes of this, we consider parsing to be part of the indexing process) * Finally, it runs interactive or scheduled searches on the indexed data.
This functionality can be split across multiple specialized instances of Splunk, ranging in number from just a few to thousands, depending on the quantity of data you’re dealing with and other variables in your environment. You might for example, create a deployment with many Splunk instances that only consume data, several other instances that index the data, and one or more instances that handle search requests. The specialized instances of Splunk are known collectively as components. There are several types of components.
For a typical mid-size deployment, for example, you can deploy lightweight versions of Splunk, called forwarders, on the machines where the data originates. The forwarders consume data locally, and then forward it across the network to another Splunk component, called the indexer. The indexer does the heavy lifting; it indexes the data and runs searches. It should reside on a machine by itself.
The forwarders on the other hand, can easily coexist on the machines generating the data, because the data-consuming function has minimal impact on machine