标签:
HBase is a low-latency NoSQL database that allows online transactional processing of big data. HBase is offered as a managed cluster integrated into the Azure environment. The clusters are configured to store data directly in Azure Blob storage, which provides low latency and increased elasticity in performance/cost choices. This enables customers to build interactive websites that work with large datasets, to build services that store sensor and telemetry data from millions of end points, and to analyze this data with Hadoop jobs. For more information on HBase and the scenarios it can be used for, see HDInsight HBase overview.
HBase (version 0.98.0) is only available for use with HDInsight 3.1 clusters on HDInsight (based on Apache Hadoop and YARN 2.4.0). For version information, see What‘s new in the Hadoop cluster versions provided by HDInsight?
Before you begin this tutorial, you must have the following:
This section describes how to provision an HBase cluster using the Azure Management portal.
The steps in this article create an HDInsight cluster using basic configuration settings. For information on other cluster configuration settings, such as using Azure Virtual Network or a metastore for Hive and Oozie, see Provision an HDInsight cluster.
To provision an HDInsight cluster in the Azure Management portal
Enter CLUSTER NAME, CLUSTER SIZE, CLUSTER USER PASSWORD, and STORAGE ACCOUNT.
Click on the check icon on the lower left to create the HBase cluster.
This section describes how to enable and use the Remote Desktop Protocol (RDP) to access the HBase shell and then use it to create an HBase sample table, add rows, and then list the rows in the table.
It assumes you have completed the procedure outlined in the first section, and so have already successfully created an HBase cluster.
To enable the RDP connection to the HBase cluster
To open the HBase Shell
Within your RDP session, click on the Hadoop Command Line shortcut located on the desktop.
Change the folder to the HBase home directory:
cd %HBASE_HOME%\bin
Open the HBase shell:
hbase shell
To create a sample table, add data and retrieve the data
Create a sample table:
create ‘sampletable‘, ‘cf1‘
Add a row to the sample table:
put ‘sampletable‘, ‘row1‘, ‘cf1:col1‘, ‘value1‘
List the rows in the sample table:
scan ‘sampletable‘
Check cluster status in the HBase WebUI
HBase also ships with a WebUI that helps monitoring your cluster, for example by providing request statistics or information about regions. On the HBase cluster you can find the WebUI under the address of the zookeepernode.
http://zookeepernode:60010/master-status
In a HighAvailability (HA) cluster, you will find a link to the current active HBase master node hosting the WebUI.
Bulk load a sample table
Create samplefile1.txt containing the following data, and upload to Azure Blob Storage to /tmp/samplefile1.txt:
row1 c1 c2
row2 c1 c2
row3 c1 c2
row4 c1 c2
row5 c1 c2
row6 c1 c2
row7 c1 c2
row8 c1 c2
row9 c1 c2
row10 c1 c2
Change the folder to the HBase home directory:
cd %HBASE_HOME%\bin
Execute ImportTsv:
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.columns="HBASE_ROW_KEY,a:b,a:c" -Dimporttsv.bulk.output=/tmpOutput sampletable2 /tmp/samplefile1.txt
Load the output from prior command into HBase:
hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /tmpOutput sampletable2
Now you have an HBase cluster provisioned and have created an HBase table, you can query it using Hive. This section creates a Hive table that maps to the HBase table and uses it to queries the data in your HBase table.
To open cluster dashboard
Click Hive Editor from the top. The Hive Editor looks like :
To run Hive queries
Enter the HiveQL script below into Hive Editor and click SUBMIT to create an Hive Table mapping to the HBase table. Make sure that you have created the sampletable table referenced here in HBase using the HBase Shell before executing this statement.
CREATE EXTERNAL TABLE hbasesampletable(rowkey STRING, col1 STRING, col2 STRING)
STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler‘
WITH SERDEPROPERTIES (‘hbase.columns.mapping‘ = ‘:key,cf1:col1,cf1:col2‘)
TBLPROPERTIES (‘hbase.table.name‘ = ‘sampletable‘);
Wait until the Status is updated to Completed.
Enter the HiveQL script below into Hive Editor, and then click SUBMIT button. The Hive query queries the data in the HBase table:
SELECT count(*) FROM hbasesampletable;
To retrieve the results of the Hive query, click on the View Details link in the Job Session window when the job finishes executing. The Job Output shall be 1 because you only put one record into the HBase table.
To browse the output file
Click stdout. Save the file and open the file with Notepad. The output shall be 1.
The Microsoft HBase REST Client Library for .NET project must be downloaded from GitHub and the project built to use the HBase .NET SDK. The following procedure includes the instructions for this task.
Run the following NuGet command in the console:
Install-Package Microsoft.HBase.Client
Add the following using statements on the top of the file:
using Microsoft.HBase.Client;
using org.apache.hadoop.hbase.rest.protobuf.generated;
Replace the Main function with the following:
static void Main(string[] args)
{
string clusterURL = "https://<yourHBaseClusterName>.azurehdinsight.net";
string hadoopUsername= "<yourHadoopUsername>";
string hadoopUserPassword = "<yourHadoopUserPassword>";
string hbaseTableName = "sampleHbaseTable";
// Create a new instance of an HBase client.
ClusterCredentials creds = new ClusterCredentials(new Uri(clusterURL), hadoopUsername, hadoopUserPassword);
HBaseClient hbaseClient = new HBaseClient(creds);
// Retrieve the cluster version
var version = hbaseClient.GetVersion();
Console.WriteLine("The HBase cluster version is " + version);
// Create a new HBase table.
TableSchema testTableSchema = new TableSchema();
testTableSchema.name = hbaseTableName;
testTableSchema.columns.Add(new ColumnSchema() { name = "d" });
testTableSchema.columns.Add(new ColumnSchema() { name = "f" });
hbaseClient.CreateTable(testTableSchema);
// Insert data into the HBase table.
string testKey = "content";
string testValue = "the force is strong in this column";
CellSet cellSet = new CellSet();
CellSet.Row cellSetRow = new CellSet.Row { key = Encoding.UTF8.GetBytes(testKey) };
cellSet.rows.Add(cellSetRow);
Cell value = new Cell { column = Encoding.UTF8.GetBytes("d:starwars"), data = Encoding.UTF8.GetBytes(testValue) };
cellSetRow.values.Add(value);
hbaseClient.StoreCells(hbaseTableName, cellSet);
// Retrieve a cell by its key.
cellSet = hbaseClient.GetCells(hbaseTableName, testKey);
Console.WriteLine("The data with the key ‘" + testKey + "‘ is: " + Encoding.UTF8.GetString(cellSet.rows[0].values[0].data));
// with the previous insert, it should yield: "the force is strong in this column"
//Scan over rows in a table. Assume the table has integer keys and you want data between keys 25 and 35.
Scanner scanSettings = new Scanner()
{
batch = 10,
startRow = BitConverter.GetBytes(25),
endRow = BitConverter.GetBytes(35)
};
ScannerInformation scannerInfo = hbaseClient.CreateScanner(hbaseTableName, scanSettings);
CellSet next = null;
Console.WriteLine("Scan results");
while ((next = hbaseClient.ScannerGetNext(scannerInfo)) != null)
{
foreach (CellSet.Row row in next.rows)
{
Console.WriteLine(row.key + " : " + Encoding.UTF8.GetString(row.values[0].data));
}
}
Console.WriteLine("Press ENTER to continue ...");
Console.ReadLine();
}
Set the first three variables in the Main function.
Press F5 to run the application.
In this tutorial, you have learned how to provision an HBase cluster, how to create tables, and and view the data in those tables from the HBase shell. You also learned how use Hive to query the data in HBase tables and how to use the HBase C# APIs to create an HBase table and retrieve data from the table.
To learn more, see:
标签:
原文地址:http://www.cnblogs.com/pangguoming/p/4231496.html