By Edward Capriolo, Dean Wampler
Need to maneuver a relational database program to Hadoop? This finished advisor introduces you to Apache Hive, Hadoop’s facts warehouse infrastructure. You’ll fast the best way to use Hive’s SQL dialect—HiveQL—to summarize, question, and study huge datasets kept in Hadoop’s disbursed filesystem.
This example-driven consultant exhibits you the way to establish and configure Hive on your atmosphere, offers an in depth assessment of Hadoop and MapReduce, and demonstrates how Hive works in the Hadoop atmosphere. You’ll additionally locate real-world case stories that describe how businesses have used Hive to unravel detailed difficulties related to petabytes of data.
- Use Hive to create, adjust, and drop databases, tables, perspectives, capabilities, and indexes
- Customize information codecs and garage suggestions, from records to exterior databases
- Load and extract information from tables—and use queries, grouping, filtering, becoming a member of, and different traditional question methods
- Gain most sensible practices for growing person outlined features (UDFs)
- Learn Hive styles you can use and anti-patterns you want to avoid
- Integrate Hive with different information processing programs
- Use garage handlers for NoSQL databases and different datastores
- Learn the professionals and cons of working Hive on Amazon’s Elastic MapReduce
Read Online or Download Programming Hive PDF
Similar Computers books
The Guru's Guide to Transact-SQL
On the grounds that its advent over a decade in the past, the Microsoft SQL Server question language, Transact-SQL, has turn into more and more well known and extra strong. the present model activities such complicated gains as OLE Automation aid, cross-platform querying amenities, and full-text seek administration. This booklet is the consummate consultant to Microsoft Transact-SQL.
Good Faith Collaboration: The Culture of Wikipedia (History and Foundations of Information Science)
Wikipedia, the web encyclopedia, is outfitted through a community--a group of Wikipedians who're anticipated to "assume stable religion" while interacting with each other. In sturdy religion Collaboration, Joseph Reagle examines this exact collaborative tradition. Wikipedia, says Reagle, isn't the first attempt to create a freely shared, common encyclopedia; its early twentieth-century ancestors comprise Paul Otlet's common Repository and H.
Information Architecture: Blueprints for the Web (2nd Edition) (Voices That Matter)
Details structure: Blueprints for the internet, moment version introduces the center innovations of knowledge structure: organizing site content material in order that it may be stumbled on, designing web site interplay in order that it's friendly to exploit, and growing an interface that's effortless to appreciate. This booklet is helping designers, undertaking managers, programmers, and different info structure practitioners stay away from high priced errors via instructing the talents of knowledge structure rapidly and obviously.
Your Life, Uploaded: The Digital Way to Better Memory, Health, and Productivity
"A outstanding task of exploring first hand the consequences of storing our complete lives digitally. " -Guy L. Tribble, Apple, Inc. Tech luminary, Gordon Bell, and Jim Gemmell unveil a advisor to the following electronic revolution. Our lifestyle begun changing into electronic a decade in the past. Now a lot of what we do is digitally recorded and available.
Additional info for Programming Hive
Demystifying CREATE desk Statements in the course of the booklet we've proven examples of making tables. you've gotten spotted that CREATE desk has numerous syntax. Examples of this syntax are kept AS SEQUENCEFILE, ROW structure DELIMITED , SERDE, INPUTFORMAT, OUTPUTFORMAT. This bankruptcy will hide a lot of this syntax and provides examples, yet as a preface observe that a few syntax is sugar for different syntax, that's, syntax used to make ideas more straightforward (sweeter) to appreciate. for instance, specifying kept AS SEQUENCEFILE is an alternative choice to specifying an INPUTFORMAT of org. apache. hadoop. mapred. SequenceFileInputFormat and an OUTPUTFORMAT of org. apache. hadoop. hive. ql. io. HiveSequenceFileOutputFormat. Let’s create a few tables and use DESCRIBE desk prolonged to peel away the sugar and reveal the internals. First, we'll create after which describe an easy desk (we have formatted the output right here, as Hive differently shouldn't have indented the output): hive> create desk textual content (x int) ; hive> describe prolonged textual content; okay x int specific desk details Table(tableName:text, dbName:default, owner:edward, createTime:1337814583, lastAccessTime:0, retention:0, sd:StorageDescriptor( cols:[FieldSchema(name:x, type:int, comment:null)], location:file:/user/hive/warehouse/text, inputFormat:org. apache. hadoop. mapred. TextInputFormat, outputFormat:org. apache. hadoop. hive. ql. io. HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo( name:null, serializationLib:org. apache. hadoop. hive. serde2. lazy. LazySimpleSerDe, parameters:{serialization. format=1} ), bucketCols:[], sortCols:[], parameters:{}), partitionKeys:[], parameters:{transient_lastDdlTime=1337814583}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE ) Now let’s create a desk utilizing saved AS SEQUENCEFILE for comparability: hive> CREATE desk seq (x int) saved AS SEQUENCEFILE; hive> DESCRIBE prolonged seq; okay x int specific desk info Table(tableName:seq, dbName:default, owner:edward, createTime:1337814571, lastAccessTime:0, retention:0, sd:StorageDescriptor( cols:[FieldSchema(name:x, type:int, comment:null)], location:file:/user/hive/warehouse/seq, inputFormat:org. apache. hadoop. mapred. SequenceFileInputFormat, outputFormat:org. apache. hadoop. hive. ql. io. HiveSequenceFileOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo( name:null, serializationLib:org. apache. hadoop. hive. serde2. lazy. LazySimpleSerDe, parameters:{serialization. format=1} ), bucketCols:[], sortCols:[], parameters:{}), partitionKeys:[], parameters:{transient_lastDdlTime=1337814571}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE ) Time taken: zero. 107 seconds until you've been blinded through Hive’s awesomeness, you are going to have picked up at the distinction among those tables. That kept AS SEQUENCEFILE has replaced the InputFormat and the OutputFormat: inputFormat:org. apache. hadoop. mapred. TextInputFormat, outputFormat:org. apache. hadoop. hive. ql. io. HiveIgnoreKeyTextOutputFormat, inputFormat:org.