1.å¦ä½å¨win7ä¸çeclipseä¸è°è¯Hadoop2.2.0çç¨åº
2.hiveåå¨parquet表
å¦ä½å¨win7ä¸çeclipseä¸è°è¯Hadoop2.2.0çç¨åº
å¨ä¸ä¸ç¯åæä¸ï¼æ£ä»å·²ç»è®²äºHadoopçåæºä¼ªåå¸çé¨ç½²ï¼æ¬ç¯ï¼æ£ä»å°±è¯´ä¸ï¼å¦ä½eclipseä¸è°è¯hadoop2.2.0,源码素材火官方源码å¦æä½ ä½¿ç¨çè¿æ¯hadoop1.xççæ¬ï¼é£ä¹ï¼ä¹æ²¡äºï¼æ£ä»å¨ä»¥åçå客éï¼ä¹åè¿eclipseè°è¯1.xçhadoopç¨åºï¼ä¸¤è æ大çä¸åä¹å¤å¨äºä½¿ç¨çeclipseæ件ä¸åï¼hadoop2.xä¸hadoop1.xçAPIï¼ä¸å¤ªä¸è´ï¼æ以æ件ä¹ä¸ä¸æ ·ï¼æ们åªéè¦ä½¿ç¨åå«å¯¹åºçæ件å³å¯.ä¸é¢å¼å§è¿å ¥æ£é¢:
åºå· å称 æè¿°
1 eclipse Juno Service Release 4.2çæ¬
2 æä½ç³»ç» Windows7
3 hadoopçeclipseæ件 hadoop-eclipse-plugin-2.2.0.jar
4 hadoopçé群ç¯å¢ èææºLinuxçCentos6.5åæºä¼ªåå¸å¼
5 è°è¯ç¨åº Hellow World
éå°çå 个é®é¢å¦ä¸ï¼
Java代ç
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
解å³åæ³:
å¨org.apache.hadoop.util.Shellç±»çcheckHadoopHome()æ¹æ³çè¿åå¼éååºå®ç
æ¬æºhadoopçè·¯å¾ï¼æ£ä»å¨è¿éæ´æ¹å¦ä¸ï¼
Java代ç private static String checkHadoopHome() {
// first check the Dflag hadoop.home.dir with JVM scope
//System.setProperty("hadoop.home.dir", "...");
String home = System.getProperty("hadoop.home.dir");
// fall back to the system/user-global env variable
if (home == null) {
home = System.getenv("HADOOP_HOME");
}
try {
// couldn't find either setting for hadoop's home directory
if (home == null) {
throw new IOException("HADOOP_HOME or hadoop.home.dir are not set.");
}
if (home.startsWith("\"") && home.endsWith("\"")) {
home = home.substring(1, home.length()-1);
}
// check that the home setting is actually a directory that exists
File homedir = new File(home);
if (!homedir.isAbsolute() || !homedir.exists() || !homedir.isDirectory()) {
throw new IOException("Hadoop home directory " + homedir
+ " does not exist, is not a directory, or is not an absolute path.");
}
home = homedir.getCanonicalPath();
} catch (IOException ioe) {
if (LOG.isDebugEnabled()) {
LOG.debug("Failed to detect a valid hadoop home directory", ioe);
}
home = null;
}
//åºå®æ¬æºçhadoopå°å
home="D:\\hadoop-2.2.0";
return home;
}
第äºä¸ªå¼å¸¸ï¼Could not locate executable D:\Hadoop\tar\hadoop-2.2.0\hadoop-2.2.0\bin\winutils.exe in the Hadoop binaries. æ¾ä¸å°winä¸çæ§è¡ç¨åºï¼å¯ä»¥å»ä¸è½½binå ï¼è¦çæ¬æºçhadoopè·ç®å½ä¸çbinå å³å¯
第ä¸ä¸ªå¼å¸¸ï¼
Java代ç Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: hdfs://...:/user/hmail/output/part-, expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:)
at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:)
at com.netease.hadoop.HDFSCatWithAPI.main(HDFSCatWithAPI.java:)
åºç°è¿ä¸ªå¼å¸¸ï¼ä¸è¬æ¯HDFSçè·¯å¾åçæé®é¢ï¼è§£å³åæ³ï¼æ·è´é群ä¸çcore-site.xmlåhdfs-site.xmlæ件ï¼æ¾å¨eclipseçsrcæ ¹ç®å½ä¸å³å¯ã
第å个å¼å¸¸ï¼
Java代ç Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
åºç°è¿ä¸ªå¼å¸¸ï¼ä¸è¬æ¯ç±äºHADOOP_HOMEçç¯å¢åéé ç½®çæé®é¢ï¼å¨è¿éæ£ä»ç¹å«è¯´æä¸ä¸ï¼å¦ææ³å¨Winä¸çeclipseä¸æåè°è¯Hadoop2.2ï¼å°±éè¦å¨æ¬æºçç¯å¢åéä¸ï¼æ·»å å¦ä¸çç¯å¢åéï¼
ï¼1ï¼å¨ç³»ç»åéä¸ï¼æ°å»ºHADOOP_HOMEåéï¼å±æ§å¼ä¸ºD:\hadoop-2.2.0.ä¹å°±æ¯æ¬æºå¯¹åºçhadoopç®å½
(2)å¨ç³»ç»åéçPathéï¼è¿½å %HADOOP_HOME%/binå³å¯
以ä¸çé®é¢ï¼æ¯æ£ä»å¨æµè¯éå°çï¼ç»è¿å¯¹çä¸è¯ï¼æ们çeclipseç»äºå¯ä»¥æåçè°è¯MRç¨åºäºï¼æ£ä»è¿éçHellow Worldæºç å¦ä¸ï¼
Java代ç package com.qin.wordcount;
import java.io.IOException;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
/***
*
* Hadoop2.2.0æµè¯
* æ¾WordCountçä¾å
*
* @author qindongliang
*
* hadoopææ¯äº¤æµç¾¤ï¼
*
*
* */
public class MyWordCount {
/**
* Mapper
*
* **/
private static class WMapper extends Mapper<LongWritable, Text, Text, IntWritable>{
private IntWritable count=new IntWritable(1);
private Text text=new Text();
@Override
protected void map(LongWritable key, Text value,Context context)
throws IOException, InterruptedException {
String values[]=value.toString().split("#");
//System.out.println(values[0]+"========"+values[1]);
count.set(Integer.parseInt(values[1]));
text.set(values[0]);
context.write(text,count);
}
}
/**
* Reducer
*
* **/
private static class WReducer extends Reducer<Text, IntWritable, Text, Text>{
private Text t=new Text();
@Override
protected void reduce(Text key, Iterable<IntWritable> value,Context context)
throws IOException, InterruptedException {
int count=0;
for(IntWritable i:value){
count+=i.get();
}
t.set(count+"");
context.write(key,t);
}
}
/**
* æ¹å¨ä¸
* (1)shellæºç éæ·»å checkHadoopHomeçè·¯å¾
* (2)è¡ï¼FileUtilséé¢
* **/
public static void main(String[] args) throws Exception{
// String path1=System.getenv("HADOOP_HOME");
// System.out.println(path1);
// System.exit(0);
JobConf conf=new JobConf(MyWordCount.class);
//Configuration conf=new Configuration();
//conf.set("mapred.job.tracker","...:");
//读åpersonä¸çæ°æ®å段
// conf.setJar("tt.jar");
//注æè¿è¡ä»£ç æ¾å¨æåé¢ï¼è¿è¡åå§åï¼å¦åä¼æ¥
/**Jobä»»å¡**/
Job job=new Job(conf, "testwordcount");
job.setJarByClass(MyWordCount.class);
System.out.println("模å¼ï¼ "+conf.get("mapred.job.tracker"));;
// job.setCombinerClass(PCombine.class);
// job.setNumReduceTasks(3);//设置为3
job.setMapperClass(WMapper.class);
job.setReducerClass(WReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
String path="hdfs://...:/qin/output";
FileSystem fs=FileSystem.get(conf);
Path p=new Path(path);
if(fs.exists(p)){
fs.delete(p, true);
System.out.println("è¾åºè·¯å¾åå¨ï¼å·²å é¤ï¼");
}
FileInputFormat.setInputPaths(job, "hdfs://...:/qin/input");
FileOutputFormat.setOutputPath(job,p );
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
æ§å¶å°ï¼æå°æ¥å¿å¦ä¸ï¼
Java代ç INFO - Configuration.warnOnceIfDeprecated() | mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
模å¼ï¼ local
è¾åºè·¯å¾åå¨ï¼å·²å é¤ï¼
INFO - Configuration.warnOnceIfDeprecated() | session.id is deprecated. Instead, use dfs.metrics.session-id
INFO - JvmMetrics.init() | Initializing JVM Metrics with processName=JobTracker, sessionId=
WARN - JobSubmitter.copyAndConfigureFiles() | Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
WARN - JobSubmitter.copyAndConfigureFiles() | No job jar file set. User classes may not be found. See Job or Job#setJar(String).
INFO - FileInputFormat.listStatus() | Total input paths to process : 1
INFO - JobSubmitter.submitJobInternal() | number of splits:1
INFO - Configuration.warnOnceIfDeprecated() | user.name is deprecated. Instead, use mapreduce.job.user.name
INFO - Configuration.warnOnceIfDeprecated() | mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
INFO - Configuration.warnOnceIfDeprecated() | mapred.mapoutput.value.class is deprecated. Instead, use mapreduce.map.output.value.class
INFO - Configuration.warnOnceIfDeprecated() | mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
INFO - C
hiveåå¨parquet表
parquetæ ¼å¼ç表å¨ç产ç¯å¢ä¸ç»å¸¸è¢«ä½¿ç¨å°ï¼å ·æåå¼åå¨åå缩çç¹ç¹ï¼æ们æä¹å¨hiveä¸åå¨parquetæ ¼å¼ç表å¢ã
è¿é使ç¨oracleçemp表
å è½½æ¬å°æ°æ®å°hive表
æ§è¡æ¥è¯¢
åç°æ¥é
emp使ç¨parquetæ ¼å¼åå¨ï¼å ¶ä¸imputFormatåoutputFormaté½æ¯parquetçç¸å ³çï¼ä¹å°±æ¯æçimputFormatæ¯parquentçï¼ä½æ¯ä½ ä¼ è¿æ¥çæ¯textï¼æä¸è®¤è¯
æ们çä¸ä¸empçç¸å ³ä¿¡æ¯,å¯ä»¥çå°è¿éçé½æ¯parquetçformatçï¼è¿æ¯å¯¼è´è¿æ¬¡é误çåå ã
è¿å°±å¯¼è´äºæ们éè¦æ¯æ¬¡é½å ætextæ件转å为parquetçæ件ï¼ç¶åparquent表è¿è¡å è½½æå¯ä»¥ï¼ä¸é¢ä»ç»å®æ¹æ¨èç使ç¨æ¹æ³ã
æ¥çemp_tmpç表çä¿¡æ¯,è¿éå¯ä»¥çå°ï¼é»è®¤çæ¯TextImputFormatåTextOutputFormatçã
ç¶åå è½½æ°æ®å°emp_tmp,æ¥çæ°æ®ï¼æ¯æ£å¸¸æ¾ç¤ºç
ç¶åç°å¨æä¹åçempéé¢çæ°æ®ç»å é¤
ç¶åæemp_tmp表éé¢çæ°æ®å è½½å°emp
æ¥è¯¢ä¸ä¸ï¼æ°æ®æ£å¸¸æ¾ç¤ºï¼è¿ä¸ªæ¹å¼ä½¿ç¨èµ·æ¥è¿è¡ï¼å°±æ¯æ¯æ¬¡é½éè¦å¯¹ä¸´æ¶è¡¨è¿è¡æä½ï¼è¿æ¯æ¯è¾éº»ç¦çã
æè§è¿ä¸ªé®é¢æ¯ç»å¸¸åºç°ï¼ä¸ºä»ä¹ä¼è¿æ ·å¢ãè¿ä¸ªåhiveççæ¬æä¸å®çå ³ç³»ã
å¯ä»¥çåºhiveå®æ¹å°inputformatåoutputformatè¿è¡äºæ´åï¼è¿æ ·ä½¿ç¨èµ·æ¥ä¹æ¯æ¯è¾æ¹ä¾¿çã
ä½æ¯å¯è½æ人æ³ï¼é£æä¿®æ¹inputformatä¸å°±è¡äºï¼ä¸é¢æä»ç»ä¸ä¸ï¼çæ¯å¦å¯ä»¥
å建emp2表,æ¯parquetçåå¨æ ¼å¼ç
ä¿®æ¹inputformat åserde,è¿éinputFormatæ¯TextInputFormatï¼SEDE使ç¨çæ¯LazySimpleSerDeï¼Outputformatä»»ç¶æ¯Parquetçï¼è¿ééè¦å¸¦ä¸ã
æ¥çemp2表çä¿¡æ¯,å¦ä¸å¾è¡¨ç¤ºä¿®æ¹æå
å è½½æ°æ®å°emp2
æ¥è¯¢æ°æ®ï¼æ§è¡æå
å°è¿éï¼ä¿®æ¹inputformatåserdeçæ¹æ³ä¹ä»ç»å®æäºï¼æ们以为æåäºï¼ä½æ¯ä¸hdfsä¸ä¸çï¼æ件è¿æ¯txtæ ¼å¼çï¼æ以éè¿ä¿®æ¹inputformatåserdeçæ¹æ³ä¸è¡ã
è¯å®æ人æ³ä½¿ç¨è¿ä¸ªæ¹æ³
è¿ä¸ªæ¹æ³æä¹å°è¯äºï¼ä½æ¯è¿åçå¼å ¨é½æ¯null
å¨ä» ä» ä½¿ç¨hiveçæ¶åï¼å¦ææ³ætxtæ件éé¢çæ°æ®ä¿åå°parquet表éé¢çè¯ï¼å¯ä»¥ä½¿ç¨å»ºç«ä¸´æ¶è¡¨çæ¹æ³ï¼è¿ä¸ªæ¹æ³ä¹æ¯æ¯è¾å¥½æä½çã
ä½æ¯å ¶å®å¦æ使ç¨sparkï¼flinkçåå¸å¼è®¡ç®å¼æçè¯ï¼æ¯å¯ä»¥ç´æ¥ç读åtxtæ°æ®ä¿åå°parquet表éé¢çï¼æ¡æ¶å¸®æ们åäºè½¬åãè¿ç§æ¹å¼ä¹æ¯æ们å¨å·¥ä½ä¸ç»å¸¸ä½¿ç¨çã
ä¸é¢ä¹ä»ç»äºä¿®æ¹inputformatåserçæ¹å¼ï¼ç§ç»inputformatæ¯å¯ä»¥è®©txtæ件éé¢çæ°æ®è¢«è¯»è¿æ¥çï¼å¦æåæ¶è¿ä¿®æ¹äºserde为lazysimpleserdeçè¯ï¼è¿ä¸ªæ¯ææ°æ®ä¿å为textæ ¼å¼çï¼å·²ç»å®å ¨åparquet没æå ³ç³»äºï¼ä¿åçæ件è¿æ¯txtæ ¼å¼çãä» ä¿®æ¹inputformatï¼ä½æ¯ä½¿ç¨çserdeæ¯parquetçï¼ä½æ¯æ°æ®è¿åºä¸ä¸è´ï¼ä¹æ¯æé®é¢çã