Hadoop in Taiwan 2012

時間

Keynote 議程-4F國際會議廳　

08:30~09:00

來賓報到

09:00~09:10

Opening

09:10~09:20

Special Guest │ 趨勢科技董事長張明正

09:20~10:10

Cutting Edge Hadoop Technology and the Trend│

Andrew Purtell (Trend Micro 資深架構師，HBase PMC 提交人)

10:10~10:30

休息 / 攤位參觀

10:30~11:20

Scalable Machine Learning with Hadoop│

Grant Ingersoll (LucidWorks 首席科學家)

11:20~12:10

趨勢科技的雲端發現之旅 - 以 Hadoop 建構企業核心競爭力的歷程分享 │ 陳永強 (趨勢科技雲端解決方案總負責人)

12:10~13:30

午餐 / 攤位參觀

下午分堂議程

時間

A.「開發者」

B.「營運者」

C.「應用案例」

13:30~14:10

oozie introduction

楊詠成 (Gibson Yang) / 台灣雅虎 Yahoo!

議程摘要：
Oozie is an open-source workflow / coordination service to manage data processing jobs for Apache Hadoop™. It is an extensible, scalable and data-aware service to orchestrate dependencies between jobs running on Hadoop (including HDFS, Pig and MapReduce).
In this talk, we will introduce oozie and share experience in Yahoo!

快速搭建Hadoop單機開發環境與雲端叢集架設實務【本課程會有實機操作演練，請自行攜帶筆電、網路線】

王耀聰 (Jazz Wang) /
國家高速網路與計算中心

議程摘要：
Hadoop經過七年的開發，終於在2011年12月釋出1.0版本，象徵著 Hadoop已成熟到能支持企業營運需求。即便如此，目前Hadoop最令人怯步的關鍵在於「不夠友善」。初學者往往第一個要面對的問題是缺乏佈建叢集所需的背景知識。
在台灣，多數資訊從業人員仍以Windows為主要的作業系統。本次演講將跟各位聽眾分享一個名為Hadoop4Win的懶人包安裝程式，除了可以作為學習Hadoop生態系的第一步外，也可以作為開發Hadoop程式的實驗環境。其次將跟各位介紹如何使用hiCloud搭建 Hadoop 叢集。

Hadoop在地理資訊系統上的應用－以福衛二號衛星影像管理及應用為例

辜文元 / 逢甲大學GIS中心

議程摘要：
近年來由於遙測技術之快速發展，單幅影像解析度大幅提高使得檔案需要更大的儲存空間，此外動態攝影在環境觀測與記錄使用上也愈來愈廣泛，資料動輒以GB或TB為單位成長，使得遙測資料儲存管理的需求性日益增加。面對如此巨量的資料量往往導致傳統伺服器頻繁的出現儲存空間不足的狀況，雖然傳統伺服器可以增加硬碟來增加儲存空間，但垂直的空間擴展有一定的限制，如何因應日益增加的影像儲存需求，將會是一個很重要的課題。

本次講題，將以逢甲大學GIS中心與國家太空中心共同合作的福衛二號衛星結合Hadoop雲端技術之研究成果為例，說明如何將Hadoop應用於地理資訊系統上。結合Hadoop做為影像管理及加值應用的基礎，發展衛星影像管理平台架構。內容包括了巨量福衛二號衛星影像管理，以及如何以Hadoop HDFS為基礎，發佈地圖服務，提供給廣大的GIS用戶端使用福衛影像資料。

除了巨量空間資料儲存的議題之外，網格分析也是地理資訊系統常用的空間分析模組，在本次講題中，也將介紹在實務上，如何將空間分析問題，拆解並轉換成MapReduce模型可以接受處理的模式，充份運用MapReduce分散式運算的優勢，加速網格分析的速度。

14:10~14:50

Big Data, Hadoop and R

Laurence Liew /Revolution Analytics

議程摘要：
This session will discuss the use of R within a Hadoop environment.
The motivation for the use of R, how R is used today inside and next-to a hadoop cluster. A short video of RHadoop will be shown.

[1] 關於 R 這個專案的簡介
　　Link1→
　　Link2→

[2] 關於 R-Hadoop 專案的簡介
　　Link→

Hadoop Security Overview - From Security Infrastructure Deployment to High-Level Services

施宏良 (Jason Shih) / Etu, SYSTEX Corp.

議程摘要：
The increasing trend of adoption Hadoop open-source framework for speedy data processing and analytics capabilities for organizations to manage huge data volume have brought attention to enterprise wide security concern aiming for fine grain control of sensitive information and isolation from different level/group of access on sharing storage or computing facilities. Prior to Hadoop 0.20, Unix-like file permission were introduced, providing also cluster-wide simple authentication mechanism but lack of access control per job queue, submission and other operations. With Hadoop's new security feature and it's integration with Kerberos, it's now possible to bring strong authentication and authorization to ensure rigorous access control to data, resources and also isolation between running tasks. In this presentation, we will cover the deployment details of Hadoop security on cluster environment and implementation on high-level services base on kerberized security infrastructure. We introduce also the Etu Appliance providing fast-deployment, system-automation and built-in feature of cross-realm trust mechanism which fulfill the interoperation between existing Active Domain or external LDAP realm and help reducing both integration and operation-wide overhead from administrators.

Ad hoc Query- 輕輕鬆鬆查詢海量資料

蘇柏綸 (Alex Su) / 趨勢科技

議程摘要：
Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs. We’ll present a case study of TrendMicro's integration of data analytics tools into its existing Hadoop-based, Pig-centric analytics platform. In our deployed solution, common data analytics tasks such as data sampling, feature generation, training, and testing can be accomplished quickly and directly in Pig, via carefully crafted loaders, storage functions, and user-defined functions.

14:50~15:20

休息 / 攤位參觀

15:20~16:00

Scalable Data Processing: Bulk Synchronous Parallel

林家弘 (Chia-Hung Lin) / 美商飛向科技(Fliptop Inc.)

議程摘要：
Hadoop MapReduce[1] is a popular open source framework inspired by functional programming 's map and reduce functions, saving developers lots of works by covering many underlying complicated tasks. However, not all tasks fit into MapReduce's scenario, graph related computation task (e.g. social network analysis) is one such example. Google therefore developed their in-house product, Pregel[2], based on Bulk Synchronous Parallel[3] - a bridge model suitable for performing iterative algorithms, performing large scale graph processing.

Outline:
1. What is Bulk Synchronous Parallel?
2. Apache Hama
3. Comparison between Hadoop MapReduce and Apache Hama
[1].Link1→
[2].Link2→
[3].Link3→

Hadoop 維運經驗分享
-規劃 Hadoop營運該注意的事項

張家豪 (James Chang) / 趨勢科技

議程摘要：
Over the last few years, there has been a fundamental shift in data storage, management, and processing. Companies are storing more data from more sources in more formats than ever before.
This isn't just about being a "data pack-rat," but instead building products, features, and intelligence predicated on knowing more about their world (where their world can be users, searches, machine logs and so forth).
In this session, we’ll present a case study of TrendMicro's Hadoop Cluster operation about Namenode and Jobtracker HA, Network Topology and Metrics Monitoring Tools.

Mohohan: An on-line video transcoding service via Hadoop

陳俊翰 (Chun-Han Chen) / OgilvyOne

議程摘要：
A famous cloud computing file system and developing framework named Hadoop is mainly designed for massive textual data management, such as counting, sorting, indexing, pattern finding, and so on. However, it is merely to seek a multimedia-oriented service via Hadoop. Mohohan is an on-line multimedia transcoding system for video resources, which implemented with Amazon Web Service (AWS) EC2, AWS S3, AWS EMR, Hadoop, and ffmpeg. Its goal is reducing the overall execution time by parallel transcoding via the Hadoop cluster. The concept of Mohohan is simple: 1) to divide the video into several chunk of frames, 2) to transcode the chunks in parallel with multiple nodes (i.e., task tracker) of Hadoop cluster, and 3) to merge the transcoded results into the output. On the homogeneous SaaS comparison, a test report from an impartial third party organization named CloudHarmony has been chosen. Finally, the experiment result shows that Mohohan performs quite better than other on-line video transcoding services mentioned in the test report, such as Encoding, Zencoder, Sorenson, and Panda.

16:00~16:40

設計高效能 HBase Schema--了解HBase運作方式與資料特徵

繆維武(Scott Miao)/趨勢科技

議程摘要：
HBase是基於分散式檔案系統的資料庫，源自於Google的Big Table，提到表格 (Table)，大眾一般都把它跟傳統關聯式資料庫 (Relational Database)聯想在一起；但就實務上，採用關聯式資料庫的設計方法，來設計HBase的schema，將會無法得到HBase的好處，更有甚者，會導致HBase效率低落！
本議題主要從HBase底層HFile之資料儲存結構為出發，來推導出HBase在不同的應用情況，可以採用的Schema設計方案有哪些，以期能更妥善利用HBase給我們的力量！

Hadoop hardware and network best practices

何長興 (Kenneth Ho)

議程摘要：
Hadoop is taking over a big chunk of the IT world. Many are already onboard, from Internet giants to cutting edge startups, from established multi-nation enterprises to SMBs serving local niche markets. Yet many more plan to hop on the Hadoop bandwagon.
One of the most important questions needs to be answered but less discussed per my observation in local communities, is hardware selection and network design. This talk attempts to shed some light on some of the best practices on how to go about selecting hardware and designing network for your new (or next) Hadoop cluster.

Hadoop在精準行銷上的應用

陳志昇 (Vincent Chen) / TCloud騰雲計算

議程摘要：
精準行銷上的應用- Hadoop in 移動裝置上網行為分析:
此應用在於Hadoop平台上，利用MapReduce等相關技術，整合各種移動裝置用戶資料，利用語意分析、資料探勘等分詞、分類技術，定義出完整用戶profile，除了將分析結果轉化成行銷能力，並最終實現人與內容、人與商品、人與人的智能配對。

※主辦單位保留活動議程與講師變更之權利，如有不便尚請見諒。