hortonworks / data-tutorials Goto Github PK
View Code? Open in Web Editor NEWHortonworks tutorials
Hortonworks tutorials
Please Fill in the Following Information before submitting your Issue:
Tutorial Name: Hadoop Tutorial – Getting Started with HDP
HDP Version are you Running: 2.4
Applicable HDP Components: HDFS, Hive, Pig, Spark, Zeppelin
Link to tutorial(Hortonworks or GitHub website):
http://hortonworks.com/hadoop-tutorial/hello-world-an-introduction-to-hadoop-hcatalog-hive-and-pig/#section_1
When users copy/paste the commands from tutorial to pig, zeppelin, etc. script editor, errors happen.
Verify that the code has been written correctly. Copy/Paste commands from tutorial to editor and verify no problems occur. Discuss with web team browser cross-compatibility for same styling.
Because we convert tutorials to HTML with Jekyll (http://jekyllrb.com) it would be useful to include metadata at the top of each tutorial.
I plan on adding the following tags to the top of tutorials
metadata tag | description | where utilized |
---|---|---|
tutorial-version |
version tag for which current tutorial is compatible | inserted into html link to for button on tutorial footer |
tutorial-series |
series which tutorial is a part of | no current plans but I feel would be useful for organization in the future |
title |
pretty self explanatory here | no specific uses yet but alas something we can use to insert if we ever needed it. Also helpful to make an "official" title for some of the tutorials |
tutorial-id |
This will be used to identify tutorial in hortonworks community connection | used when inserting data into links for tutorial footer |
intro-page |
true or false value | can be used to insert extra data in top/bottom of paragraph if needed - no current use |
components |
a string array of values which specifies the hdp/hdf components which are used in the tutorial. | no specific plans - but could be utilized on gh-pages? Definitely something that could be useful. |
The build script also needs to be modified to not add YAML frontmatter since all tutorials should already contain it after this fix.
Update spelling/formatting
Please Fill in the Following Information before submitting your Issue:
Tutorial Name: Realtime Event Processing with Kafka and Storm
HDP Version are you Running: 2.4
Applicable HDP Components: Kafka, Storm, HBase, Hive, HDFS
Link to tutorial(Hortonworks or GitHub website):http://hortonworks.com/hadoop-tutorial/simulating-transporting-realtime-events-stream-apache-kafka/#section_3
Please Fill in the Following Information before submitting your Issue:
Tutorial Name: Realtime Event Processing with Storm and Kafka (Tutorial Series)
HDP Version are you Running: 2.4
HDF Version are you Running: 1.2
Applicable HDP Components: Storm, Kafka, HBase, Hive
Link to tutorial(Hortonworks or GitHub website): http://hortonworks.com/hadoop-tutorial/simulating-transporting-realtime-events-stream-apache-kafka/
Currently we do not have NiFi incorporated in this tutorial series, yet it is a powerful tool to read in the data and do some processing on it. Right now the Kafka producer creates messages and publishes it to a kafka cluster. However, we can use NiFi's processor to activate a stream simulator that generates truck event data and another processor to push it onto a Kafka cluster. How will it affect storm and hbase later in the lab? Our IoT tutorial lacks showing the community the process of integrating Hortonworks DataFlow with Hortonworks DataPlatform services (Kafka and Storm).
Integrate NiFi into this tutorial series to add automating the flow of data. We want to read in data, do some filtering on it while it is flowing and push it to a kafka cluster. Storm will be used to capture that data and do some processing. We will test HBase and Hive to see how our table has changed after using NiFi.
Please Fill in the Following Information before submitting your Issue:
Tutorial Name: Processing Data Pipeline with Apache Falcon
HDP Version are you Running: 2.4
Applicable HDP Components: Falcon, Ambari, HDFS
Link to tutorial(Hortonworks or GitHub website): http://hortonworks.com/hadoop-tutorial/processing-data-pipeline-with-apache-falcon/
The user needs to be more aware that they will need admin privileges for the tutorial. The tutorial's images and ease of use need to be improved.
Update tutorial with clearer instructions
Update images
Please Fill in the Following Information before submitting your Issue:
Tutorial Name: Lap Around
HDP Version are you Running: 2.4
Applicable HDP Components:
Link to tutorial(Hortonworks or GitHub website):
The hyperlinks to jump to certain parts of the page doesn't work properly. After running the bash script to convert all the markdown files to html, the issue below happens.
<li><a href="#configure-and-start-solr">Configure and Start Solr</a></li>
......(more html code in between)
<h2 id="configure-and-start-solr-a-idconfigure-and-start-solra">Configure and Start Solr <a id="configure-and-start-solr"></a></h2>
href in the <a>
tag link is correct.
Yet, the location that the link jumps to is incorrect as can be seen by the <h2>
tag. There should be one id attribute that says: <h2 id="configure-and-start-solr">Configure and Start Solr</h2>
Also an anchor element does not need to be present.
Have your run into this issue?
Discovered several formatting issues after conversion from WP to HTML that need to be fixed in tutorials.
Zeppelin views are not enabled in HDP 2.4.
Update how to manually access Zeppellin notebooks.
Please Fill in the Following Information before submitting your Issue:
Tutorial Name: Securing Data Lake Resources and User Access Auditing with Apache Ranger
HDP Version are you Running: 2.4
Applicable HDP Components: Apache Ranger
Link to tutorial(Hortonworks or GitHub website): http://hortonworks.com/hadoop-tutorial/securing-data-lake-auditing-user-access-using-hdp-security/
Tutorial needs to be updated for HDP 2.4
Changes to images and text that allow the tutorial to be compatible with HDP 2.4 sandbox.
Please Fill in the Following Information before submitting your Issue:
Tutorial Name: Learning the ropes of Hortonworks Sandbox
HDP Version are you Running: 2.4
Applicable HDP Components: Sandbox
Link to tutorial(Hortonworks or GitHub website):
http://hortonworks.com/hadoop-tutorial/learning-the-ropes-of-the-hortonworks-sandbox/#deploy-sandbox-azure
Want to change the the IP address to a generic fact
Changing IP address text to "host"
Everything should now point to hortonworks/tutorials.
Please Fill in the Following Information before submitting your Issue:
Tutorial Name: Hadoop Tutorial – Getting Started with HDP
HDP Version are you Running: 2.4
Applicable HDP Components: HDFS, Hive, Pig, Spark, Zeppelin
Link to tutorial(Hortonworks or GitHub website):
http://hortonworks.com/hadoop-tutorial/hello-world-an-introduction-to-hadoop-hcatalog-hive-and-pig/#section_1
Formatting problem: Header numbers are different sizes and font weight.
Make Headers even in size and font weight.
Please Fill in the Following Information before submitting your Issue:
Tutorial Name: All Tutorials
HDP Version are you Running: 2.4
Applicable HDP Components: Sandbox
Link to tutorial(Hortonworks or GitHub website):
Users are not sure how long they will spend in the tutorial, we should add a bullet point in the pre-requisite or outline section that gives them a clear statement on this.
Add a bullet point to the outline section that tells the user how long the tutorial will take.
@ZacBlanco
@roberthryniewicz
I'm using sandbox 2.4 on tutorial https://github.com/hortonworks/tutorials/blob/hdp/tutorials/hortonworks/a-lap-around-spark/tutorial.md
For the command
val counts = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _)
I get the error:
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://sandbox.hortonworks.com:8020/tmp/data
Please Fill in the Following Information before submitting your Issue:
Tutorial Name: Deploying Hortonworks Sandbox on Microsoft Azure
HDP Version are you Running: HDP 2.4
Applicable HDP Components: Sandbox
Link to tutorial(Hortonworks or GitHub website): http://hortonworks.com/hadoop-tutorial/deploying-hortonworks-sandbox-on-microsoft-azure/
Screenshots are outdated.
Want to upgrade the splash screen shots to make it to the latest version of HDP to 2.4
Please Fill in the Following Information before submitting your Issue:
Tutorial Name: How to Refine and Visualize Sentiment Data
HDP Version are you Running: 2.4
Applicable HDP/HDF Components: NiFi
Link to tutorial(Hortonworks or GitHub website): http://hortonworks.com/hadoop-tutorial/how-to-refine-and-visualize-sentiment-data/
HDF 1.2 is now GA and tutorial should reflect this new HDF version
Update links and text to match HDF 1.2
Please Fill in the Following Information before submitting your Issue:
Tutorial Name: Secure JDBC and ODBC Clients’ Access to HiveServer2 using Apache Knox
HDP Version are you Running: 2.4.0
Applicable HDP Components: Knox, Hive
Link to tutorial(Hortonworks or GitHub website):http://hortonworks.com/hadoop-tutorial/secure-jdbc-odbc-clients-access-hiveserver2-using-apache-knox/
Tutorial needs to be updated to be compatible with HDP 2.4
Update text and images so that the tutorial works with 2.4
Discovered several formatting issues after conversion from WP to HTML that need to be fixed.
Please Fill in the Following Information before submitting your Issue:
Tutorial Name: How to Refine and Visualize Server Log Data
HDP Version are you Running: HDP-2.4
Applicable HDP Components:
Link to tutorial(Hortonworks or GitHub website):http://hortonworks.com/hadoop-tutorial/how-to-refine-and-visualize-server-log-data/
Tutorial is slightly outdated in technology. We want to use NiFi to do data collection
Update tutorial to utilize NiFi for log aggregation and collection rather than flume.
Issues that are being resolved include
Please Fill in the Following Information before submitting your Issue:
Tutorial Name: Processing Data Pipeline with Apache Falcon
HDP Version are you Running: 2.4
Applicable HDP Components: Falcon, Ambari
Link to tutorial(Hortonworks or GitHub website): http://hortonworks.com/hadoop-tutorial/processing-data-pipeline-with-apache-falcon/
Fix Tutorial, so it works.
Revamp and test tutorial to make sure it works.
We'll need to re-write the build script under util/build_tutorials.sh
in order to implement custom metadata for tutorial ID's and other necessary tags.
Update tutorial to utilize NiFi for log aggregation and collection rather than flume.
Please Fill in the Following Information before submitting your Issue:
Tutorial Name: Hello-HDP Tutorial Series
HDP Version are you Running: 2.4
Applicable HDP Components: HDFS, Hive, Pig, Spark, Zeppelin
Link to tutorial(Hortonworks or GitHub website): http://hortonworks.com/hadoop-tutorial/hello-world-an-introduction-to-hadoop-hcatalog-hive-and-pig/#section_1
Tutorials in the series are lacking summary sections that cover what skills and knowledge were acquired by completing the tutorial. Tutorial readability could be improved.
Add Summary sections to each tutorial in hello-hdp. Improve readability in each tutorial with regards to the feedback we've gotten.
Please Fill in the Following Information before submitting your Issue:
Shell File Name: Build_Tutorials.sh
HDP Version are you Running: hdp-2.4
Applicable HDP Components: Sandbox
Tutorials Q&A at the bottom of every tutorial has been updated. For instance, alignment, structure and content has been updated.
Utilize the following HTML code in the shell script for the Tutorial Q&A section:
<hr>
<table>
<tr>
<th colspan="2">
<h2>Tutorial Q&A and Reporting Issues</h2>
</th>
</tr>
<tr>
<td colspan="2">
<p> If you need help or have questions with this tutorial, please first check
HCC for existing Answers to questions on this tutorial using the Find Answers
button. If you don't find your answer you can post a new HCC question for
this tutorial using the Ask Questions button. </p>
</td>
</tr>
<tr>
<td>
<a class="btn" href="https://community.hortonworks.com/topics/tutorial-100.html"
role="button">Find Answers</a>
</td>
<td>
<a class="btn" href="https://community.hortonworks.com/questions/ask.html?space=81&topics=tutorial-100&topics=hdp-2.4.0"
role="button">Ask Questions</a>
</td>
</tr>
<tr>
<td colspan="2">
<p> Tutorial Name: <strong>Hadoop Tutorial - Getting Started with HDP - Intro</strong></p>
</td>
</tr>
<tr>
<td colspan="2">
<p> HCC Tutorial Tag:<strong> tutorial-100</strong> and <strong>HDP-2.4</strong></p>
</td>
</tr>
<tr>
<td colspan="2">
<p>If the tutorial has multiple labs please indicate which lab your question corresponds to. Please provide any feedback related to that lab.</p>
</td>
</tr>
<tr>
<td colspan="2">
<p>All Hortonworks, partner and community tutorials are posted in the Hortonworks github and can be
contributed via the <a href="https://github.com/hortonworks/tutorials/wiki">Hortonworks Tutorial
Contribution Guide</a>. If you are certain there is an
issue or bug with the tutorial, please
<a href="https://github.com/hortonworks/tutorials/wiki#issues-with-tutorials">create an issue</a>
on the repository and we will do our best to resolve it!
</p>
</td>
</tr>
</table>
Please Fill in the Following Information before submitting your Issue:
Tutorial Name: Data Reporting with Zoomdata
HDP Version are you Running: hdp-2.4
Applicable HDP Components: Sandbox
Link to new tutorial(Hortonworks or GitHub website):http://hortonworks.com/hadoop-tutorial/hello-world-an-introduction-to-hadoop-hcatalog-hive-and-pig/#section_8
Create a new tutorial on data reporting with zoomdata.
Build the tutorial first in github. Then test the tutorial with latest sandbox. Publish onto hortonworks website.
Currently all tutorials have been placed under tutorials/hortonworks
I think it would be beneficial if we broke up the tutorials into different "learning path" folders. And possibly number each tutorial on the path.
It could help users and give them a sense of direction.
Thoughts, @racoss, @roberthryniewicz ?
Surely not a critical item, but there is a general misrepresentation of the calculated riskfactor value in the Risk Analysis tutorials; especially in http://hortonworks.com/hadoop-tutorial/hello-world-an-introduction-to-hadoop-hcatalog-hive-and-pig/#section_9.
We are telling folks that the higher this number is, the more risk they present when in fact it is the complete opposite. The calculation is nbrMiles / nbrBadEvents. For example if two different driver drove 10000 miles, but Fred had 10 bad events and Barney had 1000 their riskfactors would then be 1000 and 10.
So, the tall spikes in tools like Zeppelin are actually the good drivers, not the bad ones.
Again, not a critical item, but could cause some confusion for someone really thinking about the tutorial, not just clicking through it as fast as they can. We could either just clean up the wording on the spots that talk to this, or consider making the formula nbrBadEvents / nbrMiles. That approach would give the worse driver Barney a riskfactor of 0.1 and the good driver, Fred, 0.001. The actual nbrs wouldn't be all that high with this approach, but the disparity and spread will still exist.
Because we have hardcoded sandbox.hortonworks.com into default.json
from sentiment tutorial, unless users edit hosts on their host to point to the Azure machine, then they won't be able to use the Dashboard.
We should replace the old default.json
with an updated one from @abajwa-hw located at https://raw.githubusercontent.com/abajwa-hw/ambari-nifi-service/master/demofiles/default.json
I validated on the 2.3.2 Sandbox (and the emerging 2.4 one) that the http://hortonworks.com/hadoop-tutorial/simulating-transporting-realtime-events-stream-apache-kafka/#section_4 tutorial is missing a dependency (called out in the notes section) and forces a fix as identified about half-way down https://github.com/HortonworksUniversity/Essentials/blob/master/demos/storm/README.md to get it working.
I was thinking of forking/fixing and creating a pull request, but when I visited https://github.com/hortonworks/tutorials/blob/be9e1a6afd906df2744d6bef6e4a72a5e74e465b/tutorials/hortonworks/realtime-event-processing-with-hadoop/rtep-2.md I found out that the source is actually embedded in a big zip file, not in the tutorials GitHub repo itself.
Unfortunately, https://github.com/search?q=BaseTruckEventTopology.java&type=Code&utf8=%E2%9C%93 actually shows MANY variations of this code from MANY Hortonworkers; bummer.
Ultimately, how should we address this?
I'm glad to help resolve this, but want to make sure I know the direction we should be taking to fix it and I wanted to make sure I'm not doing double-duty if someone is already addressing this.
I'm using sandbox 2.4 to follow the Falcon tutorial:
http://hortonworks.com/hadoop-tutorial/defining-processing-data-end-end-data-pipeline-apache-falcon/
When setting hdfs permissions I get
"hadoop fs –chown –R falcon /apps/falcon/*" gives the error "-chown: Unknown command"
After copying the xml to define the cluster I get several errors, where it doesn't seem to recognize the values for colo, permissions, staging or working.
Please Fill in the Following Information before submitting your Issue:
Tutorial Name: Fine Grained Permissions for HDFS files in Hadoop using HDFS ACL`s
HDP Version are you Running: 2.4
Applicable HDP Components: HDFS
Link to tutorial(Hortonworks or GitHub website):http://hortonworks.com/hadoop-tutorial/fine-grained-permissions-hdfs-files-hadoop-using-hdfs-acls/
Tutorial needs to be imported and updated for HDP 2.4
Work with 2.4 sandbox to update images and text appropriately
Tutorial Name: Hadoop Tutorial – Getting Started with HDP
HDP Version are you Running: N/A
Applicable HDP Components:N/A
Link to tutorial(Hortonworks or GitHub website):
http://hortonworks.com/hadoop-tutorial/hello-world-an-introduction-to-hadoop-hcatalog-hive-and-pig/#section_9
Updated content from Zoomdata
PR Forthcoming.
Please Fill in the Following Information before submitting your Issue:
Tutorial Name: Learning Spark with Zeppelin
HDP Version are you Running: 2.4+
Applicable HDP Components: Spark, Zeppelin TP
Please Fill in the Following Information before submitting your Issue:
Tutorial Name: Learning the Ropes of the Hortonworks Sandbox
HDP Version are you Running: hdp-2.4
Applicable HDP Components: Sandbox
Link to tutorial(Hortonworks or GitHub website):http://hortonworks.com/hadoop-tutorial/learning-the-ropes-of-the-hortonworks-sandbox/
Add a new step on how to find your sandbox-version.
Create a new step called learn your sandbox version between original steps 1.1 and 1.2.
They should be located at tutorials/community
and tutorials tutorials/partner
respectively.
For some reason we have a copy of the IPython notebook tutorial in the partners folder. I will delete it because we have the same one under the hortonworks folder.
Problem: I just came across a problem when uploading the hello hdp tutorials onto the website. I can't navigate to different labs 1 - 6.
For hello hdp, the pages are connected all on one page, it's just the javascript creates the allusion that they are separate pages. Therefore, lab 1 having the div tag below and lab 2 also having that tag creates a duplicate.
example:
Our script works well for one page tutorials. Have we accounted for tutorials like hello hdp in the script code?
@ZacBlanco @roberthryniewicz
The links currently point to the old repo. This should be updated now to make sure images which get linked are from this repo.
Please Fill in the Following Information before submitting your Issue:
Tutorial Name: Learning the Ropes of the Hortonworks Sandbox
HDP Version are you Running: hdp-2.4
Applicable HDP Components: Sandbox
Link to tutorial(Hortonworks or GitHub website):http://hortonworks.com/hadoop-tutorial/learning-the-ropes-of-the-hortonworks-sandbox/
Change headings of subsections to be be h3 or h2's instead. The h5's look smaller than the text. Makes it a bit difficult to follow at points
Simply change markdown lines from "#####" to "###" or "##"
Adding new images for testing WP links in staging.
Note: old assets for this tutorial are not removed until Zac merges this repo with tutorials-future
Please Fill in the Following Information before submitting your Issue:
Tutorial Name: Learning the ropes of Hortonworks Sandbox
HDP Version are you Running: 2.4
Applicable HDP Components: Sandbox
Link to tutorial(Hortonworks or GitHub website):http://hortonworks.com/hadoop-tutorial/learning-the-ropes-of-the-hortonworks-sandbox/
Have to include Azure in the tutorial
Updating the tutorial with the screenshots of deployed sandbox on Azure.
There is a mixup of columns that get loaded into the finalresults table in the tutorial at http://hortonworks.com/hadoop-tutorial/hello-world-an-introduction-to-hadoop-hcatalog-hive-and-pig/#section_7.
Check out cols 2 and 3 of the next line from the Compute Driver Risk Factor section.
val risk_factor_spark=hiveContext.sql("select driverid, totmiles,occurance, totmiles/occurance riskfactor from joined")
Check out cols 2 and 3 in the next line from the Create and ORC table section.
hiveContext.sql("create table finalresults( driverid String, occurance bigint,totmiles bigint,riskfactor double) stored as orc").toDF()
When the next line runs, we end up storing the totalmiles value into the occurance column and vice-versa.
hiveContext.sql("load data inpath 'risk_factor_spark' into table finalresults")
GOOD NEWS; EASY FIX!! Just switch the order in the first code snippet above to now read as follows.
val risk_factor_spark=hiveContext.sql("select driverid, occurance, totmiles, totmiles/occurance riskfactor from joined")
Issues currently being resolved:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.