Code Monkey home page Code Monkey logo

parquet-site's Introduction

Parquet Website

This website is built / powered by Hugo, and extended from the Docsy Theme.

The following steps assume that you have hugo installed and working. You can also use docker, see the Docker section for more information.

Building and Running Locally

Clone this repository to run the website locally:

git clone [email protected]:apache/parquet-site.git
cd parquet-site
git submodule update --init --recursive

To build or update CSS resources, you also need PostCSS to create the final assets. By default npm installs tools under the directory where you run npm install.

npm install -D autoprefixer
npm install -D postcss-cli
npm install -D postcss

To preview this website site locally, run the following in the root of the directory:

hugo server

Building and Running in Docker

If you don't want to install hugo and its dependencies on your local machine, you can use docker. To do so, checkout the parquet-site repo as explained above and then use Dockerfile to build an image with the required tools:

docker build -t parquet-site .

Then run the container mounting the current directory to /parquet-site and exposing local port 1313:

docker run -it -v `pwd`:/parquet-site -p 1313:1313  parquet-site

Once inside the container, run the following to preview the site:

# Install necessary npm modules in parquet-site directory
cd parquet-site
npm install -D autoprefixer
npm install -D postcss-cli
npm install -D postcss
hugo server --bind 0.0.0.0 # run the server

You can now preview the site locally on http://localhost:1313/

Release Process

To create documentation for a new release of parquet-format create a new .md file under content/en/blog/parquet-format. Please see existing files in that directory as an example.

To create documentation for a new release of parquet-java create a new .md file under content/en/blog/parquet-java. Please see existing files in that directory as an example.

Website development and deployment

Staging

To make a change to the staging version of the website:

  1. Make a PR against the staging branch in the repository
  2. Once the PR is merged, the Build and Deploy Parquet Site job in the deployment workflow will be run, populating the asf-staging branch on this repo with the necessary files.

Do not directly edit the asf-staging branch of this repo

Production

To make a change to the production version of the website:

  1. Make a PR against the production branch in the repository
  2. Once the PR is merged, the Build and Deploy Parquet Site job in the deployment workflow will be run, populating the asf-site branch on this repo with the necessary files.

Do not directly edit the asf-site branch of this repo

parquet-site's People

Contributors

alamb avatar alippai avatar charlesmahler avatar deining avatar etseidl avatar fokko avatar gszadovszky avatar jfarrell avatar jonahgao avatar kevinburkesegment avatar martin-g avatar pitrou avatar rdblue avatar shangxinli avatar vegarsti avatar vinooganesh avatar waldyrious avatar wgtmac avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

parquet-site's Issues

Update download links

Based on the following mail sent to the private list we shall update the download links on our site.

Hello, Apache PMCs,

In order to better provide our millions of users with downloads, the
Apache Infrastructure Team has been restructuring the way downloads work
for our main distribution channels in the past few weeks. For users,
this will largely go unnoticed, and for projects likely the same, but we
did want to reach out to projects and inform them of the changes we've
made:

As of March 2020, we are deprecating www.apache.org/dist/ in favor of
https://downloads.apache.org/ for backup downloads as well as signature
and checksum verification. The primary driver has been splitting up web
site visits and downloads to gain better control and offer a better
service for both downloads and web site visits.

As stated, this does not impact end-users, and should have a minimal
impact on projects, as our download selectors as well as visits to
www.apache.org/dist/ have been adjusted to make use of
downloads.apache.org instead. We do however ask that projects, in their
own time-frame, change references on their own web sites from
www.apache.org/dist/ to downloads.apache.org wherever such references
may exist, to complete the switch in full. We will NOT be turning off
www.apache.org/dist/ in the near future, but would greatly appreciate if
projects could help us transition away from the old URLs in their
documentation and on their download pages.

The standard way of uploading releases[1] will STILL apply, however
there may be a short delay (<= 15 minutes) between releasing and
releases showing up on downloads.apache.org for technical reasons.

If you have any questions about this change, please do not hesitate
to reach out to us at [email protected].

With regards,
Daniel on behalf of ASF Infrastructure.

[1] https://www.apache.org/legal/release-policy.html#upload-ci

Reporter: Gabor Szadovszky / @gszadovszky
Assignee: Gabor Szadovszky / @gszadovszky

PRs and other links:

Note: This issue was originally created as PARQUET-1811. Please see the migration documentation for further details.

Clarify parquet-format with respect to repeated fields across boundaries

Several implementors have reported that the parquet spec is currently unclear as to when repeated fields can span page boundaries (aka can a logical record be split across a page and/or row group boundary)

 

Discussion on list: https://lists.apache.org/thread/rd8twnvg4bg3558r507rzpxckcxt5wdn

 

The conclusion seems to be that the records can't be split across boundaries for "v2 data pages" or if there is a page index. 

 

We should clarify the spec to make this clear

Reporter: Andrew Lamb / @alamb
Assignee: Andrew Lamb / @alamb

PRs and other links:

Note: This issue was originally created as PARQUET-2473. Please see the migration documentation for further details.

Automate site generation

We moved our site source to github. It is much better than svn but still not working as it should. Currently, we have to generate the site manually before checking in. It would be much better if the site generation would be automatic so we can simply accept PRs on the source files.
One option to achieve this is the Pelican CMS System as described at .asf.yaml features for git repositories. Not sure if this is the best solution though. Another solution might be to trigger a jenkins build for the changes on master and after generating the site with middleman commit the files to the branch asf-site.

Reporter: Gabor Szadovszky / @gszadovszky

Related issues:

Note: This issue was originally created as PARQUET-1686. Please see the migration documentation for further details.

Update the website to describe the larger role of Parquet

I personally believe Parquet will be at the center of the analytics ecosystem

https://parquet.apache.org/ currently emphasis Parquet's role in the Hadoop ecosystem. I think this causes confusion in several ways:

  1. It implies that parquet is only focused on Hadoop, whem I think it is a critical technology across other ecosystems that are unrelated to hadoop (e.g. Apache Iceberg, Delta Lake, etc)
  2. It may further the perception that the Apache Parquet project only focuses on / cares about Hadoop / Java impleemntation

 

I would like to update the site to focus less on the hadoop aspects and more on the broader nature of Parquet

 

If people like where this is headed, I would like to next expand the documentation to explain better how the various implementations are related (e.g. how parquet-mr relates to the readers in arrow-rs, arrow, etc)

Reporter: Andrew Lamb / @alamb
Assignee: Andrew Lamb / @alamb

PRs and other links:

Note: This issue was originally created as PARQUET-2470. Please see the migration documentation for further details.

The announcement email on the web site does not comply with ASF rules

After following the instructions in the release guide on the Parquet website, my mail to [email protected] was rejected with the following message:

 

Sorry, but the announce email cannot be accepted as it stands.

Announcements of Apache project releases must contain a link to the
relevant download page. [1]
The download page must provide public download links where current official
source releases and accompanying cryptographic files may be obtained. [2]
It must also link to the KEYS file at
https://www.apache.org/dist//KEYS,
and provide details of how to verify a download using the signature or a
hash [3]

Note also that MD5 and SHA1 hashes are deprecated and should not be used
for new releases. [4]

Announcements that contain a link to the dyn/closer page alone will be
rejected by the moderators.

Announcements that contain a link to the dist.apache.org host will be
rejected by the moderators.

Announcements that contain a link to a web page that does not include a
link to a mirror to the artifact plus links to the signature and at least
one sha checksum will be rejected.

[1] https://www.apache.org/legal/release-policy.html#release-announcements
<http://www.apache.org/legal/release-policy.html#release-announcements>
[2] https://www.apache.org/dev/release-distribution#download-links
[3] https://www.apache.org/dev/release-download-pages.html#download-page
[4] https://www.apache.org/dev/release-distribution#sigs-and-sums

Reporter: Jim Apple / @jbapple
Assignee: Gabor Szadovszky / @gszadovszky

PRs and other links:

Note: This issue was originally created as PARQUET-1674. Please see the migration documentation for further details.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.