Haz clic en este link para acceder a la versión en español del setup.
Please choose your operating system (OS):
Please choose your operating system (OS):
Setup instructions for Le Wagon's students on their first day of Data Science Bootcamp
Home Page: https://www.lewagon.com/data
Haz clic en este link para acceder a la versión en español del setup.
Please choose your operating system (OS):
Please choose your operating system (OS):
brew install xz readline
then
pyenv install 3.7.7
When prompted that you already have Python and do you wish to reinstall it, accept
I seem to be unable to use pip to install certain packages like numpy
, pandas
or dotenv
on a given pyenv
environment on a M1-MacBook Air.
I've tried updating xcode-select
, setuptools
and wheel
but non yielded any results.
➜ pip install dotenv
Collecting dotenv
Using cached dotenv-0.0.5.tar.gz (2.4 kB)
ERROR: Command errored out with exit status 1:
command: /Users/alephpei/.pyenv/versions/3.8.12/envs/comtools/bin/python3.8 -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/37/hrj2n25s0szb0w7_wg4v0yw80000gn/T/pip-install-sr4mje_9/dotenv_747d32c001b5422f8fb2b775e72f4686/setup.py'"'"'; __file__='"'"'/private/var/folders/37/hrj2n25s0szb0w7_wg4v0yw80000gn/T/pip-install-sr4mje_9/dotenv_747d32c001b5422f8fb2b775e72f4686/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /private/var/folders/37/hrj2n25s0szb0w7_wg4v0yw80000gn/T/pip-pip-egg-info-p9qxvhmc
cwd: /private/var/folders/37/hrj2n25s0szb0w7_wg4v0yw80000gn/T/pip-install-sr4mje_9/dotenv_747d32c001b5422f8fb2b775e72f4686/
Complete output (1589 lines):
ERROR: Command errored out with exit status 1:
command: /Users/alephpei/.pyenv/versions/3.8.12/envs/comtools/bin/python3.8 -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/37/hrj2n25s0szb0w7_wg4v0yw80000gn/T/pip-wheel-7aglt0px/distribute_e2997156b23e4dad80657a4786abad0a/setup.py'"'"'; __file__='"'"'/private/var/folders/37/hrj2n25s0szb0w7_wg4v0yw80000gn/T/pip-wheel-7aglt0px/distribute_e2997156b23e4dad80657a4786abad0a/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /private/var/folders/37/hrj2n25s0szb0w7_wg4v0yw80000gn/T/pip-pip-egg-info-yi7x1rto
cwd: /private/var/folders/37/hrj2n25s0szb0w7_wg4v0yw80000gn/T/pip-wheel-7aglt0px/distribute_e2997156b23e4dad80657a4786abad0a/
Complete output (15 lines):
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/private/var/folders/37/hrj2n25s0szb0w7_wg4v0yw80000gn/T/pip-wheel-7aglt0px/distribute_e2997156b23e4dad80657a4786abad0a/setuptools/__init__.py", line 2, in <module>
from setuptools.extension import Extension, Library
File "/private/var/folders/37/hrj2n25s0szb0w7_wg4v0yw80000gn/T/pip-wheel-7aglt0px/distribute_e2997156b23e4dad80657a4786abad0a/setuptools/extension.py", line 5, in <module>
from setuptools.dist import _get_unpatched
File "/private/var/folders/37/hrj2n25s0szb0w7_wg4v0yw80000gn/T/pip-wheel-7aglt0px/distribute_e2997156b23e4dad80657a4786abad0a/setuptools/dist.py", line 7, in <module>
from setuptools.command.install import install
File "/private/var/folders/37/hrj2n25s0szb0w7_wg4v0yw80000gn/T/pip-wheel-7aglt0px/distribute_e2997156b23e4dad80657a4786abad0a/setuptools/command/__init__.py", line 8, in <module>
from setuptools.command import install_scripts
File "/private/var/folders/37/hrj2n25s0szb0w7_wg4v0yw80000gn/T/pip-wheel-7aglt0px/distribute_e2997156b23e4dad80657a4786abad0a/setuptools/command/install_scripts.py", line 3, in <module>
from pkg_resources import Distribution, PathMetadata, ensure_directory
File "/private/var/folders/37/hrj2n25s0szb0w7_wg4v0yw80000gn/T/pip-wheel-7aglt0px/distribute_e2997156b23e4dad80657a4786abad0a/pkg_resources.py", line 1518, in <module>
register_loader_type(importlib_bootstrap.SourceFileLoader, DefaultProvider)
AttributeError: module 'importlib._bootstrap' has no attribute 'SourceFileLoader'
----------------------------------------
WARNING: Discarding https://files.pythonhosted.org/packages/5f/ad/1fde06877a8d7d5c9b60eff7de2d452f639916ae1d48f0b8f97bf97e570a/distribute-0.7.3.zip#sha256=3dc7a8d059dcf72f0ead2fa2144a24ee0ef07dce816e8c3545d7345767138c5e (from https://pypi.org/simple/distribute/). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
ERROR: Command errored out with exit status 1:
command: /Users/alephpei/.pyenv/versions/3.8.12/envs/comtools/bin/python3.8 -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/37/hrj2n25s0szb0w7_wg4v0yw80000gn/T/pip-wheel-7aglt0px/distribute_b3164b4c41d24a6da43d4e7a4c1b03a3/setup.py'"'"'; __file__='"'"'/private/var/folders/37/hrj2n25s0szb0w7_wg4v0yw80000gn/T/pip-wheel-7aglt0px/distribute_b3164b4c41d24a6da43d4e7a4c1b03a3/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /private/var/folders/37/hrj2n25s0szb0w7_wg4v0yw80000gn/T/pip-pip-egg-info-0c7g8a40
cwd: /private/var/folders/37/hrj2n25s0szb0w7_wg4v0yw80000gn/T/pip-wheel-7aglt0px/distribute_b3164b4c41d24a6da43d4e7a4c1b03a3/
Complete output (10 lines):
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/private/var/folders/37/hrj2n25s0szb0w7_wg4v0yw80000gn/T/pip-wheel-7aglt0px/distribute_b3164b4c41d24a6da43d4e7a4c1b03a3/setuptools/__init__.py", line 2, in <module>
from setuptools.extension import Extension, Library
File "/private/var/folders/37/hrj2n25s0szb0w7_wg4v0yw80000gn/T/pip-wheel-7aglt0px/distribute_b3164b4c41d24a6da43d4e7a4c1b03a3/setuptools/extension.py", line 5, in <module>
from setuptools.dist import _get_unpatched
File "/private/var/folders/37/hrj2n25s0szb0w7_wg4v0yw80000gn/T/pip-wheel-7aglt0px/distribute_b3164b4c41d24a6da43d4e7a4c1b03a3/setuptools/dist.py", line 103
except ValueError, e:
^
SyntaxError: invalid syntax
----------------------------------------
Which repeats itself until it says:
WARNING: Discarding https://files.pythonhosted.org/packages/fa/5a/6dcdddeaa0ddc0bd331fdd1bc8696d9650ede0cb014083716976174eb4b8/dotenv-0.0.1.tar.gz#sha256=04006132a48e301a40b5bc3e8ea0d667a68981f277bb1785af0f8b9f7958e278 (from https://pypi.org/simple/dotenv/). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
ERROR: Could not find a version that satisfies the requirement dotenv (from versions: 0.0.1, 0.0.2, 0.0.4, 0.0.5)
ERROR: No matching distribution found for dotenv
Suggested process (or maybe integrate it directly in the .dotfile
setup process? Not sure best practice)
pip install pylint
sublimeLinter.sublime-settings - User
{
"paths": {
"osx": [
"~/.pyenv/shims",
]
},
"linters": {
"pylint": {
"disable": false
},
}
}
Student really needs to be able to code in their text editor starting week7, and being able to display the docstring in your editor makes their life much easier.
I have found a SublimeText3 docstring that works well using AnacondaIDE
but I had to add the following file in my User/Anaconda.sublime-setting
to make it work with pyenv and stop interfering with Flake8
{
"anaconda_linting": false,
"python_interpreter": "/Users/brunolajoie/.pyenv/shims/python"
}
Shortcut to display docstring is then Cmd-Shift-D
🚀
this occurs on old mac versions (High Sierra)
I tried installing a new version of sqlite using brew, then installing a new version of python and creating a new virtual env, but to no effect
I only heard about this issue with 1 student on batch #502
A few students have been replacing user and email in the following command, can we add a comment suggesting not to edit that line?
It should already be installed on your laptop from the previous commands. First you need to login:
gh auth login -s 'user:email' -w
ISSUE: Some Windows Students encounters the follwing error
"Please enable the Virtual Machine Platform Windows feature and ensure virtualization is enabled in the BIOS."
at step https://github.com/lewagon/data-setup/blob/master/WINDOWS.md#upgrade-to-wsl-2
SOLUTION:
update the install link, it's not clear for the students which instructions to use with the general link:
although the setup deals with this by running brew install xz
before installing pyenv
, a lot of students on macos still get the lzma warning :
UserWarning: Could not import the lzma module. Your installed Python is incomplete. Attempting to use lzma compression will result in a RuntimeError. warnings.warn(msg)
in order to solve this it was necessary to upgrade python to 3.8.6 and create a new virtual env:
brew install xz
pyenv install 3.8.6
pyenv virtualenv 3.8.6 data
pyenv activate data
(then update the ~/.zshrc
in order to use the new env as default, and rerun the setup pip install ...
steps)
which is very strange, since these steps exactly replicate the setup
Visual C++ redistributable package is needed for tensorflow
It would be good to add it to the setup
Lots of mac student are having this error when installing homebrew
Error: Not a valid ref: refs/remotes/origin/master :
fatal: ambiguous argument 'refs/remotes/origin/master': unknown revision or path not in the working tree.
The fix we found was to have them do this, as explained in Homebrew/brew#10368
rm -fr $(brew --repo homebrew/core) # because you can't `brew untap homebrew/core`
brew tap homebrew/core
We should improve the setup so that it doesn't
that would be great
https://github.com/lewagon/data-setup/blob/master/macOS.md#custom-css
Context : the set up was done in July 2020 on my machine. I wanted to upgrade to Python 3.8.6 and followed the steps again in June 2021
For the custom CSS, I did not have a variable JUPYTER_CONFIG_DIR
defined.
On Slack @gmanchon mentioned that we could find the config file with this command:
jupyter --config-dir
=> it could make sense to add this on the set up page
👉 Open an Ubuntu terminal and run the following commands
🚨 replace GITHUB_NICKNAME by your GitHub nickname
cd ~/code/GITHUB_NICKNAME
ls -la
If the command does not show the data-challenges and dotfiles directories, ask for a TA 🙏
Otherwise, you can proceed with the setup:
Issue => At this point of the set-up, students don't have forked/clone the data-challenges repository yet, it is done afterward in the Kitt instruction.
We've had several issues with the students editing their .zshrc files with TextEdit from macos and it saving it with a hidden .txt extension. Zsh wouldn't source these files then.
I suggest the students add the following line to their shell configuration
export EDITOR='code'
so they don't accidentally end up editing their source files with Word or sth.
I've been running into this issue for quite some time now and I actually found it weird that no one pointed that out yet hahaha
Thing is, inside ~/.zshrc
we have the following line:
type -a pyenv > /dev/null && eval "$(pyenv init -)" && eval "$(pyenv virtualenv-init -)"
However, this returns an error in the students' terminals ("pyenv init -" no longer sets PATH
), making python
and pip
unknown commands. What I found around this was to change this line to this one:
type -a pyenv > /dev/null && eval "$(pyenv init --path)" && eval "$(pyenv virtualenv-init -)"
If correct, I think we should change the dotfiles to make that change. Otherwise, I'd like to know what I'm missing :)
Nearly every student who completed the setup ended up with this error when running make
the first time:
This was solved by making them run pip install pylint
, but it also resulted in some other dependency errors (related to typing_extensions
) in the terminal output. It still successfully installed pylint
, but may cause some other issues down the road.
Windows students lack a lot of aliases that are present in the dotfiles
.
dotfiles
for git bash usersNo where is it require a Windows user to install the MS Visual Studio. When one tries to follow the sum_of_three (Kick-start terminal instructions), the code . will fail with the "Command Not Found" error.
Either request that we install the MS Visual Studio or provide a different command.
Our current data-setup creates the following error in VS code when opening a terminal
pyenv shell shims
❯ pyenv shell shims
pyenv: version `shims' not installed
In our setup we require "python.pythonPath": "~/.pyenv/shims/python",
which used to work fine until we created the a .zprofile
Link #150
Removing this setup line fix the shims issue, but creates asynchronicity between VS code terminal and the OS Terminal. We now have to manually select the shims venv at every OS login.
Add this lines to the windows setup to start ssh-agent and prevent re-asking for passphrase
eval ssh-agent -s
ssh-add ~/.ssh/id_ed25519
plugins=(gitfast last-working-dir common-aliases sublime zsh-syntax-highlighting history-substring-search pyenv ssh-agent)
missing git here to have access to git shorcuts
illegal hardware instruction
it seems that he has an old CPU, according to the doc, starting with TensorFlow 1.6
, binaries use AVX instructions which may not run on older CPUs.
student config : INTEL(R) Celeron(R) N4030 CPU @1.10GHz 1.10 GHz
✅it looks like the only turnaround is to use an older version oftensorflow (<1.6)
or to build from source
:github: official issue : tensorflow/tensorflow#17411
Proposition to fix this issue : check beforehand the configuration of student' computers with $ grep flags -m1 /proc/cpuinfo | cut -d ":" -f 2 | tr '[:upper:]' '[:lower:]' | { read FLAGS; OPT="-march=native"; for flag in $FLAGS; do case "$flag" in "sse4_1" | "sse4_2" | "ssse3" | "fma" | "cx16" | "popcnt" | "avx" | "avx2") OPT+=" -m$flag";; esac; done; MODOPT=${OPT//_/\.}; echo "$MODOPT"; }
If by executing the command, -mavx
and/or -mavx2
is not shown, it can be confirmed that AVX support is missing and the source build should be done with other optimization flags displayed in the output.
This article explains how to build TensorFlow from sources and optimizes for the older CPU. The key is in detecting the CPU flags and enable all the CPU flags for optimization when configuring the build.
Thanks to :
https://stackoverflow.com/questions/49094597/illegal-instruction-core-dumped-after-running-import-tensorflow
It seems like students copy the entire line of plugins in the "Disable ssh passphrase prompt" instead of just adding ssh-agent
at the end.
The line is missing the pyenv
plugin though, causing errors afterwards!
https://github.com/lewagon/data-setup/blob/master/WINDOWS.md#disable-ssh-passphrase-prompt
So ideally this would be the line for the students' copy/paste convenience
plugins=(git gitfast last-working-dir common-aliases sublime zsh-syntax-highlighting history-substring-search pyenv ssh-agent)
If Sublime hasn't been launched at all, this step:
https://github.com/lewagon/data-setup/blob/master/WINDOWS.md#sublime-text-3---package
will throw an error and not allow it.
However, if the app has been launched for the first time, the step works fine. I suggest we add a small note to have the students open Sublime once.
as discussed in https://github.com/lewagon/data-challenges/issues/378
With current setup, students will have a Sublime text whose Build system is still python 2.7.7, and does not contain any of the pip modules installed under their pyenv virtualenv.
For instance, I couldn't build (Cmd-B) a file starting with "import pandas" because pandas was not installed
I've followed the following steps and it work like a charm.
Do reckon we should update the setup to configure their Build system? Or do you have another alternative in mind?
How do they do with Ruby?
Students are confused where to type those on windows:
VS Code Extensions
Let's gain time now and add other extensions that will be helpful during your Bootcamp:
--> add to instruction: "Please copy those in your powershell"
cf #75
With GCP you get 3 months free credits, if the pt students install gcp on setup day, their free credits will expire before they get to data engineering :(
Maybe mention on the GCP section that free credits will expire in 3 months and will only be used in data engineering week, so better to wait
With windows WSL, putting the gcp keys in windows/documents won't work as WSL file system is separate.
Instead, they should copy the gcp keys in a folder in the WSL file system, for example
/home/barangerbenjamin/code/barangerbenjamin/gcp_keys
having the initialisation of pyenv in both .zprofile
and .zshrc
seems to generate some confusion
https://lewagon-alumni.slack.com/archives/G02NFDT0J/p1632938371252800?thread_ts=1632502334.175100&cid=G02NFDT0J
https://lewagon-alumni.slack.com/archives/C02EDQ2SNBE/p1633184235135600
Not an issue, a nice to have instruction for students.
Disk space taken by Ubuntu on WSL2 (Ubuntu after) is not given back to the host when files or docker images are deleted.
A manual action on the Ubuntu virtual disk is required to reclaim that space for windows.
Ubuntu virtual disk is located here:
%LOCALAPPDATA%/Packages/CanonicalGroupLimited./LocalState/ext4.vhdx
This operation will do the trick for Windows Pro users:
cd $env:LOCALAPPDATA\Packages\<TARGET_DISTRO_FOLDERNAME>\LocalState\
wsl --shutdown
optimize-vhd -Path .\ext4.vhdx -Mode full
And this one should be fine for Windows Family:
wsl --shutdown
diskpart
select vdisk file=$env:LOCALAPPDATA\Packages\<TARGET_DISTRO_FOLDERNAME>\LocalState\ext4.vhdx
attach vdisk readonly
compact vdisk
detach vdisk
exit
It's for edge cases but for PC often built with a SSD < 256G, this is a game changer.
During projects students often build a lot of docker images, create new virtualenvs, etc... and it eats space quickly.
I think it'd be good for them to have this cleaning method within reach during the bootcamp.
The official pyenv
documentation says that you need to do the following when installing with homebrew
.
echo 'eval "$(pyenv init --path)"' >> ~/.zprofile
echo 'eval "$(pyenv init -)"' >> ~/.zshrc
Received a warning in the first step to install Visual Studio Code:
wget -q https://packages.microsoft.com/keys/microsoft.asc -O- | sudo apt-key add -
Error:
Warning: apt-key is deprecated. Manage keyring files in trusted.gpg.d instead (see apt-key(8)).
Worked around it by downloading the .deb from here and installed in the graphical interface.
Running Ubuntu 21.04 - kernel 5.14.2
It's never mentioned that the modification needs to take place in the .zshrc
file
https://github.com/lewagon/data-setup/blob/master/WINDOWS.md#disable-ssh-passphrase-prompt
dotfiles has been updated a lot since 2014. If the alumni run the webdev program since more than a year, dotfiles are out of date and must be updated
During the setup, 3 commands should be run inside the Powershell console.
To run those commands, the console needs to be open with Run as administrator
otherwise the commands will fail with the require elevated rights
error message
ctrl + shift + enter
doesn't seem to be explicit enough for the students
@krokrob @brunolajoie @zzuziak @ssaunier
Hello all,
Yesterday during the Setup, a good half of the class had to manually setup MagicPython in Sublime text (download package, deactivate default Python).
Perhaps we could add that part in the instructions? It would save us a lot of tickets.
When running the check for pandas
, scikit
and Tensorflow
the following message comes up (WSL Student):
After digging into it we followed this article, it seems that his computer is missing AVX support. Downgrading to Tensorflow 1.5 would require Python 3.7. Building from source might not be stable, but we could try that.
For now we skipped Tensorflow installation and just in case he will run it on Google Colab.
Any solution would be appreciated!
We can activate a local virtual environment on a per folder basis with pyenv local <virtualenv>
(for example pyenv local lewagon
in the data-challenges
folder, or pyenv local project-env
in another project's folder).
The virtual environment automatically switches depending on the current folder of the terminal.
This seems like a better practice than the current solution (adding a static pyenv activate lewagon
in the .zshrc
file, and updating it when switching projects) not only for the bootcamp, but also for the future, when they take on multiple data projects.
The updated setup would be as simple as navigating to the data-challenges
folder, then running pyenv local lewagon
.
So apparently the homebrew path has changed from /usr/local to /opt/homebrew depending on the system architecture as per this reddit post:
https://www.reddit.com/r/MacOS/comments/jw9guu/why_did_homebrew_move_from_usrlocalto_opthomebrew/
A student was trying to execute this command in the Google Cloud setup:
/usr/local/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/install.sh
But it had to be changed to the following in order to work:
/opt/homebrew/Caskroom/google-cloud-sdk/latest/google-cloud-sdk/install.sh
Several students experienced weird error upon adding the GPG key from the Docker installation tutorial for Linux.
The key message in error is suspended tty
which means that there is some process running in the background an interfering in the terminal. We couldn't find anything on Google but what worked for everyone was to turn the computer off and on again ♥ and then adding the GPG key, running sudo apt update
and continuing with the installation.
I can make a PR for it but first wanted to check if it happended as well for any Mac users? If not, I'll update the Ubuntu and Windows docs.
The Virtual Environment step in the Windows setup could use some screenshots to show how the terminal should look like after the virtual environment is successfully created. Some students fail this step and just continue, creating lots of backtracking after.
Now it tells you to run python -m venv lewagon
and after that Awesome! The virtual environment has been created. But if this step fails, or if the bash
and git
config commands after that fail, the student has no feedback whether they can continue or not.
We don't use it since we have zoom campus
The shortcut ctrl + ,
doesn't work in the Terminal (WSL) - setup here.
We managed to sort it by opening the settings manually:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.