numfocus / gsoc Goto Github PK

View Code? Open in Web Editor NEW

457.0 104.0 296.0 6.79 MB

NumFOCUS Google Summer of Code Materials

Home Page: http://numfocus.org/programs/gsoc.html

TeX 55.26% Jupyter Notebook 44.74%

gsoc gsoc-2023

gsoc's Introduction

Google Summer of Code

| Sub organizations | IDEAS LIST | Student guides |

NumFOCUS will be applying again as an umbrella mentoring organization for Google Summer of Code 2024. NumFOCUS supports and promotes world-class, innovative, open source scientific software.

NumFOCUS is committed to promoting and sustaining a professional and ethical community. Our Code of Conduct is our effort to uphold these values and it provides a guideline and some of the tools and resources necessary to achieve this.

Google Summer of Code is an annual open source internship program sponsored by Google. This repository contains information specific to NumFOCUS' participation in GSoC. For general information about the competition, including this year's application timeline and key phases involved, please see the GSoC website

This Git repository stores information about NumFOCUS' application for Google Summer of Code in the current and previous years.

Table of Contents

Students
Sub Organizations
Organizations Confirmed Under NumFOCUS Umbrella
NumFOCUS Projects
About GSoC

Students

NumFOCUS is participating as a umbrella organization. This means that you will need to identify a specific project to apply to under the NumFOCUS umbrella. (Projects are listed below.)

Read this document to learn how to apply for the GSoC program with NumFOCUS. Please also check out our ideas list.

For any questions, please open an issue in our issue tracker or send a email to [email protected], our mailing list address. Please also consider subscribing to the mailing list at https://groups.google.com/a/numfocus.org/forum/#!forum/gsoc.

Sub Organizations

If you want to participate as a sub organization of NumFOCUS please read this guide.

Organizations Confirmed Under NumFOCUS Umbrella

In alphabetic order.

	aeon aeon is an open-source scikit-learn compatible toolkit for time series tasks such as forecasting, classification, regression, clustering, anomaly detection and segmentation. It provides a broad library of time series algorithms, including efficient implementations of the latest advances in research. Website \| Ideas Page \| Slack \| Source Code
	AiiDA AiiDA is a python framework for managing computational science workflows, with roots in computational materials science. It helps researchers manage large numbers of simulations (10k, 100k, 1M, ...) and complex workflows involving multiple executables. At the same time, it records the provenance of the entire simulation pipeline with the aim to make it fully reproducible. Website \| Ideas List \| Discourse \| Source Code
	ArviZ ArviZ is a project dedicated to promoting and building tools for exploratory analysis of Bayesian models. It currently has a Python and a Julia interface. ArviZ aims to integrate seamlessly with established probabilistic programming languages like PyStan, PyMC, Turing, Soss, emcee, or Pyro. Where the probabilistic programming languages aim to make it easy to build and solve Bayesian models, the ArviZ libraries aim to make it easy to process and analyze the results from those Bayesian models. Website \| Ideas List \| Contact (Gitter) \| Source Code
	Bambi Bambi (BAyesian Model Building Interface) is an open source Python package designed to make it easier for practitioners to build statistical models from a wide range of families using a formula notation similar to those found in R. It is built on top of the PyMC probabilistic programming framework and the ArviZ package for exploratory analysis of Bayesian models. Website \| Ideas List \| Discussions \| Source Code
	biocommons The biocommons is a community that fosters collaboration on pre-competitive, interoperable, and high-quality bioinformatics open source software and data, primarily for biological sequence analysis and interpretation. Our software is used by clinical genetics/diagnostics companies, computational biologists and scientists, and tool and database developers. Website \| Project Ideas \| Getting Connected \| GitHub
	CB-Geo MPM CB-Geo MPM is an HPC-enabled Material Point Method solver for large-deformation modeling. It supports isoparametric elements to model complex geometries and creates photo-realistic rendering. Website \| Ideas List \| Discussions \| Source Code
	Colour Colour is an open-source Python package providing a comprehensive number of algorithms and datasets for colour science. It is freely available under the New BSD License terms. Website \| Ideas List \| Contact \| Source Code
	CuPy CuPy is a NumPy/SciPy-compatible array library for GPU-accelerated computing with Python. CuPy acts as a drop-in replacement to run existing NumPy/SciPy code on NVIDIA CUDA or AMD ROCm platforms. Website \| Ideas List \| Chat on Gitter \| Contact \| Source Code
	Conda Forge A community led collection of recipes, build infrastructure and distributions for the conda package manager. Website \| Ideas List \| Gitter \| Source Code
	Data Retriever The Data Retriever ecosystem improves reproducible research through data product management. The platform takes advantage of freely available data sources in a variety of formats, standardizes them, and makes them available to scientists in a form that is ready to analyze. Data sources range from tabular data, spatial data packages and APIs. Several data packages use the ecosystems, and many projects support or rely on the ecosystem. Website \| Ideas List \| Contact (Gitter) \| Source Code
	FEniCS FEniCS is an automated finite element library used to solve equations used in modeling, featuring a domain-specific language and automated code generation. Users input a problem that looks very much like mathematical notation; FEniCS then translates that into computer code. It solves problems for which there is no analytical (exact) solution numerically. Website \| Ideas List \| Contact \| Source Code
	FluxML FluxML is a 100%-pure Julia machine learning stack built on top of Julia's native automatic differentiation and GPU support. Our organization maintains packages for building and training neural networks, data pre-processing pipelines, standard deep learning models, automatic differentiation, and more. By writing our complete toolchain in Julia, we aim to make machine learning simple, extensible, and performant. Website \| Ideas List \| Contact (Slack or Zulip) \| Source Code
	Gridap Gridap is a new generation, open-source, finite element (FE) library implemented in the Julia programming language. Gridap aims at adopting a more modern programming style than existing FE applications written in C/C++ or Fortran. Website \| Ideas List \| Contact (Gitter) \| Source Code
	JuMP JuMP is a modeling language and collection of supporting packages for mathematical optimization in Julia. JuMP makes it easy to formulate and solve a range of problem classes, including linear programs, integer programs, conic programs, semidefinite programs, and constrained nonlinear programs. Website \| Ideas List \| Contact \| Source Code
	JupyterLab JupyterLab is a web-based interactive development environment for notebooks, code, and data. Its flexible interface allows users to configure and arrange workflows in data science, scientific computing, computational journalism, and machine learning. A modular design invites extensions to expand and enrich functionality. JupyterLab is a project of the Jupyter organization: free software, open standards, and web services for interactive computing across all programming languages. JupyterLab Website \| Jupyter Website \| Ideas List \| Contact \| Source Code
	Matplotlib Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. Matplotlib makes easy things easy and hard things possible. Website \| Ideas List \| Gitter \| Source Code
	Mesa Mesa allows users to quickly create agent-based models using built-in core components (such as spatial grids and agent schedulers) or customized implementations; visualize them using a browser-based interface; and analyze their results using Python’s data analysis tools. Its goal is to be the Python 3-based counterpart to NetLogo, Repast, or MASON. Website \| Ideas Page \| Contact (Mailing List) \| Source Code
	NetworkX NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. Website \| Ideas Page \| Contact (GitHub Discussions) \| Source Code
	OpenFHE OpenFHE is an open-source Fully Homomorphic Encryption (FHE) library that includes efficient implementations of all common FHE schemes: BFV, BGV, CKKS, DM and CGGI. Website \| Ideas Page \| Contact Us (Discourse)\| Source Code
	Open Science Labs Open Science Labs is a global community dedicated to creating an open space for teaching, learning, and sharing information about open science and computational tools. Our community develops tools that address real-world problems and collaborates with other projects and workgroups to improve technology and create international opportunities for our community. Although our focus may seem broad, we initially prioritize supporting Research Software Engineers (RSEs) who often face computational challenges in their work. Website \| Ideas Page \| Contact (GitHub Discussions) \| Source Code
	Optuna Optuna is an open source hyperparameter optimization framework to automate hyperparameter search. Optuna features 1. define-by-run interface for defining search spaces, 2. state-of-the-art algorithms to efficiently search large spaces and prune unpromising trials for faster results, and 3. easy parallelization for hyperparameter searches over multiple threads or processes without modifying code. Website \| Ideas Page \| Contact ([email protected])\| Source Code
	pvlib pvlib python provides a set of functions and classes for simulating the performance of photovoltaic energy systems. Website \| Google Group Forum \| Ideas Page \| Source Code
	PyBaMM PyBaMM (Python Battery Mathematical Modelling) solves physics-based electrochemical DAE models by using state-of-the-art automatic differentiation and numerical solvers. Website \| Contact \| Ideas Page \| Source Code
	PyLops PyLops is an open-source Python library focused on providing a backend-agnostic, idiomatic, matrix-free library of linear operators and related computations. It is inspired by the iconic MATLAB Spot – A Linear-Operator Toolbox project. Website \| Slack \| Ideas Page \| Source Code
	PyMC PyMC is a python module for Bayesian statistical modeling and model fitting which focuses on advanced Markov chain Monte Carlo and variational fitting algorithms. Its flexibility and extensibility make it applicable to a large suite of problems. Website \| discourse \| Ideas Page \| Source Code
	PySAL PySAL is a python library for geographical data science. It consists of 18 subpackages that cover a wide range of spatial analytical methods from exploratory spatial data analysis, spatial interaction modeling, spatial optimization, spatial econometrics, segregation, and spatial interpolation, among others. Website \| gitter \| Ideas Page \| Source Code
	PyTorch-Ignite PyTorch-Ignite is a high-level library to help with training neural networks in PyTorch Website \| Discord \| GitHub Discussions \| Ideas Page \| Source Code
	QuTiP QuTiP is a software for simulating quantum systems. QuTiP aims to provide tools for user-friendly and efficient numerical simulations of open quantum systems. It can be used to simulate a wide range of physical phenomenon in areas such as quantum optics, trapped ions, superconducting circuits and quantum nanomechanical resonators. In addition, it contains a number of other modules to simplify the numerical simulation and study of many topics in quantum physics such as quantum optimal control, quantum information, and computing. Website \| Contact \| Ideas Page \| Source Code
	SciML SciML is an open source software organization created to unify the packages for scientific machine learning. This includes the development of modular scientific simulation support software, such as differential equation solvers, along with the methodologies for inverse problems and automated model discovery. By providing a diverse set of tools with a common interface, we provide a modular, easily-extendable, and highly performant ecosystem for handling a wide variety of scientific simulations. Website \| Contact \| Ideas Page \| Source Code
	Taskflow Parallel and heterogeneous programming with high performance and simultaneous high productivity Website \| Contact \| Ideas Page \| Source Code
	TNL TNL is a collection of building blocks that facilitate the development of efficient numerical solvers and HPC algorithms. It is implemented in C++ using modern programming paradigms in order to provide a flexible and user-friendly interface such as the STL library, for example. TNL provides native support for modern hardware architectures such as multicore CPUs, GPUs, and distributed systems, which can be managed via a unified interface. Website \| Gitter \| Ideas Page \| Source Code
	Zarr Zarr is a file storage format for chunked, compressed, N-dimensional arrays based on an open-source specification. Website \| Gitter \| Ideas Page \| Source Code

gsoc's People

Contributors

Stargazers

Watchers

Forkers

ethanwhite yashasvigirdhar moorepants rgaiacs willingc wking hasintha12 schamnad srjoglekar246 akshayah3 acroy lyttonhao darshan95 amitjamadagni insertinterestingnamehere mwiebe gregcaporaso roger-luo abelsiqueira mlubin madeleineudell ishan5ain sidgupta234 weecology mdboom jreback srinjay tmylk adityabhagat pravj prerit2010 tacaswell piskvorky shree-shubham pmespresso raj-maurya nikhilweee meetvora ryan-w-s devasenainupakutika shubhsingh594 rtriangle pkofod chrismedrela ramcha24 bhargavvader arsenal9971 oxphos devashishd12 prabh27 blessingchick ayush-iitkgp i2000s asutosh7hota kain88-de jngod2011 fenics rgbkrk ferrine ash1998 kanchanarp mriduls henrykironde jstac ankitkp4056 zeus512 sciencepal chaitanyaphalak harshish patniharshit ivanyashchuk utkbansal bwengals ferabi quantshah katierose1029 shreyasnbhat ssequeira vidursatija hminle jainamritanshu theopetunde2 kahnchana michalhabera kapilkd13 asifzubair samir55 chinmayapancholi13 alekhka shivamnegi kvnamipara prakhar2b parulsethi kris-singh shubhvachher zaffnet vedang14 benda1997 himanshurepo shshnkg

gsoc's Issues

Write a result-aggregation server for the installation-test scripts

Background

Software Carpentry has installation-test scripts so students can check that they've successfully installed any software required by their workshop. However, we don't collect the results of student tests, which makes a number of things more difficult than they need to be. Statistics about installed versions would make it easy to:

Figure out how much trouble our students are having with installation (for a given workshop or in general).
Make informed decisions about deprecating older versions of our various dependencies (see swcarpentry/DEPRECATED-bc#724, swcarpentry/sql-novice-survey#13, swcarpentry/git-novice#32, and here) or modernizing our lessons to catch up with current systems (see carpentries/workshop-template#157 and carpentries/workshop-template#159).

Approach

This project would:

Create a model for installed packages, failed package checks, and diagnostic system information.
Design an API so clients can submit the results of their installation-test script.
Write a small server to serve the API, store the results in a relational database, and allow administators to analyze the content.
Update the installation-test scripts to (optionally) submit their results to the new server.

I'm not particular about the web framework you use to write the server, but I have the most experience with Django and Flask. If you prefer a different framework, I'm fine with anything that takes care of the boilerplate and lets you focus on the high-level tasks.

Challenges

Designing and implementing a simple API for storing test results, error messages, diagnostic system information, etc. We want a robust, flexible system that's small and easy to maintain going forward.

Involved toolkits or projects

Python, for extending the existing installation test scripts.
A web framework like Django, Flask, RoR, Express, ...
Relational database management systems (RDBMS) including MySQL, PostgreSQL, SQLite.

Degree of difficulty and needed skills

Moderate difficulty. This will be a simple server application, but you'll be designing and writing it from scratch.
Knowledge of Python, to write client-side code for the installation test scripts.
Knowledge of a web framework and basic API design.
Knowledge of relational databases or a wrapping library for storing the results.

Any of these skills could be learned during the project, but you probably can't learn all of them during the project ;).

Involved developer communities

The Software Carpentry community primarily interacts via issues and pull requests on
GitHub and the discuss@ mailing list. There's also an IRC channel.

Mentors

@wking

Acknowlegements

Thanks to @xuf12 for the initial idea behind this project.

Query: Survey Responses Visualizer

Hi.

After going through the Abstract and the various surveys conducted by Software Carpentry on SurveyMonkey, I have understood the application as follows:

There are 5 surveys associated with each workshop ID
As a workshop conductor (host) only can view the results of his workshops, the conductor has to log into the application using his/her SurveyMonkey account (The OAuth & SurveyMonkey API will be used for authorization)
The conductor has to either provide the link to survey with workshop ID as a query parameter or directly feed the app with workshop ID.
In the latter case, the 5 surveys associated with the workshop ID are shown, of which user chooses one.
The result of the selected survey is shown graphically or in format as requested (to the logged in conductor).
Integration with AMY.

Also, there is some confusion regarding the statement :

These surveys are shared among all of the workshops

I request you to point out any misunderstanding as I'm currently studying the SurveyMonkey API and drafting a proposal.

Thanking You.
Regards,
Meet Vora.

Survey responses visualizer - Misunderstood requirements

Hello! I am Robert Dumitru, a Freshman year Computer Science student at the University of Bucharest. I have extensive knowledge of C/C++, I know Python at a medium level and I have pretty vast knowledge of algorithms and data structures. Regarding my experience with web development, I have taken a course in Ruby on Rails, that being my starting point, and I have studied Django by myself, based on previously learned mechanisms and concepts. I can't say I have a really broad experience using Django, but I have enough information to be sure that this project would not represent a difficulty for me, regarding the web framework. I have basic understanding of Bootstrap, but I will keep on practicing to improve my skills. I read the SurveyMonkey API Docs and I implemented some scripts that make use of it's functionalities, so I got pretty familiar with it. Tomorrow, I'll be exploring the available options in terms of plotting libraries and I will try to choose the one that I consider to suit the project's needs.
I have three questions:

We have to create a pull request to get a review of our draft proposal?
Where should I send a sample Django project to get your opinion about it? (because I don't think I'm 100% sure that I have understood how the required functionalities should be implemented)
Is a GitHub repository link alright for the sample project?

Question about result-aggregation server

Hello Developers,

I am Ryan Stolliker, a Computer Science Student at University of California, Irvine, and I have experience with Python, HTML, CSS, JavaScript, and web frameworks for Python. I am interested in the result-aggregation server for installation-test scripts project, but I have a question. Will the data be stored permanently, collecting data across many workshops, or will data be created and destroyed each workshop?

Thank you.

Project Idea for GSoC 2017

Hi everybody,
I am Amritanshu Jain, a sophomore currently studying at BITS Pilani, Pilani Campus. I wish to apply for GSoC under your organisation. I have a programming experience of over six years in Python and JS, and have gained substantial command over Python libraries/frameworks such as Django, Scrapy, NumPy OpenCV, Pandas. I have also worked with Node.js and React/Redux for a few projects.

While going through your project ideas I got very fascinated by the idea of Add Spatial Data Support to Data Retriever since recently I made a small version of Data Retriever for Mutual Funds. The script would scrape data of past 10 years of almost 170 mutual fund schemes, process it into a csv and then save it into a local postgres database.

For the past few months I had been more involved with development of Web applications and Text Mining. As discussed with one of the mentors I would like to extend Data Retriever to disciplines where it lacks data such as market data and more. Kindly go through my CV if time permits. I have mentioned my experience and projects in brief in the same.

Lately I have been doing plenty of projects, hence I am already in a very good programming flow, I would love to program a basic prototype and so that I can assure you that I am capable to undertaking this GSoC project. Please guide me on how to proceed.
Thanks,
Amritanshu Jain
Github - https://github.com/jainamritanshu/
Email - [email protected]
@ethanwhite @henrykironde #2017 #DataRetriever

manage work flow for instructor training - ideas discussion.

Project Title :- manage work flow for instructor training.

Issue : Recording applications from individuals and groups interested in instructor training.
sol -
At this point I noticed only some users can log into AMY , mostly admin and some other instructors. A new trainee can only apply ( through a form, and wait for the committee to contact ). what I propose is to create a login accounts for trainees also. This would be different from the AMY dashboard ( as i understand that the AMY dashboard and its other features are designed to meet administrative requirements and is not general purpose). It would have minimal options. In fact I just have two in mind. one is for notifications (using Django notification system ) and other is a list of trainees also applying for instructor training( a simple view to list down the other people applying, with a filtering options , on location and university etc, much like the filtering option AMY already has).
Issue : - Help individuals form groups.
sol -
extending from the above idea, we can easily search for people and other groups. and send requests to join them.
issue : - tracking progress
sol - we create a window for admin and other mentors, which is mostly a view listing all the trainees and the tasks they have completed.
Issue :- When someone completes training, we manually create an entry in AMY to record.
sol - Through maintaining accounts like this we can very easily export their details to the instructors list. This is very alike to what AMY already does in Bulk add people.

These are a few of my ideas to tackle the issues mentioned. This list is by no means complete and many issues have not been mentioned. I'm opening this issue, so that I can discuss on them and start on a little piece of code as a proof of concept, before writing the whole proposal.

Interested in contributing to Data Retriever

Hi, I am a graduate student from Germany interested in Data Science.

I would like to extend the retriever project as part of GSOC '17 to include support for spatial data into its current implementation. I have a fair understanding of spatial data and currently working on formulating an idea to bring installation of the vast variety of GIS data under the same umbrella. I was hoping to have @ethanwhite and @henrykironde as my mentors.

As referred here, I am opening this issue for further discussion on this.

Add JuliaDiffEq as a project

JuliaDiffEq is the organization which host DifferentialEquations.jl and associated libraries for numerically solving differential equations in Julia. We have a wide range of projects already laid out at the Julia GSoC website, (and including one more currently in a PR to the website). Please let me know what you need me to do to make this official.

Discussion: result-aggregation server for installation-scripts

Hi @rgaiacs, @pbanaszkiewicz!

(Quoting from issue opened at swc-repo)

I would like to put forward a suggestion of using conda as a package installer and manager. It is not only platform agnostic but also a generic package installer compared to pip which is used to manage only Python packages. It also allows us to create specific py2 and py3 environments and solve dependency issues, without any hassle.

This might be more useful for the target audience of Software Carpentry which are mostly scientists and researchers with little to no expertise with computers. For example, bioconda is a conda channel that allows one to install bioinformatics tools through conda program.

I'd like to extend this discussion in relation to the idea at numfocus/gsoc/2016/swc.

Thanks!

Add proposal: reporting app from SurveyMonkey results

Add taxonomic name resolution to the EcoData Retriever to facilitate data science approaches to ecology - doubts

@ethanwhite @henrykironde
Hi i am Bishakh Ghosh,
I am really interested to participate in GSOC 2016 under EcoData Retriever. I have cloned the source code and installed it and was able to download a couple of datasets.
I have made one PR weecology/retriever#442 however I don't know if it is any good.

I have knowledge of python and some web frameworks like Django and Flask and I know how to use REST APIs.
Also I have some knowledge of SQL and ORMs like SQL Alchemy.
Here is a list of my works: http://ghoshbishakh.github.io/blog/about/

What should be my next step? Should I try to make a sample app that fetches data from iPlant's Taxonomic Name Resolution Service?

Also is it possible to apply for both the projects as both seem interesting to me, although since I know web APIs a bit so my first preferance will be adding taxonomic name resolution to the EcoData Retriever.

Interested in Categorical Axes Project for matplotlib

Hi, @tacaswell I am interested in working on Categorical axes project for matplotlib and very keen to work on it. I would like to know about kind of interactions required for packages like seaborn and pandas and Compound Artists Project.

Need write access permission

@aterrel Could you provide write access to this repository to

moorepants
r-gaia-cs
wking

Add Dataretriever as a sub-org

Dataretriever wants to be a sub-org again this year. They already participated with NumFOCUS last year.

Is it OK with NumFOCUS if they participate again this year @aterrel ?

Interested in sci-kit bio project: Are there mentors?

Hi,

I am interested in developping the sci-kit bio "A scikit-bio-based bioinformatics file format converter" project. However, I see there are no mentors listed on the ideas page, is this problematic?

Organization disambiguation

I spoke this morning with Rainere about NumFOCUS's role as an umbrella org for GSoC and it sounds like you all want to accept applications for any NumFOCUS organizations. To make this more NumFOCUS oriented I'd like to propose some ideas/questions:

May we move this repository to the NumFOCUS Github account?
May we add a page to the NumFOCUS website about its participation in GSoC 2015?
Might we create an ideas page that is more like the one PSF does, where projects simply link to their ideas pages? See: https://wiki.python.org/moin/SummerOfCode/2015

I'm happy to do the grunt work to make this happen if people agree.

Add pomegranate as a project

pomegranate is a library for fast and flexible probabilistic modeling for python, implemented in cython. We have a list of open projects stratified by model here: https://github.com/jmschrei/pomegranate/projects. I have served as a mentor for GSoC before as part of the sklearn team.

@aterrel would NumFOCUS be interested in supporting this project this summer?

Interested in contributing to PyMC3

Hi! I'm third year undergraduate student interested in Bayesian statistics, Machine Learning and Economics.

I want to contribute to pymc3 and extend Variational Inference methods it currently supports. Moreover I am sure that it is possible to make unified interface for these methods so it will be easy to implement new ones. I hope @twiecki can be my mentor

I'll write my proposal a bit later as I have exams now. But having feedback a bit earlier would be great

ANNOUNCEMENT to prospect students

Hello all,

today (the PI day: March 14, 2016) marks the start of GSOC'16 application period. Since 2015 edition, some rules have changed and starting later today you can submit a draft of your proposal to the GSOC website which we, mentors, will comment on.

As you noted, NumFOCUS began accepting the drafts as Pull Requests earlier (https://github.com/numfocus/gsoc/pulls).

We will keep an eye on the list of proposal drafts on the GSOC website, but we encourage everyone submitting their proposal draft to link to their PR in this repository – if you keep everything here, it will be a lot easier for us to follow.

I'll add comments to this issue if anything new turns out. You can subscribe to it to receive notifications.

Thanks in advance,
Piotr

Manage workflow for instructor training

@gvwilson @pbanaszkiewicz

I wish to write a proposal for Managing work flow for instructor training project.
Can a student write two proposals under NumFocus?

Simplify proposal method for students

Hi,

This document seems overly complicated to me:

https://github.com/numfocus/gsoc/blob/master/CONTRIBUTING-students.md

Melange requires students to submit their applications in their system which is basically a text form (I think it may support formatting). At SymPy we've always used the Github wiki for students to develop their proposals. We create an application template and then they simply copy and paste into a new page on the wiki.

We also require students to submit a patch with their application so they do have to learn about how Github works in the application process, but having the applications on the wiki removes a lot of headache that the PR system adds. I'm not sure having them fork and submit PRs for their proposal is that useful.

Secondly, each sub org usually has their own process for this. If NumFOCUS, an umbrella org, enforces this method, then all sub orgs have to adjust their current systems. For example, we (SymPy) have a process that we've honed for 8 eight years and would like to stick to that.

Last note: We always have formatting problems when students transfer their proposals from our system (github wiki in SymPy's case) to Melange's system. We should figure out what the Melange system supports (it may not be Github flavored markdown, I can't remember).

Interested in contributing to "EcoData Retriever" project.

Hello @ethanwhite, @henrykironde

I am a 3rd year B.Tech (Computer Science and Engineering) student from UIET. I have gone through the project ideas list and very much interested in "EcoData Retriever" project.

I am comfortable with the skill set requirement and have experience for the same, I also have previous experiences as well and I have done some research projects where I was engaged in handling large data sets and playing around "Time Series Analysis" and Prediction.

Please guide me, "How can I start contributing to this project ?". I would be interested in sharing ideas and other possibilities as well as understanding metadata and JSON input format for python parser.

Thanks.

Interested in pandas project

Hello,
I am very interested in developping the "Weights support in most stat functions" project.
May I ask if the project is still open?

Enhance Amy, a Django application for managing workshops

Amy is a Django application for managing workshops like those run by Software Carpentry. Basic functionality is in place, but there are many outstanding feature requests, and it could use a lot more testing.

Amy is a straightforward Django application; experience with that framework (and with Python) is required. Some of the features we would like to add require Javascript, so familiarity with JQuery and other frameworks is an asset.

Amy will manage personal identifying information, so applicants should also have a basic understanding of security engineering. Experience with deploying and maintaining applications is also an asset.

Improving reproducibility in science by adding provenance tracking to the EcoData Retriever

Rationale

The EcoData Retriever is a Python based tool
for automatically downloading, cleaning up,
and restructuring ecological data. It does the hard work of data munging so
that scientists can focus on doing science.

Science is experiencing a reproducibility crisis in that it is becoming
clear that published results often cannot be
reproduced. Part of the challenge of reproducibility in data science style
research is that most of the steps
related to data: downloading it, cleaning it up, restructuring it, are
either done manually or using one-off scripts
and therefore this phase of the scientific process is not reproducible. The
EcoData Retriever already solves many of
these problems, but it doesn't currently keep track of exactly what has
been done and therefore fails to support
full reproducible workflows.

Approach

This project would extend the EcoData Retriever using Python to store all
of the metadata necessary for
full reproduction in an associated SQLite database.

Specifically this would involve:

Object oriented programming in Python
Design an SQLite database for storing provenance information or use an
existing framework (e.g,. http://code.google.com/p/core-provenance-library/)
Implement checks to make sure that the data is in the same form created
by the Retriever when retrieving provenance information

Involved toolkits or projects

The EcoData Retriever
Python
Relational database management systems (RDBMS) including MySQL,
PostgreSQL, SQLite
Potentially an existing provenance library

Degree of difficulty and needed skills

Moderate Difficulty
Knowledge of Python
Some experience with SQL

Involved developer communities

The EcoData Retriever primarily interacts via issues and pull requests on
GitHub.

Mentors

Manage workflow for instructor training

Hi @pbanaszkiewicz and @rgaiacs, I'd like to participate in GSoC and work on "managing workflow for instructor training" project. So far, I've created this pull request to Amy: carpentries/amy#722

What are the next steps? I guess that I should analyze the current workflow. I know about the two etherpads ("Registration for one-hour discussion" and "Teaching demo") and nothing more. Where can I find more information about the current workflow?

Queries regarding Survey Response Visualizer

Can we have a survey page made public so that we can ascertain the fields that constitute it.
It will also allow us to recreate the survey form and browse SurveyMonkey admin.

Review GSoC 2016 application

I already filled the GSoC 2016 application, there is a copy of it at https://github.com/numfocus/gsoc/blob/master/organization/profile.md#organization-profile. Changes can be made until February 19, 2016 at 19:00 UTC so please review it.

I will close this issue after February 19, 2016 at 19:00 UTC.

Lessons as Packages

The aim of this project is to build tools to turn lessons into packages that can be installed via standard package management applications. A proof of concept exists at https://github.com/swcarpentry/installable-lesson-demo-01, and more information is given in the ideas-list file.

Policy on editing proposals after deadline

I have a request from a student to allow them to make some edits to their proposal following the deadline. It is possible to allow this with Melange, but it seems like we should have a policy for this that applies to all of the applicants.

Upgrade to datapackage.json standard - Questions

Hello @ethanwhite @henrykironde ,

I am a 4th year B.Tech. student from DTU, Delhi. I am interested to contribute to this project. I have been working with python libs and frameworks such as Scikit-learn and Flask.

For this project, I assume we have to port all these scripts into a datapackage.json format and edit the parser so that it understands the JSON instead of a text file. Also, a further goal is to provide a CLI (and maybe GUI) to add new scripts by letting user specify the input.

Are there any requirements prior to submitting a proposal? I would like to discuss the layout for the new metadata with mentors as well.

Thanks

EcoData Retriever ideas not linked to from main pages

I just noticed that the ideas we contributed weren't linked to from anywhere (that I could find easily anyway). Should they be linked from the SWC pages, or somewhere else? Thanks.

Issue with the grammar of a question in pre-workshop survey

18th question in the "pre-workshop" survey is titled as "Was the workshop held in the United States?".

This seems a little wrong in the context, as we are talking to the user before the workshop. There might be case when the user is taking the survey before coming to the workshop.

We use the same title for "post-workshop" survey also, which is fine.

For "pre-workshop" surveys, I think it should be something like "Is the workshop being held in the United States?".

Pardon me if I should not have created an issue for this tiny doubt.

General purpose next-generation sequence pre-processor for QIIME 2-Questions

Hello! @gregcaporaso I am Sourav Singh and I have made pull requests to QIIME 1,micronota and scikit-bio project. And as an enthusiast in bioinformatics , I am really interested in working on the project. As a student who has used some preprocessing tools like FastQC and Trimmomatic.

I would like to know if we can add a plugin which can allow for paired-end read merging, which can make it easier for the sequence data to be processed.

Also, are there any requirements prior to sending a proposal? I would like to also discuss about the plugins which can be added to QIIME2.

EcoData Retriever Python3 Support

Good evening,

In the next few hours I will be sending a proposal for this project. I hate being the guy who gets in touch at the last minute, but sometimes there is no way around that.

I already cloned the repo and installed everything to get myself started.

Thanks,
Ernesto Cedeno.

Improving reproducibility in science by adding provenance tracking to the EcoData Retriever

I had a look at the provenance library. Let us discuss the requirements so that we can get started on the design of the problem, and see if this is a good fit or not.
Link -> http://archive.is/74MON

A survey responses visualizer application - queries

Hi,
I am Prerit Garg, a final year B.Tech ( Computer Engineering ) Student in Jamia millia Islamia University. I have pretty good skills in python, flask, django, HTML, CSS, JavaScript, Boostrap etc. I have worked on several full stack projects involving both front end and back-end, while working with Zopper as an intern. I developed several dashboards and RESTful APIs. I would like to contribute to the idea "A survey responses visualizer application" this year as the project for Gsoc 2016. For the heads up, i am currently going through the SurveyMonkey API. I would appreciate more guidance to get started and writing a proposal for the same.
Thanks.

Improving reproducibility in science by adding provenance tracking to the EcoData Retriever.

@ethanwhite Please have a look at the proposal.

Add MDAnalysis as a project

MDAnalysis would like to take part under the NumFOCUS umbrella as well. We already have a list of project ideas that I would copy here.

@rgaiacs are you OK with this? I just want ask officially instead of just adding it myself.

Improve / Re-engineer the web and network framework for Julia

Julia is a high-level, high-performance dynamic programming language for technical computing and has seen significant adoption in various fields/areas of practice. One of the current deficiencies in the language is the lack of a robust networking / data exchange framework for obtaining and exporting data using standard network protocols. Existing efforts have relied on unstable/minimally-supported external packages; as a result, there is an underlying fragility to the network and web stacks.

We would like to improve the robustness of the Julia network/web framework by initially focusing on the sound implementation of a TLS package that would underpin secure communications between Julia programs and the general internet community. It is our hope that the TLS package would be the official way to develop secure higher level network interfaces such as LDAPS, HTTPS, FTPS, etc. The non-secure versions of these interfaces/protocols will also need some work.

Support for complex numbers within Convex.jl

I am a second-year computer science student from Thapar university India,
I am interested in working on the idea "Support for complex numbers within Convex.jl".
Can someone guide me regarding this project, what all things should start studying and also I am not quite not clear with the idea what exactly has to be made.

Waiting for your early response,
Thankyou
Rohan Goel
([email protected])

Add taxonomic name resolution to the EcoData Retriever to facilitate data science approaches to ecology

Rationale

One of the challenges of ecological (and evolutionary) data is that the
names of species are constantly being
redefined. This makes it difficult to combine datasets to do interesting
science. By automating reconciliation of
different species names as part of the process of accessing the data in the
first place it will become much easier
to combine diverse datasets and in new and interesting ways.

Approach

This project would extendthe EcoData Retriever using Python to access one
or more of the existing web
services that reconcile species names
(e.g., iPlant's Taxonomic
Name Resolution Service)
and automatically replacing the species scientific names in the generated
databases. Specifically this
would involve:

Object oriented programming in Python
Using Python to query web service APIs

Challenges

Scientific names are stored inconsistently across datasets, so it will be
necessary to either modifying the scripts
that hold information on each dataset to indicate the location of the
species information or use an existing ontology
to automatically identify the location.

Involved toolkits or projects

The EcoData Retriever
Python
The APIs for the taxonomic name resolution services.
Relational database management systems (RDBMS) including MySQL,
PostgreSQL, SQLite

Degree of difficulty and needed skills

Moderate Difficulty
Knowledge of Python
Knowledge of interacting with web services via APIs is a plus, but could
be learned during the project

Involved developer communities

The EcoData Retriever primarily interacts via issues and pull requests on
GitHub.

Mentors

More Information needed

The present information about the project Categorical Axes fails to convey the exact nature of work to be done. Could there be an elaboration of the abstract ?

JuliaQuantum: New Application Framework

I wrote the proposal here

I would love to hear your feedback :-)

Spyder is not a CAS

looks like the readme put SymPy's description in place of Spyder's.

Result Aggregation Server project - next steps

Hi @pbanaszkiewicz and @rgaiacs

I was going through the idea of Result Aggregation Server for the installation-test scripts, and found it interesting. I have extensively worked on flask RESTful APIs while working with Zopper as an Intern. As mentioned in the description of project, I have created the demo api using flask and mysql which stores the os details from the client in the mysql database. Please.. guide me through the next steps, so that I can move towards writing the proposal for the same.

cheers!
Prerit Garg

Invite mentors

If you want to be a mentor for NumFOCUS on GSoC 2016 please email me your email address. You will find my email address on my GitHub profile page.

Interested in working on pymc3

I am an undergraduate student from India interested in Machine Learning and Bioinformatics.

I am interested in working on-

Implement non-parameteric Bayesian methods
Alternative Computational Engines.

I am opening this issue as a discussion forum on the projects.

Query about the survey visualizer project

Hi @rgaiacs and @pbanaszkiewicz,

I have a question about the "A survey responses visualizer application" project idea. Hope you will help me.

I saw that SurveyMonkey already provides features for visualizing the responses. For example, this is for a sample survey I have created.

Although I'm not using the "Premium" service of SurvyeMonkey and assume that SWC uses it, a few more advancements can be done with the responses.

For example

In the visualizer project, we can use the IP address to geolocate (probabilistic) our respondents for a survey.

Can you please explain to me why we are not using the default visualizations shown to us by SurveyMonkey?

Is it because of the freedom to have additional features like caching, and to integrate it with AMY?

Sorry if this sounds a naive question to ask.

Sample Django Project | Survey Responses Visualizer

Hi @pbanaszkiewicz ,

I am Aditya Narayan. I have contributed six commit to aMY and wish to participate in GSOC 2016 under SWC.
I have created a test survey and written a sample project which creates bar graph from the responses. It is hosted here
The sample project is minimal and queries the SurveyMonkey API on every request.
Please guide me on what I should be doing next?

	aeon aeon is an open-source scikit-learn compatible toolkit for time series tasks such as forecasting, classification, regression, clustering, anomaly detection and segmentation. It provides a broad library of time series algorithms, including efficient implementations of the latest advances in research. Website \| Ideas Page \| Slack \| Source Code
	AiiDA AiiDA is a python framework for managing computational science workflows, with roots in computational materials science. It helps researchers manage large numbers of simulations (10k, 100k, 1M, ...) and complex workflows involving multiple executables. At the same time, it records the provenance of the entire simulation pipeline with the aim to make it fully reproducible. Website \| Ideas List \| Discourse \| Source Code
	ArviZ ArviZ is a project dedicated to promoting and building tools for exploratory analysis of Bayesian models. It currently has a Python and a Julia interface. ArviZ aims to integrate seamlessly with established probabilistic programming languages like PyStan, PyMC, Turing, Soss, emcee, or Pyro. Where the probabilistic programming languages aim to make it easy to build and solve Bayesian models, the ArviZ libraries aim to make it easy to process and analyze the results from those Bayesian models. Website \| Ideas List \| Contact (Gitter) \| Source Code
	Bambi Bambi (BAyesian Model Building Interface) is an open source Python package designed to make it easier for practitioners to build statistical models from a wide range of families using a formula notation similar to those found in R. It is built on top of the PyMC probabilistic programming framework and the ArviZ package for exploratory analysis of Bayesian models. Website \| Ideas List \| Discussions \| Source Code
	biocommons The biocommons is a community that fosters collaboration on pre-competitive, interoperable, and high-quality bioinformatics open source software and data, primarily for biological sequence analysis and interpretation. Our software is used by clinical genetics/diagnostics companies, computational biologists and scientists, and tool and database developers. Website \| Project Ideas \| Getting Connected \| GitHub
	CB-Geo MPM CB-Geo MPM is an HPC-enabled Material Point Method solver for large-deformation modeling. It supports isoparametric elements to model complex geometries and creates photo-realistic rendering. Website \| Ideas List \| Discussions \| Source Code
	Colour Colour is an open-source Python package providing a comprehensive number of algorithms and datasets for colour science. It is freely available under the New BSD License terms. Website \| Ideas List \| Contact \| Source Code
	CuPy CuPy is a NumPy/SciPy-compatible array library for GPU-accelerated computing with Python. CuPy acts as a drop-in replacement to run existing NumPy/SciPy code on NVIDIA CUDA or AMD ROCm platforms. Website \| Ideas List \| Chat on Gitter \| Contact \| Source Code
	Conda Forge A community led collection of recipes, build infrastructure and distributions for the conda package manager. Website \| Ideas List \| Gitter \| Source Code
	Data Retriever The Data Retriever ecosystem improves reproducible research through data product management. The platform takes advantage of freely available data sources in a variety of formats, standardizes them, and makes them available to scientists in a form that is ready to analyze. Data sources range from tabular data, spatial data packages and APIs. Several data packages use the ecosystems, and many projects support or rely on the ecosystem. Website \| Ideas List \| Contact (Gitter) \| Source Code
	FEniCS FEniCS is an automated finite element library used to solve equations used in modeling, featuring a domain-specific language and automated code generation. Users input a problem that looks very much like mathematical notation; FEniCS then translates that into computer code. It solves problems for which there is no analytical (exact) solution numerically. Website \| Ideas List \| Contact \| Source Code
	FluxML FluxML is a 100%-pure Julia machine learning stack built on top of Julia's native automatic differentiation and GPU support. Our organization maintains packages for building and training neural networks, data pre-processing pipelines, standard deep learning models, automatic differentiation, and more. By writing our complete toolchain in Julia, we aim to make machine learning simple, extensible, and performant. Website \| Ideas List \| Contact (Slack or Zulip) \| Source Code
	Gridap Gridap is a new generation, open-source, finite element (FE) library implemented in the Julia programming language. Gridap aims at adopting a more modern programming style than existing FE applications written in C/C++ or Fortran. Website \| Ideas List \| Contact (Gitter) \| Source Code
	JuMP JuMP is a modeling language and collection of supporting packages for mathematical optimization in Julia. JuMP makes it easy to formulate and solve a range of problem classes, including linear programs, integer programs, conic programs, semidefinite programs, and constrained nonlinear programs. Website \| Ideas List \| Contact \| Source Code
	JupyterLab JupyterLab is a web-based interactive development environment for notebooks, code, and data. Its flexible interface allows users to configure and arrange workflows in data science, scientific computing, computational journalism, and machine learning. A modular design invites extensions to expand and enrich functionality. JupyterLab is a project of the Jupyter organization: free software, open standards, and web services for interactive computing across all programming languages. JupyterLab Website \| Jupyter Website \| Ideas List \| Contact \| Source Code
	Matplotlib Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. Matplotlib makes easy things easy and hard things possible. Website \| Ideas List \| Gitter \| Source Code
	Mesa Mesa allows users to quickly create agent-based models using built-in core components (such as spatial grids and agent schedulers) or customized implementations; visualize them using a browser-based interface; and analyze their results using Python’s data analysis tools. Its goal is to be the Python 3-based counterpart to NetLogo, Repast, or MASON. Website \| Ideas Page \| Contact (Mailing List) \| Source Code
	NetworkX NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. Website \| Ideas Page \| Contact (GitHub Discussions) \| Source Code
	OpenFHE OpenFHE is an open-source Fully Homomorphic Encryption (FHE) library that includes efficient implementations of all common FHE schemes: BFV, BGV, CKKS, DM and CGGI. Website \| Ideas Page \| Contact Us (Discourse)\| Source Code
	Open Science Labs Open Science Labs is a global community dedicated to creating an open space for teaching, learning, and sharing information about open science and computational tools. Our community develops tools that address real-world problems and collaborates with other projects and workgroups to improve technology and create international opportunities for our community. Although our focus may seem broad, we initially prioritize supporting Research Software Engineers (RSEs) who often face computational challenges in their work. Website \| Ideas Page \| Contact (GitHub Discussions) \| Source Code
	Optuna Optuna is an open source hyperparameter optimization framework to automate hyperparameter search. Optuna features 1. define-by-run interface for defining search spaces, 2. state-of-the-art algorithms to efficiently search large spaces and prune unpromising trials for faster results, and 3. easy parallelization for hyperparameter searches over multiple threads or processes without modifying code. Website \| Ideas Page \| Contact ([email protected])\| Source Code
	pvlib pvlib python provides a set of functions and classes for simulating the performance of photovoltaic energy systems. Website \| Google Group Forum \| Ideas Page \| Source Code
	PyBaMM PyBaMM (Python Battery Mathematical Modelling) solves physics-based electrochemical DAE models by using state-of-the-art automatic differentiation and numerical solvers. Website \| Contact \| Ideas Page \| Source Code
	PyLops PyLops is an open-source Python library focused on providing a backend-agnostic, idiomatic, matrix-free library of linear operators and related computations. It is inspired by the iconic MATLAB Spot – A Linear-Operator Toolbox project. Website \| Slack \| Ideas Page \| Source Code
	PyMC PyMC is a python module for Bayesian statistical modeling and model fitting which focuses on advanced Markov chain Monte Carlo and variational fitting algorithms. Its flexibility and extensibility make it applicable to a large suite of problems. Website \| discourse \| Ideas Page \| Source Code
	PySAL PySAL is a python library for geographical data science. It consists of 18 subpackages that cover a wide range of spatial analytical methods from exploratory spatial data analysis, spatial interaction modeling, spatial optimization, spatial econometrics, segregation, and spatial interpolation, among others. Website \| gitter \| Ideas Page \| Source Code
	PyTorch-Ignite PyTorch-Ignite is a high-level library to help with training neural networks in PyTorch Website \| Discord \| GitHub Discussions \| Ideas Page \| Source Code
	QuTiP QuTiP is a software for simulating quantum systems. QuTiP aims to provide tools for user-friendly and efficient numerical simulations of open quantum systems. It can be used to simulate a wide range of physical phenomenon in areas such as quantum optics, trapped ions, superconducting circuits and quantum nanomechanical resonators. In addition, it contains a number of other modules to simplify the numerical simulation and study of many topics in quantum physics such as quantum optimal control, quantum information, and computing. Website \| Contact \| Ideas Page \| Source Code
	SciML SciML is an open source software organization created to unify the packages for scientific machine learning. This includes the development of modular scientific simulation support software, such as differential equation solvers, along with the methodologies for inverse problems and automated model discovery. By providing a diverse set of tools with a common interface, we provide a modular, easily-extendable, and highly performant ecosystem for handling a wide variety of scientific simulations. Website \| Contact \| Ideas Page \| Source Code
	Taskflow Parallel and heterogeneous programming with high performance and simultaneous high productivity Website \| Contact \| Ideas Page \| Source Code
	TNL TNL is a collection of building blocks that facilitate the development of efficient numerical solvers and HPC algorithms. It is implemented in C++ using modern programming paradigms in order to provide a flexible and user-friendly interface such as the STL library, for example. TNL provides native support for modern hardware architectures such as multicore CPUs, GPUs, and distributed systems, which can be managed via a unified interface. Website \| Gitter \| Ideas Page \| Source Code
	Zarr Zarr is a file storage format for chunked, compressed, N-dimensional arrays based on an open-source specification. Website \| Gitter \| Ideas Page \| Source Code

numfocus / gsoc Goto Github PK

gsoc's Introduction

Google Summer of Code

Students

Sub Organizations

Organizations Confirmed Under NumFOCUS Umbrella

aeon

AiiDA

ArviZ

Bambi

biocommons

CB-Geo MPM

Colour

CuPy

Conda Forge

Data Retriever

FEniCS

FluxML

Gridap

JuMP

JupyterLab

Matplotlib

Mesa

NetworkX

OpenFHE

Open Science Labs

Optuna

pvlib

PyBaMM

PyLops

PyMC

PySAL

PyTorch-Ignite

QuTiP

SciML

Taskflow

TNL

Zarr

gsoc's People

Contributors

Stargazers

Watchers

Forkers

gsoc's Issues

Background

Approach

Challenges

Involved toolkits or projects

Degree of difficulty and needed skills

Involved developer communities

Mentors

Acknowlegements

Rationale

Approach

Involved toolkits or projects

Degree of difficulty and needed skills

Involved developer communities

Mentors

Rationale

Approach

Challenges

Involved toolkits or projects

Degree of difficulty and needed skills

Involved developer communities

Mentors

Recommend Projects

Recommend Topics

Recommend Org