numfocus / gsoc Goto Github PK
View Code? Open in Web Editor NEWNumFOCUS Google Summer of Code Materials
Home Page: http://numfocus.org/programs/gsoc.html
NumFOCUS Google Summer of Code Materials
Home Page: http://numfocus.org/programs/gsoc.html
Hi,
This document seems overly complicated to me:
https://github.com/numfocus/gsoc/blob/master/CONTRIBUTING-students.md
Melange requires students to submit their applications in their system which is basically a text form (I think it may support formatting). At SymPy we've always used the Github wiki for students to develop their proposals. We create an application template and then they simply copy and paste into a new page on the wiki.
We also require students to submit a patch with their application so they do have to learn about how Github works in the application process, but having the applications on the wiki removes a lot of headache that the PR system adds. I'm not sure having them fork and submit PRs for their proposal is that useful.
Secondly, each sub org usually has their own process for this. If NumFOCUS, an umbrella org, enforces this method, then all sub orgs have to adjust their current systems. For example, we (SymPy) have a process that we've honed for 8 eight years and would like to stick to that.
Last note: We always have formatting problems when students transfer their proposals from our system (github wiki in SymPy's case) to Melange's system. We should figure out what the Melange system supports (it may not be Github flavored markdown, I can't remember).
The aim of this project is to build tools to turn lessons into packages that can be installed via standard package management applications. A proof of concept exists at https://github.com/swcarpentry/installable-lesson-demo-01, and more information is given in the ideas-list file.
The present information about the project Categorical Axes fails to convey the exact nature of work to be done. Could there be an elaboration of the abstract ?
Julia is a high-level, high-performance dynamic programming language for technical computing and has seen significant adoption in various fields/areas of practice. One of the current deficiencies in the language is the lack of a robust networking / data exchange framework for obtaining and exporting data using standard network protocols. Existing efforts have relied on unstable/minimally-supported external packages; as a result, there is an underlying fragility to the network and web stacks.
We would like to improve the robustness of the Julia network/web framework by initially focusing on the sound implementation of a TLS package that would underpin secure communications between Julia programs and the general internet community. It is our hope that the TLS package would be the official way to develop secure higher level network interfaces such as LDAPS, HTTPS, FTPS, etc. The non-secure versions of these interfaces/protocols will also need some work.
Hello! @gregcaporaso I am Sourav Singh and I have made pull requests to QIIME 1,micronota and scikit-bio project. And as an enthusiast in bioinformatics , I am really interested in working on the project. As a student who has used some preprocessing tools like FastQC and Trimmomatic.
I would like to know if we can add a plugin which can allow for paired-end read merging, which can make it easier for the sequence data to be processed.
Also, are there any requirements prior to sending a proposal? I would like to also discuss about the plugins which can be added to QIIME2.
I have a request from a student to allow them to make some edits to their proposal following the deadline. It is possible to allow this with Melange, but it seems like we should have a policy for this that applies to all of the applicants.
pomegranate is a library for fast and flexible probabilistic modeling for python, implemented in cython. We have a list of open projects stratified by model here: https://github.com/jmschrei/pomegranate/projects. I have served as a mentor for GSoC before as part of the sklearn team.
@aterrel would NumFOCUS be interested in supporting this project this summer?
Hi,
I am interested in developping the sci-kit bio "A scikit-bio-based bioinformatics file format converter" project. However, I see there are no mentors listed on the ideas page, is this problematic?
Hello all,
today (the PI day: March 14, 2016) marks the start of GSOC'16 application period. Since 2015 edition, some rules have changed and starting later today you can submit a draft of your proposal to the GSOC website which we, mentors, will comment on.
As you noted, NumFOCUS began accepting the drafts as Pull Requests earlier (https://github.com/numfocus/gsoc/pulls).
We will keep an eye on the list of proposal drafts on the GSOC website, but we encourage everyone submitting their proposal draft to link to their PR in this repository โ if you keep everything here, it will be a lot easier for us to follow.
I'll add comments to this issue if anything new turns out. You can subscribe to it to receive notifications.
Thanks in advance,
Piotr
Hi, @tacaswell I am interested in working on Categorical axes project for matplotlib and very keen to work on it. I would like to know about kind of interactions required for packages like seaborn and pandas and Compound Artists Project.
Hi, I am a graduate student from Germany interested in Data Science.
I would like to extend the retriever project as part of GSOC '17 to include support for spatial data into its current implementation. I have a fair understanding of spatial data and currently working on formulating an idea to bring installation of the vast variety of GIS data under the same umbrella. I was hoping to have @ethanwhite and @henrykironde as my mentors.
As referred here, I am opening this issue for further discussion on this.
Hi.
After going through the Abstract and the various surveys conducted by Software Carpentry on SurveyMonkey, I have understood the application as follows:
Also, there is some confusion regarding the statement :
These surveys are shared among all of the workshops
I request you to point out any misunderstanding as I'm currently studying the SurveyMonkey API and drafting a proposal.
Thanking You.
Regards,
Meet Vora.
18th question in the "pre-workshop" survey is titled as "Was the workshop held in the United States?".
This seems a little wrong in the context, as we are talking to the user before the workshop. There might be case when the user is taking the survey before coming to the workshop.
We use the same title for "post-workshop" survey also, which is fine.
For "pre-workshop" surveys, I think it should be something like "Is the workshop being held in the United States?".
Pardon me if I should not have created an issue for this tiny doubt.
Hi,
I am Prerit Garg, a final year B.Tech ( Computer Engineering ) Student in Jamia millia Islamia University. I have pretty good skills in python, flask, django, HTML, CSS, JavaScript, Boostrap etc. I have worked on several full stack projects involving both front end and back-end, while working with Zopper as an intern. I developed several dashboards and RESTful APIs. I would like to contribute to the idea "A survey responses visualizer application" this year as the project for Gsoc 2016. For the heads up, i am currently going through the SurveyMonkey API. I would appreciate more guidance to get started and writing a proposal for the same.
Thanks.
Can we have a survey page made public so that we can ascertain the fields that constitute it.
It will also allow us to recreate the survey form and browse SurveyMonkey admin.
The EcoData Retriever is a Python based tool
for automatically downloading, cleaning up,
and restructuring ecological data. It does the hard work of data munging so
that scientists can focus on doing science.
One of the challenges of ecological (and evolutionary) data is that the
names of species are constantly being
redefined. This makes it difficult to combine datasets to do interesting
science. By automating reconciliation of
different species names as part of the process of accessing the data in the
first place it will become much easier
to combine diverse datasets and in new and interesting ways.
This project would extendthe EcoData Retriever using Python to access one
or more of the existing web
services that reconcile species names
(e.g., iPlant's Taxonomic
Name Resolution Service)
and automatically replacing the species scientific names in the generated
databases. Specifically this
would involve:
Scientific names are stored inconsistently across datasets, so it will be
necessary to either modifying the scripts
that hold information on each dataset to indicate the location of the
species information or use an existing ontology
to automatically identify the location.
The EcoData Retriever primarily interacts via issues and pull requests on
GitHub.
Hi! I'm third year undergraduate student interested in Bayesian statistics, Machine Learning and Economics.
I want to contribute to pymc3 and extend Variational Inference methods it currently supports. Moreover I am sure that it is possible to make unified interface for these methods so it will be easy to implement new ones. I hope @twiecki can be my mentor
I'll write my proposal a bit later as I have exams now. But having feedback a bit earlier would be great
Hi everybody,
I am Amritanshu Jain, a sophomore currently studying at BITS Pilani, Pilani Campus. I wish to apply for GSoC under your organisation. I have a programming experience of over six years in Python and JS, and have gained substantial command over Python libraries/frameworks such as Django, Scrapy, NumPy OpenCV, Pandas. I have also worked with Node.js and React/Redux for a few projects.
While going through your project ideas I got very fascinated by the idea of Add Spatial Data Support to Data Retriever since recently I made a small version of Data Retriever for Mutual Funds. The script would scrape data of past 10 years of almost 170 mutual fund schemes, process it into a csv and then save it into a local postgres database.
For the past few months I had been more involved with development of Web applications and Text Mining. As discussed with one of the mentors I would like to extend Data Retriever to disciplines where it lacks data such as market data and more. Kindly go through my CV if time permits. I have mentioned my experience and projects in brief in the same.
Lately I have been doing plenty of projects, hence I am already in a very good programming flow, I would love to program a basic prototype and so that I can assure you that I am capable to undertaking this GSoC project. Please guide me on how to proceed.
Thanks,
Amritanshu Jain
Github - https://github.com/jainamritanshu/
Email - [email protected]
@ethanwhite @henrykironde #2017 #DataRetriever
Hi @pbanaszkiewicz ,
I am Aditya Narayan. I have contributed six commit to aMY and wish to participate in GSOC 2016 under SWC.
I have created a test survey and written a sample project which creates bar graph from the responses. It is hosted here
The sample project is minimal and queries the SurveyMonkey API on every request.
Please guide me on what I should be doing next?
I am an undergraduate student from India interested in Machine Learning and Bioinformatics.
I am interested in working on-
I am opening this issue as a discussion forum on the projects.
@ethanwhite Please have a look at the proposal.
Hi @rgaiacs and @pbanaszkiewicz,
I have a question about the "A survey responses visualizer application" project idea. Hope you will help me.
I saw that SurveyMonkey already provides features for visualizing the responses. For example, this is for a sample survey I have created.
Although I'm not using the "Premium" service of SurvyeMonkey and assume that SWC uses it, a few more advancements can be done with the responses.
For example
In the visualizer project, we can use the IP address to geolocate (probabilistic) our respondents for a survey.
Can you please explain to me why we are not using the default visualizations shown to us by SurveyMonkey?
Is it because of the freedom to have additional features like caching, and to integrate it with AMY?
Sorry if this sounds a naive question to ask.
looks like the readme put SymPy's description in place of Spyder's.
If you want to be a mentor for NumFOCUS on GSoC 2016 please email me your email address. You will find my email address on my GitHub profile page.
Software Carpentry has installation-test scripts so students can check that they've successfully installed any software required by their workshop. However, we don't collect the results of student tests, which makes a number of things more difficult than they need to be. Statistics about installed versions would make it easy to:
This project would:
I'm not particular about the web framework you use to write the server, but I have the most experience with Django and Flask. If you prefer a different framework, I'm fine with anything that takes care of the boilerplate and lets you focus on the high-level tasks.
Designing and implementing a simple API for storing test results, error messages, diagnostic system information, etc. We want a robust, flexible system that's small and easy to maintain going forward.
Any of these skills could be learned during the project, but you probably can't learn all of them during the project ;).
The Software Carpentry community primarily interacts via issues and pull requests on
GitHub and the discuss@
mailing list. There's also an IRC channel.
Thanks to @xuf12 for the initial idea behind this project.
Hi @rgaiacs, @pbanaszkiewicz!
(Quoting from issue opened at swc-repo)
I would like to put forward a suggestion of using conda as a package installer and manager. It is not only platform agnostic but also a generic package installer compared to
pip
which is used to manage only Python packages. It also allows us to create specificpy2
andpy3
environments and solve dependency issues, without any hassle.This might be more useful for the target audience of Software Carpentry which are mostly scientists and researchers with little to no expertise with computers. For example, bioconda is a conda channel that allows one to install bioinformatics tools through
conda
program.I'd like to extend this discussion in relation to the idea at numfocus/gsoc/2016/swc.
Thanks!
Hello @ethanwhite, @henrykironde
I am a 3rd year B.Tech (Computer Science and Engineering) student from UIET. I have gone through the project ideas list and very much interested in "EcoData Retriever" project.
I am comfortable with the skill set requirement and have experience for the same, I also have previous experiences as well and I have done some research projects where I was engaged in handling large data sets and playing around "Time Series Analysis" and Prediction.
Please guide me, "How can I start contributing to this project ?". I would be interested in sharing ideas and other possibilities as well as understanding metadata and JSON input format for python parser.
Thanks.
Hi @pbanaszkiewicz and @rgaiacs, I'd like to participate in GSoC and work on "managing workflow for instructor training" project. So far, I've created this pull request to Amy: carpentries/amy#722
What are the next steps? I guess that I should analyze the current workflow. I know about the two etherpads ("Registration for one-hour discussion" and "Teaching demo") and nothing more. Where can I find more information about the current workflow?
Hello,
I am very interested in developping the "Weights support in most stat functions" project.
May I ask if the project is still open?
MDAnalysis would like to take part under the NumFOCUS umbrella as well. We already have a list of project ideas that I would copy here.
@rgaiacs are you OK with this? I just want ask officially instead of just adding it myself.
I am a second-year computer science student from Thapar university India,
I am interested in working on the idea "Support for complex numbers within Convex.jl".
Can someone guide me regarding this project, what all things should start studying and also I am not quite not clear with the idea what exactly has to be made.
Waiting for your early response,
Thankyou
Rohan Goel
([email protected])
JuliaDiffEq is the organization which host DifferentialEquations.jl and associated libraries for numerically solving differential equations in Julia. We have a wide range of projects already laid out at the Julia GSoC website, (and including one more currently in a PR to the website). Please let me know what you need me to do to make this official.
Hello! I am Robert Dumitru, a Freshman year Computer Science student at the University of Bucharest. I have extensive knowledge of C/C++, I know Python at a medium level and I have pretty vast knowledge of algorithms and data structures. Regarding my experience with web development, I have taken a course in Ruby on Rails, that being my starting point, and I have studied Django by myself, based on previously learned mechanisms and concepts. I can't say I have a really broad experience using Django, but I have enough information to be sure that this project would not represent a difficulty for me, regarding the web framework. I have basic understanding of Bootstrap, but I will keep on practicing to improve my skills. I read the SurveyMonkey API Docs and I implemented some scripts that make use of it's functionalities, so I got pretty familiar with it. Tomorrow, I'll be exploring the available options in terms of plotting libraries and I will try to choose the one that I consider to suit the project's needs.
I have three questions:
@aterrel Could you provide write access to this repository to
Hello @ethanwhite @henrykironde ,
I am a 4th year B.Tech. student from DTU, Delhi. I am interested to contribute to this project. I have been working with python libs and frameworks such as Scikit-learn and Flask.
For this project, I assume we have to port all these scripts into a datapackage.json format and edit the parser so that it understands the JSON instead of a text file. Also, a further goal is to provide a CLI (and maybe GUI) to add new scripts by letting user specify the input.
Are there any requirements prior to submitting a proposal? I would like to discuss the layout for the new metadata with mentors as well.
Thanks
I already filled the GSoC 2016 application, there is a copy of it at https://github.com/numfocus/gsoc/blob/master/organization/profile.md#organization-profile. Changes can be made until February 19, 2016 at 19:00 UTC so please review it.
I will close this issue after February 19, 2016 at 19:00 UTC.
I had a look at the provenance library. Let us discuss the requirements so that we can get started on the design of the problem, and see if this is a good fit or not.
Link -> http://archive.is/74MON
Good evening,
In the next few hours I will be sending a proposal for this project. I hate being the guy who gets in touch at the last minute, but sometimes there is no way around that.
I already cloned the repo and installed everything to get myself started.
Thanks,
Ernesto Cedeno.
I wrote the proposal here
I would love to hear your feedback :-)
I wish to write a proposal for Managing work flow for instructor training project.
Can a student write two proposals under NumFocus?
Hello Developers,
I am Ryan Stolliker, a Computer Science Student at University of California, Irvine, and I have experience with Python, HTML, CSS, JavaScript, and web frameworks for Python. I am interested in the result-aggregation server for installation-test scripts project, but I have a question. Will the data be stored permanently, collecting data across many workshops, or will data be created and destroyed each workshop?
Thank you.
Amy is a Django application for managing workshops like those run by Software Carpentry. Basic functionality is in place, but there are many outstanding feature requests, and it could use a lot more testing.
Amy is a straightforward Django application; experience with that framework (and with Python) is required. Some of the features we would like to add require Javascript, so familiarity with JQuery and other frameworks is an asset.
Amy will manage personal identifying information, so applicants should also have a basic understanding of security engineering. Experience with deploying and maintaining applications is also an asset.
Project Title :- manage work flow for instructor training.
Issue : Recording applications from individuals and groups interested in instructor training.
sol -
At this point I noticed only some users can log into AMY , mostly admin and some other instructors. A new trainee can only apply ( through a form, and wait for the committee to contact ). what I propose is to create a login accounts for trainees also. This would be different from the AMY dashboard ( as i understand that the AMY dashboard and its other features are designed to meet administrative requirements and is not general purpose). It would have minimal options. In fact I just have two in mind. one is for notifications (using Django notification system ) and other is a list of trainees also applying for instructor training( a simple view to list down the other people applying, with a filtering options , on location and university etc, much like the filtering option AMY already has).
Issue : - Help individuals form groups.
sol -
extending from the above idea, we can easily search for people and other groups. and send requests to join them.
issue : - tracking progress
sol - we create a window for admin and other mentors, which is mostly a view listing all the trainees and the tasks they have completed.
Issue :- When someone completes training, we manually create an entry in AMY to record.
sol - Through maintaining accounts like this we can very easily export their details to the instructors list. This is very alike to what AMY already does in Bulk add people.
These are a few of my ideas to tackle the issues mentioned. This list is by no means complete and many issues have not been mentioned. I'm opening this issue, so that I can discuss on them and start on a little piece of code as a proof of concept, before writing the whole proposal.
Hi @pbanaszkiewicz and @rgaiacs
I was going through the idea of Result Aggregation Server for the installation-test scripts, and found it interesting. I have extensively worked on flask RESTful APIs while working with Zopper as an Intern. As mentioned in the description of project, I have created the demo api using flask and mysql which stores the os details from the client in the mysql database. Please.. guide me through the next steps, so that I can move towards writing the proposal for the same.
cheers!
Prerit Garg
@ethanwhite @henrykironde
Hi i am Bishakh Ghosh,
I am really interested to participate in GSOC 2016 under EcoData Retriever. I have cloned the source code and installed it and was able to download a couple of datasets.
I have made one PR weecology/retriever#442 however I don't know if it is any good.
I have knowledge of python and some web frameworks like Django and Flask and I know how to use REST APIs.
Also I have some knowledge of SQL and ORMs like SQL Alchemy.
Here is a list of my works: http://ghoshbishakh.github.io/blog/about/
What should be my next step? Should I try to make a sample app that fetches data from iPlant's Taxonomic Name Resolution Service?
Also is it possible to apply for both the projects as both seem interesting to me, although since I know web APIs a bit so my first preferance will be adding taxonomic name resolution to the EcoData Retriever.
I spoke this morning with Rainere about NumFOCUS's role as an umbrella org for GSoC and it sounds like you all want to accept applications for any NumFOCUS organizations. To make this more NumFOCUS oriented I'd like to propose some ideas/questions:
I'm happy to do the grunt work to make this happen if people agree.
I just noticed that the ideas we contributed weren't linked to from anywhere (that I could find easily anyway). Should they be linked from the SWC pages, or somewhere else? Thanks.
Dataretriever wants to be a sub-org again this year. They already participated with NumFOCUS last year.
Is it OK with NumFOCUS if they participate again this year @aterrel ?
The EcoData Retriever is a Python based tool
for automatically downloading, cleaning up,
and restructuring ecological data. It does the hard work of data munging so
that scientists can focus on doing science.
Science is experiencing a reproducibility crisis in that it is becoming
clear that published results often cannot be
reproduced. Part of the challenge of reproducibility in data science style
research is that most of the steps
related to data: downloading it, cleaning it up, restructuring it, are
either done manually or using one-off scripts
and therefore this phase of the scientific process is not reproducible. The
EcoData Retriever already solves many of
these problems, but it doesn't currently keep track of exactly what has
been done and therefore fails to support
full reproducible workflows.
This project would extend the EcoData Retriever using Python to store all
of the metadata necessary for
full reproduction in an associated SQLite database.
Specifically this would involve:
The EcoData Retriever primarily interacts via issues and pull requests on
GitHub.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.