Code Monkey home page Code Monkey logo

gsoc's Issues

Simplify proposal method for students

Hi,

This document seems overly complicated to me:

https://github.com/numfocus/gsoc/blob/master/CONTRIBUTING-students.md

Melange requires students to submit their applications in their system which is basically a text form (I think it may support formatting). At SymPy we've always used the Github wiki for students to develop their proposals. We create an application template and then they simply copy and paste into a new page on the wiki.

We also require students to submit a patch with their application so they do have to learn about how Github works in the application process, but having the applications on the wiki removes a lot of headache that the PR system adds. I'm not sure having them fork and submit PRs for their proposal is that useful.

Secondly, each sub org usually has their own process for this. If NumFOCUS, an umbrella org, enforces this method, then all sub orgs have to adjust their current systems. For example, we (SymPy) have a process that we've honed for 8 eight years and would like to stick to that.

Last note: We always have formatting problems when students transfer their proposals from our system (github wiki in SymPy's case) to Melange's system. We should figure out what the Melange system supports (it may not be Github flavored markdown, I can't remember).

Improve / Re-engineer the web and network framework for Julia

Julia is a high-level, high-performance dynamic programming language for technical computing and has seen significant adoption in various fields/areas of practice. One of the current deficiencies in the language is the lack of a robust networking / data exchange framework for obtaining and exporting data using standard network protocols. Existing efforts have relied on unstable/minimally-supported external packages; as a result, there is an underlying fragility to the network and web stacks.

We would like to improve the robustness of the Julia network/web framework by initially focusing on the sound implementation of a TLS package that would underpin secure communications between Julia programs and the general internet community. It is our hope that the TLS package would be the official way to develop secure higher level network interfaces such as LDAPS, HTTPS, FTPS, etc. The non-secure versions of these interfaces/protocols will also need some work.

General purpose next-generation sequence pre-processor for QIIME 2-Questions

Hello! @gregcaporaso I am Sourav Singh and I have made pull requests to QIIME 1,micronota and scikit-bio project. And as an enthusiast in bioinformatics , I am really interested in working on the project. As a student who has used some preprocessing tools like FastQC and Trimmomatic.

I would like to know if we can add a plugin which can allow for paired-end read merging, which can make it easier for the sequence data to be processed.

Also, are there any requirements prior to sending a proposal? I would like to also discuss about the plugins which can be added to QIIME2.

Policy on editing proposals after deadline

I have a request from a student to allow them to make some edits to their proposal following the deadline. It is possible to allow this with Melange, but it seems like we should have a policy for this that applies to all of the applicants.

ANNOUNCEMENT to prospect students

Hello all,

today (the PI day: March 14, 2016) marks the start of GSOC'16 application period. Since 2015 edition, some rules have changed and starting later today you can submit a draft of your proposal to the GSOC website which we, mentors, will comment on.

As you noted, NumFOCUS began accepting the drafts as Pull Requests earlier (https://github.com/numfocus/gsoc/pulls).

We will keep an eye on the list of proposal drafts on the GSOC website, but we encourage everyone submitting their proposal draft to link to their PR in this repository โ€“ if you keep everything here, it will be a lot easier for us to follow.

I'll add comments to this issue if anything new turns out. You can subscribe to it to receive notifications.

Thanks in advance,
Piotr

Interested in contributing to Data Retriever

Hi, I am a graduate student from Germany interested in Data Science.

I would like to extend the retriever project as part of GSOC '17 to include support for spatial data into its current implementation. I have a fair understanding of spatial data and currently working on formulating an idea to bring installation of the vast variety of GIS data under the same umbrella. I was hoping to have @ethanwhite and @henrykironde as my mentors.

As referred here, I am opening this issue for further discussion on this.

Query: Survey Responses Visualizer

Hi.

After going through the Abstract and the various surveys conducted by Software Carpentry on SurveyMonkey, I have understood the application as follows:

  1. There are 5 surveys associated with each workshop ID
  2. As a workshop conductor (host) only can view the results of his workshops, the conductor has to log into the application using his/her SurveyMonkey account (The OAuth & SurveyMonkey API will be used for authorization)
  3. The conductor has to either provide the link to survey with workshop ID as a query parameter or directly feed the app with workshop ID.
  4. In the latter case, the 5 surveys associated with the workshop ID are shown, of which user chooses one.
  5. The result of the selected survey is shown graphically or in format as requested (to the logged in conductor).
  6. Integration with AMY.

Also, there is some confusion regarding the statement :

These surveys are shared among all of the workshops

I request you to point out any misunderstanding as I'm currently studying the SurveyMonkey API and drafting a proposal.

Thanking You.
Regards,
Meet Vora.

Issue with the grammar of a question in pre-workshop survey

18th question in the "pre-workshop" survey is titled as "Was the workshop held in the United States?".

This seems a little wrong in the context, as we are talking to the user before the workshop. There might be case when the user is taking the survey before coming to the workshop.

We use the same title for "post-workshop" survey also, which is fine.

For "pre-workshop" surveys, I think it should be something like "Is the workshop being held in the United States?".

Pardon me if I should not have created an issue for this tiny doubt.

A survey responses visualizer application - queries

Hi,
I am Prerit Garg, a final year B.Tech ( Computer Engineering ) Student in Jamia millia Islamia University. I have pretty good skills in python, flask, django, HTML, CSS, JavaScript, Boostrap etc. I have worked on several full stack projects involving both front end and back-end, while working with Zopper as an intern. I developed several dashboards and RESTful APIs. I would like to contribute to the idea "A survey responses visualizer application" this year as the project for Gsoc 2016. For the heads up, i am currently going through the SurveyMonkey API. I would appreciate more guidance to get started and writing a proposal for the same.
Thanks.

Queries regarding Survey Response Visualizer

Can we have a survey page made public so that we can ascertain the fields that constitute it.
It will also allow us to recreate the survey form and browse SurveyMonkey admin.

Add taxonomic name resolution to the EcoData Retriever to facilitate data science approaches to ecology

Rationale

The EcoData Retriever is a Python based tool
for automatically downloading, cleaning up,
and restructuring ecological data. It does the hard work of data munging so
that scientists can focus on doing science.

One of the challenges of ecological (and evolutionary) data is that the
names of species are constantly being
redefined. This makes it difficult to combine datasets to do interesting
science. By automating reconciliation of
different species names as part of the process of accessing the data in the
first place it will become much easier
to combine diverse datasets and in new and interesting ways.

Approach

This project would extendthe EcoData Retriever using Python to access one
or more of the existing web
services that reconcile species names
(e.g., iPlant's Taxonomic
Name Resolution Service
)
and automatically replacing the species scientific names in the generated
databases. Specifically this
would involve:

  • Object oriented programming in Python
  • Using Python to query web service APIs

Challenges

Scientific names are stored inconsistently across datasets, so it will be
necessary to either modifying the scripts
that hold information on each dataset to indicate the location of the
species information or use an existing ontology
to automatically identify the location.

Involved toolkits or projects

  • The EcoData Retriever
  • Python
  • The APIs for the taxonomic name resolution services.
  • Relational database management systems (RDBMS) including MySQL,
    PostgreSQL, SQLite

Degree of difficulty and needed skills

  • Moderate Difficulty
  • Knowledge of Python
  • Knowledge of interacting with web services via APIs is a plus, but could
    be learned during the project

Involved developer communities

The EcoData Retriever primarily interacts via issues and pull requests on
GitHub.

Mentors

Interested in contributing to PyMC3

Hi! I'm third year undergraduate student interested in Bayesian statistics, Machine Learning and Economics.

I want to contribute to pymc3 and extend Variational Inference methods it currently supports. Moreover I am sure that it is possible to make unified interface for these methods so it will be easy to implement new ones. I hope @twiecki can be my mentor

I'll write my proposal a bit later as I have exams now. But having feedback a bit earlier would be great

Project Idea for GSoC 2017

Hi everybody,
I am Amritanshu Jain, a sophomore currently studying at BITS Pilani, Pilani Campus. I wish to apply for GSoC under your organisation. I have a programming experience of over six years in Python and JS, and have gained substantial command over Python libraries/frameworks such as Django, Scrapy, NumPy OpenCV, Pandas. I have also worked with Node.js and React/Redux for a few projects.

While going through your project ideas I got very fascinated by the idea of Add Spatial Data Support to Data Retriever since recently I made a small version of Data Retriever for Mutual Funds. The script would scrape data of past 10 years of almost 170 mutual fund schemes, process it into a csv and then save it into a local postgres database.

For the past few months I had been more involved with development of Web applications and Text Mining. As discussed with one of the mentors I would like to extend Data Retriever to disciplines where it lacks data such as market data and more. Kindly go through my CV if time permits. I have mentioned my experience and projects in brief in the same.

Lately I have been doing plenty of projects, hence I am already in a very good programming flow, I would love to program a basic prototype and so that I can assure you that I am capable to undertaking this GSoC project. Please guide me on how to proceed.
Thanks,
Amritanshu Jain
Github - https://github.com/jainamritanshu/
Email - [email protected]
@ethanwhite @henrykironde #2017 #DataRetriever

Interested in working on pymc3

I am an undergraduate student from India interested in Machine Learning and Bioinformatics.

I am interested in working on-

  1. Implement non-parameteric Bayesian methods
  2. Alternative Computational Engines.

I am opening this issue as a discussion forum on the projects.

Query about the survey visualizer project

Hi @rgaiacs and @pbanaszkiewicz,

I have a question about the "A survey responses visualizer application" project idea. Hope you will help me.

I saw that SurveyMonkey already provides features for visualizing the responses. For example, this is for a sample survey I have created.
sample

Although I'm not using the "Premium" service of SurvyeMonkey and assume that SWC uses it, a few more advancements can be done with the responses.

For example

In the visualizer project, we can use the IP address to geolocate (probabilistic) our respondents for a survey.

Can you please explain to me why we are not using the default visualizations shown to us by SurveyMonkey?

Is it because of the freedom to have additional features like caching, and to integrate it with AMY?

Sorry if this sounds a naive question to ask.

Invite mentors

If you want to be a mentor for NumFOCUS on GSoC 2016 please email me your email address. You will find my email address on my GitHub profile page.

Write a result-aggregation server for the installation-test scripts

Background

Software Carpentry has installation-test scripts so students can check that they've successfully installed any software required by their workshop. However, we don't collect the results of student tests, which makes a number of things more difficult than they need to be. Statistics about installed versions would make it easy to:

Approach

This project would:

  • Create a model for installed packages, failed package checks, and diagnostic system information.
  • Design an API so clients can submit the results of their installation-test script.
  • Write a small server to serve the API, store the results in a relational database, and allow administators to analyze the content.
  • Update the installation-test scripts to (optionally) submit their results to the new server.

I'm not particular about the web framework you use to write the server, but I have the most experience with Django and Flask. If you prefer a different framework, I'm fine with anything that takes care of the boilerplate and lets you focus on the high-level tasks.

Challenges

Designing and implementing a simple API for storing test results, error messages, diagnostic system information, etc. We want a robust, flexible system that's small and easy to maintain going forward.

Involved toolkits or projects

  • Python, for extending the existing installation test scripts.
  • A web framework like Django, Flask, RoR, Express, ...
  • Relational database management systems (RDBMS) including MySQL, PostgreSQL, SQLite.

Degree of difficulty and needed skills

  • Moderate difficulty. This will be a simple server application, but you'll be designing and writing it from scratch.
  • Knowledge of Python, to write client-side code for the installation test scripts.
  • Knowledge of a web framework and basic API design.
  • Knowledge of relational databases or a wrapping library for storing the results.

Any of these skills could be learned during the project, but you probably can't learn all of them during the project ;).

Involved developer communities

The Software Carpentry community primarily interacts via issues and pull requests on
GitHub
and the discuss@ mailing list. There's also an IRC channel.

Mentors

Acknowlegements

Thanks to @xuf12 for the initial idea behind this project.

Discussion: result-aggregation server for installation-scripts

Hi @rgaiacs, @pbanaszkiewicz!

(Quoting from issue opened at swc-repo)

I would like to put forward a suggestion of using conda as a package installer and manager. It is not only platform agnostic but also a generic package installer compared to pip which is used to manage only Python packages. It also allows us to create specific py2 and py3 environments and solve dependency issues, without any hassle.

This might be more useful for the target audience of Software Carpentry which are mostly scientists and researchers with little to no expertise with computers. For example, bioconda is a conda channel that allows one to install bioinformatics tools through conda program.

I'd like to extend this discussion in relation to the idea at numfocus/gsoc/2016/swc.

Thanks!

Interested in contributing to "EcoData Retriever" project.

Hello @ethanwhite, @henrykironde

I am a 3rd year B.Tech (Computer Science and Engineering) student from UIET. I have gone through the project ideas list and very much interested in "EcoData Retriever" project.

I am comfortable with the skill set requirement and have experience for the same, I also have previous experiences as well and I have done some research projects where I was engaged in handling large data sets and playing around "Time Series Analysis" and Prediction.

Please guide me, "How can I start contributing to this project ?". I would be interested in sharing ideas and other possibilities as well as understanding metadata and JSON input format for python parser.

Thanks.

Manage workflow for instructor training

Hi @pbanaszkiewicz and @rgaiacs, I'd like to participate in GSoC and work on "managing workflow for instructor training" project. So far, I've created this pull request to Amy: carpentries/amy#722

What are the next steps? I guess that I should analyze the current workflow. I know about the two etherpads ("Registration for one-hour discussion" and "Teaching demo") and nothing more. Where can I find more information about the current workflow?

Interested in pandas project

Hello,
I am very interested in developping the "Weights support in most stat functions" project.
May I ask if the project is still open?

Add MDAnalysis as a project

MDAnalysis would like to take part under the NumFOCUS umbrella as well. We already have a list of project ideas that I would copy here.

@rgaiacs are you OK with this? I just want ask officially instead of just adding it myself.

Support for complex numbers within Convex.jl

I am a second-year computer science student from Thapar university India,
I am interested in working on the idea "Support for complex numbers within Convex.jl".
Can someone guide me regarding this project, what all things should start studying and also I am not quite not clear with the idea what exactly has to be made.

Waiting for your early response,
Thankyou
Rohan Goel
([email protected])

Survey responses visualizer - Misunderstood requirements

Hello! I am Robert Dumitru, a Freshman year Computer Science student at the University of Bucharest. I have extensive knowledge of C/C++, I know Python at a medium level and I have pretty vast knowledge of algorithms and data structures. Regarding my experience with web development, I have taken a course in Ruby on Rails, that being my starting point, and I have studied Django by myself, based on previously learned mechanisms and concepts. I can't say I have a really broad experience using Django, but I have enough information to be sure that this project would not represent a difficulty for me, regarding the web framework. I have basic understanding of Bootstrap, but I will keep on practicing to improve my skills. I read the SurveyMonkey API Docs and I implemented some scripts that make use of it's functionalities, so I got pretty familiar with it. Tomorrow, I'll be exploring the available options in terms of plotting libraries and I will try to choose the one that I consider to suit the project's needs.
I have three questions:

  1. We have to create a pull request to get a review of our draft proposal?
  2. Where should I send a sample Django project to get your opinion about it? (because I don't think I'm 100% sure that I have understood how the required functionalities should be implemented)
  3. Is a GitHub repository link alright for the sample project?

Upgrade to datapackage.json standard - Questions

Hello @ethanwhite @henrykironde ,

I am a 4th year B.Tech. student from DTU, Delhi. I am interested to contribute to this project. I have been working with python libs and frameworks such as Scikit-learn and Flask.

For this project, I assume we have to port all these scripts into a datapackage.json format and edit the parser so that it understands the JSON instead of a text file. Also, a further goal is to provide a CLI (and maybe GUI) to add new scripts by letting user specify the input.

Are there any requirements prior to submitting a proposal? I would like to discuss the layout for the new metadata with mentors as well.

Thanks

EcoData Retriever Python3 Support

Good evening,

In the next few hours I will be sending a proposal for this project. I hate being the guy who gets in touch at the last minute, but sometimes there is no way around that.

I already cloned the repo and installed everything to get myself started.

Thanks,
Ernesto Cedeno.

Question about result-aggregation server

Hello Developers,

I am Ryan Stolliker, a Computer Science Student at University of California, Irvine, and I have experience with Python, HTML, CSS, JavaScript, and web frameworks for Python. I am interested in the result-aggregation server for installation-test scripts project, but I have a question. Will the data be stored permanently, collecting data across many workshops, or will data be created and destroyed each workshop?

Thank you.

Enhance Amy, a Django application for managing workshops

Amy is a Django application for managing workshops like those run by Software Carpentry. Basic functionality is in place, but there are many outstanding feature requests, and it could use a lot more testing.

Amy is a straightforward Django application; experience with that framework (and with Python) is required. Some of the features we would like to add require Javascript, so familiarity with JQuery and other frameworks is an asset.

Amy will manage personal identifying information, so applicants should also have a basic understanding of security engineering. Experience with deploying and maintaining applications is also an asset.

manage work flow for instructor training - ideas discussion.

Project Title :- manage work flow for instructor training.

  1. Issue : Recording applications from individuals and groups interested in instructor training.
    sol -
    At this point I noticed only some users can log into AMY , mostly admin and some other instructors. A new trainee can only apply ( through a form, and wait for the committee to contact ). what I propose is to create a login accounts for trainees also. This would be different from the AMY dashboard ( as i understand that the AMY dashboard and its other features are designed to meet administrative requirements and is not general purpose). It would have minimal options. In fact I just have two in mind. one is for notifications (using Django notification system ) and other is a list of trainees also applying for instructor training( a simple view to list down the other people applying, with a filtering options , on location and university etc, much like the filtering option AMY already has).

  2. Issue : - Help individuals form groups.
    sol -
    extending from the above idea, we can easily search for people and other groups. and send requests to join them.

  3. issue : - tracking progress
    sol - we create a window for admin and other mentors, which is mostly a view listing all the trainees and the tasks they have completed.

  4. Issue :- When someone completes training, we manually create an entry in AMY to record.
    sol - Through maintaining accounts like this we can very easily export their details to the instructors list. This is very alike to what AMY already does in Bulk add people.

These are a few of my ideas to tackle the issues mentioned. This list is by no means complete and many issues have not been mentioned. I'm opening this issue, so that I can discuss on them and start on a little piece of code as a proof of concept, before writing the whole proposal.

Result Aggregation Server project - next steps

Hi @pbanaszkiewicz and @rgaiacs

I was going through the idea of Result Aggregation Server for the installation-test scripts, and found it interesting. I have extensively worked on flask RESTful APIs while working with Zopper as an Intern. As mentioned in the description of project, I have created the demo api using flask and mysql which stores the os details from the client in the mysql database. Please.. guide me through the next steps, so that I can move towards writing the proposal for the same.

cheers!
Prerit Garg

Add taxonomic name resolution to the EcoData Retriever to facilitate data science approaches to ecology - doubts

@ethanwhite @henrykironde
Hi i am Bishakh Ghosh,
I am really interested to participate in GSOC 2016 under EcoData Retriever. I have cloned the source code and installed it and was able to download a couple of datasets.
I have made one PR weecology/retriever#442 however I don't know if it is any good.

I have knowledge of python and some web frameworks like Django and Flask and I know how to use REST APIs.
Also I have some knowledge of SQL and ORMs like SQL Alchemy.
Here is a list of my works: http://ghoshbishakh.github.io/blog/about/

What should be my next step? Should I try to make a sample app that fetches data from iPlant's Taxonomic Name Resolution Service?

Also is it possible to apply for both the projects as both seem interesting to me, although since I know web APIs a bit so my first preferance will be adding taxonomic name resolution to the EcoData Retriever.

Organization disambiguation

I spoke this morning with Rainere about NumFOCUS's role as an umbrella org for GSoC and it sounds like you all want to accept applications for any NumFOCUS organizations. To make this more NumFOCUS oriented I'd like to propose some ideas/questions:

  • May we move this repository to the NumFOCUS Github account?
  • May we add a page to the NumFOCUS website about its participation in GSoC 2015?
  • Might we create an ideas page that is more like the one PSF does, where projects simply link to their ideas pages? See: https://wiki.python.org/moin/SummerOfCode/2015

I'm happy to do the grunt work to make this happen if people agree.

Improving reproducibility in science by adding provenance tracking to the EcoData Retriever

Rationale

The EcoData Retriever is a Python based tool
for automatically downloading, cleaning up,
and restructuring ecological data. It does the hard work of data munging so
that scientists can focus on doing science.

Science is experiencing a reproducibility crisis in that it is becoming
clear that published results often cannot be
reproduced. Part of the challenge of reproducibility in data science style
research is that most of the steps
related to data: downloading it, cleaning it up, restructuring it, are
either done manually or using one-off scripts
and therefore this phase of the scientific process is not reproducible. The
EcoData Retriever already solves many of
these problems, but it doesn't currently keep track of exactly what has
been done and therefore fails to support
full reproducible workflows.

Approach

This project would extend the EcoData Retriever using Python to store all
of the metadata necessary for
full reproduction in an associated SQLite database.

Specifically this would involve:

  • Object oriented programming in Python
  • Design an SQLite database for storing provenance information or use an
    existing framework (e.g,. http://code.google.com/p/core-provenance-library/)
  • Implement checks to make sure that the data is in the same form created
    by the Retriever when retrieving provenance information

Involved toolkits or projects

  • The EcoData Retriever
  • Python
  • Relational database management systems (RDBMS) including MySQL,
    PostgreSQL, SQLite
  • Potentially an existing provenance library

Degree of difficulty and needed skills

  • Moderate Difficulty
  • Knowledge of Python
  • Some experience with SQL

Involved developer communities

The EcoData Retriever primarily interacts via issues and pull requests on
GitHub.

Mentors

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.