Code Monkey home page Code Monkey logo

Comments (6)

dabico avatar dabico commented on June 14, 2024 1

Aah okay, I think I see what's going on. I think this is a mining issue that stems from how language mining ranges are created, split and iterated. If a repository is updated very frequently like PyTorch then it's constantly moved to the end of the very search results that we are mining. This, coupled with the fact that we take some precautions to avoid over-mining, causes this particular issue, where a very active repository is missed in updates. Thank you for bringing this to my attention, I'll devise a fix for this.

from ghs.

dabico avatar dabico commented on June 14, 2024

Hi @egor-bogomolov

If I understood correctly, you would like to be able to filter both by the last commit and last update date? In its current state, we only allow the former, but we do include all the information in the export. We can also introduce support for filtering by non-code updates by adding another date range input group. The reason for its initial omission was that the date of the last repository page update is not as interesting to researchers as the date of the last commit. Not only that, but people also typically confuse the two, so we only gave filtering options by the one that was more in demand.

To answer your question regarding updates: we crawl each language by the last repository push date. Operating under the assumption that active projects are regularly pushed to, this implies that new data will eventually be encountered by the crawler if there are pushes to the remote. However, detecting non-code updates (such as changes to issue labels and the repository list of topics) would most likely entail some form of background job in order for our data to maintain complete consistency with GitHub. As of right now, we already make use of two scheduled tasks, one for checking if repositories in our database still exist on GitHub, and another that computes line information through static analysis of the last commit on the default branch. I'll give this some more thought, but as far as this "data maintenance" task is concerned, it will either replace one of the aforementioned jobs, or all three might be merged together into one.

from ghs.

egor-bogomolov avatar egor-bogomolov commented on June 14, 2024

image
I would like to filter by last commit, but it turns out that I miss quite some repositories this way. For example, PyTorch repo has not been updated for two years, so it does not pass the filter "last commit after 01.01.2023). However, PyTorch repo is regularly updated so I expect it to be found among the results

from ghs.

egor-bogomolov avatar egor-bogomolov commented on June 14, 2024

Thanks a lot! 🚀

from ghs.

dabico avatar dabico commented on June 14, 2024

I merged the changes just now. Results should take effect after some time.

from ghs.

dabico avatar dabico commented on June 14, 2024
Screenshot 2023-08-06 at 15 51 57

from ghs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.