Code Monkey home page Code Monkey logo

npm-search's Issues

Package Security Check

I think it'd be quite helpful to check the package and it's dependencies for security issues like nsp and display it in the package details page and may even be a great idea to put in the search result list.

dependents is broken

Every record seems to have 1437319 as value πŸ€”

https://replicate.npmjs.com/registry/_design/app/_view/dependedUpon?startkey=%5B%22react%22%5D&endkey=%5B%22react%22%2C%22%EF%BF%B0%22%5D&limit=1&reduce=true&stale=update_after

some seem to work though, react-condition, react-conditional both appear to have 1.4m in its Algolia value.

some READMEs are the localised (README.cn.md) version instead of the normal one.

Performance optimisation

db.allDocs

This is probably the slowest operation for the moment and the good news it is possible to improve. The payload is big and the API respond almost instantly, it's mostly Idle CPU and a big parsing time at the end.

If we wrap the call into an in memory store (either in the same thread or in a fork) we could always fetch in the background a few thousand rows that way the next loop() does not wait at all.

Cons:

  • Forking will mean serialize/unserialize between thread, not sure we will gain that much perf.
  • In memory will mean that we can not prefetch too much.

(There is one problem that can be solved in some other way is that we are downloading all versions of every packages, that means all html readme too. This can be pretty big for some package. We could find a way to get latest only and fetch the list of versions with an other endpoint/way)

getDownloads

Not as slow as imagined but takes 20-30% of saveDocs duration. It's not a critical info (even though it's part of the UX of the website), it could be done in a separate process.

getTSSupport

Might be the slowest metadata operation. Between 3s - 7s for 100 packages. Account for > 50% of the saveDocs duration.
Sometimes it can can stuck for super long time, blocking everything.
Screenshot 2019-08-03 18 56 04

getChangelogs

Is surprisingly not as slow as expected. Probably because we are matching CHANGELOG.md almost everytime. This could be done in an other process too.
Account for 20-30% of the saveDocs duration.

Ref
(except formatPackages, all timing on the left of a charts are in seconds)
Screenshot 2019-08-03 18 48 50

npm description is often less reliable than GitHub description

Because the description on npm can basically be full markdown/html, you can do a lot that you can't do in the text field on GitHub.

IMO it might be an idea to check if a GitHub description is available to use before using the one from npm.

Yarn for example has

  description: '<p align="center">   <a href="https://yarnpkg.com/">     <img alt="Yarn" src="https://github.com/yarnpkg/assets/blob/master/yarn-kitten-full.png?raw=true" width="546">   </a> </p>',

But a GitHub description of

πŸ“¦πŸˆ Fast, reliable, and secure dependency management.

This will coincidentally still have the emoji issues described in #17 (but those are internal to the highlighting), but at least it's always only text

Have a deploy preview index

Index part of the data, stop at some point (100k records should be enough to have a few interesting packages). Show a version of the website at that point.

Requires us to have a second heroku worker, since this will take too long to do on Travis. maybe it might be doable with Circle since it's faster, but I'm not sure

setup prettier

Would be nice to setup prettier as in the yarn website

Data discrepancy between algolia & replicate.npmjs.com/registry

We're using this package on https://www.gatsbyjs.org/plugins/ to show information from NPM. It seems like the data is not 100% correct. When we manually crawl the registry endpoint we get the correct data but through algolia we don't.

On this page https://www.gatsbyjs.org/packages/gatsby-source-sanity/?=sanity we display the readme string. If we crawl the registry https://replicate.npmjs.com/registry/gatsby-source-sanity we get a valid readme string. Whenever we use algolia the readme field is an empty string.

This is the Gatsby bug for more info: gatsbyjs/gatsby#11129

[FEATURES] add additional information for a package

Hi guys. I've got requests for some new additions to the npm-search index. Let's discuss them here:

  • open issues on gihtub: How many open issues are there.
  • number of gihtub stars: How many people stared this projet.
  • number of gihtub forks: How many people stared this projet.
  • open PR on gihtub: How many open PRs are there.
  • last commit on gihtub: When was the last commit on master.
  • add any other usefull information (here are all the properties returned by Github's API)

Does this make sense?

Add gravatar URLs

For the author, lastPublisher & owners; we need their gravatar URL.

See this doc to understand how to generate the links.

How to filter out deprecated packages?

I already have one <RefinementList> for specifying Gatsby's keywords. I tried adding another one to filter deprecated packages <RefinementList attributeName="deprecated" defaultRefinement={[false]} />

This caused this error: Warning: Failed prop type: Invalid prop defaultRefinement[0] supplied to "AlgoliaRefinementList(Translatable(RefinementList))"

I tried putting false in quotes but then no packages were returned.

I'm using react-instantsearch@4

Upgrade all deps and activate renovate

Current state of dependencies for this project is not good, a lot of outdated ones. We better follow upgraded deps to avoid being stuck at some point.

Frontend search changes proposal

Proposal to change the frontend UI results to something like:


algoliasearch :octocat: algolia/algoliasearch ⬇️6K πŸ”₯ (if popular)
v2.3.4 πŸ“… 1 day ago by πŸ™οΈ Algolia (or πŸ‘€ vvo when no org)
description
tags (light contrast) πŸ”— algolia.com


Discussed with @Shipow and @redox.

  • algoliasearch: sends to https://npmjs.com/algoliasearch
  • :octocat: algolia/algoliasearch: sends to https://github.com/algolia/algoliasearch. This is frequently asked. We found ourselves frustrated that this information is only at the package page level on npm instead of search results.
  • ⬇️6K πŸ”₯ (when popular download thresold reached) clearly identify important packages
  • v2.3.4: latest version
  • πŸ“… 1 day ago: relative activity of the package
  • by πŸ™οΈ Algolia is shown when GitHub repo found. We then link to https://github.com/algolia.
  • πŸ‘€ vvo is shown and links to https://npmjs.com/~vvo when package cannot link to a GitHub repo. This is the last publisher and not the original author of the package. What we want here is know who is the most knowledgeable person for the latest version.
  • description
  • tags (light contrast): light contrast is important because tags are not very used in npm, they are manually filled and very often not filled. Still interesting information for discoverability.
  • πŸ”— algolia.com when homepage exists and different from github repo

Still subjects to ideas of course

Handle deleted packages

I never implemented the delete change event. I think there's a _deleted flag somewhere in the feed but it's not very well documented and I am not sure npm follows this.

Replicating crashes because of change in npm API

When loading the downloads when there’s a package with scope (@scope/pkg) in the arguments, now a HTTP 400 is returned

for example: http://api.npmjs.org/downloads/point/last-month/,@lab009/babel-preset-magma,@lab009/eslint-config-magma,@lab009/magma-utils,ringpop,@lab009/magma-config,node-sftp-s3,@lab009/magma-server,nativescript-wkwebview,koo,react-native-div,mdeb,is-in-view,ut-codec,kb-path,tslib,rollup-plugin-jspicl,com.bma4s.sdk.plugins.cordova,triton-amp-core-error-type,six-key-figures-listing-widget,musicxmljs,jqtree,@cheesecakelabs/boilerplate,cubx-http-server,log-http,@honeo/test,rne,cleans,egeria-pontifex,fh-wfm-mediator,gulp-sass-inline-svg,@bennerinformatics/ember-fw,six-key-figures-organization-widget,webpack-chain,wipeer,babel-plugin-styled-components-named,react-stateless-infinite-scroll,generator-mendix,react-teleportation,rest-trankil.js,lowdb,six-listing-details-widget,grad-factions,egeria-youtube-vestal,flashheart,cover-generator-by-quicklook,vue-template-compiler,generator-finaps-xamarin-ci,@quantumblack/javascript-standards,stretchy-cli,egeria-youtubedl-vestal,six-market-temperature-widget,remaps-china,rc-inputs,generator-finaps-xamarin,manateeworks-barcodescanner-v3,rainbow-node-sdk,rung-cli,homebridge-ikea,npm-publish-git-tag,leo-wp,masonreact,react-background-video-player,homebridge-synology,ilp-connector,newuser,rc-wrapper-loader,gulp-mjml,ng2-archwizard,vscode-nls-dev,promise-rabbit-rpc,simple-context-angular2,joynr,generator-gfe-lego,salsa-calendar,react-drag-tool,prettysize,aws-signing-utils,six-mini-indicator-widget,ilp-plugin-virtual,lucify-notifier,components-orikami,@prometheusresearch/react-ui,generator-dizmo,comindware.core.ui,reactmaterial,localstorage-api,angular-emojify,d3-tooltip-ninjapixel,done-ssr,ember-mobile-inputs,react-document-title,angular2-emojify,@azinasili/bytes,ng2-daterangepicker,rslinq,six-alert-list-widget,cfapp,jrs-react-components,nicktestsem

this change happened Wednesday

Fix: remove the scoped packages from the downloads lookup

add a [cdns] key in every package

its goal is to order by preference of the module author. if it mentions unpkg first or jsdelivr first they'll want that to show up first in the detail page.

1: have the default ordering [jsdelivr, unpkg, bundleRun]
2: order those based on order of occurrence in readme
3: if the readme contains another can (we whitelisted but can't automatically match to npm packages, like cdnjs) it should be added in the right order.

We should decide to store the complete url, without prefix or just the name. Just the name might be problematic for extra cdns

add alternative name without dashes

for create-react-native-app this should be createreactnativeapp. This is because it's often easier to just write it without dashes.

To remove is:

  • .
  • -
  • _
  • @
  • /

The alternative name should have a lower relevancy score than the real name to avoid react-native to score above reactnative when you search for reactnative.

This should also be done for the keywords, but in that case the concatenated version can simply be added.

Download based popular count is not well computed

Some libraries are popular while they should not.

yarn global add json
curl 'https://ofcncog2cu-dsn.algolia.net/1/indexes/npm-search/jquery?x-algolia-application-id=OFCNCOG2CU&x-algolia-api-key=f54e21fa3a2a0160595bb058179bfb1e' | json popular
true # jQuery is popular inside index
curl http://api.npmjs.org/downloads/point/last-month/jquery | json downloads
2575423 # number of downloads for jQuery last 30 days
curl http://api.npmjs.org/downloads/point/last-month | json downloads
7009616298 # total number of downloads for npm registry last 30 days

Popular ratio is 0.005:

popularDownloadsRatio: 0.005,

Number of downloads for jQuery in one month: 2575423
Number of downloads on the npm registry: 7009616298

Ratio for jQuery: 2575423/7009616298 ~= 0.0003 < 0.005 but still marked as popular

Two issues:

  • bad computation of the popular flag
  • if computation was right, jQuery won't be popular, we may need to find other ways to say if it's popular or not (but might hit GitHub rate limit again)

add GitHub branch

some repositories don't store their package on master, which causes the location of the README etc to be wrong.

Add filterable library-specific subsets in a future-proof way

const packageDetail = {
  ...otherKeys,
  subset: {
    babel: "6",
    "angular-cli": "/path-to/scaffold"
    "gatsby": "component"
  }
}

Now I should still check how we can easily make it facetable, I'm not sure if putting it like this will allow that. An idea I had to solve that was this:

const packageDetail = {
  ...otherKeys,
  subset: {
    "keys": ["babel", "angular-cli", "gatsby"],
    babel: "6",
    "angular-cli": "/path-to/scaffold"
    "gatsby": "component"
  }
}

In this case only the subset.keys would be facetable, but it would mean that we can't use the name keys for any of the subsets (seems fine for me).

JSON parse error

Seems some update are not parsable. we should probably try/catch those.

Aug 05 14:16:58 npm-search app/worker.1: INFO:  🐌  Deleted @mobile-generator/mobile-generator 
Aug 05 14:17:00 npm-search app/worker.1: SyntaxError: Unexpected end of JSON input 
Aug 05 14:17:00 npm-search app/worker.1:     at JSON.parse (<anonymous>) 
Aug 05 14:17:00 npm-search app/worker.1:     at onSuccess (/app/node_modules/pouchdb-ajax/lib/index.js:41:20) 
Aug 05 14:17:00 npm-search app/worker.1:     at Request._callback (/app/node_modules/pouchdb-ajax/lib/index.js:101:7) 
Aug 05 14:17:00 npm-search app/worker.1:     at Request.self.callback (/app/node_modules/pouchdb-ajax/node_modules/request/request.js:187:22) 
Aug 05 14:17:00 npm-search app/worker.1:     at Request.emit (events.js:203:13) 
Aug 05 14:17:00 npm-search app/worker.1:     at Request.<anonymous> (/app/node_modules/pouchdb-ajax/node_modules/request/request.js:1126:10) 
Aug 05 14:17:00 npm-search app/worker.1:     at Request.emit (events.js:203:13) 
Aug 05 14:17:00 npm-search app/worker.1:     at IncomingMessage.<anonymous> (/app/node_modules/pouchdb-ajax/node_modules/request/request.js:1046:12) 
Aug 05 14:17:00 npm-search app/worker.1:     at Object.onceWrapper (events.js:291:20) 
Aug 05 14:17:00 npm-search app/worker.1:     at IncomingMessage.emit (events.js:208:15) 
Aug 05 14:17:00 npm-search app/worker.1: error Command failed with exit code 1. 

Add a flag for spammy packages

Packages that are low quality but not technically spam should be harder to find unless typed exactly. Things I was thinking of are:

Npmdoc-package
Npmtest-package
Package-cdn (was actually spam and removed)

See #52 for a first stab at solving this

Better indicate how results are ranked

To better fit a package search, we could display results like that:

  • Best match/Best result ("reactnative" query => react-native package)
  • Exact match ("reactnative" query => reactnative package) IF ANY
  • other results with the two previous packages removed

only for page one

Script is not resilient to "db crash"

Aug 06 13:07:43 npm-search app/worker.1: [unknown_error: Database encountered an unknown error] {
Aug 06 13:07:43 npm-search app/worker.1: status: 503,
Aug 06 13:07:43 npm-search app/worker.1: name: 'unknown_error',
Aug 06 13:07:43 npm-search app/worker.1: message: 'Database encountered an unknown error',
Aug 06 13:07:43 npm-search app/worker.1: error: true,
Aug 06 13:07:43 npm-search app/worker.1: data: '503 Service Unavailable\n' +
Aug 06 13:07:43 npm-search app/worker.1: 'No server is available to handle this request.\n' +
Aug 06 13:07:43 npm-search app/worker.1: '\n' +
Aug 06 13:07:43 npm-search app/worker.1: '\n'
Aug 06 13:07:43 npm-search app/worker.1: }
Aug 06 13:07:43 npm-search app/worker.1: error Command failed with exit code 1.

Completely reindex weekly

packages like eslint-config-algolia-cdn are actually spam, and removed by npm a while ago, so we shouldn't list them either.

Maybe we should periodically make a second index, fill it from scratch, and then moveIndex to replace the main one?

encoding issues

At some point the description is parsed as something else than utf-8.

this will make emoji show up as something other than their actual code.

example of that error happening at floating.js

Action Required: Fix Renovate Configuration

There is an error with this repository's Renovate configuration that needs to be fixed. As a precaution, Renovate will stop PRs until it is resolved.

Error type: Preset name not found within published preset config (:prConcurrentLimit4). Note: this is a nested preset so please contact the preset author if you are unable to fix it yourself.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.