greencommons / commons Goto Github PK
View Code? Open in Web Editor NEWHome Page: https://greencommons.net
Home Page: https://greencommons.net
Currently, the front-runner for how to do this is to use Webmock to mock any request to http://localhost:9200
When running rake db:seed
for the first time on production, Elasticsearch threw an error about a document not being found.
After running it once, the error didn't happen again. I'd like to reproduce the issue consistently and make sure it doesn't happen again. I suspect once you create the documents in ES it doesn't happen but maybe on the first try? Not sure.
Occasionally, the rake task will fail because it can't find the class Resource
. This can be solved by adding :environment
to the rake task definition.
For example: https://cl.ly/1O2545251725
The titles in those summary cards should link to the view page for the corresponding record. Let me know if you need any more clarification on this one.
I bet CloudSearch already supports this. Maybe we should just create a set of api docs/examples to show things like or and not and whatnot(?).
Should error out if not installed, similar to foreman/heroku here: https://github.com/greencommons/commons/blob/master/bin/setup#L23-L28
I've had the issue both in local and in production. Anytime I run Resource.import
from the console or from a rake task, I'm getting Faraday::ConnectionFailed: Broken pipe
.
D, [2017-01-13T13:35:53.791263 #4] DEBUG -- : Resource Load (1112.1ms) SELECT "resources".* FROM "resources" ORDER BY "resources"."id" ASC LIMIT $1 [["LIMIT", 1000]]
rake aborted!
Faraday::ConnectionFailed: Broken pipe
I'm not sure what's wrong, and I haven't looked into it yet since it doesn't prevent the application from working (it only makes the reset_resource_index
task fail).
I'm planning to do some research and see if I can figure it out. Any insights appreciated!
We'd like to start building out an external-facing JSON RESTful API. Let's start the discussion & development of said API for the Search endpoint. We're choosing this endpoint because it's probably the most useful, and also wont require any kind of authentication mechanism.
These requirements are presented in order of priority:
GET https://greencommons.herokuapp.com/api/v1/search.json?q=<query>
Feel free to submit each of the requirements in separate PRs (smaller is better!). Also if you think we're not thinking of anything, please raise the issue here.
As mentioned in PR #117, we need to have a unique column containing the date that will be used to sort the records when different models are returned.
For all of them, we can use the date present in the metadata if there's one, or fall back to the created_at
date. I guess we could call the field date
or published_at
maybe?
Let me know if you want me to proceed with this.
groups_users
table with appropriate indexesGroupsUser
model with appropriate relationshipsI was happily running the import when I (sadly) ran into this issue:
NoMethodError: undefined method `fetch' for #<String:0x007f9cc589aa80>
Here are the logs:
2017-01-13T14:39:52.627365+00:00 app[run.3772]: "Title: The Dominant Animal"
2017-01-13T14:39:52.627427+00:00 app[run.3772]: "Metadata: "
2017-01-13T14:39:52.627659+00:00 app[run.3772]: {
2017-01-13T14:39:52.627672+00:00 app[run.3772]: "creators" => "Paul R. Ehrlich",
2017-01-13T14:39:52.627673+00:00 app[run.3772]: "date" => "2012-06-15",
2017-01-13T14:39:52.627674+00:00 app[run.3772]: "publisher" => "Island Press"
2017-01-13T14:39:52.627674+00:00 app[run.3772]: }
2017-01-13T14:39:52.631014+00:00 app[run.3772]: "Content: \n\n\n\n\tThe Dominant Animal\n\t\n\n\n\n\n\n\n\n\n\n\n\n\tThe Dominant Animal\n\t\n\n\n\n\n\n\n\n\n\n\n\tThe Dominant Animal\n\t\n\n\n\n..."
2017-01-13T14:39:52.682561+00:00 app[run.3772]: "Error opening epub: unable to locate end-of-central-directory record"
2017-01-13T14:39:52.682821+00:00 app[run.3772]: rake aborted!
2017-01-13T14:39:52.686129+00:00 app[run.3772]: NoMethodError: undefined method `fetch' for #<String:0x007f9cc589aa80>
And the code responsible for the error:
require 'epub/parser'
class TransformEpub
def process(input_epub)
@input_epub = input_epub
{
title: title,
content: PageContentExtractor.new(parsed_book).start,
metadata: {
creators: creators,
date: date,
publisher: publisher,
}
}
rescue => error
ap "Error opening epub: #{error}"
end
class CreateNewResourceRecord
# Hidden
def title
attributes.fetch(:title)
end
end
When an error is raised, the transformer will return a string instead of a hash. I'm not sure why it happens exactly yet since the transformer shouldn't pass anything if an error occurs.
Reduces the number of places we need to update when we update a model.
Please try to follow a similar structure:
Wireframes: https://invis.io/BH8GKH38D#/185307862_RESOURCE_-_View
We'd like to build out the base UI for the Resource page.
Most of it should be straightforward, but for sake of specificity I'd like to clarify a few things:
The following code in the Resource
model might not be generating the index name we expect.
class Resource < ApplicationRecord
include Elasticsearch::Model
index_name SearchIndex.index_name(self)
Since self
here is the class, and given the following code in the SearchIndex
class:
def self.index_name(record)
"#{record.class.name.pluralize.downcase}-#{Rails.env}"
end
We end up with the following index: classes-development
(or classes-production
).
Also, in the reset_resource_index
rake task, we are still using the old index name instead of relying on the new method to generate it:
namespace :elasticsearch do
desc 'Deletes the "resource" index and regenerates it with all records currently in Resource'
task reset_resource_index: :environment do
client = Elasticsearch::Client.new(
url: ENV.fetch('BONSAI_URL', 'http://localhost:9200'), log: true
)
client.indices.delete index: 'resources' # <--- HERE
Resource.__elasticsearch__.create_index!
Resource.import
end
end
Based on the wireframes for the Group View page, please start coding up a skeleton for the Group view.
Some points about functionality:
Group
can have many List
s. Each of these List
s will have many resources -- anyone in a group can modify a List
that belongs to the Group
.List
owned by the Group
. This might require creating some kind of helper method / scope on Group
to access all of its Resource
s through its various List
s. Open to discussion on the best way to approach this.Currently, Sidekiq fails to add new resources to the ElasticSearch index in production with the following error:
Elasticsearch::Transport::Transport::Errors::BadRequest: [400]
{"error":
{
"root_cause":
[{"type":"mapper_parsing_exception","reason":"object mapping for [content] tried to parse field [content] as object, but found a concrete value"}],
"type":"mapper_parsing_exception","reason":"object mapping for [content] tried to parse field [content] as object, but found a concrete value"
},
"status":400
}
This might be related to issue 97 (or maybe not).
We run CircleCI and HoundCI but we currently don't have anything checking Rubocop offenses.
It would be great to have Pronto and Pronto-Rubocop checking and leaving the offenses as comments.
/discuss
(And yes, it's totally because I forgot to run Rubocop for the latest PR I merged.)
Please run the rake task rake etl:import_s3_epub
on production (https://greencommons.herokuapp.com).
To do it, I imagine we need to do the following steps:
List
and Resource
objectsWe have the concept of "Summary Cards" in a number of places in the wireframes.
Examples:
Essentially, we want to create reusable components that we can drop into anywhere on the site to display a Group
, List
, or Resource
. We'd like them to have a similar structure, but differ based on the object they're representing.
Initially, we don't really have a ton of info or metadata to display for a Group summary card. Please keep it in a good place in the views / directory structure, and make it easy to pass a Group object to it so that it will know what info to populate in the view.
This is an example of a Group
summary card: https://cl.ly/122E0e0d001e
We want to build out the Search results page: https://invis.io/BH8GKH38D#/185307864_SEARCH
This will involve a few different things:
Group
, Resource
, and List
to begin with. The search page should return a list of all items combined together.@dweinberger wants to assign a "Stackscore" or "Relevancy" number to every search result we get. This can be a number between 0 and 100 or something similar, but ultimately is a "smart" determination of what an item's relevancy is to what you've searched for. Initially, we can make this very simple, but we should build a stub or foundation to expand and make this more sophisticated as we go along. I'm happy to walk through this via a audio / video chat.
Not pictured in the wireframes, we want to include a small section below the search results that provides suggestions on other things a user may be interested in. Initially, we can keep it simple, and the recommendations can be rather uninteresting, but it'd be good to set up a way to see "related items" based on a search term or an actual object in the database. This is somewhat similar in idea to "Stackscore / Relevancy," so it's worth discussing what makes sense here.
NOTE: This is a rather large task, so if it makes sense, feel free to break this up into smaller issues and re-order as you see fit.
layouts/application.html.erb
layouts/mailer.html.erb
devise/registrations/new.html.erb
devise/sessions/new.html.erb
devise/shared/_links.html.erb
pages/style.html.erb
shared/_alerts.html.erb
shared/_nav.html.erb
Right now, resources are created as private
by default. In the ETL, we should make them public.
Originally discussed here: #25 (comment)
If we set these callbacks to run asynchronously (e.g. spin up a Sidekiq job to change the search index), we can have a bit more control over when we actually update the index.
For example, we could handle errors more gracefully. Or just not execute the job in development or test environments unless some ENV is set.
The name "Bhaskar Chakravorti" is in exactly one record of the json data provided thus far, namely, in the contents of a guardian article from the data file: guardian-sustainability_OR_sustainable_AND_environment-1.json; that article's title is "Sustainable business and sustainable development: two sides of the same coin".
The single match that's returned site is for a different article, namely: the guardian article titled "Davos 2013: new vision for agriculture is old news for farmers | David Nally and Bhaskar Vira" that comes from data file: guardian-sustainability_OR_sustainable_AND_environment-22.json
Try it, here:
https://greencommons.herokuapp.com/search?utf8=โ&query=Bhaskar+Chakravorti
Alternatively, a search for "Sustainable business and sustainable development: two sides of the same coin" does bring up the right record, so at least for this record, it seems that matching based on title works correctly but matching based on content doesn't.
There are currently no complete feature tests to ensure that the various parts of the Group Functionality work as expected.
Here are the happy path scenarios that have to be implemented as acceptance tests:
scenario 'users can create a group'
scenario 'users can update a group'
scenario 'users can add members'
scenario 'users can remove members'
scenario 'users can make other members admins'
scenario 'users can remove admin from other members'
Some of these scenarios might get merge together to have faster tests.
Can you add an example of the group summary card (and a code snippet of its usage in Rails) on the /style
page? I'd like to keep an up-to-date list of all the reusable components we create.
Sometimes when running rake db:reset
the index doesn't get reset. Would be good to have a rake task to handle this. All it needs to do is:
client = Elasticsearch::Client.new log:true
client.indices.delete index: 'resources'
Groups can have many admins and many regular members. Admin members are allowed to add/ remove people from the group or grant admin rights to the group, but regular members cannot. For the time being, that's the only difference.
/groups/id/members
We need to update the filtering system to be more dynamic and use all the resource types defined in the Resource
model.
I ran into this issue (after previously manually deleting record 100): https://cl.ly/2x2c1r1I0W26
So that pic is showing errors where the job is failing because it's trying to find record 100 but it doesn't exist. That's a bug - we are asking ES to remove a record from the index which does not exist in the database anymore :(
I'm opening an issue for this. Solution is probably a separate delete job, something like "don't try to rehydrate the AR record when you are deleting it - just use what the job was sent."
Based on the discussion, we should be able to handle metadata date containing only a year for resources.
For reference, the current code looks like this.
This is a task to add the "index" or "View all" view for the Group
model.
Group
"summary cards" to display each group.I think we should separate the create
/update
callback here and start using the update_document
from there. I'm worried about duplicates and was able to produce a bug by updating a record, searching for its old name and still finding it even though it should have been de-indexed.
It would also be interesting to unify the way we interact with ES by not using the ElasticSearch::Modal.client
directly and instead relying on the methods provided by elasticsearch-rails.
delete_document could replace the three lines below for example.
def remove
if search_index_callbacks_enabled?
Elasticsearch::Model.client.delete(
index: model_name.constantize.index_name,
type: model_name.downcase,
id: id,
)
else
log_callback_warning
end
end
@ptrikutam Let me know if that's something you'd like me to work on or not.
There are currently no permission checks in the GroupsController
for update
and destroy
. These actions should only be performed by group admins, and therefore, we need to ensure that the current_user
is an admin before allowing the changes.
FYI, for me, 16 of the Island Press epubs have parse issues using ruby 'epub/parser'.
Three of these files are legitimately bad (corresponding to the message "unable to locate end-of-central-directory record"), the other 13 epubs can actually parse correctly (as witnessed elsewhere).
"Error opening epub 9781597265171.epub: unable to locate end-of-central-directory record"
"Error opening epub 9781597265935.epub: unable to locate end-of-central-directory record"
"Error opening epub 9781610914529.epub: undefined method refines=' for #<EPUB::Metadata::UnsupportedModel:0x007fa704b56e10>\nDid you mean? readlines" "Error opening epub 9781610914574.epub: undefined method
refines=' for #EPUB::Metadata::UnsupportedModel:0x007fa703b02ac8\nDid you mean? readlines"
"Error opening epub 9781610914802.epub: undefined method refines=' for #<EPUB::Metadata::UnsupportedModel:0x007fa703949920>\nDid you mean? readlines" "Error opening epub 9781610915007.epub: undefined method
refines=' for #EPUB::Metadata::UnsupportedModel:0x007fa70301ff98\nDid you mean? readlines"
"Error opening epub 9781610915403.epub: unable to locate end-of-central-directory record"
"Error opening epub 9781610915762.epub: undefined method refines=' for #<EPUB::Metadata::UnsupportedModel:0x007fa7049005f8>\nDid you mean? readlines" "Error opening epub 9781610915861.epub: undefined method
refines=' for #EPUB::Metadata::UnsupportedModel:0x007fa704ab8c88\nDid you mean? readlines"
"Error opening epub 9781610916639.epub: undefined method refines=' for #<EPUB::Metadata::UnsupportedModel:0x007fa7048522f0>\nDid you mean? readlines" "Error opening epub 9781610916677.epub: undefined method
refines=' for #EPUB::Metadata::UnsupportedModel:0x007fa704ac47e0\nDid you mean? readlines"
"Error opening epub 9781610916684.epub: undefined method refines=' for #<EPUB::Metadata::UnsupportedModel:0x007fa703a8b888>\nDid you mean? readlines" "Error opening epub 9781610916691.epub: undefined method
refines=' for #EPUB::Metadata::UnsupportedModel:0x007fa704a71888\nDid you mean? readlines"
"Error opening epub 9781610916707.epub: undefined method refines=' for #<EPUB::Metadata::UnsupportedModel:0x007fa70387ad28>\nDid you mean? readlines" "Error opening epub 9781610916714.epub: undefined method
refines=' for #EPUB::Metadata::UnsupportedModel:0x007fa703a90270\nDid you mean? readlines"
"Error opening epub 9781610916745.epub: undefined method `refines=' for #EPUB::Metadata::UnsupportedModel:0x007fa7028c6940\nDid you mean? readlines"
It looks like the # of results displayed on the top right is actually the number of results that are currently visible on the page itself, not the total number of results available. This should display the full number of results returned for a query.
Currently, if one of the summary cards is too big, it will create an empty space like in the screenshot below:
We can fix it by adding a .row
class every 2 records. That also means we cannot display more cards per row in the future (but one card per row for the mobile version is fine).
Let me know if you want me to proceed with this.
2016-12-09 09:57:22 -0800: < {"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"resources","index_uuid":"_na_","index":"resources"}],"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"resources","index_uuid":"_na_","index":"resources"},"status":404}
2016-12-09 09:57:22 -0800: [404] {"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"resources","index_uuid":"_na_","index":"resources"}],"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"resources","index_uuid":"_na_","index":"resources"},"status":404}
rake aborted!
Elasticsearch::Transport::Transport::Errors::NotFound: [404] {"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"resources","index_uuid":"_na_","index":"resources"}],"type":"index_not_found_exception","reason":"no such index","resource.type":"index_or_alias","resource.id":"resources","index_uuid":"_na_","index":"resources"},"status":404}
/Users/pavan/.rvm/gems/ruby-2.3.2@commons/gems/elasticsearch-transport-5.0.0/lib/elasticsearch/transport/transport/base.rb:201:in `__raise_transport_error'
/Users/pavan/.rvm/gems/ruby-2.3.2@commons/gems/elasticsearch-transport-5.0.0/lib/elasticsearch/transport/transport/base.rb:312:in `perform_request'
/Users/pavan/.rvm/gems/ruby-2.3.2@commons/gems/elasticsearch-transport-5.0.0/lib/elasticsearch/transport/transport/http/faraday.rb:20:in `perform_request'
/Users/pavan/.rvm/gems/ruby-2.3.2@commons/gems/elasticsearch-transport-5.0.0/lib/elasticsearch/transport/client.rb:128:in `perform_request'
/Users/pavan/.rvm/gems/ruby-2.3.2@commons/gems/elasticsearch-api-5.0.0/lib/elasticsearch/api/namespace/common.rb:21:in `perform_request'
/Users/pavan/.rvm/gems/ruby-2.3.2@commons/gems/elasticsearch-api-5.0.0/lib/elasticsearch/api/actions/indices/delete.rb:44:in `delete'
/Users/pavan/Development/delete/commons/lib/tasks/elasticsearch.rake:7:in `block (2 levels) in <top (required)>'
/Users/pavan/.rvm/gems/ruby-2.3.2@commons/gems/rake-11.3.0/exe/rake:27:in `<top (required)>'
/Users/pavan/.rvm/gems/ruby-2.3.2@commons/bin/ruby_executable_hooks:15:in `eval'
/Users/pavan/.rvm/gems/ruby-2.3.2@commons/bin/ruby_executable_hooks:15:in `<main>'
Tasks: TOP => elasticsearch:reset_resource_index
(See full trace by running task with --trace)
Probably just need to rescue the error and move to the next step.
I've provisioned a free tier of Heroku Redis for now. Need to activate the worker and test the indexer jobs are working properly.
There's no need to test the registration / auth controllers, but in case we've messed something up on the views end we should add a few integration tests to verify the auth flow works as we expect.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.