Code Monkey home page Code Monkey logo

fio-de-ariadne's Introduction

Fio de Ariadne

Fio de Ariadne GitHub Actions: Tests

Essa é uma prova de conceito para um sistema de raspagem e estruturação de dados sobre crianças desaparecidas no Brasil. O Fio de Ariadne tem como requisitos técnicos Python 3.7+ e Poetry.

Rodando o Fio de Ariadne localmente (sem Docker)

Instalando as dependências

$ poetry install

Para utilizar as dependências, você precisa entrar no virtualenv que o Poetry criou:

$ poetry shell

Use exit para sair do virtualenv quando desejar.

Configurando a aplicação feita em Django

Execute esse comando e siga as instruções:

$ createnv

Raspando os dados

Esses comandos só precisam ser executados uma única vez. Eles criam a estrutura do banco de dados, raspam os dados e salvam tudo nesse banco:

$ python manage.py migrate
$ python manage.py crawl

Você pode ainda criar um usuário para acessar o painel de controle:

$ python manage.py createsuperuser

Iniciando a aplicação web

Utilize esse comando e depois acesse localhost:8000:

$ python manage.py runserver

Rodando o Fio de Ariadne via Docker

Não é necessária nenhuma configuração para rodar o Fio de Ariadne em modo de desenvolvimento.

Esses comandos só precisam ser executados uma única vez (como explicado anteriormente).

$ docker-compose run --rm web python manage.py migrate
$ docker-compose run --rm web python manage.py crawl
$ docker-compose run --rm web python manage.py createsuperuser

Para inicar a aplicação web em 0.0.0.0:8000, utilize:

$ docker-compose up

API web

GET /api/kid

Lista as crianças do nosso banco de dados.

Aceita como parâmetro de busca (busca exata) parâmetros de URL com os nomes dos campos do modelo web.core.models.Kid.

Exemplo

GET /api/kid?eyes=Pretos&hair=Castanho escuro lista apenas as crianças:

  • cujo campo eyes tenha o valor exato (case-sensitive, inclusive) "Pretos"
  • e cujo campo hair tenha o valor exato (case-sensitive, inclsuive) "Castanho escuro"

Contribuindo

Precisamos de ajuda

Você pode contribuir com melhorias no código e utilizar algumas verificações de qualidade:

$ mypy crawler
$ pytest

fio-de-ariadne's People

Contributors

cuducos avatar danielslz avatar eltonarodrigues avatar glairtonsantos avatar jairojair avatar juniorcarvalho avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fio-de-ariadne's Issues

Adaptar raspador para coletar as fotos

Após a contribuição do @juniorcarvalho (#23) e de resolver o questão das chaves do S3 (#21) teremos infra-estrutura para armazenar as fotos. Então precisamos adaptar o raspador para coletá-las.

Como o #23 só não funciona em produção (ou seja, funciona em desenvolvimento), acho que vale começas a implementar esse lado do raspador com base naquela branch.

Create a Mother Class for Spiders params

Hey guys, I saw the website you are scraping, and if you change the URL with another state, it still works and gets the missing child from that state.

I think it would be nice to have a mother class where we just set the state in our daughter classes, so the base URL would be universal.

Container

Olá,

Criarei um container para rodar o código em meu computador. Posso criar um pull request para configurar um container para todo o repositório. O que acham?

Att,
Humberto

Melhorar qualidade dos dados: processamento de linguagem natural?

A lógica de raspagem dos dados depende de uma estrutura textual adotada pelo Paraná, o que faz com o que as chances do mesmo script funcionar para estados que não adotam padrão algum nos posts ser baixíssima.

Faz sentido e é factível consideramos abandonar o esquema de expressão regular e testar NLP? Temos volume de dados (categorizados e não categorizados) para isso?

Criar chaves da S3 apenas para um "bucket"

Me deparei com esse problema ao fim da revisão do PR #21. Temos craques na AWS aqui?

TLDR, não consigo criar uma conta (IAM) que só permita acesso a um dos meu buckets S3.


Contexto: na minha conta pessoal, tenho 2 buckets, um é meu site pessoal e outro quero usar com nosso projeto aqui, o Fio de Ariadne.

Problema: preciso criar chaves de API para o Django do Fio de Ariadne guardar arquivos estáticos e de mídia. Mas, como tem mais gente nesse projeto com acesso ao ambiente de produção, não gostaria que essas pessoas pudessem (acidentalmente ou não) mexer no S3 do meu site pessoal.

O que eu tentei

Pelo o que entendi na AWS, o caminho seria criar uma política, atribuir essa política a um grupo e depois criar um usuário nesse grupo – tudo pelo IAM. Fiz tudo isso, mas não funcionou. Quando o Django tenta subir arquivos, dá erro dizendo que essa chave de API não tem permissão para criar objetos no S3:

botocore.exceptions.ClientError: An error occurred (AccessDenied) when calling the PutObject operation: Access Denied

Minha política de acesso ficou assim:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowUserToReadWriteObjectData",
            "Action": [
                "s3:PutObject"
            ],
            "Effect": "Allow",
            "Resource": [
                "arn:aws:s3:::fio-de-ariadne/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::fio-de-ariadne"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::fio-de-ariadne/*"
            ]
        }
    ]
}

Tentei também sem esse primeiro bloco (AllowUserToReadWriteObjectData) e não rolou. Em ambos os casos, na simulação de políticas da AWS aparece o mesmo erro, Implicitly denied (no matcing statements), com a descrição (que não ajuda muito, pois diz, basicamente, que esse teste seria a excessão):

This action belongs to a service that supports special access control mechanisms in addition to resource-based policies, such as S3 ACLs or Glacier vault lock policies. The policy simulator does not support these mechanisms, so the results can differ from your production environment.

Alguma dica?

Muito obriagdo,

Melhorar qualidade dos dados: limpeza

Hoje o raspamos alguns dados "sujos" como Cor do Cabelo: Preto - crespo, ao invés de apenas Preto. Poderíamos padronizar isso na captura e no banco de dados (utilizar o choices facilitaria muito a organização, filtragem, visualziação e manipulação dos dados).

Atualizar API

Creio que aṕos o #14 as instruções da documentação da API documentação da API não funcione mais (os valores dos campos são inteiros). Precisamos melhorar a “API da API”, ou seja, permitir a busca pelo valor humano Preto ao invés do número.

Problemas na instalação usando Docker

Caros, quando rodo o programa com Docker eu recebo erros em dois comandos docker-compose run --rm web python manage.py migrate e docker-compose run --rm web python manage.py createsuperuser

Sou eu que estou rodando algo errado ou é um problema com o código?

Seguem outputs abaixo:

$ docker-compose run --rm web python manage.py migrate
Starting fio-de-ariadne_db_1 ... done
Creating fio-de-ariadne_web_run ... done
Traceback (most recent call last):
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/db/backends/base/base.py", line 220, in ensure_connection
    self.connect()
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/db/backends/base/base.py", line 197, in connect
    self.connection = self.get_new_connection(conn_params)
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/db/backends/postgresql/base.py", line 185, in get_new_connection
    connection = Database.connect(**conn_params)
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/psycopg2/__init__.py", line 127, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: FATAL:  the database system is starting up


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "manage.py", line 21, in <module>
    main()
  File "manage.py", line 17, in main
    execute_from_command_line(sys.argv)
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/core/management/__init__.py", line 401, in execute_from_command_line
    utility.execute()
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/core/management/__init__.py", line 395, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/core/management/base.py", line 328, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/core/management/base.py", line 369, in execute
    output = self.handle(*args, **options)
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/core/management/base.py", line 83, in wrapped
    res = handle_func(*args, **kwargs)
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/core/management/commands/migrate.py", line 86, in handle
    executor = MigrationExecutor(connection, self.migration_progress_callback)
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/db/migrations/executor.py", line 18, in __init__
    self.loader = MigrationLoader(self.connection)
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/db/migrations/loader.py", line 49, in __init__
    self.build_graph()
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/db/migrations/loader.py", line 212, in build_graph
    self.applied_migrations = recorder.applied_migrations()
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/db/migrations/recorder.py", line 76, in applied_migrations
    if self.has_table():
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/db/migrations/recorder.py", line 56, in has_table
    return self.Migration._meta.db_table in self.connection.introspection.table_names(self.connection.cursor())
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/db/backends/base/base.py", line 260, in cursor
    return self._cursor()
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/db/backends/base/base.py", line 236, in _cursor
    self.ensure_connection()
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/db/backends/base/base.py", line 220, in ensure_connection
    self.connect()
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/db/backends/base/base.py", line 220, in ensure_connection
    self.connect()
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/db/backends/base/base.py", line 197, in connect
    self.connection = self.get_new_connection(conn_params)
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/utils/asyncio.py", line 26, in inner
    return func(*args, **kwargs)
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/db/backends/postgresql/base.py", line 185, in get_new_connection
    connection = Database.connect(**conn_params)
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/psycopg2/__init__.py", line 127, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
django.db.utils.OperationalError: FATAL:  the database system is starting up
$ docker-compose run --rm web python manage.py createsuperuser
reating fio-de-ariadne_web_run ... done

You have 24 unapplied migration(s). Your project may not work properly until you apply the migrations for app(s): admin, auth, contenttypes, core, sessions.
Run 'python manage.py migrate' to apply them.

Traceback (most recent call last):
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/db/backends/utils.py", line 86, in _execute
    return self.cursor.execute(sql, params)
psycopg2.errors.UndefinedTable: relation "auth_user" does not exist
LINE 1: ...user"."is_active", "auth_user"."date_joined" FROM "auth_user...
                                                             ^


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "manage.py", line 21, in <module>
    main()
  File "manage.py", line 17, in main
    execute_from_command_line(sys.argv)
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/core/management/__init__.py", line 401, in execute_from_command_line
    utility.execute()
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/core/management/__init__.py", line 395, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/core/management/base.py", line 328, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/contrib/auth/management/commands/createsuperuser.py", line 79, in execute
    return super().execute(*args, **options)
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/core/management/base.py", line 369, in execute
    output = self.handle(*args, **options)
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/contrib/auth/management/commands/createsuperuser.py", line 100, in handle
    default_username = get_default_username()
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/contrib/auth/management/__init__.py", line 140, in get_default_username
    auth_app.User._default_manager.get(username=default_username)
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/db/models/manager.py", line 82, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/db/models/query.py", line 411, in get
    num = len(clone)
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/db/models/query.py", line 258, in __len__
    self._fetch_all()
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/db/models/query.py", line 1261, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/db/models/query.py", line 57, in __iter__
    results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/db/models/sql/compiler.py", line 1151, in execute_sql
    cursor.execute(sql, params)
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/db/backends/utils.py", line 100, in execute
    return super().execute(sql, params)
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/db/backends/utils.py", line 68, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/db/backends/utils.py", line 77, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/db/backends/utils.py", line 86, in _execute
    return self.cursor.execute(sql, params)
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/root/.cache/pypoetry/virtualenvs/fio-de-ariadne-91Mc58yS-py3.7/lib/python3.7/site-packages/django/db/backends/utils.py", line 86, in _execute
    return self.cursor.execute(sql, params)
django.db.utils.ProgrammingError: relation "auth_user" does not exist
LINE 1: ...user"."is_active", "auth_user"."date_joined" FROM "auth_user...
                                                             ^

Problemas de instalação no Windows

Opa @cuducos , tenho algumas dúvidas:

[…]

  • Meu Windows é o 10 Home, então tenho que utilizar o docker toobox pra tentar rodar o ambiente, então não sei se é o recomendado para o projeto, se existe alguma configuração necessária.
  • Quando dou o docker-compose eu vi no log que ambas imagens foram criadas no sucesso e com os elementos baixados, mas tive esse erro quando ele tenta rodar do web_1.
web_1  | [RuntimeError]
web_1  | Poetry could not find a pyproject.toml file in /fio or its parents

Originally posted by @JoaoSant0s in #20 (comment)

Qual o usuário / senha padrão para cessar o web app?

Caros,

Consigo rodar o web app mas não estou conseguindo fazer login. Qual seria o usuário e senha padrão para entrar no webapp a partir da tela de login (ver imagem abaixo)?

image

Ps.: Estou rodando as imagens do Docker.

Adicionar campo explícito com foto(s) da criança

No modelo de dados, filtros e visualizações do Admin, mostrar a foto(s) da criança desaparecida. Provavelmente temos que usar algum storage para isso. Qualquer coisa que integre legal com o django-storages acho que fica simples de implementar!

Problemas na Instalação

Olá, estou tentando testar o aplicativo, porém sempre enfrento alguns
problemas de USB ainda, após clonar o repositório, entrei os seguintes
comandos, cheguei até seu projeto através da

http://www.projetohumanos.com.br/o-caso-evandro/1-o-caso-evandro/

Estou ainda no primeiro episódio do podcast,

newbee@newbee ~/Documentos/fio-de-ariadne $ sudo pip install poetry
DEPRECATION: Python 2.7 reached the end of its life on January 1st,
2020. Please upgrade your Python as Python 2.7 is no longer maintained.
A future version of pip will drop support for Python 2.7. More details
about Python 2 support in pip, can be found at
https://pip.pypa.io/en/latest/development/release-process/#python-2-support
WARNING: The directory '/home/newbee/.cache/pip' or its parent directory
is not owned or is not writable by the current user. The cache has been
disabled. Check the permissions and owner of that directory. If
executing pip with sudo, you may want sudo's -H flag.
Collecting poetry
   Downloading poetry-1.0.5-py2.py3-none-any.whl (220 kB)
      |████████████████████████████████| 220 kB 1.7 MB/s
Collecting virtualenv<17.0.0,>=16.7.9; python_version >= "2.7" and
python_version < "2.8"
   Downloading virtualenv-16.7.10-py2.py3-none-any.whl (3.4 MB)
      |████████████████████████████████| 3.4 MB 13.1 MB/s
Collecting shellingham<2.0,>=1.1
   Downloading shellingham-1.3.2-py2.py3-none-any.whl (11 kB)
Collecting clikit<0.5.0,>=0.4.2
   Downloading clikit-0.4.3-py2.py3-none-any.whl (88 kB)
      |████████████████████████████████| 88 kB 12.7 MB/s
Collecting functools32<4.0.0,>=3.2.3; python_version >= "2.7" and
python_version < "2.8"
   Downloading functools32-3.2.3-2.tar.gz (31 kB)
Collecting pkginfo<2.0,>=1.4
   Downloading pkginfo-1.5.0.1-py2.py3-none-any.whl (25 kB)
Collecting cleo<0.8.0,>=0.7.6
   Downloading cleo-0.7.6-py2.py3-none-any.whl (21 kB)
Collecting subprocess32<4.0,>=3.5; python_version >= "2.7" and
python_version < "2.8" or python_version >= "3.4" and python_version < "3.5"
   Downloading subprocess32-3.5.4.tar.gz (97 kB)
      |████████████████████████████████| 97 kB 32.6 MB/s
Requirement already satisfied: requests<3.0,>=2.18 in
/usr/lib/python2.7/dist-packages (from poetry) (2.18.4)
Collecting cachy<0.4.0,>=0.3.0
   Downloading cachy-0.3.0-py2.py3-none-any.whl (20 kB)
Collecting cachecontrol[filecache]<0.13.0,>=0.12.4
   Downloading CacheControl-0.12.6-py2.py3-none-any.whl (19 kB)
Collecting tomlkit<0.6.0,>=0.5.11
   Downloading tomlkit-0.5.11-py2.py3-none-any.whl (31 kB)
Collecting pexpect<5.0.0,>=4.7.0
   Downloading pexpect-4.8.0-py2.py3-none-any.whl (59 kB)
      |████████████████████████████████| 59 kB 19.0 MB/s
Collecting pathlib2<3.0,>=2.3; python_version >= "2.7" and
python_version < "2.8" or python_version >= "3.4" and python_version < "3.5"
   Downloading pathlib2-2.3.5-py2.py3-none-any.whl (18 kB)
Collecting pyrsistent<0.15.0,>=0.14.2
   Downloading pyrsistent-0.14.11.tar.gz (104 kB)
      |████████████████████████████████| 104 kB 9.3 MB/s
Collecting requests-toolbelt<0.9.0,>=0.8.0
   Downloading requests_toolbelt-0.8.0-py2.py3-none-any.whl (54 kB)
      |████████████████████████████████| 54 kB 20.7 MB/s
Collecting jsonschema<4.0,>=3.1
   Downloading jsonschema-3.2.0-py2.py3-none-any.whl (56 kB)
      |████████████████████████████████| 56 kB 26.9 MB/s
Collecting html5lib<2.0,>=1.0
   Downloading html5lib-1.0.1-py2.py3-none-any.whl (117 kB)
      |████████████████████████████████| 117 kB 13.0 MB/s
Collecting keyring<19.0.0,>=18.0.1; python_version >= "2.7" and
python_version < "2.8" or python_version >= "3.4" and python_version < "3.5"
   Downloading keyring-18.0.1-py2.py3-none-any.whl (35 kB)
Collecting glob2<0.7,>=0.6; python_version >= "2.7" and python_version <
"2.8" or python_version >= "3.4" and python_version < "3.5"
   Downloading glob2-0.6.tar.gz (10 kB)
Collecting typing<4.0,>=3.6; python_version >= "2.7" and python_version
< "2.8" or python_version >= "3.4" and python_version < "3.5"
   Downloading typing-3.7.4.1-py2-none-any.whl (26 kB)
Collecting pyparsing<3.0,>=2.2
   Downloading pyparsing-2.4.7-py2.py3-none-any.whl (67 kB)
      |████████████████████████████████| 67 kB 15.3 MB/s
Collecting importlib-metadata<1.2.0,>=1.1.3; python_version < "3.8"
   Downloading importlib_metadata-1.1.3-py2.py3-none-any.whl (29 kB)
Collecting pylev<2.0,>=1.3
   Downloading pylev-1.3.0-py2.py3-none-any.whl (4.9 kB)
Requirement already satisfied: enum34<2.0,>=1.1; python_version >= "2.7"
and python_version < "2.8" in /usr/lib/python2.7/dist-packages (from
clikit<0.5.0,>=0.4.2->poetry) (1.1.6)
Collecting pastel<0.3.0,>=0.2.0
   Downloading pastel-0.2.0-py2.py3-none-any.whl (6.0 kB)
Collecting msgpack>=0.5.2
   Downloading msgpack-1.0.0.tar.gz (232 kB)
      |████████████████████████████████| 232 kB 13.6 MB/s
Requirement already satisfied: lockfile>=0.9; extra == "filecache" in
/usr/lib/python2.7/dist-packages (from
cachecontrol[filecache]<0.13.0,>=0.12.4->poetry) (0.12.2)
Collecting ptyprocess>=0.5
   Downloading ptyprocess-0.6.0-py2.py3-none-any.whl (39 kB)
Collecting scandir; python_version < "3.5"
   Downloading scandir-1.10.0.tar.gz (33 kB)
Requirement already satisfied: six in /usr/lib/python2.7/dist-packages
(from pathlib2<3.0,>=2.3; python_version >= "2.7" and python_version <
"2.8" or python_version >= "3.4" and python_version < "3.5"->poetry)
(1.11.0)
Requirement already satisfied: attrs>=17.4.0 in
/usr/lib/python2.7/dist-packages (from jsonschema<4.0,>=3.1->poetry)
(17.4.0)
Requirement already satisfied: setuptools in
/usr/local/lib/python2.7/dist-packages (from
jsonschema<4.0,>=3.1->poetry) (39.1.0)
Collecting webencodings
   Downloading webencodings-0.5.1-py2.py3-none-any.whl (11 kB)
Collecting secretstorage<3; (sys_platform == "linux2" or sys_platform ==
"linux") and python_version < "3.5"
   Downloading SecretStorage-2.3.1.tar.gz (16 kB)
Collecting entrypoints
   Downloading entrypoints-0.3-py2.py3-none-any.whl (11 kB)
Collecting contextlib2; python_version < "3"
   Downloading contextlib2-0.6.0.post1-py2.py3-none-any.whl (9.8 kB)
Collecting zipp>=0.5
   Downloading zipp-1.2.0-py2.py3-none-any.whl (4.8 kB)
Collecting configparser>=3.5; python_version < "3"
   Downloading configparser-4.0.2-py2.py3-none-any.whl (22 kB)
Requirement already satisfied: cryptography in
/usr/lib/python2.7/dist-packages (from secretstorage<3; (sys_platform ==
"linux2" or sys_platform == "linux") and python_version <
"3.5"->keyring<19.0.0,>=18.0.1; python_version >= "2.7" and
python_version < "2.8" or python_version >= "3.4" and python_version <
"3.5"->poetry) (2.1.4)
Installing collected packages: virtualenv, shellingham, typing, pylev,
pastel, clikit, functools32, pkginfo, cleo, subprocess32, cachy,
msgpack, cachecontrol, tomlkit, ptyprocess, pexpect, scandir, pathlib2,
pyrsistent, requests-toolbelt, contextlib2, zipp, configparser,
importlib-metadata, jsonschema, webencodings, html5lib, secretstorage,
entrypoints, keyring, glob2, pyparsing, poetry
     Running setup.py install for functools32 ... done
     Running setup.py install for subprocess32 ... error
     ERROR: Command errored out with exit status 1:
      command: /usr/bin/python -u -c 'import sys, setuptools, tokenize;
sys.argv[0] = '"'"'/tmp/pip-install-LKAezx/subprocess32/setup.py'"'"';
__file__='"'"'/tmp/pip-install-LKAezx/subprocess32/setup.py'"'"';f=getattr(tokenize,
'"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"',
'"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))'
install --record /tmp/pip-record-0LfomA/install-record.txt
--single-version-externally-managed --compile --install-headers
/usr/local/include/python2.7/subprocess32
          cwd: /tmp/pip-install-LKAezx/subprocess32/
     Complete output (60 lines):
     running install
     running build
     running build_py
     creating build
     creating build/lib.linux-x86_64-2.7
     copying subprocess32.py -> build/lib.linux-x86_64-2.7
     running build_ext
     running build_configure
     checking for gcc... gcc
     checking whether the C compiler works... yes
     checking for C compiler default output file name... a.out
     checking for suffix of executables...
     checking whether we are cross compiling... no
     checking for suffix of object files... o
     checking whether we are using the GNU C compiler... yes
     checking whether gcc accepts -g... yes
     checking for gcc option to accept ISO C89... none needed
     checking how to run the C preprocessor... gcc -E
     checking for grep that handles long lines and -e... /bin/grep
     checking for egrep... /bin/grep -E
     checking for ANSI C header files... yes
     checking for sys/types.h... yes
     checking for sys/stat.h... yes
     checking for stdlib.h... yes
     checking for string.h... yes
     checking for memory.h... yes
     checking for strings.h... yes
     checking for inttypes.h... yes
     checking for stdint.h... yes
     checking for unistd.h... yes
     checking for unistd.h... (cached) yes
     checking fcntl.h usability... yes
     checking fcntl.h presence... yes
     checking for fcntl.h... yes
     checking signal.h usability... yes
     checking signal.h presence... yes
     checking for signal.h... yes
     checking sys/cdefs.h usability... yes
     checking sys/cdefs.h presence... yes
     checking for sys/cdefs.h... yes
     checking for sys/types.h... (cached) yes
     checking for sys/stat.h... (cached) yes
     checking sys/syscall.h usability... yes
     checking sys/syscall.h presence... yes
     checking for sys/syscall.h... yes
     checking for dirent.h that defines DIR... yes
     checking for library containing opendir... none required
     checking for pipe2... yes
     checking for setsid... yes
     checking whether dirfd is declared... yes
     configure: creating ./config.status
     config.status: creating _posixsubprocess_config.h
     building '_posixsubprocess32' extension
     creating build/temp.linux-x86_64-2.7
     x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -Wdate-time
-D_FORTIFY_SOURCE=2 -g
-fdebug-prefix-map=/build/python2.7-5Z483E/python2.7-2.7.17=.
-fstack-protector-strong -Wformat -Werror=format-security -fPIC
-I/usr/include/python2.7 -c _posixsubprocess.c -o
build/temp.linux-x86_64-2.7/_posixsubprocess.o
     _posixsubprocess.c:16:10: fatal error: Python.h: Arquivo ou
diretório inexistente
      #include "Python.h"
               ^~~~~~~~~~
     compilation terminated.
     error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
     ----------------------------------------
ERROR: Command errored out with exit status 1: /usr/bin/python -u -c
'import sys, setuptools, tokenize; sys.argv[0] =
'"'"'/tmp/pip-install-LKAezx/subprocess32/setup.py'"'"';
__file__='"'"'/tmp/pip-install-LKAezx/subprocess32/setup.py'"'"';f=getattr(tokenize,
'"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"',
'"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))'
install --record /tmp/pip-record-0LfomA/install-record.txt
--single-version-externally-managed --compile --install-headers
/usr/local/include/python2.7/subprocess32 Check the logs for full
command output.
newbee@newbee ~/Documentos/fio-de-ariadne $ sudo pip install shell
DEPRECATION: Python 2.7 reached the end of its life on January 1st,
2020. Please upgrade your Python as Python 2.7 is no longer maintained.
A future version of pip will drop support for Python 2.7. More details
about Python 2 support in pip, can be found at
https://pip.pypa.io/en/latest/development/release-process/#python-2-support
WARNING: The directory '/home/newbee/.cache/pip' or its parent directory
is not owned or is not writable by the current user. The cache has been
disabled. Check the permissions and owner of that directory. If
executing pip with sudo, you may want sudo's -H flag.
Collecting shell
   Downloading shell-1.0.1-py2.py3-none-any.whl (5.4 kB)
Installing collected packages: shell
Successfully installed shell-1.0.1
newbee@newbee ~/Documentos/fio-de-ariadne $ createenv
createenv: comando não encontrado
newbee@newbee ~/Documentos/fio-de-ariadne $ python version
python: can't open file 'version': [Errno 2] No such file or directory
newbee@newbee ~/Documentos/fio-de-ariadne $ python3 version
python3: can't open file 'version': [Errno 2] No such file or directory
newbee@newbee ~/Documentos/fio-de-ariadne $ python --version
Python 3.6.5 :: Anaconda, Inc.
newbee@newbee ~/Documentos/fio-de-ariadne $

Limpar nomes das cidades na coleta

Hoje a coleta de dados produz um resultado sujo para as cidades (por exemplo, duas cidades com grafia diferentes):

Adrianópolis/PR - Bairro Capelinha
Almirante Tamandaré/PR
Araucária / Pr.
Cascavel / Pr.
Colombo / Pr.
Colombo/PR
Corbélia/PR
Curitiba / Pr.
Curitiba/PR
Foz do Iguaçu/PR
Guarapuava/PR - Distrito Guará - Assentamento Rio Banana
Guaratuba / Pr.
Iporã / Pr.
Km 35-BR, João Lunardelli, PR 170 – Sítio São Marcos/PR
Lapa / Pr.
Lidianópolis/PR - Bairro Água da Barra - Sitio São Francisco
Londrina / Pr.
Maringá / Pr.
PARANAGUA/PR
-

Precisamor uniformizar isso de alguma forma. Pode ser com um modelo City, ou qualquer outra estratégia. Alguma ideia?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.