Code Monkey home page Code Monkey logo

Comments (23)

cleeland avatar cleeland commented on June 14, 2024

Any hints on paragraph 2, developers?

In attempting to debug this enough to create a small publishable test case, I cannot figure out where the plugin generates its data or, more importantly, which bit of code DELETES all the data generated by the plugin.

from dxr.

erikrose avatar erikrose commented on June 14, 2024

Well, there's plugins/clang/indexer.py:generate_callgraph(). It looks like that reads some raw data emitted by the clang plugin and emits a call graph. Does that help?

from dxr.

cleeland avatar cleeland commented on June 14, 2024

thanks for the reply, @erikrose . As I read your answer, I realize that my question unfortunately used the word "plugin" ambiguously. I was referring to the clang plugin, dxr-index.c, rather than the dxr plugin.

It appears that the dxr plugin (authored in python) is generated a proper callgraph for the data provided by the clang plugin. The clang plugin is providing incomplete data, at least incomplete relative to what actually happens in the code.

from dxr.

erikrose avatar erikrose commented on June 14, 2024

Yep, we should fix that. :-) Sorry for my short reply; I've had to change my question-answering philosophy lately to output short spurts and see if that's enough. Otherwise, I'll never keep up with the pull reqs. :-)

Incidentally, I wonder if this is the same problem I'm running into in https://bugzilla.mozilla.org/show_bug.cgi?id=823777. I assumed that "called-by" and "callers" queries don't work at all, but maybe I ran into the (more insidious?) problem you have. I commented out the tests in json-test/search-test.py. Maybe json-test would serve as a reduction for you.

from dxr.

cleeland avatar cleeland commented on June 14, 2024

I'll give the json-test a try. Do you know if there's any docs anywhere on the format of the stuff spewed by dxr-index and consumed by the clang plugin?

from dxr.

erikrose avatar erikrose commented on June 14, 2024

I've read all the docs and can't say I remember running into any. It's possible they're floating around on the wiki, but imo that'd be the wrong place for them.

from dxr.

cleeland avatar cleeland commented on June 14, 2024

Thanks for the tip on VisitFunctionDecl(). I wonder if the interestingLocation() predicate is deeming things "uninteresting" and bailing quickly. I will fiddle with VisitFunctionDecl() a bit to see if I can get better coverage on the output. Once I get what I think is correct, or at least more correct than what I'm getting now, maybe I can work backwards to figure out the real problem/solution.

from dxr.

erikrose avatar erikrose commented on June 14, 2024

No problem! I deleted my comment because I wasn't sure that was emitting calling info—just function decls.

I've been looking through indexer.py and can give you a partial outline of the flow:

generate_callgraph() pulls the raw calling info out of a global, calls. calls is populated by process_call(), which is called from dump_indexer_output(), which pulls its input from CSV files in a temp dir.

from dxr.

erikrose avatar erikrose commented on June 14, 2024

All you have to do is figure out who's writing that CSV file, which shouldn't be too hard (hope, hope).

from dxr.

cleeland avatar cleeland commented on June 14, 2024

Just to confirm, json-test is a suitable reduction. callers:getHello returns no results despite the fact that it's called by main().

I looked directly at the database in table callers, and it's listed in there correctly, i.e., there is a row in there with callerid==main and targetid==getHello.

So, at least with the json-test case, it may actually be a search-side issue.

from dxr.

erikrose avatar erikrose commented on June 14, 2024

Thanks; I've added that to my other bug.

from dxr.

cleeland avatar cleeland commented on June 14, 2024

Looks like query.py handles the actual inquiry into the DB, though it looks like some sort of pseudo-embedded sql python-fu. My python skillz aren't great, so it's going to take me awhile to understand what's happening in there.

from dxr.

erikrose avatar erikrose commented on June 14, 2024

My Python fu is in good shape; let me know if you get stuck.

from dxr.

cleeland avatar cleeland commented on June 14, 2024

Trying to figure out how the filters defined in query.py actually get invoked. I manually executed what the sql that I think should get executed when I search for "callers:getHello":

select functions.extent_start, functions.extent_end from functions
where functions.file_id = 3 and
            exists (select 1 from functions as target, callers
                         where target.fname like 'getHello'
                         and callers.targetid = target.funcid
                         and callers.callerid = functions.funcid)
order by functions.extent_start;

and got back

89|93

but I don't know if it's right or not.

from dxr.

erikrose avatar erikrose commented on June 14, 2024

I'm not sure offhand what data the queries are supposed to emit, but you can always stick print statements in the code or even a breakpoint if you want to confirm the queries you deduced are accurate. To insert a breakpoint, just stick import pdb;pdb.set_trace() in the code. The interface is a lot like gdb.

from dxr.

cleeland avatar cleeland commented on June 14, 2024

Oooookaaaayyy...

First, pdb isn't fun to use in python cgi scripts :\

I've narrowed the issue with callers not working to the sql statement generated by a query such as "callers:getHello". The sql statement returns no rows. This is a problem. The statement looks like this

SELECT thefiles.path, thefiles.icon, thefiles.ID,
    extents(trg_index.contents)
      FROM trg_index, files as thefiles
     WHERE  thefiles.ID = trg_index.id
  AND EXISTS (SELECT 1
                        FROM functions as caller, functions as target, callers
                       WHERE  target.fname LIKE '%GetNextBuf%' ESCAPE "\" 
                         AND callers.targetid = target.funcid
                         AND callers.callerid = caller.funcid
                         AND caller.file_id = thefiles.ID
                       )
  AND EXISTS (SELECT 2
                        FROM functions as caller, functions as target, callers
                       WHERE  target.fname LIKE '%GetNextBuf%' ESCAPE "\" 
                         AND  EXISTS ( SELECT 1 FROM targets
                                        WHERE targets.funcid = target.funcid
                                          AND targets.targetid = callers.targetid
                                     )
                         AND callers.callerid = caller.funcid
                         AND caller.file_id = thefiles.ID
                    )
LIMIT 100 OFFSET 0
;

More specifically, the problem lies in the sub-queries inside the EXISTS in the WHERE clause. The first sub-query, denoted by the SELECT 1 ..., comes from the direct-calls callers filter, whereas the second sub-query, denoted by the SELECT 2 ..., comes from the INdirect-calls callers filter.

Removing either of those two sub-queries makes the query work and generate results, although the results are incomplete because they do not include the results from the missing filter.

Eliminating the AND EXISTS from around the second sub-query and replacing it with UNION yields complete results, but I don't see how to generate that from the existing code base and filter set.

One possibility is to collapse the two callers filters into a single that has the UNION embedded already, yielding a query that looks like this:

SELECT thefiles.path, thefiles.icon, thefiles.ID,
    extents(trg_index.contents)
      FROM trg_index, files as thefiles
     WHERE  thefiles.ID = trg_index.id
  AND EXISTS (SELECT 1
                        FROM functions as caller, functions as target, callers
                       WHERE  target.fname LIKE '%GetNextBuf%' ESCAPE "\" 
                         AND callers.targetid = target.funcid
                         AND callers.callerid = caller.funcid
                         AND caller.file_id = thefiles.ID
          UNION
                      SELECT 2
                        FROM functions as caller, functions as target, callers
                       WHERE  target.fname LIKE '%GetNextBuf%' ESCAPE "\" 
                         AND  EXISTS ( SELECT 1 FROM targets
                                        WHERE targets.funcid = target.funcid
                                          AND targets.targetid = callers.targetid
                                     )
                         AND callers.callerid = caller.funcid
                         AND caller.file_id = thefiles.ID
                    )
LIMIT 100 OFFSET 0
;

from dxr.

cleeland avatar cleeland commented on June 14, 2024

I tried quickly hacking in a combined filter, but the string substitution for like_name and qual_name break. I think python has named substitution that can work, and I might try that.

from dxr.

cleeland avatar cleeland commented on June 14, 2024

Even with named substitution there are still issues when calling into sqlite because not enough parameters are passed for sqlite to substitute for the proper number of question mark placeholders in the query. Sigh.

from dxr.

cleeland avatar cleeland commented on June 14, 2024

So...does anybody know why two filters are matched rather than just having one filter?

from dxr.

jonasfj avatar jonasfj commented on June 14, 2024
So...does anybody know why two filters are matched rather than just having one filter?

That would be because I made bug... Obviously, the two filters are conjunctive, and exclude most (if not all) results.
Callers should be either direct or indirect, not both. So these filters should be merged into one.

To fix this we'll probably need to write a new SearchFilter subclass, or update ExistsLikeFilter to replace %s with self.like_expr and count the number of occurrences of %s, so that it can be matched with an appropriate number of arguments.
This way it would be possible to have multiple occurences of "%s" inside one filter_sql statement, and the UNION query as proposed by @cleeland would work.
This is just an illustration for line 505 - 510:

for arg in query.params[self.param]:
  yield (
          "EXISTS (%s)" % (self.filter_sql.replace("%s", self.like_expr)),
          ['%' + like_escape(arg) + '%'] * self.filter_sql.count("%s"),
          self.ext_sql is not None
        )

I don't have a working DXR install at the moment. So I didn't test this, nor is this a complete solution.

from dxr.

cleeland avatar cleeland commented on June 14, 2024

Thanks @jonasfj for the feedback. I think it may be more complicated than what you suggest, though I probably don't have as good an understanding of the data model or python as you.

I have been hacking away at query.pl in a working copy of dxr and have done a lot of different things. It will take me awhile to pull it all apart in logical stages.

I have the filter_sql working (I think), but now it's stumbling on ext_sql. In this case, the query itself is so complicated that it takes a long time, and then yields no rows. I know for my code base that's not true, so either there is a problem with what got parsed and shoved data into the DB or there is a problem with the query. Once again I will need to try this with the accompanying test source code to see if I can get reproduce the problem.

Thank you all for your continued assistance in figuring this out.

from dxr.

cleeland avatar cleeland commented on June 14, 2024

Interesting! I see that abbeyj did some work to fix this.

FWIW, I put my work-in-progress on this issue in a branch on my dxr fork:

cleeland/dxr/issue_57_work

Unfortunately, there are a myriad of changes commingled with no granularity, and some of them are obviated by pull request #74 . However, in browsing the commit associated with #74, it doesn't look like the SQL itself was changed to be more efficient. One of the changes in my branch also attempted to make the SQL more efficient by reducing the number of nested queries and inner joins.

At some point in the next month I will hopefully have time to get back to this and investigate integrating my query changes with the solution in #74 unless somebody else takes the helm.

from dxr.

cleeland avatar cleeland commented on June 14, 2024

[Why doesn't GFM properly hyperlink the references to my branch? Here is the URL https://github.com/cleeland/dxr/tree/issue_57_work ]

from dxr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.