Code Monkey home page Code Monkey logo

joern's People

Contributors

a0x77n avatar agelastic avatar fabsx00 avatar ideadapt avatar l4l avatar malteskoruppa avatar rieck avatar soiax avatar svanteschubert avatar timhemel avatar tvondra avatar vlad902 avatar vvalter avatar yangke avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

joern's Issues

Unsanitized probably doesn't grub arrays

Query like getArguments('f', '0').unsanitized({it,s -> it._().codeContains('$a'))}) (i.e. unsanitzed shouldn't filter anything)

With following signature of f:

void f(int *a);

grubs normally:

void g(int *a) {
  f(a);
}

but doesn't work with:

void g(int *a[]) {
  f(a[0]);
}

Documentation update

Are you planning to update docs and python-joern's docs? It's now about 2 years old
There's some cases that really can make your life miserable and spend lot of your time like:
Not working .unsanitized{ it.isCheck( paramName ) }; working: .unsanitized{ it -> it.isCheck( paramName ) } (that also can be due to groovy/gremlin/neo4j updates, just guess by numerous attempts)
Not working getArguments('(f1 OR f2)', '0'); possible workaround: OR(getArguments('f1', '0'), getArguments('f2', '0')
And probably much more stuff like this.

Warning in nested macro conditionals and Parse error in embeded macro

The following code will not be parsed,and joern will throw two warnings and an error exception .

#if !_LIBC
/* Return non-zero if FD is open.  */
static int
is_open (int fd)
{
# if (defined _WIN32 || defined __WIN32__) && ! defined __CYGWIN__
  /* On native Windows: The initial state of unassigned standard file
     descriptors is that they are open but point to an INVALID_HANDLE_VALUE.
     There is no fcntl, and the gnulib replacement fcntl does not support
     F_GETFL.  */
  return (HANDLE) _get_osfhandle (fd) != INVALID_HANDLE_VALUE;
# else
#  ifndef F_GETFL
#   error Please port fcntl to your platform
#  endif
  return 0 <= fcntl (fd, F_GETFL);
# endif
}
#endif
static void
flush_stdout (void)
{
#if !_LIBC
  int stdout_fd;
# if GNULIB_FREOPEN_SAFER
  /* Use of gnulib's freopen-safer module normally ensures that
       fileno (stdout) == 1
     whenever stdout is open.  */
  stdout_fd = STDOUT_FILENO;
# else
  /* POSIX states that fileno (stdout) after fclose is unspecified.  But in
     practice it is not a problem, because stdout is statically allocated and
     the fd of a FILE stream is stored as a field in its allocated memory.  */
  stdout_fd = fileno (stdout);
# endif
  /* POSIX states that fflush (stdout) after fclose is unspecified; it
     is safe in glibc, but not on all other platforms.  fflush (NULL)
     is always defined, but too draconian.  */
  if (0 <= stdout_fd && is_open (stdout_fd))
#endif
    fflush (stdout);
}
static void
print_errno_message (int errnum)
{
  char const *s;
#if defined HAVE_STRERROR_R || _LIBC
  char errbuf[1024];
# if STRERROR_R_CHAR_P || _LIBC
  s = __strerror_r (errnum, errbuf, sizeof errbuf);
# else
  if (__strerror_r (errnum, errbuf, sizeof errbuf) == 0)
    s = errbuf;
  else
    s = 0;
# endif
#else
  s = strerror (errnum);
#endif
#if !_LIBC
  if (! s)
    s = _("Unknown system error");
#endif
#if _LIBC
  __fxprintf (NULL, ": %s", s);
#else
  fprintf (stderr, ": %s", s);
#endif
}

[phpsupport] Move node and edge strings for unit test cases into individual files

Many node and edge strings for PHP in CSV format are found in TestPHPCSVASTBuilder and TestPHPCSVASTBuilderMinimal. A subset of the former are additionally reproduced in CSVASTSamples. We should move them into individual files and initialize such strings from these files, thus separating program logic and test resources. It will also make it much easier to apply changes to the CSV format, as files can be easily regenerated by some shell script. (This could be made part of rebuildTestDB.sh, or introduced in a similarly-named file such as rebuildTestCSVs.sh.)

Joern and Joern-Tools

I installed Joern and its dependencies as described in the documentation; however, there were issues encountered in using the tool.

  1. The py2neo-gremlin tool you have linked no longer appears compatible with the latest py2neo tool.

Current Link:
https://github.com/fabsx00/py2neo-gremlin/blob/master/py2neo_gremlin/gremlin.py

Latest from py2neo:
https://github.com/nigelsmall/py2neo/blob/a3eae0dae3f18548b4dacfb54df283d5b35c8b3b/py2neo/ext/gremlin/__init__.py

When accessing the code via python-joern per the "Basic Usage" example in the documentation, several errors arise from your gremlin link due to the latest version of py2neo.

For example,

  • graph_db no longer has the attribute "metadata" (line 13)
  • Resource does not have _post as an attribute ( line 22)
  • neo4j does not the attribute _hydrated (line 23)

After updating gremlin.py to reflect the latest version in py2neo/ext/gremlin, it compiled successfully; but, the Basic Usage example still returned no results.

  1. When adding and analyzing "testCode", no data and no relationships were shown in Neo4J (http://localhost:7474/db/data/node/0) when adding testCode; furthermore, when executing "echo 'g.v(0).out()' | joern-lookup -g", no results were returned.
  2. When following the VLC tutorial, no data and no relationships were shown in Neo4J (http://localhost:7474/db/data/node/0) when adding the VLC source code; furthermore, when executing "echo 'g.v(0).out()' | joern-lookup -g", no results were returned.

I would love to utilize your tool, but I don't understand why no results are being returned for any source code analysis after following your documentation.

I appreciate your insight and I look forward to hearing from you.

CSV output

Since a clear interface to the output module exists now, it would be pretty nice to have a CSV output module that stores nodes and edges in a format suitable to be imported using the batch-importer by Michael Hunger at https://github.com/jexp/batch-import/ .
For one thing, we could use this as an interchange format between different graph databases. For another, it might result in a performance gain and outsources the job of efficiently importing into the database to a generic tool.

About the BUG when running CallGraphCreator after indexed the linux kernel file like linux-2.6.10/drivers/usb/media/sn9c102_core.c

Firstly,the thowed exception is pasted here:

Exception in thread "main" java.lang.NullPointerException
at org.apache.lucene.util.SimpleStringInterner.intern(SimpleStringInterner.java:54)
at org.apache.lucene.util.StringHelper.intern(StringHelper.java:39)
at org.apache.lucene.index.Term.(Term.java:38)
at org.apache.lucene.queryParser.QueryParser.getFieldQuery(QueryParser.java:643)
at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1436)
at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1319)
at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1275)
at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1234)
at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:206)
at org.neo4j.index.impl.lucene.IndexType.query(IndexType.java:300)
at org.neo4j.index.impl.lucene.LuceneBatchInserterIndex.query(LuceneBatchInserterIndex.java:372)
at output.neo4j.Neo4JBatchInserter.queryIndex(Neo4JBatchInserter.java:69)
at tools.callGraph.CallGraphListener.lookupCallee(CallGraphListener.java:62)
at tools.callGraph.CallGraphListener.resolveCalledFunction(CallGraphListener.java:40)
at tools.callGraph.CallGraphListener.createOutgoingCallGraphEdges(CallGraphListener.java:24)
at tools.callGraph.CallGraphListener.visitFunction(CallGraphListener.java:17)
at tools.callGraph.ImportedFunctionWalker.notifyListenersOfFunction(ImportedFunctionWalker.java:37)
at tools.callGraph.ImportedFunctionWalker.walk(ImportedFunctionWalker.java:29)
at tools.callGraph.CallGraphCreator.main(CallGraphCreator.java:19)

Secondly,I found the exception is about the handling the code like :

linux-2.6.10/drivers/usb/media/sn9c102_core.c

sn9c102_urb_complete (struct urb * urb , struct pt_regs * regs)

if (status) {
DBG(3, "Error in isochronous frame")
(*f)->state = F_ERROR;
continue;
}

This is a "if" statement in linux kernel 2.6.10.
When "joern" index this file,it wrongly handled the "if body" usage code of macro("DBG(...)") included
which caused when you run the main class tools.callGraph.CallGraphCreator
it throws an Null pointer exception because the index misrecognise
DBG(3, "Error in isochronous frame")(*f) as a function
and naively think DBG(3, "Error in isochronous frame") is the functionName.

Here is the debug result when I index the whole linux-2.6.10 directory.
the result is got from the function:

private String getCalleeString(Long callId)

which is called by:

private IndexHits resolveCalledFunction(long callId)

FirstChild of 2830017 is
2830021:DBG ( 3 , "Error in isochronous frame" )
AFTER REP:DBG ( 3 , "Error in isochronous frame" )
2829432:sn9c102_urb_complete (struct urb * urb , struct pt_regs * regs)
linux-2.6.10/drivers/usb/media/sn9c102_core.c

CSV issue

I tried the dev branch with small project(i can share if you need)

First thought was that csv file is generated based on sources but it drops an exception so I guessed that it should work from already generated db

So after generating db(181 nodes & 417 edges) I ran ./joern .joernIndex -outformat csv
and get this file hierarchy:
screenshot- - -27-18 47 27-msk-2016
And all files are look similar to a selected one. So that's quite weird behaviour

Parser fails on convoluted expressions

The parser fails on some valid (but convoluted) expressions such as:

int res = ((int (*)())(*(&function_table[atoi(argv[1])])))(atoi(argv[2]), atoi(argv[3]));

It ends up creating multiple statement nodes that will just contain one-character pieces of the whole expression. Also, it does not report an error.

Full example:

#include <stdio.h>
int sum(int a, int b) {return a+b;}
int mul(int a, int b) {return a*b;}
int main(int argc, char **argv) {
    int (*function_table[2])(int a, int b);
    function_table[0] = sum;
    function_table[1] = mul;
    int res = ((int (*)())(*(&function_table[atoi(argv[1])])))(atoi(argv[2]), atoi(argv[3]));
    printf("result %d\n", res);
    return 0;
}

Building dev branch

Is it intended to build the dev branch or it's not ready yet?

I've tried couple of times with different versions of openjdk(7 & 8). I delete unnecessary info from logs, feel free to ask the full ones

For gradle jar/gradle classes with openjdk-8:

:projects:joern-pluginlib:compileJava .../projects/joern-pluginlib/src/main/java/joern/pluginlib/plugintypes/JoernProjectPlugin.java:4: error: cannot find symbol
import octopus.lib.plugintypes.OctopusProjectPlugin;
                              ^
  symbol:   class OctopusProjectPlugin
  location: package octopus.lib.plugintypes
.../projects/joern-pluginlib/src/main/java/joern/pluginlib/plugintypes/JoernProjectPlugin.java:6: error: cannot find symbol
public class JoernProjectPlugin extends OctopusProjectPlugin {
                                        ^
  symbol: class OctopusProjectPlugin
.../projects/joern-pluginlib/src/main/java/joern/pluginlib/plugintypes/JoernProjectPlugin.java:10: error: cannot find symbol
                setProjectConnector(new JoernProjectConnector());
                ^
  symbol:   method setProjectConnector(JoernProjectConnector)
  location: class JoernProjectPlugin
3 errors

BUILD FAILED

With gradle build & openjdk-8 build fails on tests(due to version maybe?):

:projects:joern-fuzzyc:test

tests.ddg.testDDGCreator > simplestTest FAILED
    java.util.NoSuchElementException at testDDGCreator.java:33

tests.ddg.testDDGCreator > testConverter FAILED
    java.util.NoSuchElementException at testDDGCreator.java:55

tests.argumentTainter.TestArgumentTainter > testDefUseCFGPatcher FAILED
    java.util.NoSuchElementException at TestArgumentTainter.java:65

135 tests completed, 3 failed
:projects:joern-fuzzyc:test FAILED

And finally gradle build/gradle jar/gradle classes & openjdk-7 fails on lambdas at CG.java due to it's unsupported at 7th version

Incorrect handling of try/catch makes evaluating Windows drivers impossible

It seems to me that joern doesn't handle try{}catch{} correctly.

I'm now using the following code:
query = (""
"getArguments('ProbeForRead', '0')"
".match{ it.type == 'Identifier' && it.childNum == '0' }" # This gets the 'identifier' node
".sideEffect { paramName = it.code }"
".reachableCfgNodes()"
)

You can see from the output (below) running against the demo code below that reachableCfgNodes doesn't return either the statements after ProbeForRead that are inside the "try" statement, and it also doesn't return the statements after the "catch" statement. So all that Joern seems to think is reachable when you start from a node inside a "try" statement are the nodes in the "catch" statement.

This prevents Joern from being used against many Windows driver scenarios.

Output:
(n68 {childNum:"0",code:"ntStatus = GetExceptionCode ( )",functionId:14,isCFGNode:"True",location:"24:2:480:509",type:"ExpressionStatement"})

(n118 {code:"EXIT",functionId:14,isCFGNode:"True",type:"CFGExitNode"})

(n66 {childNum:"1",code:"goto Cleanup ;",functionId:14,isCFGNode:"True",location:"25:2:513:525",type:"GotoStatement"})

(n76 {childNum:"6",code:"EXCEPTION_EXECUTE_HANDLER",functionId:14,isCFGNode:"True",location:"21:8:447:471",type:"Statement"})

(n77 {childNum:"5",code:"(",functionId:14,isCFGNode:"True",location:"21:7:446:446",type:"Statement"})

(n78 {childNum:"4",code:"except",functionId:14,isCFGNode:"True",location:"21:1:440:445",type:"Statement"})

(n28 {childNum:"12",code:"return ntStatus ;",functionId:14,isCFGNode:"True",location:"40:1:804:819",type:"ReturnStatement"})

(n30 {childNum:"11",code:"Cleanup :",functionId:14,isCFGNode:"True",location:"39:0:794:801",type:"Label"})

(n75 {childNum:"7",code:")",functionId:14,isCFGNode:"True",location:"21:33:472:472",type:"Statement"})

(n273 {code:"EXIT",functionId:131,isCFGNode:"True",type:"CFGExitNode"})

(n190 {childNum:"8",code:")",functionId:131,isCFGNode:"True",location:"60:33:1323:1323",type:"Statement"})

(n191 {childNum:"7",code:"EXCEPTION_EXECUTE_HANDLER",functionId:131,isCFGNode:"True",location:"60:8:1298:1322",type:"Statement"})

(n145 {childNum:"13",code:"return ntStatus ;",functionId:131,isCFGNode:"True",location:"79:1:1637:1652",type:"ReturnStatement"})

(n147 {childNum:"12",code:"Cleanup :",functionId:131,isCFGNode:"True",location:"78:0:1627:1634",type:"Label"})

(n193 {childNum:"5",code:"except",functionId:131,isCFGNode:"True",location:"60:1:1291:1296",type:"Statement"})

(n183 {childNum:"0",code:"ntStatus = GetExceptionCode ( )",functionId:131,isCFGNode:"True",location:"63:2:1331:1360",type:"ExpressionStatement"})

(n192 {childNum:"6",code:"(",functionId:131,isCFGNode:"True",location:"60:7:1297:1297",type:"Statement"})

(n181 {childNum:"1",code:"goto Cleanup ;",functionId:131,isCFGNode:"True",location:"64:2:1364:1376",type:"GotoStatement"})

(n273 {code:"EXIT",functionId:131,isCFGNode:"True",type:"CFGExitNode"})

(n190 {childNum:"8",code:")",functionId:131,isCFGNode:"True",location:"60:33:1323:1323",type:"Statement"})

(n191 {childNum:"7",code:"EXCEPTION_EXECUTE_HANDLER",functionId:131,isCFGNode:"True",location:"60:8:1298:1322",type:"Statement"})

(n145 {childNum:"13",code:"return ntStatus ;",functionId:131,isCFGNode:"True",location:"79:1:1637:1652",type:"ReturnStatement"})

(n147 {childNum:"12",code:"Cleanup :",functionId:131,isCFGNode:"True",location:"78:0:1627:1634",type:"Label"})

(n193 {childNum:"5",code:"except",functionId:131,isCFGNode:"True",location:"60:1:1291:1296",type:"Statement"})

(n183 {childNum:"0",code:"ntStatus = GetExceptionCode ( )",functionId:131,isCFGNode:"True",location:"63:2:1331:1360",type:"ExpressionStatement"})

(n192 {childNum:"6",code:"(",functionId:131,isCFGNode:"True",location:"60:7:1297:1297",type:"Statement"})

(n181 {childNum:"1",code:"goto Cleanup ;",functionId:131,isCFGNode:"True",location:"64:2:1364:1376",type:"GotoStatement"})

Demo code:

typedef struct _DATACOPY
{
DWORD SourceLength;
PVOID Source;
DWORD DestLength;
PVOID Dest;
} DATACOPY, *PDATACOPY;

NTSTATUS WriteWhatWhere(PDEVICE_OBJECT DeviceObject, PIRP Irp, PDATACOPY dataCopy)
{
NTSTATUS ntStatus = STATUS_SUCCESS;

UNREFERENCED_PARAMETER(DeviceObject);
UNREFERENCED_PARAMETER(Irp);

try
{
    //Verify that the structure being read is in usermode
    ProbeForRead(dataCopy, sizeof(DATACOPY), sizeof(UCHAR));
}
except(EXCEPTION_EXECUTE_HANDLER)
{

    ntStatus = GetExceptionCode();
    goto Cleanup;
}


if (dataCopy->DestLength < dataCopy->SourceLength)
{
    ntStatus = STATUS_INVALID_BUFFER_SIZE;
    goto Cleanup;
}

//Vuln: Don't verify where the pointers in the structure are located
RtlCopyMemory(dataCopy->Dest, dataCopy->Source, dataCopy->SourceLength);

Cleanup:
return ntStatus;
}

NTSTATUS Safe(PDEVICE_OBJECT DeviceObject, PIRP Irp, PDATACOPY dataCopy)
{
NTSTATUS ntStatus = STATUS_SUCCESS;
size_t size;

UNREFERENCED_PARAMETER(DeviceObject);
UNREFERENCED_PARAMETER(Irp);

try
{
    //Verify that the structure being read is in usermode
    ProbeForRead(dataCopy, sizeof(DATACOPY), sizeof(UCHAR));
    size = dataCopy->SourceLength;
    ProbeForWrite(dataCopy->Dest, size, sizeof(UCHAR));
    ProbeForRead(dataCopy->Source, size, sizeof(UCHAR));
}
except(EXCEPTION_EXECUTE_HANDLER)
{

    ntStatus = GetExceptionCode();
    goto Cleanup;
}


if (dataCopy->DestLength < dataCopy->SourceLength)
{
    ntStatus = STATUS_INVALID_BUFFER_SIZE;
    goto Cleanup;
}

//Vuln: Don't verify where the pointers in the structure are located
RtlCopyMemory(dataCopy->Dest, dataCopy->Source, size);

Cleanup:
return ntStatus;
}

Inconsistency in argument tainting

When tainting arguments using argumentTainter.jar, DEF-edges are only added from the affected statement nodes but not from the argument-nodes. This is incosistent as USE-edges are present for the respective symbols.

Add Callee nodes in CallExpressions

Right now, the first child of a CallExpression is the callee, however, no pseudo node of type 'Callee' exists. This means that Callees cannot be retrieved by type from the index directly but need to be extracted by traversing from the CallExpression to its first child. This is pretty clumsy and prohibits extracting calles using funcLs. It would make sense to introduce pseudo nodes here.

JavaScript Support

In my fork I started to teach joern ECMAScript5 (JavaScript).

If ever, I want to create a PR, which branch should I branch off now, dev or master?

Type error in sideEffect clause

getCallsTo("foo")                                                                              
  8   .sideEffect{ arg = it.ithArguments("0").code; id = it.id; code = it.code;  }          
  7   .inE("IS_AST_PARENT")                                                                                
  6   .outV()                                                                                              
  5   .as("x")                                                                                            
  4   .outE("FLOWS_TO")                                                                                    
  3   .inV()                                                                                              
  2   .loop("x"){it.loops < 40 && it.object.hasNot("code", "EXIT") && !it.object.code.matches(".*RETURN.*")
  1   }                                                                                                    
  0   {!it.object.has("code", "foo ( "+arg+" )" && it.object.id != fst_id}  

py2neo.error.BadInputException: groovy.lang.MissingMethodException: No signature of method: java.lang.String.has() is applicable for argument types: (java.lang.String, java.lang.String) values: [code, foo (, ( [GremlinStartPipe, [GremlinStartPipe, [GremlinStartPipe, [GremlinStartPipe, VertexQueryPipe(out,[IS_AST_PARENT],vertex)], FilterFunctionPipe, [GremlinStartPipe, VertexQueryPipe(out,[IS_AST_PARENT],vertex)]], FilterFunctionPipe], PropertyPipe(code)] )]
Possible solutions: tr(java.lang.String, java.lang.String), wait(), any(), is(java.lang.Object), trim(), next()

Expected type of arg is string. Instead it's ( [GremlinStartPipe, [GremlinStartPi. How do I extract a string representing the function's parameter in sideEffect clause?

I want to find paths in the CFG that start and end with foo(arg) or bar(arg). The query above is my first try to do that. Interestingly, if I use it.object.code == code in the break clause of loop, query works and emits paths starting and ending with foo(arg).

Parser fails on multiline strings

The parser fails on multiline strings. Example:

#include <stdio.h>
int main(int argc, char **argv) {
    char buf[] = "this is a "
    "multiline string\n";

    printf("%s", buf);
    return 0;
}

Argument UNtainter

At some point, it would be nice to have an argument untainter. Right now, if you do a mistake while tainting, you will have to start from the beginning.

Missing Def edges for reference arguments

For a call expression, accepting pointer arguments there seems to be missing def edges.

If an parameter to a function is passed by reference, there should be a def edge for that symbol.

problem about the knn tool in joern-tool

I recently am working with "Finding Similar Functions with joern-tools'. When I use the knn, some problem appear.
Command used:

joern-list-funcs -p flic_decode_frame_8BPP | awk -F "\t" '{print $2}' | joern-kn

and here is the error messages in terminal:

Traceback (most recent call last):
File "/usr/local/bin/joern-knn", line 5, in
pkg_resources.run_script('joerntools==0.1', 'joern-knn')
File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 528, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 1394, in run_script
execfile(script_filename, namespace, namespace)
File "/usr/local/lib/python2.7/dist-packages/joerntools-0.1-py2.7.egg/EGG-INFO/scripts/joern-knn", line 77, in
tool.run()
File "/usr/local/lib/python2.7/dist-packages/joerntools-0.1-py2.7.egg/joerntools/shelltool/CmdLineTool.py", line 19, in run
self._runImpl()
File "/usr/local/lib/python2.7/dist-packages/joerntools-0.1-py2.7.egg/joerntools/shelltool/PipeTool.py", line 28, in _runImpl
self._processStream()
File "/usr/local/lib/python2.7/dist-packages/joerntools-0.1-py2.7.egg/joerntools/shelltool/PipeTool.py", line 36, in _processStream
self.streamStart()
File "/usr/local/lib/python2.7/dist-packages/joerntools-0.1-py2.7.egg/EGG-INFO/scripts/joern-knn", line 57, in streamStart
self.knn.initialize()
File "/usr/local/lib/python2.7/dist-packages/joerntools-0.1-py2.7.egg/joerntools/KNN.py", line 33, in initialize
self.emb.x = Normalizer().fit_transform(self.emb.x)
AttributeError: 'NoneType' object has no attribute 'x'

Could it possible for you to solve the problem? Thank you.

glibc import error

Hello,

I am trying to import glibc 2.17 using joern 0.3.1. However, I got the error message attached below. Could you please help me to resolve this? Thank you!

Some additional info:

  • The command I use: java -jar joern.jar ..\glibc
  • I am using neo4j 2.1.6 on Windows 8.
  • I have imported some other c programs (e.g. libexif 0.6.20) and they work fine.

Error output:

...
E:\Vulns\glibc_joern..\glibc\wctype\wctype.h
E:\Vulns\glibc_joern..\glibc\wctype\wctype_l.c
Exception in thread "main" org.neo4j.kernel.impl.nioneo.store.UnderlyingStorageException: Unable to close store .joernIn
dex\neostore.propertystore.db.strings
at org.neo4j.kernel.impl.nioneo.store.CommonAbstractStore.close(CommonAbstractStore.java:743)
at org.neo4j.kernel.impl.nioneo.store.PropertyStore.closeStorage(PropertyStore.java:125)
at org.neo4j.kernel.impl.nioneo.store.CommonAbstractStore.close(CommonAbstractStore.java:695)
at org.neo4j.kernel.impl.nioneo.store.NeoStore.closeStorage(NeoStore.java:279)
at org.neo4j.kernel.impl.nioneo.store.CommonAbstractStore.close(CommonAbstractStore.java:695)
at org.neo4j.unsafe.batchinsert.BatchInserterImpl.shutdown(BatchInserterImpl.java:800)
at neo4j.batchInserter.Neo4JBatchInserter.closeDatabase(Unknown Source)
at outputModules.neo4j.Neo4JIndexer.shutdownDatabase(Unknown Source)
at tools.index.Indexer.shutdown(Unknown Source)
at tools.index.IndexMain.walkCodebase(Unknown Source)
at tools.index.IndexMain.main(Unknown Source)
Caused by: java.io.IOException: The requested operation cannot be performed on a file with a user-mapped section open
at sun.nio.ch.FileDispatcherImpl.truncate0(Native Method)
at sun.nio.ch.FileDispatcherImpl.truncate(Unknown Source)
at sun.nio.ch.FileChannelImpl.truncate(Unknown Source)
at org.neo4j.kernel.impl.nioneo.store.StoreFileChannel.truncate(StoreFileChannel.java:98)
at org.neo4j.kernel.impl.nioneo.store.StoreFileChannel.truncate(StoreFileChannel.java:28)
at org.neo4j.kernel.impl.nioneo.store.CommonAbstractStore$1.perform(CommonAbstractStore.java:725)
at org.neo4j.kernel.impl.util.FileUtils.windowsSafeIOOperation(FileUtils.java:367)
at org.neo4j.kernel.impl.nioneo.store.CommonAbstractStore.close(CommonAbstractStore.java:715)
... 10 more

Some refactorings to abstract from C and PHP

As discussed in PR #79, I would suggest refactoring some classes within the ast package in order to abstract from C and from PHP. Basically every class in ast itself should be seen as a base class that can be extended by C- or PHP-aware implementations in ast.c resp. ast.php packages (the same goes, of course, for packages such as ast.expressions, ast.declarations, etc.).

My basic approach would be to say, "whenever a base class declares a field that is only used for C (resp. only for PHP), create a new extending class, declare that field in the extending class only, and move associated get- resp. set- methods to the extending class accordingly."

Another problem: Often, the C implementations override the methods addChild(ASTNode) and accept(ASTNodeVisitor), neither of which is needed by the PHP classes. I am not sure what to do about these. The absolutely 100% clean way would be to also move them to extending classes, but this would imply having to create C-aware extending implementations for each and every single class in the ast package. Therefore, as long as they do not hurt the PHP world, we can perhaps simply leave them in the base classes, and only move them to extending C-aware classes when such an extending class is declared anyway, due to the approach mentioned above.

Note that accept(ASTNodeVisitor) never hurts PHP world. At the contrary, addChild(ASTNode) can be confusing for PHP world, because in PHP world we do need the super-method ASTNode.addChild(ASTNode) in order to add a node to another node's list of children, but we do not need the addChild(ASTNode) override declared in most classes that inherit from ASTNode and which call various handlers which are not needed for PHP and can have undesired side effects. The override, however, prevents us from calling the super-method ASTNode.addChild(ASTNode) directly in PHP world. Currently this is solved by having the edge interpreter in PHP call the various handler methods directly, and have the handler methods call the super-method ASTNode.addChild(ASTNode).

All in all, attending to this issue will split the C and the PHP world cleanly and more easily allow for integration of further languages into Joern. I will add one comment to this thread per class that I think needs refactoring. Once that refactoring is done, the corresponding comment can be deleted, or marked in some way. 😊

The Child Num Bug of IdentifierDecl

Hi,fabs or other kind person who can help
When I use joern to parse linux-3.9.1/drivers/usb/class/cdc-wdm.c.I found it gives wrong children AST node numbers for the declaration statement like

int status = urb->status;

I run this command:

echo "queryNodeIndex('type:IdentifierDecl').filter{it.code.contains(\"status\")}.out()" |joern-lookup -g

and got the following output

(n579 {childNum:"0",code:"int",functionId:301,type:"IdentifierDeclType"})
(n578 {childNum:"0",code:"status",functionId:301,type:"Identifier"})
(n573 {childNum:"2",code:"status = urb -> status",functionId:301,operator:"=",type:"AssignmentExpr"})
(n1080 {childNum:"0",code:"int",functionId:618,type:"IdentifierDeclType"})
(n1079 {childNum:"0",code:"status",functionId:618,type:"Identifier"})
(n1074 {childNum:"2",code:"status = urb -> status",functionId:618,operator:"=",type:"AssignmentExpr"})

It seems the "childNum" of the "Identifier" type node is wrongly assigned with "0".
But the "childNum" property should be "1".

Neo4j 2.x support

Are there any plans to support the 2.x series of Neo4j? I've been able to upgrade the DB produced by Joern to 2.x, and can run Cypher queries, but I realise that there would be a lot of work to move the tools from Gremlin to Cypher.

Default arguments

I am currently trying to use joern on the following c++ code:

void test(int a = 0)
{}

as soon as I leave out the default value, it seems to work.

I guess the grammar file doesn't recognize this.

Warning in nested macro conditionals and Parse error in embeded macro

The following code will not be parsed,and joern will throw two warnings and an error exception .

#if !_LIBC
/* Return non-zero if FD is open.  */
static int
is_open (int fd)
{
# if (defined _WIN32 || defined __WIN32__) && ! defined __CYGWIN__
  /* On native Windows: The initial state of unassigned standard file
     descriptors is that they are open but point to an INVALID_HANDLE_VALUE.
     There is no fcntl, and the gnulib replacement fcntl does not support
     F_GETFL.  */
  return (HANDLE) _get_osfhandle (fd) != INVALID_HANDLE_VALUE;
# else
#  ifndef F_GETFL
#   error Please port fcntl to your platform
#  endif
  return 0 <= fcntl (fd, F_GETFL);
# endif
}
#endif



static void
flush_stdout (void)
{
#if !_LIBC
  int stdout_fd;

# if GNULIB_FREOPEN_SAFER
  /* Use of gnulib's freopen-safer module normally ensures that
       fileno (stdout) == 1
     whenever stdout is open.  */
  stdout_fd = STDOUT_FILENO;
# else
  /* POSIX states that fileno (stdout) after fclose is unspecified.  But in
     practice it is not a problem, because stdout is statically allocated and
     the fd of a FILE stream is stored as a field in its allocated memory.  */
  stdout_fd = fileno (stdout);
# endif
  /* POSIX states that fflush (stdout) after fclose is unspecified; it
     is safe in glibc, but not on all other platforms.  fflush (NULL)
     is always defined, but too draconian.  */
  if (0 <= stdout_fd && is_open (stdout_fd))
#endif
    fflush (stdout);
}

static void
print_errno_message (int errnum)
{
  char const *s;

#if defined HAVE_STRERROR_R || _LIBC
  char errbuf[1024];
# if STRERROR_R_CHAR_P || _LIBC
  s = __strerror_r (errnum, errbuf, sizeof errbuf);
# else
  if (__strerror_r (errnum, errbuf, sizeof errbuf) == 0)
    s = errbuf;
  else
    s = 0;
# endif
#else
  s = strerror (errnum);
#endif

#if !_LIBC
  if (! s)
    s = _("Unknown system error");
#endif

#if _LIBC
  __fxprintf (NULL, ": %s", s);
#else
  fprintf (stderr, ": %s", s);
#endif
}

ithArgument() not working?

In the VLC tutorial, I am able to complete all the examples, except for when I attempt to list the arguments.

Per the example:

echo "getCallsTo('memcpy').ithArgument('2').code" | joern-lookup -g

I get the following error:

py2neo.execeptions.ClientError: javax.script.ScriptException: groovy.lang.MissingMethodException: No signature of method: com.tinkerpop.gremlin.groovy.GremlinGroovyPipeline.ithArgument() is applicable for argument types: (java.lang.String) values: [2]

The error is raised at line 369 of neo4j.py (version 1.6.1) in _post.

I appreciate your assistance and I look forward to hearing from you.

Cheers,

Parse error in nested conditionals

The else branch gets attached to the "if(x==0)" conditonal rather than the "if(x&FLAGS)" one in the AST.

u_char *
foo(u_char *data)
{
    u_char x;
    x = *data;

    if (x & FLAGS){
//removing either of the following conditionals will result in correct parsing
        if (x == 0){
            return NULL;
        }
        if (x > sizeof(long)){
            return NULL;
        }
        return data;
    }
//commenting out the else will result in correct parsing (keeping the return statement)
    else {
//commenting out the return statement will result in correct parsing
        return data + 1;
    }
}

Typedef problem

Hi,
I'm parsing C programs containing many, many typedefs such as typedef short * foo, though it seems as they aren't covered by joern as types. As I understand it, this isn't a trivial problem, but I'm rather unexperience with all of this and thought I'd ask if such a thing would be possible to add to Joern? It would be very helpful if I could query for such typedef'd types!

Feature Request: Function Call Graphs

Support for the extraction of Function Call Graphs (FCG) might be an interesting addition to Joern. A FCG is basically a directed (multi-)graph showing the relation between function callers and callees.

Bonus features:

  • (Visually) indicate functions from the same class or source file. This might help spotting interesting call patterns across different classes or files.
  • (Visually) indicate functions that are not part of the code base. This can help to identify entry points to external libraries and APIs.

Moving to LGPL

I have split off a program analysis library from joern called jpanlib, and I would like to make that library available under the LGPL, so that people writing code that use the library can use any license they want.

To be on the safe side, I'd like to ask everyone who has contributed a fix to joern at some point whether they are OK with this. I've gone through the logs, this should be: @ideadapt @l4l and @yangke

Please let me know if this is OK with you guys.

Assignment expressions for declarations don't have left and right sub-nodes

AssignmentExpression nodes usually have a left and right hand value [e.g. accessed by assignment.groovy helper methods lval() and rval()]; however, while hitting some puzzling null errors with a query I noticed that IdentifierDeclStatements with an assignment, e.g. 'int foo = bar;', have AssignmentExpression sub-nodes without left and right sub-nodes. This behavior seems like it's unintentional, though it's possible it's not desirable to duplicate AST Identifier nodes between the IdentifierDeclStatement and the AssignmentExpr? Either way, this should be fixed in the graph generation or the lval() and rval() definitions.

Does not appear to work with Neo4j-2.1.6

I'm attempting to containerize joern, as I'd like to make it super easy to install and use out of the box.

When attempting to follow the tutorial in my container, built with the Dockerfile below I keep running into an error indicating that the database created by joren is corrupted. This error appears in neo4j's console.log when attempting to start the neo4j-server with the org.neo4j.server.database.location set to my own .jorenIndex directory.

2015-01-14 21:58:52.962+0000 INFO  [API] Setting startup timeout to: 120000ms based on -1
Detected incorrectly shut down database, performing recovery..

Steps to reproduce

Build the joern container
1.) run docker build --tag=joern-0.3.1 . from a directory with the attached Dockerfile

This uses neo4j's PPA to install the server from packages the current version is 2.1.6 (as it is hard to find exact packages / downloads for 2.1.5)

2.) run the container

docker run -it --privileged --name joern joern-0.3.1 /bin/bash

3.) inside the contianer from the root prompt up your file via ulimit -n 65535

This is to work around an dockers settings and also why we need to use --privileged.

4.) cd to the $JOERN dirctory (/opt/joern-0.3.1) and
4a.) create subdir named tutorial
4b.) download vlc-2.1.4 to that directory
4c.) cd to /opt/joren-0.3.1 and run ./joren tutorial/vlc-2.1.4 to create the .jorenIndex in /opt/joern-0.3.1

5.) Edit /etc/neo4j/neo4j-server.properties to point to /opt/joern-0.3.1/.jorenIndex/

namely make these changes

#location of the database directory
#org.neo4j.server.database.location=data/graph.db
org.neo4j.server.database.location=/opt/joern-0.3.1/.joernIndex/

6.) Attempt to start neo4j via services command service neo4j-service start

This will fail like the following:

root@6ae06dfe9067:/opt/joern-0.3.1# service neo4j-service start
Using additional JVM arguments:  -server -XX:+DisableExplicitGC -Dorg.neo4j.server.properties=conf/neo4j-server.properties -Djava.util.logging.config.file=conf/logging.properties -Dlog4j.configuration=file:conf/log4j.properties -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:-OmitStackTraceInFastThrow
Starting Neo4j Server...WARNING: not changing user
process [692]... waiting for server to be ready.. Failed to start within 120 seconds.
Neo4j Server may have failed to start, please check the logs.

7.) check /var/log/neo4j/console.log to see the aforementioned error message.

Since I can't attach the Dockerfile in a ticket below are its contents.

FROM ubuntu:latest
MAINTAINER Pete Markowsky 
RUN apt-get update && apt-get install -y wget unzip git && apt-get clean
#Add the official neo4j PPA
RUN wget -O - http://debian.neo4j.org/neotechnology.gpg.key | apt-key add -
RUN echo 'deb http://debian.neo4j.org/repo stable/' > /etc/apt/sources.list.d/neo4j.list 
# install packages
RUN apt-get update && apt-get upgrade -y && apt-get install -y openjdk-7-jdk python-dev python-setuptools python-pip graphviz libgraphviz-dev neo4j ant python-pygraphviz python-pip && apt-get clean
# install joern 
RUN cd /opt/ && wget https://github.com/fabsx00/joern/archive/0.3.1.tar.gz && tar xfzv 0.3.1.tar.gz
RUN cd /opt/joern-0.3.1 && wget http://mlsec.org/joern/lib/lib.tar.gz && tar xfzv lib.tar.gz
RUN cd /opt/joern-0.3.1 && ant
# Install gremlin plugin
RUN cd /tmp && wget -O neo4j-gremlin.zip http://mlsec.org/joern/lib/neo4j-gremlin-plugin-2.1-SNAPSHOT-server-plugin.zip && unzip neo4j-gremlin.zip -d /usr/share/neo4j/plugins/gremlin-plugin
#install joern tools
RUN cd /tmp/ && git clone https://github.com/fabsx00/joern-tools && cd joern-tools && python2 setup.py install

[phpsupport] Some problems in CFG creation

While testing, I encountered some problems with the CFG generation for PHP code.

Some statements do not appear to be handled at all:

  • A switch statement generates an [ERROR] CFG node. Any code within the switch statement is ignored.
  • A foreach statement generates no CFG node at all. Any code within a foreach statement is ignored.
  • yield and yield from statements are treated as normal statements with no special control flow, though they actually behave as a mixture of normal statements and return statements.
  • A conditional statement (expr1 ? expr2 : expr3) is treated as a normal statement with no special control flow, though it should generate three CFG nodes, with true and false edges going from expr1 to expr2 and expr3, respectively.
  • A coalesce statement (expr1 ?? expr2) is treated as a normal statement with no special control flow.

Some statements appear to be only partly handled, or handled incorrectly:

  • In a try statement, code within the try block appears to be ignored (no except edges to the [EXCEPTION] node). In exchange, it is the code within the finally block that has (incorrectly, I think) except edges back to the [EXCEPTION] node.
  • In break and continue statements, the numerical argument is ignored. That is, break; or continue; (i.e., without numerical argument) are handled correctly, but break 2; or continue 2; (or any other numerical argument) are incorrectly handled as if they had no numerical argument.

Concerning switch, foreach and the coalesce operator, I can see that CFG generation is not clearly defined: The predicates are implicit, one cannot simply generate true or false edges. Rather, one probably has to transform the AST to yield the needed predicates, which is non-trivial.

Concerning yield and yield from statements, it is even less clear. One could treat them as return statements, but then the code after those statements incorrectly becomes "dead". Generally, I do not know whether or how one can cope with generators as far as CFG generation is concerned.

However, for try statements, break and continue statements as well as for conditionals, there appears to be no "a priori" problem concerning CFG generation. In these cases, the CFG creation implementation simply seems to be incomplete or partly buggy. In particular, note that the numerical argument to break and continue statements in the AST is given as an IntegerExpression child of the PHPBreakStatement, respectively PHPContinueStatement AST nodes. That is, the numerical argument is always statically known (it cannot be a variable, for example.)

Could you comment / have a look? 😃 Thanks!

Question on Queries

Hi Fabs and team,
Great project. Thank you!

In an attempt to reproduce your research presented at 31C3, executing the buffer overflow queries on the Linux mainline produces no results. The problem seems to be the sanitizers.

Specifically, this query produces no results:
query = """
getFunctionASTsByName('_write')
.getArguments('(copy_from_user OR memcpy)', '2')
.sideEffect{ paramName = 'c(ou)?nt'; }
.filter{ it.code.matches(paramName) }
.unsanitized(
{ it.().or(
().isCheck('.' + paramName + '.
'),
().codeContains('.alloc.' + paramName + '.'),
_().codeContains('.min.')
)} )
.param( '._c(ou)?nt.*' )
.locations() """

This query produces results:
query = """
getFunctionASTsByName('_write')
getArguments('copy_from_user OR memcpy', '2')
.sideEffect{ paramName = 'c(ou)?nt'; }
.filter{ it.code.matches(paramName) }
.locations()
"""

The 'no results' is consistent across other examples when adding the sanitizers. Is there a problem with the query or do I need to do something special to enable this functionality?

Thanks in advance.

Signature handling

The following signature is not recognized:
static void GetFilenames( libvlc_int_t *p_vlc, unsigned n, const char *const args[] )

I figure the 'const char *const args[]' is not captured by the grammar.'

Warning in nested macro conditionals and Parse error in embeded macro

The following code will not be parsed,and joern will throw two warnings and an error exception .

#if !_LIBC
/* Return non-zero if FD is open. */
static int
is_open (int fd)
{

if (defined _WIN32 || defined WIN32) && ! defined CYGWIN

/* On native Windows: The initial state of unassigned standard file
descriptors is that they are open but point to an INVALID_HANDLE_VALUE.
There is no fcntl, and the gnulib replacement fcntl does not support
F_GETFL. */
return (HANDLE) _get_osfhandle (fd) != INVALID_HANDLE_VALUE;

else

ifndef F_GETFL

error Please port fcntl to your platform

endif

return 0 <= fcntl (fd, F_GETFL);

endif

}
#endif

static void
flush_stdout (void)
{
#if !_LIBC
int stdout_fd;

if GNULIB_FREOPEN_SAFER

/* Use of gnulib's freopen-safer module normally ensures that
fileno (stdout) == 1
whenever stdout is open. _/
stdout_fd = STDOUT_FILENO;

else

/_ POSIX states that fileno (stdout) after fclose is unspecified. But in
practice it is not a problem, because stdout is statically allocated and
the fd of a FILE stream is stored as a field in its allocated memory. _/
stdout_fd = fileno (stdout);

endif

/_ POSIX states that fflush (stdout) after fclose is unspecified; it
is safe in glibc, but not on all other platforms. fflush (NULL)
is always defined, but too draconian. */
if (0 <= stdout_fd && is_open (stdout_fd))
#endif
fflush (stdout);
}

static void
print_errno_message (int errnum)
{
char const *s;

#if defined HAVE_STRERROR_R || _LIBC
char errbuf[1024];

if STRERROR_R_CHAR_P || _LIBC

s = __strerror_r (errnum, errbuf, sizeof errbuf);

else

if (__strerror_r (errnum, errbuf, sizeof errbuf) == 0)
s = errbuf;
else
s = 0;

endif

#else
s = strerror (errnum);
#endif

#if !_LIBC
if (! s)
s = _("Unknown system error");
#endif

#if _LIBC
__fxprintf (NULL, ": %s", s);
#else
fprintf (stderr, ": %s", s);
#endif
}

Incomplete def use edges

Seems like expressions missing assignment operator are missing defs and uses.
Example:
Statement buf++ both uses and defines buf.
joern misses these.

Defs not completely populated

for statements , where there is no explicit assignment, defs are not properly populated.
For ex:
num_bytes++;
The above statement both uses and defs num_bytes. But it has missing use and def edges.

[phpsupport] Null pointer exception

Calling the ./rebuildTestDB script on the current phpsupport branch results in Null pointer exceptions as nodes as CFG edges are added by the neo4j batch importer for destination nodes which seem to not exist in the GraphNodeStore.

@einheitlix : could you check this out?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.