Let me preface this with the fact that the last time I did any coding in C/CPP (and that was only little, too) was >10 years ago, so bear with me.
I've been trying to compile on a old Mac Pro with an ATI gfx card and have been running in to some issues.
I'd like to submit a pull request, but I haven't solved all of them yet, and my 'fixes' so far do not involve any checking.
- In Yosemite (10.10.4) with XCode 6 OpenMP is not supported out of the box see this question on SO, so I installed gcc (importantly, perhaps: gcc-5) through homebrew.
- Yosemite 10.10/XCode 6 ships with OpenCL 1.2
- Some stuff down the road doesn't work fully (I'll get to that in a minute)
1 Fixing issues with OpenCL
In root CMakeLists.txt
:
set(OPENCL_LIBRARIES, " -lOpenCL")
will not include the OpenCL framework properly, change to:
if(APPLE)
set(OPENCL_LIBRARIES " -framework OpenCL")
else(APPLE)
set(OPENCL_LIBRARIES " -lOpenCL")
endif(APPLE)
1.1 OpenCL 1.1 vs 1.2 properties
in lib/mason/opencl/OclHost.cpp
the use of then OpenCL property CL_DEVICE_PARTITION_EQUALLY_EXT
only refers to OpenCL 1.1, CL_DEVICE_PARTITION_EQUALLY
needs to be used for 1.2. Incidentally, this is used on line 102, but not 206. See here for a quick comparison. Similarly CL_PROPERTIES_LIST_END_EXT
on line 206 should terminate the list, but strangely does not work, even though listed above as functional in OpenCL 1.2. CL_PROPERTIES_LIST_END_EXT
is defined as 0, so setting partitionPrty[2] = 0
on line 208 should fix that.
1.2 NVidia vs ATI CL properties
Again in lib/mason/opencl/OclHost.cpp
, this time in int OclHost::getThreadPerMulti()
CL_DEVICE_COMPUTE_CAPABILITY_MAJOR_NV
only gets major revision from NVIDIA cards, to achieve the same from an ATI card you need to use CL_DEVICE_GFXIP_MAJOR_AMD
and MINOR_AMD
respectively. Strangely, this is not caught in the OSX OpenCL framework, but defining it in include/CL/opencl.h
helped. I added the following lines after the #ifdef __APPLE__ ... #endif
, as suggested in this amd community post:
/* Two properties required for AMD in OpenCL 1.2 */
#define CL_DEVICE_GFXIP_MAJOR_AMD 0x404A
#define CL_DEVICE_GFXIP_MINOR_AMD 0x404B
and changed the COMPUTE_CAPABILITY_MAJOR_NV
and COMPUTE_CAPABILITY_MINOR_NV
reference to GFXIP_MAJOR_AMD
and GFXIP_MINOR_AMD
, respectively.
2 Stuff down the road
So far so good, but other errors are cropping up.
2.1 Function not defined on OSX:
src/writer/PlainFileWriter.h:46:75: error: 'fwrite_unlocked' was not declared in this scope
Turns out, fwrite_unlocked is not defined on OS X.
As per this issue, adding the following in PlainFileWriter.h
fixed it:
#ifdef __APPLE__
#define fwrite_unlocked fwrite
#define fflush_unlocked fflush
#endif
2.2 Multithreading on mac
cpu_set_t
, mask
, CPU_SET
, and pthread_setaffinity_np
are not defined on osx, it is all handled by a mac-specific thread-affinity API. As this isn't easy to fix, devs usually resort to disabling multithreading on Macs. However, Facebook has addressed some of these issues in a virtual machine they wrote (hhvm), which can be seen in the following header file:
https://github.com/facebook/hhvm/blob/master/hphp/runtime/ext/hotprofiler/ext_hotprofiler.h
Furthermore, BRL-CAD (through libbu++) has apparently also solved this problem, slightly differently:
https://github.com/kanzure/brlcad/blob/master/src/libbu/affinity.c
So I fiddled around a bit and modified src/core/unix_threads.cpp
somewhat.
on the top add:
#if defined(__APPLE__)
# include <mach/thread_policy.h>
# include <mach/mach.h>
# include <stdio.h>
# include <stdlib.h>
# include <sys/sysctl.h>
#endif
Change NGMSetThreadAffinity to:
void NGMSetThreadAffinity(NGMThread * thread, int cpu)
{
if (cpu == -1)
return;
pthread_t self = 0;
#if defined(__APPLE__)
/* Mac OS X mach thread affinity hinting. Mach implements a CPU
* affinity policy by default so this just sets up an additional
* hint on how threads can be grouped/ungrouped. Here we set all
* threads up into their own group so threads will get their own
* cpu and hopefully be kept in place by Mach from there.
*/
thread_extended_policy_data_t epolicy;
thread_affinity_policy_data_t apolicy;
// This should work
thread_t curr_thread = mach_thread_self();
kern_return_t ret;
/* discourage interrupting this thread */
epolicy.timeshare = FALSE;
ret = thread_policy_set(curr_thread, THREAD_EXTENDED_POLICY, (thread_policy_t) &epolicy, THREAD_EXTENDED_POLICY_COUNT);
if (ret != KERN_SUCCESS)
/* I don't want to bother with error handling and void won't return int, so we'll just print and exit */
// return -1;
printf("thread_policy_set(1) returned %d\n", ret);
exit(1);
/* Get number of CPUs from brlcad/src/libbu/parallel.c */
int ncpu;
size_t len;
int maxproc;
int mib[] = {CTL_HW, HW_AVAILCPU};
len = sizeof(maxproc);
if (sysctl(mib, 2, &maxproc, &len, NULL, 0) == -1) {
perror("sysctl");
} else {
ncpu = maxproc; /* should be able to get sysctl to return maxproc */
}
/* put each thread into a separate group */
apolicy.affinity_tag = cpu % ncpu;
ret = thread_policy_set(curr_thread, THREAD_EXTENDED_POLICY, (thread_policy_t) &apolicy, THREAD_EXTENDED_POLICY_COUNT);
if (ret != KERN_SUCCESS)
/* more errors */
// return -1;
printf("thread_policy_set(1) returned %d\n", ret);
exit(1);
#else
/* This works on linux */
if (thread == 0)
{
self = pthread_self();
thread = &self;
}
cpu_set_t * mask = new cpu_set_t();
CPU_SET(cpu, mask);
pthread_setaffinity_np(*thread, sizeof(cpu_set_t), mask);
#endif
}
2.3 Linking on osx
Almost done now, the mac linker (strangely not gnu ld) does not accept they keyword -Bdynamic, so I tracked it down in lib/mason/opencl/CMakeLists.txt
and changed target_link_libraries(MASonOpenCl "-Wl,-Bdynamic ${OPENCL_LIBRARIES}")
to:
if(APPLE)
target_link_libraries(MASonOpenCl "${OPENCL_LIBRARIES}") # Apple doesn't like -Bdynamic
else(APPLE)
target_link_libraries(MASonOpenCl "-Wl,-Bdynamic ${OPENCL_LIBRARIES}")
endif(APPLE)
and did the same for MASonOpenCl-debug
in the same file.
Now it looks like it's all dandy and it compiles, but when running ngm it segfaults after.
gdb offers the following unhelpful insight:
[SEQPROV] 0 reference sequences were skipped (length < 10).
[SEQPROV] Writing encoded reference to
Program received signal SIGSEGV, Segmentation fault.
0x00007fff852f3c67 in ?? () from /usr/lib/system/libsystem_c.dylib
(gdb) bt
#0 0x00007fff852f3c67 in ?? () from /usr/lib/system/libsystem_c.dylib
#1 0x00007fff5fbff140 in ?? ()
#2 0x00007fff852f6ec5 in ?? () from /usr/lib/system/libsystem_c.dylib
#3 0x0000000000000001 in ?? ()
#4 0x0000206000711c70 in ?? ()
#5 0x0000000000000004 in ?? ()
#6 0x00007fff5fbff0f0 in ?? ()
#7 0x0000000000000000 in ?? ()
It has loads of can't open to read symbols: No such file or directory.
for a variety of files from libgomp, libstdc++-v3 and libgcc -- very weird and I'm still digging in to it. I just hope the SIGSEGV didn't come from the changes I made above…