jmacglashan / burlap Goto Github PK

View Code? Open in Web Editor NEW

271.0 271.0 163.0 153.1 MB

Repository for the ongoing development of the Brown-UMBC Reinforcement Learning And Planning (BURLAP) java library

License: Apache License 2.0

Java 100.00%

burlap's People

Stargazers

Watchers

Forkers

chtran jmeehan16 nakulgopalan vukosim nathaniellam dnuffer cdspace jhudgins skavanaugh jskonhovd sandizzle yl-1993 brawner lauradhamilton lpainton adhikarisandeep mongi3 neilhwang nickrobinson mlhales hellofanengineer bhilliard wufeng02 dhershko donnydcy seeprybyrun jgromeros tuxdna seederekengineer juanup akashgangil ishanichatterjee jenevans33 cjs77 codeaudit dwhit rafaelespericueta cmiller112000 ryanlinnane snakeztc bryantravissmith jromansk robert-ko danainschool ishara powchow nlscng carlos-aguayo shimmeringvoid jameskyle jmenashe debasmitdas ssharan22 zzmjohn peejaybee jart vincentla14 nikolayvoronchikhin haweave brandon-o pppw ironyeti billmccord erickerby keanenguyen sethanderson dcfrancis itswill desperado1992 melroderick shawnsquire czsugar3 khalil8500 czshaffer maple-robot-subgoaling mindis jwinder1 businessmeetsprogramming rodrigocesarb okkhoy f-leno matlab379 pengcheng-wang happythief cjmera mitchshack raymick nickolyamba tgodonou3 rajanvashisht galdreiman yogeshluthra isra312 c4git zrachel maxime-gueriau emoudahi lauradanielle willzma pinaappleg

burlap's Issues

PolicyIteration error, type mismatch?

This works:
State initialState = GraphDefinedDomain.getState(d, init);
// pi.performReachabilityFrom(initialState);
GreedyQPolicy p = pi.planFromState(initialState);
return new PolicyLearningAgent(p);

But calling .performReachabilityFrom() manually first...

State initialState = GraphDefinedDomain.getState(d, init);
pi.performReachabilityFrom(initialState);
GreedyQPolicy p = pi.planFromState(initialState);
return new PolicyLearningAgent(p);

...throws this error:

Exception in thread "main" java.lang.ClassCastException: burlap.behavior.policy.GreedyDeterministicQPolicy cannot be cast to burlap.behavior.policy.GreedyQPolicy
    at burlap.behavior.singleagent.planning.stochastic.policyiteration.PolicyIteration.planFromState(PolicyIteration.java:185)
    at project4.Analysis$PIAgentFactory.generateAgent(Analysis.java:130)
    at burlap.behavior.singleagent.auxiliary.performance.LearningAlgorithmExperimenter.runEpisodeBoundTrial(LearningAlgorithmExperimenter.java:322)
    at burlap.behavior.singleagent.auxiliary.performance.LearningAlgorithmExperimenter.startExperiment(LearningAlgorithmExperimenter.java:249)
    at project4.Analysis.main(Analysis.java:48)

Looks like it's that this if block that changes the type of this.evaluativePolicy to GreedyQPolicy doesn't get run if you use .performReachabilityFrom() manually before .planFromState().

if(this.performReachabilityFrom(initialState)){

            double delta;
            do{
                delta = this.evaluatePolicy();
                iterations++;
                this.evaluativePolicy = new GreedyQPolicy(this.getCopyOfValueFunction());
            }while(delta > this.maxPIDelta && iterations < maxPolicyIterations);

        }

Not sure if this in intentional for some reason I'm missing, just struck me as odd that the type shift occurred. Is performReachabilityFrom() only meant to be run internally?

Two files in burlap have unmappable characters

I ran into problems building from these two files. Pull request to follow shortly

[javac] /home/brawner/workspace/burlap/build.xml:23: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds
    [javac] Compiling 45 source files to /home/brawner/workspace/burlap/build
    [javac] /home/brawner/workspace/burlap/src/burlap/behavior/singleagent/planning/deterministic/informed/astar/DynamicWeightedAStar.java:34: error: unmappable character for encoding UTF8
    [javac]  * Proceedings of the Third International Joint Conference on Artificial Intelligence (IJCAI-73) 3. California, USA. pp. 11�17.
    [javac]                                                                                                                            ^
    [javac] /home/brawner/workspace/burlap/src/burlap/behavior/singleagent/planning/deterministic/informed/astar/StaticWeightedAStar.java:17: error: unmappable character for encoding UTF8
    [javac]  * 1. Pohl, Ira (1970). "First results on the effect of error in heuristic search". Machine Intelligence 5: 219�236.
    [javac]

Hashing Performance in SimpleHashableState

While doing performance profiles, I noticed that in the IISimpleHashableState-Class in the valuesEqual()-Method the isArray()-Call is used. That call has a quite bad performance and perhaps it can be replaced by something more efficient... like you can see here: http://stackoverflow.com/questions/16170548/object-isarray-is-slow-is-there-a-fast-way-to-do-that

Perhaps by replacing with instanceof Object[] like supposed on stackoverflow we get a significant performance gain in the HashableState Comparison as the isArray()-Methods are called for every variable compared... Just a suggestion! Maybe there are more places like this in the code, you should do a search for "isArray" maybe...

Burlap incompatibilities

https://classroom.udacity.com/courses/ud600/lessons/3780788560/concepts/40374085350923
At the above link it refers that in order to create the initial state of a graph node you execute this command:
GraphDefinedDomain.getState(domain, 0)

But getState does not exist as a static method at the current Burlap library.
What is the version I am seeing? How much Burlap has changed since then?
And where could I find a migration guide?
Thank you

Null action issue on allApplicableActionsForTypes in ActionUtils.java

Renaming objects in a GenericOOState changes the order in objectsByClass list

This is a small thing, but I think it goes against the abstraction paradigm. Basically, I wanted to rename all the objects of a certain class, but that method just returns a reference to the list itself. However when you call state.renameObject(oldName, newName) it performs a removeObject and addObject so it changes the order of objects when iterating through that list. Below is my old code. The fix for me was simple, just create a shallow copy of that objectsList, but I'm not sure that's how it should be intended to do this sort of thing.

Wrong code:

List<ObjectInstance> agentObjects = state.objectsOfClass(GridGame.CLASS_AGENT);
for (int i = 0; i < agentObjects.size(); i++) {
    ObjectInstance agentObject = agentObjects.get(i);
    state = state.renameObject(agentObject.name(), newAgentNames.get(i));
}

Fixed code:

List<ObjectInstance> agentObjects = new ArrayList<ObjectInstance>(state.objectsOfClass(GridGame.CLASS_AGENT));
for (int i = 0; i < agentObjects.size(); i++) {
    ObjectInstance agentObject = agentObjects.get(i);
    state = state.renameObject(agentObject.name(), newAgentNames.get(i));
}

Possible bug in "performReachabilityFrom" function in PolicyIteration.java

The number of states reachable from the given states differs when using PolicyIteration and ValueIteration.

When PolicyIteration and ValueIteration were run on a graph defined MDP of 100 states, ValueIteration output 100 but PolicyIteration output 99.

Code snippet to create the graph and run policy and value iterations (warning: code is untested, I'm working on an assignment, apologies):

    public class GraphDefinedMDP {

	private static final int NUM_STATES = 100;

	private GraphDefinedDomain graphDefinedDomainGen;
	private SADomain domain;

	private State initialState;
	private int[] goalStates = new int[NUM_STATES / 75];

	private HashableStateFactory hashingFactory;

	private Environment env;

	public GraphDefinedMDP(int initialStateNum) {
		graphDefinedDomainGen = new GraphDefinedDomain(NUM_STATES);

		// Deterministic goal states
		for (int i = 0; i < goalStates.length; i++) {
			goalStates[i] = i * goalStates.length + (goalStates.length / 2);
		}
		// Print goal states
		System.out.format("Goal States: %s\n",
				Util.intArrayToString(goalStates));

		// Set terminal states
		TerminalFunction tf = new GraphTF(goalStates);
		RewardFunction rf = new GraphRF() {
			@Override
			public double reward(int s, int a, int sprime) {
				for (int goalState : goalStates)
					if (goalState == sprime)
						return 2;
				return -1;
			}
		};
		graphDefinedDomainGen.setTf(tf);
		graphDefinedDomainGen.setRf(rf);

		// All nodes are equally reachable from every other node
		int action = 0;
		double probability = 1.0 / (NUM_STATES - 1);
		for (int srcNode = 0; srcNode < NUM_STATES; srcNode++) {
			for (int dstNode = 0; dstNode < NUM_STATES; dstNode++) {
				if (srcNode != dstNode) {
					graphDefinedDomainGen.setTransition(srcNode, action,
							dstNode, probability);
					action = (action + 1) % goalStates.length;
				}
			}
		}

		if (graphDefinedDomainGen.isValidMDPGraph()) {
			// Invalid MDP graph
			System.exit(1);
		}

		domain = graphDefinedDomainGen.generateDomain();

		initialState = new GraphStateNode(initialStateNum);
		System.out.println("initialState: " + initialState);

		hashingFactory = new SimpleHashableStateFactory();

		env = new SimulatedEnvironment(domain, initialState);
	}

	public void valueIteration() {
		Planner planner = new ValueIteration(domain, 0.99, hashingFactory,
				0.001, 200);
		Policy p = planner.planFromState(initialState);

		Episode episode = PolicyUtils.rollout(p, initialState,
				domain.getModel(), 500);
		printEpisodeStats(episode);
	}

	public void policyIteration() {
		Planner planner = new PolicyIteration(domain, 0.99, hashingFactory,
				0.001, 200, 10);
		Policy p = planner.planFromState(initialState);

		Episode episode = PolicyUtils.rollout(p, initialState,
				domain.getModel(), 500);
		printEpisodeStats(episode);
	}

	public static void main(String args[]) {
		int initialStateNum = 99;
		System.out.println("---Value iteration---");
		GraphDefinedMDP obj1 = new GraphDefinedMDP(initialStateNum);
		long startTime = System.currentTimeMillis();
		obj1.valueIteration();
		long endTime = System.currentTimeMillis();
		System.out.format("Time taken for value iteration: %d ms\n\n\n",
				(endTime - startTime));

		System.out.println("---Policy iteration---");
		GraphDefinedMDP obj2 = new GraphDefinedMDP(initialStateNum);
		startTime = System.currentTimeMillis();
		obj2.policyIteration();
		endTime = System.currentTimeMillis();
		System.out.format("Time taken for policy iteration: %d ms\n\n\n",
				(endTime - startTime));
	}

Diff of the performReachabilityFrom(State state) function: https://www.diffchecker.com/4ZsjNQx7

Not setting an Attribute value results in an index out of bounds exception during planning

This goes in the category of exceptions that would keep me from doing stupid things. Anyway, I created some ObjectInstance instances, and neglected to set a DISC attribute values. Because it's default is -1, yet I gave it a range of 0,1 there was an index out of range exception.

To prevent me (and hopefully others) from doing stupid things, would you be able to throw an AttributeUnset exception or something? The current exception is not entirely clear what I did wrong.

Cumulative probability incorrect in EMinMaxPolicy.getActionDistributionForState() ?

This section seems to incorrectly calculate the action probabilities.

Shouldn't this be

double eCont = this.epsilon / (forAgentGSAs.size() * otherAgentGSAs.size());

Q Learning Stochastic Movements

I'm currently using Burlap for a project for my ML class, I am attempting to make the movements have a probability of not 1, however I don't see anything in the source that shows how to do that. I have set the

GridWorldDomain x = new GridworldDomain(9,9);
x = setProbSucceedTransitionDynamics(.8)

however when I try something like say .1 I would expect the "directions" the Q learning algorithm to discover are abnormal to the normal person, but completely make sense when your movements are nearly implied opposite of intended.

The last bit may be confusing so here is an example that hopefully clarifies.

Suppose you are in the bottom left corner. If you want to go UP and your transition dynamics are .1, this means that going up has a higher probability of ending up either down left (both remain stationary really), or right. The logical move would be to select right, this means that if you go down you stay stationary, go left you stay stationary, and you have a higher probability of going up than right.

I was hoping to experiment with a Q learner which would discover that moving right would result in UP (or rather moving east has a better chance of up, etc).

Looking in the code I don't see any references to transition dynamics in the Q learning class, but I could just be misunderstanding what I'm seeing.

applicableInState actions

What is the design purpose of defining SingleAction.isApplicableInState?

The way I interpret it, it is to limit the allowable actions to a subset of the total set of available actions in the world, depending on the current state.

Assuming this is the case, then this is a convenient way to reduce the state space for e.g. value learning. But in exploration (e.g. using SGVisualiser), a 'super-agent' (i.e. a player controller) might not know that some actions are invalid in a given state. As currently written, a RunTimeError is thrown, which seems a bit harsh. Wouldn't it be gentler to provide of an InvalidActionHandler mechanism to gracefully recover from bad moves. The default can still throw an exception, but one could also say replace the invalid action with a universal NOOP type action and try again. How would such an approach fit into the bigger picture?

It's possible I've misinterpreted the intent of isApplicableInState or subsequent error handling, in which case, please point me in the right direction.

DiscreteValue.getStringVal throws ArgumentOutOfBoundsException

    public String getStringVal(){
        if(this.discVal == -1){
            throw new UnsetValueException();
        }
        return attribute.discValues.get(discVal);
    }

This method indexes discValues (Array) by discVal, which is only correct if the range of discrete values starts at 0.

In my case, my discrete value range is -5000 to 5000, so it tries to look up negative array indexes and throws an ArgumentOutOfBoundsException.

setDiscValuesForRange does not store the low value passed in, so maybe the best way is to grab the first discrete value and use that as the offset? Or maybe this method should just return discVal.toString()? What are your thoughts?

Continuous Domain Tutorial Problem

Hello guys, I was just trying to run the continuous time example code. At some point I know it was working, I tested it like a month ago, but I had to format my computer, then I tried again with the github version and it's not working. Here's the important information while compiling.

ContinuousDomainTutorial.java:58: error: qualified new of static class
        TerminalFunction tf = mcGen.new ClassicMCTF();
                              ^
ContinuousDomainTutorial.java:95: error: qualified new of static class
        TerminalFunction tf = mcGen.new ClassicMCTF();
                              ^
ContinuousDomainTutorial.java:145: error: cannot find symbol
        ip.actionNoise = 0.;
          ^
  symbol:   variable actionNoise
  location: variable ip of type InvertedPendulum
3 errors

getTotalValueIterations() for ValueIteration?

I see that the PolicyIteration class has a getTotalPolicyIterations() method, allowing you to see how many iterations it took to converge. The ValueIteration class, however, does not seem to have an analogous getTotalValueIterations().

Is there a good reason for that? Or am I overlooking it somewhere? It seems like this method would be necessary to compare how many iterations ValueIteration and PolicyIteration needed to converge. Or am I misunderstanding something?

StochasticGames SCPSolver does not work with Java 64-bit under Windows x64

In the burlap_examples Repository, the GridGameExample does not work with a 64-bit Java Runtime Environment under Windows x64.

You get the following error:

Exception in thread "main" java.lang.NullPointerException
at burlap.behavior.stochasticgames.solvers.MinMaxSolver.getColPlayersStrategy(MinMaxSolver.java:60)
at burlap.behavior.stochasticgames.solvers.MinMaxSolver.getRowPlayersStrategy(MinMaxSolver.java:25)
at burlap.behavior.stochasticgames.madynamicprogramming.backupOperators.CoCoQ.performBackup(CoCoQ.java:63)
at burlap.behavior.stochasticgames.madynamicprogramming.MADynamicProgramming.backupAllValueFunctions(MADynamicProgramming.java:193)
at burlap.behavior.stochasticgames.madynamicprogramming.dpplanners.MAValueIteration.runVI(MAValueIteration.java:183)
at burlap.behavior.stochasticgames.madynamicprogramming.dpplanners.MAValueIteration.planFromState(MAValueIteration.java:158)
at burlap.behavior.stochasticgames.agents.madp.MultiAgentDPPlanningAgent.action(MultiAgentDPPlanningAgent.java:102)
at burlap.mdp.stochasticgames.world.World.runStage(World.java:417)
at burlap.mdp.stochasticgames.world.World.runGame(World.java:338)
at burlap.mdp.stochasticgames.world.World.runGame(World.java:306)
at burlap.mdp.stochasticgames.world.World.runGame(World.java:296)
at edu.brown.cs.burlap.examples.GridGameExample.VICoCoTest(GridGameExample.java:80)
at edu.brown.cs.burlap.examples.GridGameExample.main(GridGameExample.java:233)

This error comes up in all tests (in similar form) except in the SingleAgentInterface-Variant, so you get this error when calling VICoCoTest(), VICorrelatedTest() and QLCoCoTest() but not in saInterface() which does not use the SCPSolver-Library as far as I know.

The real problem sits inside the SCPSolver library. It does not work on a 64-bit JRE under Windows x64 as it is explained on the website. It only would work with a 32-bit JRE under Windows x64, but as this one is limited to 2 GB RAM usage, it is not very useful anymore when working with learning algorithms.

So what needs to be done is to recompile the SCPSolver-library and package a new version with burlap that adds support for 64-bit JRE under Windows x64. The SCPSolver website also explains how to build the library.

Thank you :)

Cumulative probability incorrect in ECorrelatedQJointPolicy.getActionDistributionForState() ?

This is exactly the same as issue #98, but in the ECorrelatedQJointPolicy.getActionDistributionForState method.

double eCont = this.epsilon / (forAgentGSAs.size() + otherAgentGSAs.size());
should be:
double eCont = this.epsilon / (forAgentGSAs.size() * otherAgentGSAs.size());

Is this a bug in SarsaLam implementation?

Hello, I have been recently learning RL with Sutton's book along with implementation in BURLAP, and found an inconsistent place in SarsaLam algorithm. I am wondering if this is a bug? Thanks!

In Sutton's book, the replacing type of Eligibility trace is defined as 1 or lamba * discount * previous ET
(can't insert picture here, in sutton's online book v2, page 161)

But in SarsaLam.java, runLearningEpisode, it will always replace the ET(s,a) not matched with 0. It looks like it should not set 0 in the else branch here.

            `  if(et.sh.equals(curState)) {
                if(et.q.a.equals(action)){
                    foundCurrentQTrace = true;
                    et.eligibility = 1.; //replacing traces
                }
                else{
                    et.eligibility = 0.; //replacing traces
                }
            }`

ROS - Continous Domain

Hi there,

Is there tutorial that I can follow the create states on ROS. Because I'm not sure how can I sample my space.

'stocashtic' games

These subpackages should probably be renamed 'stochastic' to avoid confusion.

PerformancePlotter TRIAL_AVERAGES_ONLY Mode

If you run the experimentAndPlotter function from

burlap_examples/src/main/java/edu/brown/cs/burlap/tutorials/BasicBehavior.java

and change the TrialMode from MOST_RECENT_AND_AVERAGE to TRIAL_AVERAGES_ONLY (line 261)

exp.setUpPlottingConfiguration(500, 250, 2, 1000,
				TrialMode.TRIAL_AVERAGES_ONLY,
				PerformanceMetric.CUMULATIVE_STEPS_PER_EPISODE,
				PerformanceMetric.AVERAGE_EPISODE_REWARD);

The Program get stucked.
I think this is because PerformancePlotter only notifies that one trial is complete if the Trial Mode is MOST_RECENT_TRIAL_ONLY or MOST_RECENT_AND_AVERAGE. This can be found in the updateTimeSeries function of the PerformancePlotter class (line 753).
So maybe this function should look like this?

	synchronized protected void updateTimeSeries(){
		
		SwingUtilities.invokeLater(new Runnable() {
			
			public void run() {

				synchronized (WorkingPerformancePlotter.this) {
						
					synchronized (WorkingPerformancePlotter.this.trialUpdateComplete) {
							
						if(WorkingPerformancePlotter.this.trialMode.mostRecentTrialEnabled()){
								
							if(WorkingPerformancePlotter.this.needsClearing){
								WorkingPerformancePlotter.this.curAgentDatasets.clearNonAverages();
								WorkingPerformancePlotter.this.needsClearing = false;
							}
							
							if(WorkingPerformancePlotter.this.curTimeStep > WorkingPerformancePlotter.this.lastTimeStepUpdate){
								WorkingPerformancePlotter.this.updateCSRSeries();
								WorkingPerformancePlotter.this.lastTimeStepUpdate = curTimeStep;
							}
							if(WorkingPerformancePlotter.this.curEpisode > WorkingPerformancePlotter.this.lastEpisode){
								WorkingPerformancePlotter.this.updateCERSeries();
								WorkingPerformancePlotter.this.updateAERSeris();
								WorkingPerformancePlotter.this.updateMERSeris();
								WorkingPerformancePlotter.this.updateCSESeries();
								WorkingPerformancePlotter.this.updateSESeries();
								
								WorkingPerformancePlotter.this.lastEpisode = WorkingPerformancePlotter.this.curEpisode;
							}
						}
						
						WorkingPerformancePlotter.this.trialUpdateComplete.b = true;
						WorkingPerformancePlotter.this.trialUpdateComplete.notifyAll();
					}
				}
				
				
				
			}
		});
	}

NullPointerException thrown in planFromState

I have an action that calls a class method. In this class method, it should check if certain attributes are true for one class instance. If true, it should create an instance of a different class. I think if I understand the issue, another action doesn't properly set these attributes, so the planner has no way to plan. That being said, I don't think a null pointer exception is the desired notification here.

This causes the null pointer exception

public void mix(State state, Domain domain)
    {
        List<ObjectInstance> contentInstances = state.getObjectsOfTrueClass(CLASSCONTENTS);
        if (contentInstances.size() > 0)
        {
            ObjectInstance contents = contentInstances.get(0);
            if (contents.getValueForAttribute(ATTHASFLOUR).getDiscVal() == 1 && 
                contents.getValueForAttribute(ATTHASBUTTER).getDiscVal() == 1 &&  
                contents.getValueForAttribute(ATTHASCOCOA).getDiscVal() == 1)
            {
                state.addObject(new ObjectInstance(domain.getObjectClass(CLASSBATTER), CLASSBATTER+0));

                ObjectInstance batter = state.getObjectsOfTrueClass(CLASSBATTER).get(0);
                batter.setValue(ATTBAKED, 0);
            }           
        }
    }

This works fine

public void mix(State state, Domain domain)
    {
        List<ObjectInstance> contentInstances = state.getObjectsOfTrueClass(CLASSCONTENTS);
        if (contentInstances.size() > 0)
        {
            ObjectInstance contents = contentInstances.get(0);
            state.addObject(new ObjectInstance(domain.getObjectClass(CLASSBATTER), CLASSBATTER+0));

            ObjectInstance batter = state.getObjectsOfTrueClass(CLASSBATTER).get(0);
            batter.setValue(ATTBAKED, 0);           
        }
    }

Exception in thread "main" java.lang.NullPointerException
    at burlap.behavior.singleagent.planning.deterministic.DeterministicPlanner.encodePlanIntoPolicy(DeterministicPlanner.java:84)
    at burlap.behavior.singleagent.planning.deterministic.informed.BestFirst.planFromState(BestFirst.java:122)
    at SingleAgentKitchen.main(SingleAgentKitchen.java:191)

is the website source available?

for PRs on documentation/typos/etc?

Creating instances of actions, attributes and classes and attaching it to a domain don't necessarily use' the action (java warnings thrown)

This is small, but creating new class, action or attribute instances can produce warnings about them not being used, even though in different parts of code they kinda are. For example, the following code will produce warnings about ingredient, mix and bake not being used.

public Domain generateDomain() {
    Domain domain = new SADomain();
    ObjectClass ingredient = new ObjectClass(domain, "ingredient")
    Action mix = new MixAction("mix", domain);
    Action bake = new BakeAction("mix", domain);
    return domain;
}

In my opinion the format of code should be changed so java doesn't produce warnings about this.

public Domain generateDomain() {
    Domain domain = new SADomain();
    ObjectClass obj1 = new ObjectClass("objectWithNoAttributes");
    domain.addObjectClass(obj1);

    ObjectClass obj2 = new ObjectClass("objectWith1Attribute"); 
    Attribute att = new Attribute("attributeName", Attribute.AttributeType.DISC);
    obj.addAttribute(att);
    domain.addObjectClass(obj);

    domain.addAction(new DerivedAction("name"));
}

This has the benefit of suggesting to those reading the code what domain does with the created object instances. Whereas, when passing around domain in the constructors sort of hides that.

NullPointerException thrown in Policy.followAndRecordPolicy

Somehow the grounded action in the following line of code from Policy.followAndRecordPolicy is null.

GroundedAction ga = this.getAction(cur);

Again, I assume this means my code is bad and perhaps that it can't get out of state it found itself in. But also my terminal function doesn't properly evaluate that state as a terminating state or something.

Exception in thread "main" java.lang.NullPointerException
    at burlap.behavior.singleagent.Policy.followAndRecordPolicy(Policy.java:215)
    at burlap.behavior.singleagent.Policy.evaluateBehavior(Policy.java:149)
    at SingleAgentKitchen.main(SingleAgentKitchen.java:229)

Actions are not published after terminal state

Is this suppose to be like that? I thought that I need terminal state to end each episode, is it for ending the whole thing? I'm working with ros btw

for (int i = 0; i < 100; i++) {
            Episode ea = agent.runLearningEpisode(env);
            episodes.add(ea);
            System.out.println(i + ea.maxTimeStep());
            env.resetEnvironment();
        }

When i = 2, no actions are published

PolicyUtils.Rollout(Policy, State, SampleModel) will hang when policy does not reach goal state

I'm experimenting with policyIteration params to see the differences between policyIteration and valueIteration. While tuning the params in my model, I'm noticing that sometimes policyIteration will end before a policy that leads to the terminal state is found. On line 121 of PolicyUtils, there is a do/while loop condition that states !env.isInTerminalState(). If my policy doesn't lead to the terminal state, this while loop will hang.

I feel like there should be a better way to follow the policy instead of checking to see if it reaches the terminal state.

Add a getBoolValForAttribute support in ObjectInstance and Value

With this fancy new attribute type, it would be great to have a corresponding method in ObjectInstance and Value classes.

Separate DISC attributes into a discrete numeric, discrete categorical, and boolean

Another low priority feature request, as it seems unlikely that this is stopping people from accomplishing what they want. This is really just more for convenience.

From James:
"That said, I've also considered splitting up the discrete attribute into two different possible types: discrete numeric and discrete categorical. Discrete numeric would allow negatives and discrete categorical would be like it is now. Discrete numeric would have utility in that it can be unbound in values. That said, I haven't gotten around to implementing it yet because I haven't had a strong need to do so."

StatePainter's Graphics2D Context Doesn't Update

I discovered that the Graphics2D context passed to a StatePainter is broken: updates don't appear correctly on-screen. The following is a comparison of the Visualizer view and the same context converted to SVG:
Step 1:

Step 2:

Step 3:

Step 4:

I haven't investigated exactly which classes are affected, nor did I try to come up with a (minimal) reproduction case, but I can provide source code upon request. I suspect it may have something to do with AffineTransform which I use to rotate the canvas, as the rotated elements (text) are being drawn correctly.

NullPointerException when comparing equality between Tabular Belief States

if (Math.abs(otb.beliefValues.get(e.getKey) - e.getValue) > 1e-10)

If the two belief states don't have the same keyset, this can lead to a null pointer exception.

DiscretizingMaskedHashableStateFactory is missing methods for handling config

While seeking deeper in the code and trying out DiscretizingMaskedHashableStateFactory, I saw that it is missing the methods that MaskedHashableStateFactory is offering for adding both variables and objects. The config field in the Discretizing class also is not protected, so I cannot extend the class and add the missing methods. I also cannot use the constructor properly because it forces to use either variable or class keys but not both, which is possible to walk around with the add-Methods in the MaskedHashableStateFactory.

Perhaps the DiscretizingMaskedHashableStateFactory is extending the wrong class? Maybe it shoud NOT extend SimpleHashableStateFactory but MaskedHashableStateFactory (which itself extends SimpleHashableStateFactory)? Then it also should use the config field from the superclass (for example by overriding it via a superconstructor which is protected and can set the config by subclasses) and not introduce an own config field. Just an idea, it's also possible in other ways to fix this issue :)

Thanks!

(PS: I could help myself for now with getting the config field via reflection from the DiscretizingMaskedHashableStateFactory and use its add-Methods then, but that's not how it should be...)

Agent can try to move out of bounds, throwing an ArrayIndexOutOfBoundsException

burlap/src/main/java/burlap/domain/singleagent/blockdude/BlockDudeModel.java

Line 138 in cc3b09d

int nx = ax+dir;

Because dir can be +1 and ax can be the map.length - 1, the agent can try to move outside the bounds of the map array.

To reproduce, run this test with getLevel1 or getLevel2 instead of getLevel3:

burlap/src/test/java/burlap/testing/TestBlockDude.java

Line 40 in cc3b09d

return BlockDudeLevelConstructor.getLevel3(domain);

jmacglashan / burlap Goto Github PK

burlap's People

Stargazers

Watchers

Forkers

burlap's Issues

Recommend Projects

Recommend Topics

Recommend Org