Code Monkey home page Code Monkey logo

assistant-skill-analysis's People

Contributors

eric-wayne avatar gunnarhorve-ibm avatar haodeqi avatar mingtan888 avatar navneetrao avatar pratyushsingh97 avatar rfazeli avatar tsinggggg avatar yangyuphd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

assistant-skill-analysis's Issues

DecodeError: It is required that you pass in a value for the "algorithms" argument when calling decode().

I'm receiving this error when running the notebook.
DecodeError: It is required that you pass in a value for the "algorithms" argument when calling decode().

This is in the cell after the heading

1.1 Set up access to the training data

on this line:
52 workspace = skills_util.retrieve_workspace(workspace_id=workspace_id,
---> 53 conversation=conversation)

It appears to be some type of authentication issue.
Any ideas what this might be?

Ambiguity in Training Data - Prioritization guideance

Cell 1.5 - Ambiguity in Training Data - is a very useful report.

Can we quantify or prioritize what overlap should be addressed - perhaps an "overlap score" - showing the worst overlaps or the ones most impactful to improving the training.

Ideally the tool suggests what I do next, this would be informed by:
Work on ambiguity between my most confused intents first (leveraging earlier chi-square or a k-folds analysis)
Work on the most significant overlaps, ie which ones give me the most bang for the buck.

Minor documentation improvement - section 1.2

In section 1.2 it would be good to clarify that the documentation does NOT refer to the user's bot. The diff below addresses that.

Better would be to hyperlink out to an existing bot (ex: Customer Care bot) with names of real intents that follow this rule, like #Customer_Care_Profile_Password vs #Customer_Care_Profile_Security_Questions and #Customer_Care_Contact_Us

diff --git a/skill_analysis.ipynb b/skill_analysis.ipynb
index 392476d..175cc5f 100644
--- a/skill_analysis.ipynb
+++ b/skill_analysis.ipynb
@@ -252,6 +252,8 @@
     "\n",
     "Class imbalance will not always lead to lower accuracy. All intents (classes) thus need not have the same number of examples.\n",
     "\n",
+    "Given a hypothetical chatbot related to banking:\n",
+    "\n",
     "1. For intents like `updateBankAccount` and `addNewAccountHolder` where the semantics difference between them is more subtle, the number of examples per intent needs to be somewhat balanced else the classifier might favor the intent with the higher number of examples.\n",
     "2. For intents like `greetings` that are semantically distinct from other intents like `updateBankAccount`, it may be okay for it to have fewer examples per intent and still be easy for the intent detector to classify.\n",
     "\n",

nbconvert - Module not found errors

I tried to run on Python 3.10 Kernel in CloudPak - script stopped in step 3 complaining about a missing module: nbconvert.

!pip list revealed an outdated version of the module (5), so we installed the latest per 2023-07:

!pip install --index-url https://pypi.python.org/simple  -U "nbconvert>=7.7.1"

This fixed the issues.

Keep getting Error: Resource not found, Code: 404

ApiException Traceback (most recent call last)
in
29 #).get_result()
30
---> 31 ws_json = conversation.get_workspace(workspace_id, export=True)
32 workspace = ws_json.get_result()
33

/opt/conda/envs/Python36/lib/python3.6/site-packages/ibm_watson/assistant_v1.py in get_workspace(self, workspace_id, export, include_audit, sort, **kwargs)
373 params=params,
374 accept_json=True)
--> 375 response = self.send(request)
376 return response
377

/opt/conda/envs/Python36/lib/python3.6/site-packages/ibm_cloud_sdk_core/base_service.py in send(self, request, **kwargs)
155 'invalid credentials'
156 raise ApiException(
--> 157 response.status_code, error_message, http_response=response)
158 except requests.exceptions.SSLError:
159 logging.exception(self.ERROR_MSG_DISABLE_SSL)

ApiException: Error: Resource not found, Code: 404 , X-global-transaction-id: c18842ee6e46277481ac022b07e3c10e

minor documentation clarification in 1.3 about stop words

In section 1.3 it would be good to clarify that you are referring to stop words since I believe that is your intent. The diff below addresses that.

diff --git a/skill_analysis.ipynb b/skill_analysis.ipynb
index 392476d..175cc5f 100644
--- a/skill_analysis.ipynb
+++ b/skill_analysis.ipynb
@@ -282,7 +284,7 @@
     "\n",
     "If you see terms like `hi`, `hello` correlated with a `greeting` intent that would be reasonable. But if you see terms like `table`, `chair` correlated with the `greeting` intent that would be anomalous. A scan of the most correlated unigrams & bigrams for each intent can help you spot potential anomalies within your training data.\n",
     "\n",
-    "**Note**: We ignore the following common words from consideration `an, a, in, on, be, or, of, a, and, can, is, to, the, i`"
+    "**Note**: We ignore the following common words (\"stop words\") from consideration `an, a, in, on, be, or, of, a, and, can, is, to, the, i`"
    ]
   },
   {

Correlation analysis 1.4

Classic dialog skill analysis - got up to 1.4 when running the notebook and encountered this error:
InvalidParameterError: The 'stop_words' parameter of CountVectorizer must be a str among {'english'}, an instance of 'list' or None. Got {'i', 'the', 'of', 'and', 'is', 'or', 'be', 'in', 'a', 'an', 'to', 'on', 'can'} instead.

chi-squared analysis - prioritization guideance

The chi-squared analysis is wonderful and I love how it does an "in place" analysis of training data without any input from the user.

Is there any way to improve consumability and next steps for this report?
Such as:

  • a visualization of strongest correlated terms
  • a narrowing down of the list (ie only show for worst-performing intents in some metric ie k-folds)

I tested with a workspace containing 60 intents and it gets fatiguing quickly to scan all of them. I do understand that user input is required to find terms that "do not belong" but it would be helpful to make the task a little easier.

error when running keyword_analyzer

When running on action skill following code:
keyword_analyzer.seaborn_heatmap(workspace_pd, lang_util, 30, 30, intent_list)

I get error message from keyword_analyzer:
KeyError: 'n_w'

Defect/Code Issue while running classic_dialog_skill_analysis.ipynb

While I am trying to run this notebook, I am getting error in below mentioned code ...
unigram_intent_dict, bigram_intent_dict = chi2_analyzer.get_chi2_analysis(workspace_pd, lang_util=lang_util)
I could fix this issue, by making below mentioned modification in assistant_skill_analysis/term_analysis/chi2_analyzer.py
image

Another error while trying to execute keyword_analyzer.py. This would be fixed by below mentioned code change ..

image

Need a way to disable SSL checking in calls to Skills_Util.py

I have a customer using the skills_util package, that needs to do the retrieve_workspace function with SSL checking disabled.

In the retrieve_workspace function, there should be an option to disable_SLL, and the IAM Authentication would look like this when a user wants to do so with no SSL:
authenticator = IAMAuthenticator(apikey=iam_apikey, disable_ssl_verification=True)

Bearer token support

I need to run this against CP4D, but it they only supply a bearer token.

What would need to change in the code to get this to work for dialog?

new_experience_skill_analysis notebook is failing

Fail on running new_experience_skill_analysis notebook, on this part of code:

THREAD_NUM = min(4, os.cpu_count() if os.cpu_count() else 1)

full_results = inferencer.inference(conversation,
                                    test_df,
                                    max_thread=THREAD_NUM, 
                                    assistant_id=ASSISTANT_ID,
                                    intent_to_action_mapping=intent_to_action_mapping
                                   )

Error message:

 0%|                                                                                                                                 | 0/7 [00:01<?, ?it/s]
---------------------------------------------------------------------------
TimeoutError                              Traceback (most recent call last)
Cell In[26], line 3
      1 THREAD_NUM = min(4, os.cpu_count() if os.cpu_count() else 1)
----> 3 full_results = inferencer.inference(conversation,
      4                                     test_df,
      5                                     max_thread=THREAD_NUM, 
      6                                     assistant_id=ASSISTANT_ID,
      7                                     intent_to_action_mapping=intent_to_action_mapping
      8                                    )

File ~/_dev/assistant-skill-analysis-master/assistant_skill_analysis/inferencing/inferencer.py:110, in inference(conversation, test_data, max_thread, user_id, assistant_id, skill_id, intent_to_action_mapping)
    108     result_df = pd.DataFrame(data=responses)
    109 else:
--> 110     result_df = thread_inference(
    111         conversation=conversation,
    112         test_data=test_data,
    113         max_thread=max_thread,
    114         user_id=user_id,
    115         skill_id=skill_id,
    116         assistant_id=assistant_id,
    117         intent_to_action_mapping=intent_to_action_mapping,
    118     )
    119 return result_df

File ~/_dev/assistant-skill-analysis-master/assistant_skill_analysis/inferencing/inferencer.py:182, in thread_inference(conversation, test_data, max_thread, user_id, assistant_id, skill_id, intent_to_action_mapping)
    179     futures[future] = (test_example, ground_truth)
    181 for future in tqdm(futures):
--> 182     res = future.result(timeout=1)
    183     test_example, ground_truth = futures[future]
    184     result.append(
    185         process_result(
    186             test_example,
   (...)
    191         )
    192     )

File ~/anaconda3/lib/python3.10/concurrent/futures/_base.py:460, in Future.result(self, timeout)
    458             return self.__get_result()
    459         else:
--> 460             raise TimeoutError()
    461 finally:
    462     # Break a reference cycle with the exception in self._exception
    463     self = None

TimeoutError: 

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.