Code Monkey home page Code Monkey logo

Comments (11)

thanh-lam avatar thanh-lam commented on August 17, 2024

A change made in the query to replace data.begin_time and data.history.end_time with @timestamp which is a field with date type proved the time range search worked on a date field but not a text field.

Query body:  {'query': {'bool': {'must': [{'match': {'data.user_name': 'tlam'}}], 'should': [{'range': {'@timestamp': {'format': 'epoch_millis', 'lte': '1610487082000'}}}, {'range': {'@timestamp': {'format': 'epoch_millis', 'gte': '1610427600000'}}}, {'bool': {'must_not': {'exists': {'field': 'data.history.end_time'}}}}], 'minimum_should_match': 2}}, '_source': ['data.primary_job_id', 'data.secondary_job_id', 'data.allocation_id', 'data.user_name', 'data.begin_time', 'data.history.end_time', 'data.state'], 'size': 1000}

However, @timestamp is not exactly data.begin_time and data.history.end_time.

A possible workaround for the problem is to create a new index that has the same fields as cast-allocation only with the two fields data.begin_time and data.history.end_time having type date. Then, an elasticsearch reindex can be run to copy the data from cast-allocation to the new index.

But, the new index has a different name. Therefore, the current cast-allocation index may have to be deleted and recreated anew. Then, data can be copied from the new index to the new cast-allocation index. That will take some work but possible.

Never the less, the reindex process also hit a problem with copying the text string to the date type.

POST _reindex
{
  "source": {
    "index": "cast-allocation"
  },
  "dest": {
    "index": "cast-allocation-test"
  }
}

That caused Exceptions:

{
  "took": 273,
  "timed_out": false,
  "total": 252,
  "updated": 0,
  "created": 0,
  "deleted": 0,
  "batches": 1,
  "version_conflicts": 0,
  "noops": 0,
  "retries": {
    "bulk": 0,
    "search": 0
  },
  "throttled_millis": 0,
  "requests_per_second": -1,
  "throttled_until_millis": 0,
  "failures": [
    {
      "index": "cast-allocation-test",
      "type": "_doc",
      "id": "193",
      "cause": {
        "type": "mapper_parsing_exception",
        "reason": "failed to parse field [data.begin_time] of type [date] in document with id '193'. Preview of field's value: '2020-11-02 13:04:22.413672'",
        "caused_by": {
          "type": "illegal_argument_exception",
          "reason": "failed to parse date field [2020-11-02 13:04:22.413672] with format [YYYY-MM-dd HH:mm:ss.000000]",
          "caused_by": {
            "type": "date_time_parse_exception",
            "reason": "Text '2020-11-02 13:04:22.413672' could not be parsed at index 20"
          }
        }
      },
      "status": 400
    },
...

Pending for further investigation......

from cast.

thanh-lam avatar thanh-lam commented on August 17, 2024

For debugging purposes, we deleted the cast-allocation index in elasticsearch. Then, a new cast-allocation was created with the two fields (begin_time, data.end_time) now have type "date".

With this new index, the incoming data also needs to have the correct date format. The data sources for cast-allocation are from CSM transaction log, which can be manually edited to get the two fields in the correct format. For example:

{"type":"allocation","traceid":531084125,"uid":1,"timestamp":"2020-09-17T16:06:01Z","data":{"allocation_id":1,"begin_time":"2020-09-17 12:06:01.183794","primary_job_id":526,"secondary_job_id":0,"ssd_file_system_name":"","launch_node_name":"f5n06","user_flags":"","system_flags":"","ssd_min":0,"ssd_max":0,"num_nodes":1,"num_processors":0,"num_gpus":0,"projected_memory":0,"state":"running","type":"jsm","job_type":"batch","user_name":"root","user_id":0,"user_group_id":0,"user_script":"1 -q normal sleep 3333333333","account":"","comment":"","job_name":"","job_submit_time":"2020-09-17 12:06:57","queue":"normal","requeue":"","time_limit":2147483647,"wc_key":"","isolated_cores":0,"compute_nodes":["f10n16"]} }
{"type":"allocation","traceid":231640295,"uid":2,"timestamp":"2020-09-17T16:06:02Z","data":{"allocation_id":2,"begin_time":"2020-09-17 12:06:02.483204","primary_job_id":527,"secondary_job_id":0,"ssd_file_system_name":"","launch_node_name":"f5n06","user_flags":"","system_flags":"","ssd_min":0,"ssd_max":0,"num_nodes":1,"num_processors":0,"num_gpus":0,"projected_memory":0,"state":"running","type":"jsm","job_type":"batch","user_name":"root","user_id":0,"user_group_id":0,"user_script":"1 -q normal sleep 3333333333","account":"","comment":"","job_name":"","job_submit_time":"2020-09-17 12:06:59","queue":"normal","requeue":"","time_limit":2147483647,"wc_key":"","isolated_cores":0,"compute_nodes":["f10n15"]} }
{"type":"allocation","traceid":231640295,"uid":2,"timestamp":"2020-09-17T16:06:55Z","data":{"running-start-timestamp":"2020-09-17 12:06:55.93243"}}

A T letter has to be added to replace the space between date and time:

"begin_time":"2020-09-17 12:06:01.183794" ==> "begin_time":"2020-09-17T12:06:01.183794"

As a result, following simple script shows that the time range query works:

=================================
$ cat timezone-query.sh
curl -X GET "XX.X.X.XX:9200/cast-allocation/_search?pretty" -H 'Content-Type: application/json' -d'
{
   "query" : {
      "range" : {
         "data.begin_time" : {
         "time_zone": "-05:00",
         "gte" : "2021-02-05T14:11:28.578Z",
         "lte" : "now",
         "relation": "within"
         }
      }
   }
}
'
======================================

from cast.

thanh-lam avatar thanh-lam commented on August 17, 2024

Another script that mimics how the python scripts query for "--starttime" and "--endtime" also works.

$ cat startend-epochmil-query.sh
curl -X GET "XX.X.X.XX:9200/cast-allocation/_search?pretty" -H 'Content-Type: application/json' -d'
{"query":
   {"bool":
      {"must":
         [{"match":
             {"data.user_name":"wcmorris"}
         }],
       "should":
         [{"range":
             {"data.begin_time":
                 {"format":"epoch_millis",
                  "lte":1612562710000
                 }
             }
          },
          {"range":
             {"data.history.end_time":
                 {"format":"epoch_millis",
                  "gte":1612450800000
                 }
             }
          },
          {"bool":
             {"must_not":
                 {"exists":
                    {"field":"data.history.end_time"}
                 }
             }
          }],
       "minimum_should_match":2
      }
   },
   "_source":
      ["data.primary_job_id",
       "data.secondary_job_id",
       "data.allocation_id",
       "data.user_name",
       "data.begin_time",
       "data.history.end_time",
       "data.state"
      ],
   "size":1000
}
'

from cast.

thanh-lam avatar thanh-lam commented on August 17, 2024

A note on how the python scripts query for "--starttime" and "--endtime" as following example:

$ ./findUserJobs.py -u <user> --starttime A --endtime B

A and B need to have the correct date and time format like the one in transaction log. That turns out to be a elasticsearch date format: "strict_date_optional_time_nanos". A and B form the two edges of the window where all allocations within it will be returned.

                  starttime               endtime
begin_time:           |<------------------------B
end_time:              A------------------------>|

The recommendation is: always input A and B. Leaving B out will result in following query:

                  starttime                endtime
begin_time:           |<------------------------"now"
end_time:              A------------------------>|

In other word, the query will return all allocations from time A to "now". It's important to note:
** There's no left edge limit for begin_time nor right edge limit for end_time. Following two conditions are combined with the "AND" operator.
*** Any allocation that has end_time >= A will be returned regardless of its begin_time.
*** Any allocation that has begin_time <= B will be returned regardless of its end_time.

from cast.

thanh-lam avatar thanh-lam commented on August 17, 2024

We need to find out where or how the allocation records are written into transaction log so that the data for "begin_time" and "history.end_time" can be changed to the correct date format as discussed earlier.

CSM transaction log is the data source for filebeat to retrieve the allocation records that are shipped to elasticsearch indexes.

from cast.

thanh-lam avatar thanh-lam commented on August 17, 2024

Bill and I had done some sources digging and it seems the following two programs write allocation records to transaction log:

  • CSMIAllocationCreate.cc
  • logging.cc
    As in following example for begin_time:
           if ( allocation->begin_time ) free(allocation->begin_time);
            allocation->begin_time    = strdup( tuples[0]->data[1] );

We just have to replace the ' ' with 'T' in allocation->begin_time.

Another observation: timestamp is also part of the allocation record and it already has the correct date format (with the T). Why allocation->begin_time was not that way? The difference in current data type mapping is that timestamp has type date but begin_time has type string.

from cast.

thanh-lam avatar thanh-lam commented on August 17, 2024

I found a way to load a new index mapping for cast-allocation on-the-fly. With the manually modified date format for begin_time and end_time, the python scripts seem to work at least with the date. For example, following command returned two allocation records in Jan:

[root@c685f4n07 python]# ./findUserJobs.py -u tlam --starttime "2021-01-06" --endtime "2021-01-12"
     State |   AID | P Job ID | S Job ID | Begin Time                 | End Time                  
  complete |   251 |     1038 | 0        | 2021-01-06T14:36:49.244459 | 2021-01-06T14:37:54.391456
  complete |   252 |     1039 | 0        | 2021-01-11T11:27:11.463397 | 2021-01-11T11:27:31.771474

Increasing the starttime did exclude the first record and return only the second record:

[root@c685f4n07 python]# ./findUserJobs.py -u tlam --starttime "2021-01-11" --endtime "2021-01-12"
     State |   AID | P Job ID | S Job ID | Begin Time                 | End Time                  
  complete |   252 |     1039 | 0        | 2021-01-11T11:27:11.463397 | 2021-01-11T11:27:31.771474

from cast.

thanh-lam avatar thanh-lam commented on August 17, 2024

Searching for the CSM code that writes allocation records to the transaction log is a daunting task that requires understanding the design and structure of CSM sources.

It also took quite some searching, reading, understanding, try-outs and errors on elasticsearch online docs. But I finally made a break through with a fix by adding a mutate filter in logstash config file:

/etc/logstash/conf.d/logstash.conf

The mutate process should be added into the filter section: (Lines marked with ">>>")

else if "transaction" in [tags] {
    mutate { remove_field => [ "beat", "host", "source", "offset", "prospector"] }

### Use logstash gsub filter plugin to process input data before sending to elastic index.
### For the two fields: data.begin_time and data.history.end_time, the blank is replaced
### with a letter T. Note: these fields have type "date" that requires the valid format.
### Also note: No change to csm_transaction.log with this process.

>>>    mutate {
>>>        gsub => [
>>>           "[data][begin_time]", " ", "T",
>>>           "[data][history][end_time]", " ", "T"
>>>        ]
>>>    }

}
else if "ras" in [tags] and "csm" in [tags] {
    mutate{
        gsub => [ "time_stamp", ".(\d{3})\d{3}", ".\1" ]
        add_field => { "type" => "log-ras"  }
    }

    date {
        match => ["time_stamp", "ISO8601","YYYY-MM-dd HH:mm:ss.SSS" ]
        target => "time_stamp"
    }
}

This is definitely a better solution than having CSM write the new date format into transaction log. It doesn't require changes in CSM source or log files.

from cast.

thanh-lam avatar thanh-lam commented on August 17, 2024

For verification, after inserting the new mutate filter into logstash.conf and restarted logstash, try create an allocation. Then check transaction.log.0 for the new allocation record for begin_time and history.end_time date format:

{"type":"allocation","traceid":151499658,"uid":272,"timestamp":"2021-03-02T21:35:58Z","data":{"allocation_id":272,"begin_time":"2021-03-02 16:35:58.805708","primary_job_id":1361,"secondary_job_id":0,"ssd_file_system_name":"","launch_node_name":"f5n06","user_flags":"","system_flags":"","ssd_min":0,"ssd_max":0,"num_nodes":2,"num_processors":0,"num_gpus":0,"projected_memory":0,"state":"running","type":"jsm-cgroup-step","job_type":"batch","user_name":"tlam","user_id":11745,"user_group_id":100,"user_script":"user script not known by LSF","account":"","comment":"","job_name":"","job_submit_time":"2021-03-02 16:34:47","queue":"normal","requeue":"","time_limit":2147483647,"wc_key":"","isolated_cores":2,"compute_nodes":["f10n16","f10n15"]} }
{"type":"allocation","traceid":151499658,"uid":272,"timestamp":"2021-03-02T21:35:59Z","data":{"running-start-timestamp":"2021-03-02 16:35:58.990888"}}
{"type":"allocation","traceid":841396274,"uid":272,"timestamp":"2021-03-02T21:36:18Z","data":{"state":"complete","history":{"end_time":"2021-03-02 16:36:18.830969"},"running-end-timestamp":"2021-03-02 16:36:18.830969"}}

Notice that the date+time format remains the same, with a blank between date and time. However, a python script that searches for user jobs will display the date+time format with the letter T between date and time. That means the fix has written the correct format into Elasticsearch index cast-allocation.

     State |   AID | P Job ID | S Job ID | Begin Time                 | End Time                  
  complete |   267 |     1356 | 0        | 2021-03-02T16:30:46.674554 | 2021-03-02T16:31:06.911078
  complete |   268 |     1357 | 0        | 2021-03-02T16:34:39.823075 | 2021-03-02T16:35:00.121053
  complete |   269 |     1358 | 0        | 2021-03-02T16:35:01.72255  | 2021-03-02T16:35:18.721014
  complete |   270 |     1359 | 0        | 2021-03-02T16:35:20.41316  | 2021-03-02T16:35:40.431049
  complete |   271 |     1360 | 0        | 2021-03-02T16:35:41.092173 | 2021-03-02T16:35:58.160964
  complete |   272 |     1361 | 0        | 2021-03-02T16:35:58.805708 | 2021-03-02T16:36:18.830969

Before the fix, those date+time would have had a blank in place of T.

from cast.

thanh-lam avatar thanh-lam commented on August 17, 2024

As discovered in a comment earlier:

A and B need to have the correct date and time format like the one in transaction log. That turns out to be a elasticsearch date format: "strict_date_optional_time_nanos".

The date+time format in the csm_transaction.log.0 is in nanos unit. That's one of the reasons why time range queries with epoch_millis time don't work. The script findUserJobs.py is one example.

Found the time format defined in cast_helper.py. Changing that to nanos time gets findUserJobs.py work with time range queries.

[root@c650f01p22 python]# grep TIME_SEARCH *.py
#cast_helper.py:#TIME_SEARCH_FORMAT = 'yyyy-MM-dd HH:mm:ss'
#cast_helper.py:#TIME_SEARCH_FORMAT = "epoch_millis"
cast_helper.py:TIME_SEARCH_FORMAT = "strict_date_optional_time_nanos"

This change eliminates calls of convert_time_range() that converts time to epoch_millis. These calls are in build_time_range() and build_timestamp_range().

[root@c650fXXpXX python]# grep convert_time *.py
cast_helper.py:def convert_timestamp( timestamp ):
cast_helper.py:    #start_time = convert_timestamp(start_time)
cast_helper.py:    #end_time = convert_timestamp(end_time)
cast_helper.py:    #start_time = convert_timestamp(start_time)
cast_helper.py:    #end_time   = convert_timestamp(end_time)

Before the change:

[root@c650fXXpXX python]# ./findUserJobs.py -u root --starttime 2021-03-01
     State |   AID | P Job ID | S Job ID | Begin Time                 | End Time                  
  complete |     1 |        1 | 0        | 2021-03-11T10:01:20.626411 | 2021-03-11T10:01:25.266256
  complete |     2 |        1 | 0        | 2021-03-11T10:01:26.803619 | 2021-03-11T10:01:29.501622
  complete |     3 |        1 | 0        | 2021-03-11T10:02:11.384713 | 2021-03-11T10:02:12.296012

[root@c650fXXpXX python]# ./findUserJobs.py -u root --starttime 2021-03-11T10:02:11.000000 --endtime 2021-03-11T10:02:12.999999
     State |   AID | P Job ID | S Job ID | Begin Time                 | End Time 

After the change to use nanos:

[root@c650fXXpXX python]# ./findUserJobs.py -u root --starttime 2021-03-11T10:02:11.000000 --endtime 2021-03-11T10:02:12.999999
     State |   AID | P Job ID | S Job ID | Begin Time                 | End Time                  
  complete |     3 |        1 | 0        | 2021-03-11T10:02:11.384713 | 2021-03-11T10:02:12.296012

from cast.

thanh-lam avatar thanh-lam commented on August 17, 2024

@williammorrison2 found another script that has problem with time format "epoch_millis | nanos".

[root@c650fXXpXX python]# ./findJobsRunning.py -T 2021-03-01
A RequestError was recieved:
  type: search_phase_execution_exception
  reason: all shards failed
  root_0_type: parse_exception
  root_0_reason: failed to parse date field [1614574800] with format [strict_date_optional_time_nanos]: [failed to parse date field [1614574800] with format [strict_date_optional_time_nanos]]

But, it is defined in a different way, not from convert_time_range().

date_print_format='%s'

That should be changed to the following format for compatible with nanos.

    # Parse the user's date.
    date_format='(\d{4})-(\d{1,2})-(\d{1,2})[ \.T]*(\d{0,2}):{0,1}(\d{0,2}):{0,1}(\d{0,2})'
    date_print_format='%Y-%m-%dT%H:%M:%S'
    #date_print_format='%s'

from cast.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.