Describe the bug Similar to <a href="https://github.com/twitter/c

Note Scoring fails due to changes in expected file format,about twitter/communitynotes

Comments (4)

chriskd commented on August 16, 2024 1

I'll note that adding this line to drop the columns allowed the execution to finish successfully:

noteScores = noteScores.drop(columns=['raterParticipantId_interval', 'raterParticipantId_same'])

from communitynotes.

jbaxter commented on August 16, 2024

I cannot reproduce this! Works for me. I do see that you are using old data files, so make sure you are using the newest code and data files and see if that resolves this.

from communitynotes.

chriskd commented on August 16, 2024

Strange - I'm willing to accept that there's a quirk on my end, but I just tried again after a fresh git pull and the most recent data files, but I'm still getting the error. I'll note that I had to patch a few numpy calls to get things working, but I don't believe they'd be responsible for the error.

Output from latest run:

$ python main.py -o ./
Timestamp of latest rating in data:  2023-06-05 00:32:21.977000
Timestamp of latest note in data:  2023-06-05 00:31:06.955000
total notes added to noteStatusHistory: 0
Preprocess Data: Filter misleading notes, starting with 4508470 ratings on 102944 notes
  Keeping 3271885 ratings on 70680 misleading notes
  Keeping 234213 ratings on 8746 deleted notes that were previously scored (in note status history)
  Removing 69201 ratings on 3420 older notes that aren't deleted, but are not-misleading.
  Removing 12096 ratings on 1774 notes that were deleted and not in note status history (e.g. old).
Num Ratings: 4427173, Num Unique Notes Rated: 97750, Num Unique Raters: 116038
Identifying core notes and ratings
  Total ratings: 4427173
  Ratings from user without modelingPopulation: 0
  Total notes: 117171
  Total notes with ratings: 97750
  Total core notes: 111103
  Total expansion notes: 6068
  Core ratings: 4029233
Filter notes and ratings with too few ratings
  After Filtering Notes w/less than 5 Ratings, Num Ratings: 3986125, Num Unique Notes Rated: 73863, Num Unique Raters: 80944
  After Filtering Raters w/less than 10 Notes, Num Ratings: 3842019, Num Unique Notes Rated: 73863, Num Unique Raters: 40437
  After Final Filtering of Notes w/less than 5 Ratings, Num Ratings: 3840894, Num Unique Notes Rated: 73572, Num Unique Raters: 40437
------------------
Users: 40437, Notes: 73572
cpu
epoch 0 6.589416980743408
TRAIN FIT LOSS:  6.125600814819336
epoch 50 0.12608759105205536
TRAIN FIT LOSS:  0.098120778799057
epoch 100 0.11418940871953964
TRAIN FIT LOSS:  0.08804900199174881
Num epochs: 145
epoch 145 0.11411625146865845
TRAIN FIT LOSS:  0.08798252791166306
Global Intercept:  0.1595355123281479
Applying scoring rule: InitialNMR (v1.0)
Applying scoring rule: GeneralCRH (v1.0)
Applying scoring rule: LcbCRH (v1.0)
Applying scoring rule: GeneralCRNH (v1.0)
Applying scoring rule: UcbCRNH (v1.0)
Applying scoring rule: NmCRNH (v1.0)
Total ratings: 3805141 post-tombstones and 224092 pre-tombstones
Total ratings created before statuses: 583066, including 513141 post-tombstones and 69925 pre-tombstones.
Total valid ratings: 201957
Unique Raters:  40437
People (Authors or Raters) With Helpfulness Scores:  33587
Raters Included Based on Helpfulness Scores:  26766
Included Raters who have rated at least 1 note in the final dataset:  24107
Number of Ratings Used For 1st Training:  3840894
Number of Ratings for Final Training:  2848482
------------------
Users: 24107, Notes: 73567
initializing notes
initializing users
cpu
epoch 0 0.4064728021621704
TRAIN FIT LOSS:  0.3407231569290161
epoch 50 0.11263342201709747
TRAIN FIT LOSS:  0.08677636831998825
epoch 100 0.11187303066253662
TRAIN FIT LOSS:  0.08480041474103928
Num epochs: 108
epoch 108 0.11187130212783813
TRAIN FIT LOSS:  0.08481691032648087
Global Intercept:  0.16299566626548767
------------------
Re-scoring all notes with extra rating added: {'internalRaterIntercept': None, 'internalRaterFactor1': None, 'helpfulNum': None}
------------------
Users: 24113, Notes: 73567
cpu
initializing notes
initializing users
initialized global intercept
epoch 0 0.1451594978570938
TRAIN FIT LOSS:  0.12055684626102448
epoch 50 0.11196774244308472
TRAIN FIT LOSS:  0.08494749665260315
Num epochs: 70
epoch 70 0.11189267784357071
TRAIN FIT LOSS:  0.08485785126686096
------------------
Re-scoring all notes with extra rating added: {'raterParticipantId': '-1', 'raterIndex': 24107, 'internalRaterIntercept': -0.20905748, 'internalRaterFactor1': -0.98706627, 'helpfulNum': 0.0}
------------------
Users: 24113, Notes: 73567
cpu
initializing notes
initializing users
initialized global intercept
epoch 0 0.16348758339881897
TRAIN FIT LOSS:  0.1395658701658249
epoch 50 0.1144440621137619
TRAIN FIT LOSS:  0.08890409022569656
Num epochs: 76
epoch 76 0.11435116827487946
TRAIN FIT LOSS:  0.08862535655498505
------------------
Re-scoring all notes with extra rating added: {'raterParticipantId': '-1', 'raterIndex': 24107, 'internalRaterIntercept': -0.20905748, 'internalRaterFactor1': -0.98706627, 'helpfulNum': 1.0}
------------------
Users: 24113, Notes: 73567
cpu
initializing notes
initializing users
initialized global intercept
epoch 0 0.1725122183561325
TRAIN FIT LOSS:  0.1293899267911911
epoch 50 0.12989546358585358
TRAIN FIT LOSS:  0.10157230496406555
Num epochs: 84
epoch 84 0.12979304790496826
TRAIN FIT LOSS:  0.10174272954463959
------------------
Re-scoring all notes with extra rating added: {'raterParticipantId': '-2', 'raterIndex': 24108, 'internalRaterIntercept': -0.20905748, 'internalRaterFactor1': 0.0, 'helpfulNum': 0.0}
------------------
Users: 24113, Notes: 73567
cpu
initializing notes
initializing users
initialized global intercept
epoch 0 0.15731190145015717
TRAIN FIT LOSS:  0.13753223419189453
epoch 50 0.11080730706453323
TRAIN FIT LOSS:  0.08443847298622131
Num epochs: 80
epoch 80 0.11070317775011063
TRAIN FIT LOSS:  0.08417756110429764
------------------
Re-scoring all notes with extra rating added: {'raterParticipantId': '-2', 'raterIndex': 24108, 'internalRaterIntercept': -0.20905748, 'internalRaterFactor1': 0.0, 'helpfulNum': 1.0}
------------------
Users: 24113, Notes: 73567
cpu
initializing notes
initializing users
initialized global intercept
epoch 0 0.16405029594898224
TRAIN FIT LOSS:  0.12321518361568451
epoch 50 0.1290624737739563
TRAIN FIT LOSS:  0.10025785863399506
Num epochs: 75
epoch 75 0.1289958357810974
TRAIN FIT LOSS:  0.10028310865163803
------------------
Re-scoring all notes with extra rating added: {'raterParticipantId': '-3', 'raterIndex': 24109, 'internalRaterIntercept': -0.20905748, 'internalRaterFactor1': 0.8957525, 'helpfulNum': 0.0}
------------------
Users: 24113, Notes: 73567
cpu
initializing notes
initializing users
initialized global intercept
epoch 0 0.16300328075885773
TRAIN FIT LOSS:  0.13849183917045593
epoch 50 0.11360931396484375
TRAIN FIT LOSS:  0.08788181096315384
Num epochs: 79
epoch 79 0.11351975798606873
TRAIN FIT LOSS:  0.08761853724718094
------------------
Re-scoring all notes with extra rating added: {'raterParticipantId': '-3', 'raterIndex': 24109, 'internalRaterIntercept': -0.20905748, 'internalRaterFactor1': 0.8957525, 'helpfulNum': 1.0}
------------------
Users: 24113, Notes: 73567
cpu
initializing notes
initializing users
initialized global intercept
epoch 0 0.1718447357416153
TRAIN FIT LOSS:  0.12911443412303925
epoch 50 0.12908950448036194
TRAIN FIT LOSS:  0.09993864595890045
Num epochs: 74
epoch 74 0.12899614870548248
TRAIN FIT LOSS:  0.10016079246997833
------------------
Re-scoring all notes with extra rating added: {'raterParticipantId': '-4', 'raterIndex': 24110, 'internalRaterIntercept': 0.598844, 'internalRaterFactor1': -0.98706627, 'helpfulNum': 0.0}
------------------
Users: 24113, Notes: 73567
cpu
initializing notes
initializing users
initialized global intercept
epoch 0 0.17577436566352844
TRAIN FIT LOSS:  0.15158946812152863
epoch 50 0.13234080374240875
TRAIN FIT LOSS:  0.10695061832666397
Num epochs: 89
epoch 89 0.13225241005420685
TRAIN FIT LOSS:  0.1068490669131279
------------------
Re-scoring all notes with extra rating added: {'raterParticipantId': '-4', 'raterIndex': 24110, 'internalRaterIntercept': 0.598844, 'internalRaterFactor1': -0.98706627, 'helpfulNum': 1.0}
------------------
Users: 24113, Notes: 73567
cpu
initializing notes
initializing users
initialized global intercept
epoch 0 0.1630665510892868
TRAIN FIT LOSS:  0.13770204782485962
epoch 50 0.11418108642101288
TRAIN FIT LOSS:  0.08842354267835617
Num epochs: 67
epoch 67 0.1141059547662735
TRAIN FIT LOSS:  0.08808497339487076
------------------
Re-scoring all notes with extra rating added: {'raterParticipantId': '-5', 'raterIndex': 24111, 'internalRaterIntercept': 0.598844, 'internalRaterFactor1': 0.0, 'helpfulNum': 0.0}
------------------
Users: 24113, Notes: 73567
cpu
initializing notes
initializing users
initialized global intercept
epoch 0 0.17783692479133606
TRAIN FIT LOSS:  0.15611062943935394
epoch 50 0.13092678785324097
TRAIN FIT LOSS:  0.1053718850016594
Num epochs: 62
epoch 62 0.13086439669132233
TRAIN FIT LOSS:  0.1052466407418251
------------------
Re-scoring all notes with extra rating added: {'raterParticipantId': '-5', 'raterIndex': 24111, 'internalRaterIntercept': 0.598844, 'internalRaterFactor1': 0.0, 'helpfulNum': 1.0}
------------------
Users: 24113, Notes: 73567
cpu
initializing notes
initializing users
initialized global intercept
epoch 0 0.15194520354270935
TRAIN FIT LOSS:  0.12724226713180542
epoch 50 0.11060148477554321
TRAIN FIT LOSS:  0.0838242843747139
Num epochs: 52
epoch 52 0.11060154438018799
TRAIN FIT LOSS:  0.08376222103834152
------------------
Re-scoring all notes with extra rating added: {'raterParticipantId': '-6', 'raterIndex': 24112, 'internalRaterIntercept': 0.598844, 'internalRaterFactor1': 0.8957525, 'helpfulNum': 0.0}
------------------
Users: 24113, Notes: 73567
cpu
initializing notes
initializing users
initialized global intercept
epoch 0 0.1766638457775116
TRAIN FIT LOSS:  0.1522710621356964
epoch 50 0.1312122344970703
TRAIN FIT LOSS:  0.10556632280349731
Num epochs: 79
epoch 79 0.1311303973197937
TRAIN FIT LOSS:  0.10544323921203613
------------------
Re-scoring all notes with extra rating added: {'raterParticipantId': '-6', 'raterIndex': 24112, 'internalRaterIntercept': 0.598844, 'internalRaterFactor1': 0.8957525, 'helpfulNum': 1.0}
------------------
Users: 24113, Notes: 73567
cpu
initializing notes
initializing users
initialized global intercept
epoch 0 0.16297021508216858
TRAIN FIT LOSS:  0.1364491581916809
epoch 50 0.11338061094284058
TRAIN FIT LOSS:  0.0873255655169487
Num epochs: 79
epoch 79 0.11329249292612076
TRAIN FIT LOSS:  0.08705952018499374
/Users/dehghanpoorc/git/communitynotes/sourcecode/scoring/incorrect_filter.py:59: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ratings_w_user_totals.drop(
/Users/dehghanpoorc/git/communitynotes/sourcecode/scoring/incorrect_filter.py:59: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ratings_w_user_totals.drop(
Applying scoring rule: InitialNMR (v1.0)
Applying scoring rule: GeneralCRH (v1.0)
Applying scoring rule: LcbCRH (v1.0)
Applying scoring rule: GeneralCRNH (v1.0)
Applying scoring rule: UcbCRNH (v1.0)
Applying scoring rule: NmCRNH (v1.0)
Applying scoring rule: GeneralCRHInertia (v1.0)
Applying scoring rule: TagFilter (v1.0)
CRH notes prior to tag filtering: 7739
CRH notes above crhSuperThreshold: 1988
Checking note tags:
notHelpfulOther
  ratio threshold: 0.0523903465459289
notHelpfulIncorrect
  ratio threshold: 0.030175009231865144
notHelpfulSourcesMissingOrUnreliable
  ratio threshold: 0.09133365436405545
notHelpfulOpinionSpeculationOrBias
  ratio threshold: 0.0
notHelpfulMissingKeyPoints
  ratio threshold: 0.10043737751997933
notHelpfulOutdated
  ratio threshold: 0.0
notHelpfulHardToUnderstand
  ratio threshold: 0.05357206358220941
outlier filtering disabled for tag: notHelpfulHardToUnderstand
notHelpfulArgumentativeOrBiased
  ratio threshold: 0.049932614782993594
notHelpfulOffTopic
  ratio threshold: 0.0
notHelpfulSpamHarassmentOrAbuse
  ratio threshold: 0.00031220707798889486
notHelpfulIrrelevantSources
  ratio threshold: 0.04074369069568569
notHelpfulOpinionSpeculation
  ratio threshold: 0.07354085054276029
notHelpfulNoteNotNeeded
  ratio threshold: 0.10933317613008935
outlier filtering disabled for tag: notHelpfulNoteNotNeeded
Total {note, tag} pairs where tag filter logic triggered: 380
Total unique notes impacted by tag filtering: 309
Applying scoring rule: CRHSuperThreshold (v1.0)
Applying scoring rule: ElevatedCRHInertia (v1.0)
Applying scoring rule: FilterIncorrect (v1.0)
Total notes impacted by incorrect filtering: 89
Traceback (most recent call last):
  File "/Users/dehghanpoorc/git/communitynotes/sourcecode/main.py", line 96, in <module>
    main()
  File "/Users/dehghanpoorc/git/communitynotes/sourcecode/main.py", line 84, in main
    scoredNotes, helpfulnessScores, newStatus, auxNoteInfo = run_scoring(
                                                             ^^^^^^^^^^^^
  File "/Users/dehghanpoorc/git/communitynotes/sourcecode/scoring/run_scoring.py", line 469, in run_scoring
    scoredNotes, helpfulnessScores, auxiliaryNoteInfo = _run_scorers(
                                                        ^^^^^^^^^^^^^
  File "/Users/dehghanpoorc/git/communitynotes/sourcecode/scoring/run_scoring.py", line 151, in _run_scorers
    modelResultsAndTimes = [
                           ^
  File "/Users/dehghanpoorc/git/communitynotes/sourcecode/scoring/run_scoring.py", line 152, in <listcomp>
    _run_scorer_parallelizable(s, ratings, noteStatusHistory, userEnrollment) for s in scorers
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dehghanpoorc/git/communitynotes/sourcecode/scoring/run_scoring.py", line 103, in _run_scorer_parallelizable
    result = ModelResult(*scorer.score(ratings, noteStatusHistory, userEnrollment))
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dehghanpoorc/git/communitynotes/sourcecode/scoring/scorer.py", line 109, in score
    assert set(noteScores.columns) == set(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: all columns must be either dropped or explicitly defined in an output. 
    Extra columns that were in noteScores: {'raterParticipantId_same', 'raterParticipantId_interval'}
    Missing expected columns that should've been in noteScores: set()

Here's a git diff too, just in case:

diff --git a/sourcecode/scoring/constants.py b/sourcecode/scoring/constants.py
index 497c3d4..9e6af84 100644
--- a/sourcecode/scoring/constants.py
+++ b/sourcecode/scoring/constants.py
@@ -251,19 +251,19 @@ notMisleadingTagsAndTypes = [(tag, np.int64) for tag in notMisleadingTags]
 noteTSVColumnsAndTypes = (
   [
     (noteIdKey, np.int64),
-    (noteAuthorParticipantIdKey, np.object),
+    (noteAuthorParticipantIdKey, object),
     (createdAtMillisKey, np.int64),
     (tweetIdKey, np.int64),
-    (classificationKey, np.object),
-    ("believable", np.object),
-    ("harmful", np.object),
-    ("validationDifficulty", np.object),
+    (classificationKey, object),
+    ("believable", object),
+    ("harmful", object),
+    ("validationDifficulty", object),
   ]
   + misleadingTagsAndTypes
   + notMisleadingTagsAndTypes
   + [
     ("trustworthySources", np.int64),
-    (summaryKey, np.object),
+    (summaryKey, object),
     ("isMediaNote", np.int64)
   ]
 )
@@ -274,14 +274,14 @@ noteTSVTypeMapping = {col: dtype for (col, dtype) in noteTSVColumnsAndTypes}
 ratingTSVColumnsAndTypes = (
   [
     (noteIdKey, np.int64),
-    (raterParticipantIdKey, np.object),
+    (raterParticipantIdKey, object),
     (createdAtMillisKey, np.int64),
     ("version", np.int64),
     ("agree", np.int64),
     ("disagree", np.int64),
     (helpfulKey, np.int64),
     (notHelpfulKey, np.int64),
-    (helpfulnessLevelKey, np.object),
+    (helpfulnessLevelKey, object),
   ]
   + helpfulTagsAndTypesTSVOrder
   + notHelpfulTagsAndTypesTSVOrder
@@ -305,16 +305,16 @@ timestampMillisOfRetroLockKey = "timestampMillisOfRetroLock"

noteStatusHistoryTSVColumnsAndTypes = [
   (noteIdKey, np.int64),
-  (noteAuthorParticipantIdKey, np.object),
+  (noteAuthorParticipantIdKey, object),
   (createdAtMillisKey, np.int64),
   (timestampMillisOfNoteFirstNonNMRLabelKey, np.double),  # double because nullable.
-  (firstNonNMRLabelKey, np.object),
+  (firstNonNMRLabelKey, object),
   (timestampMillisOfNoteCurrentLabelKey, np.double),  # double because nullable.
-  (currentLabelKey, np.object),
+  (currentLabelKey, object),
   (timestampMillisOfNoteMostRecentNonNMRLabelKey, np.double),  # double because nullable.
-  (mostRecentNonNMRLabelKey, np.object),
+  (mostRecentNonNMRLabelKey, object),
   (timestampMillisOfStatusLockKey, np.double),  # double because nullable.
-  (lockedStatusKey, np.object),
+  (lockedStatusKey, object),
   (timestampMillisOfRetroLockKey, np.double),  # double because nullable.
 ]
 noteStatusHistoryTSVColumns = [col for (col, dtype) in noteStatusHistoryTSVColumnsAndTypes]
@@ -355,12 +355,12 @@ core = "CORE"
 expansion = "EXPANSION"
 
 userEnrollmentTSVColumnsAndTypes = [
-  (participantIdKey, np.str),
-  (enrollmentState, np.str),
+  (participantIdKey, str),
+  (enrollmentState, str),
   (successfulRatingNeededToEarnIn, np.int64),
   (timestampOfLastStateChange, np.int64),
   (timestampOfLastEarnOut, np.double),  # double because nullable.
-  (modelingPopulationKey, np.str),
+  (modelingPopulationKey, str),
 ]
 userEnrollmentTSVColumns = [col for (col, _) in userEnrollmentTSVColumnsAndTypes]
 userEnrollmentTSVTypes = [dtype for (_, dtype) in userEnrollmentTSVColumnsAndTypes]
@@ -427,26 +427,26 @@ noteModelOutputTSVColumnsAndTypes = [
   (noteIdKey, np.int64),
   (coreNoteInterceptKey, np.double),
   (coreNoteFactor1Key, np.double),
-  (finalRatingStatusKey, np.str),
-  (firstTagKey, np.str),
-  (secondTagKey, np.str),
+  (finalRatingStatusKey, str),
+  (firstTagKey, str),
+  (secondTagKey, str),
   # Note that this column was formerly named "activeRules" and the name is now
   # updated to "coreActiveRules".  The data values remain the compatible,
   # but the new column only contains rules that ran when deciding status based on
   # the core model.
-  (coreActiveRulesKey, np.str),
-  (activeFilterTagsKey, np.str),
-  (classificationKey, np.str),
+  (coreActiveRulesKey, str),
+  (activeFilterTagsKey, str),
+  (classificationKey, str),
   (createdAtMillisKey, np.int64),
-  (coreRatingStatusKey, np.str),
-  (metaScorerActiveRulesKey, np.str),
-  (decidedByKey, np.str),
+  (coreRatingStatusKey, str),
+  (metaScorerActiveRulesKey, str),
+  (decidedByKey, str),
   (expansionNoteInterceptKey, np.double),
   (expansionNoteFactor1Key, np.double),
-  (expansionRatingStatusKey, np.str),
+  (expansionRatingStatusKey, str),
   (coverageNoteInterceptKey, np.double),
   (coverageNoteFactor1Key, np.double),
-  (coverageRatingStatusKey, np.str),
+  (coverageRatingStatusKey, str),
   (coreNoteInterceptMinKey, np.double),
   (coreNoteInterceptMaxKey, np.double),
   (expansionNoteInterceptMinKey, np.double),
@@ -477,7 +477,7 @@ raterModelOutputTSVColumnsAndTypes = [
   (notesAwaitingMoreRatings, np.int64),
   (enrollmentState, np.int32),
   (successfulRatingNeededToEarnIn, np.int64),
-  (authorTopNotHelpfulTagValues, np.str),
+  (authorTopNotHelpfulTagValues, str),
   (timestampOfLastStateChange, np.int64),
   (aboveHelpfulnessThresholdKey, np.bool_),
   (isEmergingWriterKey, np.bool_),
diff --git a/sourcecode/scoring/note_ratings.py b/sourcecode/scoring/note_ratings.py
index dad9dcf..4d2e1a2 100644
--- a/sourcecode/scoring/note_ratings.py
+++ b/sourcecode/scoring/note_ratings.py
@@ -83,7 +83,7 @@ def get_ratings_before_note_status_and_public_tsv(
   # c.timestampMillisOfNoteMostRecentNonNMRLabelKey are determined at runtime and cannot be statically
   # determined from the code above.  If noteStatusHistory is missing any noteIdKey which is found in
   # ratings, then the missing rows will have NaN values for c.createdAtMillisKey and
-  # c.timestampMillisOfNoteMostRecentNonNMRLabelKey, forcing the entire colum to have type np.float.
+  # c.timestampMillisOfNoteMostRecentNonNMRLabelKey, forcing the entire colum to have type float.
   # However, if there are no missing values in column noteIdKey then c.createdAtMillisKey and
   # c.timestampMillisOfNoteMostRecentNonNMRLabelKey will retain their int64 types.  The code below
   # coerces both columns to always have float types so the typecheck below will pass.
@@ -95,11 +95,11 @@ def get_ratings_before_note_status_and_public_tsv(
     ratingsWithNoteLabelInfoTypes = c.ratingTSVTypeMapping
     ratingsWithNoteLabelInfoTypes[
       c.createdAtMillisKey + "_note"
-    ] = np.float  # float because nullable after merge.
+    ] = float  # float because nullable after merge.
     ratingsWithNoteLabelInfoTypes[
       c.timestampMillisOfNoteMostRecentNonNMRLabelKey
-    ] = np.float  # float because nullable.
-    ratingsWithNoteLabelInfoTypes[c.helpfulNumKey] = np.float
+    ] = float  # float because nullable.
+    ratingsWithNoteLabelInfoTypes[c.helpfulNumKey] = float
 
     assert len(ratingsWithNoteLabelInfo) == len(ratings)
     mismatches = [
diff --git a/sourcecode/scoring/process_data.py b/sourcecode/scoring/process_data.py
index bc74a22..4b28839 100644
--- a/sourcecode/scoring/process_data.py
+++ b/sourcecode/scoring/process_data.py
@@ -342,7 +342,7 @@ def preprocess_data(
   ratings.loc[ratings[c.helpfulnessLevelKey] == c.helpfulValueTsv, c.helpfulNumKey] = 1
   ratings = ratings.loc[~pd.isna(ratings[c.helpfulNumKey])]
 
-  notes[c.tweetIdKey] = notes[c.tweetIdKey].astype(np.str)
+  notes[c.tweetIdKey] = notes[c.tweetIdKey].astype(str)
 
   noteStatusHistory = note_status_history.merge_note_info(noteStatusHistory, notes)
 
diff --git a/sourcecode/scoring/scorer.py b/sourcecode/scoring/scorer.py
index 20d0ff8..c75f557 100644
--- a/sourcecode/scoring/scorer.py
+++ b/sourcecode/scoring/scorer.py
@@ -105,6 +105,7 @@ class Scorer(ABC):
     userScores = userScores.rename(columns=self._get_user_col_mapping())
     # Process noteScores
     noteScores = noteScores.drop(columns=self._get_dropped_note_cols())
+    #noteScores = noteScores.drop(columns=['raterParticipantId_interval', 'raterParticipantId_same'])
     assert set(noteScores.columns) == set(
       self.get_scored_notes_cols() + self.get_auxiliary_note_info_cols()
     ), f"""all columns must be either dropped or explicitly defined in an output.

from communitynotes.

jbaxter commented on August 16, 2024

Thank you! Please reopen this if my fix doesn't resolve this for you.

from communitynotes.

Note Scoring fails due to changes in expected file format about communitynotes HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent