Code Monkey home page Code Monkey logo

Comments (8)

cdown avatar cdown commented on June 3, 2024

https://github.com/badboy/iso8601/blob/master/src/parsers.rs is probably not a bad place to start to start looking at text-based nom parsing.

from mack.

cdown avatar cdown commented on June 3, 2024

For feat in titles at least, we probably want take_until!

from mack.

cdown avatar cdown commented on June 3, 2024

I think one of the main downsides for nom is that the learning curve seems pretty high. For example, I've been struggling just to do feat extraction for a while, which is made much worse by cryptic type mismatches from within macros themselves:

mack % gd
diff --git src/fixers.rs src/fixers.rs
index 42265b7..f9a2f10 100644
--- src/fixers.rs
+++ src/fixers.rs
@@ -1,6 +1,6 @@
 use regex::{Regex, RegexBuilder};
 use taglib;
-use types::{Fixer, MackError, Track};
+use types::{Fixer, MackError, Track, TrackTitle};
 use std::borrow::Cow;
 
 lazy_static! {
@@ -9,6 +9,23 @@ lazy_static! {
     ).case_insensitive(true).build().unwrap();
 }
 
+named!(open_bracket, alt!(tag!("(") | tag!("[")));
+named!(close_bracket, alt!(tag!(")") | tag!("]")));
+named!(feature_word, alt!(tag_no_case!("featuring") | tag_no_case!("feat") | tag_no_case!("ft")));
+named!(parse_track_title<&[u8], TrackTitle>,
+    do_parse!(
+        track_name: take_till!(alt!(opt!(open_bracket) | eof!())) >>
+        feature_word >> opt!(".") >>
+
+        featured_artists: take_till!(opt!(close_bracket)) >>
+
+        (TrackTitle {
+            track_name: track_name,
+            featured_artists: featured_artists,
+        })
+    )
+);
+
 pub fn run_fixers(track: &mut Track, dry_run: bool) -> Result<Vec<Fixer>, MackError> {
     let mut applied_fixers = Vec::new();
     let mut tags = track.tag_file.tag()?;
diff --git src/main.rs src/main.rs
index 2901b45..f41bc87 100644
--- src/main.rs
+++ src/main.rs
@@ -4,6 +4,8 @@ extern crate ignore;
 extern crate lazy_static;
 extern crate regex;
 extern crate taglib;
+#[macro_use]
+extern crate nom;
 
 mod fixers;
 mod track;
diff --git src/types.rs src/types.rs
index 001e3d0..c3109a0 100644
--- src/types.rs
+++ src/types.rs
@@ -7,6 +7,12 @@ pub struct Track {
     pub tag_file: taglib::File,
 }
 
+#[derive(Debug)]
+pub struct TrackTitle {
+    pub track_name: String,
+    pub featured_artists: Vec<String>,
+}
+
 #[derive(Debug)]
 pub enum Fixer {
     FEAT,

from mack.

cdown avatar cdown commented on June 3, 2024

Using regexes as a replacement mechanism really show their limitations when it comes to things like replacing the final "and" with ", &", since we don't know if there are only two artists (and so there should be no comma) or not.

I suspect the best balance for now is to extract using regex, and have specific types for a title, and an artist (possibly with "feat." in). For other types I don't think we'll need them yet since I can't think of multiple pieces of metadata contained.

from mack.

cdown avatar cdown commented on June 3, 2024

Well, there are other limitations with the regex crate for our needs. Greedy matching to try and extract particular things (like (feat. X)) while eliminating the surrounding context often causes us to be unable to reasonably make things optional, but still prefer to cut them out. Since there's no negative lookahead, we can't do much. Maybe fancy-regex really can help here.

from mack.

cdown avatar cdown commented on June 3, 2024

Or maybe it's just more judicious use of regexes that's the solution. We could explicitly look for our feat string and do the rest manually.

from mack.

cdown avatar cdown commented on June 3, 2024

lalrpop possibly looks more promising: https://github.com/lalrpop/lalrpop

from mack.

cdown avatar cdown commented on June 3, 2024

The current TrackFeat approach seems the best compromise for now.

from mack.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.