[lang] Wildcard regex

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[lang] Wildcard regex

jodastephen
I don't think comons lang has a routine for converting a standard
wildcard string (with * and ?) to a regex.
Here is a first suggestion, although I'm sure it can be improved.

  public Pattern createPattern(String text) {
    StringTokenizer tkn = new StringTokenizer(text, "?*", true);
    StringBuilder buf = new StringBuilder(text.length() + 10);
    buf.append('^');
    boolean lastStar = false;
    while (tkn.hasMoreTokens()) {
      String str = tkn.nextToken();
      if (str.equals("?")) {
        buf.append('.');
        lastStar = false;
      } else if (str.equals("*")) {
        if (lastStar == false) {
          buf.append(".*");
        }
        lastStar = true;
      } else {
        buf.append(Pattern.quote(str));
        lastStar = false;
      }
    }
    buf.append('$');
    return Pattern.compile(buf.toString(), Pattern.CASE_INSENSITIVE);
  }

Other possile conversions would be * and ? to databse wildcards, so
perhaps there is scope for a few related methods here?

Stephen

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [lang] Wildcard regex

Paul Benedict
Can I get some sense of use case? What would you use it for? Just curious.

On Fri, Oct 8, 2010 at 9:06 AM, Stephen Colebourne <[hidden email]> wrote:

> I don't think comons lang has a routine for converting a standard
> wildcard string (with * and ?) to a regex.
> Here is a first suggestion, although I'm sure it can be improved.
>
>  public Pattern createPattern(String text) {
>    StringTokenizer tkn = new StringTokenizer(text, "?*", true);
>    StringBuilder buf = new StringBuilder(text.length() + 10);
>    buf.append('^');
>    boolean lastStar = false;
>    while (tkn.hasMoreTokens()) {
>      String str = tkn.nextToken();
>      if (str.equals("?")) {
>        buf.append('.');
>        lastStar = false;
>      } else if (str.equals("*")) {
>        if (lastStar == false) {
>          buf.append(".*");
>        }
>        lastStar = true;
>      } else {
>        buf.append(Pattern.quote(str));
>        lastStar = false;
>      }
>    }
>    buf.append('$');
>    return Pattern.compile(buf.toString(), Pattern.CASE_INSENSITIVE);
>  }
>
> Other possile conversions would be * and ? to databse wildcards, so
> perhaps there is scope for a few related methods here?
>
> Stephen
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [lang] Wildcard regex

Matt Benson-2

On Oct 8, 2010, at 9:10 AM, Paul Benedict wrote:

> Can I get some sense of use case? What would you use it for? Just curious.
>

I've got code like this as well.  It allows you to take a pattern containing glob-style wildcards and convert to a regex that you can then use to do your matching for you.  I would also suggest including support for Ant-style wildcards if possible as well since those have achieved some ubiquity in the Java ecosystem.

-Matt

> On Fri, Oct 8, 2010 at 9:06 AM, Stephen Colebourne <[hidden email]> wrote:
>> I don't think comons lang has a routine for converting a standard
>> wildcard string (with * and ?) to a regex.
>> Here is a first suggestion, although I'm sure it can be improved.
>>
>>  public Pattern createPattern(String text) {
>>    StringTokenizer tkn = new StringTokenizer(text, "?*", true);
>>    StringBuilder buf = new StringBuilder(text.length() + 10);
>>    buf.append('^');
>>    boolean lastStar = false;
>>    while (tkn.hasMoreTokens()) {
>>      String str = tkn.nextToken();
>>      if (str.equals("?")) {
>>        buf.append('.');
>>        lastStar = false;
>>      } else if (str.equals("*")) {
>>        if (lastStar == false) {
>>          buf.append(".*");
>>        }
>>        lastStar = true;
>>      } else {
>>        buf.append(Pattern.quote(str));
>>        lastStar = false;
>>      }
>>    }
>>    buf.append('$');
>>    return Pattern.compile(buf.toString(), Pattern.CASE_INSENSITIVE);
>>  }
>>
>> Other possile conversions would be * and ? to databse wildcards, so
>> perhaps there is scope for a few related methods here?
>>
>> Stephen
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [lang] Wildcard regex

jodastephen
In reply to this post by Paul Benedict
Human users enter wildcards * and ? (because regex is too complex). In
my case, I'm passing it to MongoDB, which needs regex.

Stephen


On 8 October 2010 15:10, Paul Benedict <[hidden email]> wrote:

> Can I get some sense of use case? What would you use it for? Just curious.
>
> On Fri, Oct 8, 2010 at 9:06 AM, Stephen Colebourne <[hidden email]> wrote:
>> I don't think comons lang has a routine for converting a standard
>> wildcard string (with * and ?) to a regex.
>> Here is a first suggestion, although I'm sure it can be improved.
>>
>>  public Pattern createPattern(String text) {
>>    StringTokenizer tkn = new StringTokenizer(text, "?*", true);
>>    StringBuilder buf = new StringBuilder(text.length() + 10);
>>    buf.append('^');
>>    boolean lastStar = false;
>>    while (tkn.hasMoreTokens()) {
>>      String str = tkn.nextToken();
>>      if (str.equals("?")) {
>>        buf.append('.');
>>        lastStar = false;
>>      } else if (str.equals("*")) {
>>        if (lastStar == false) {
>>          buf.append(".*");
>>        }
>>        lastStar = true;
>>      } else {
>>        buf.append(Pattern.quote(str));
>>        lastStar = false;
>>      }
>>    }
>>    buf.append('$');
>>    return Pattern.compile(buf.toString(), Pattern.CASE_INSENSITIVE);
>>  }
>>
>> Other possile conversions would be * and ? to databse wildcards, so
>> perhaps there is scope for a few related methods here?
>>
>> Stephen
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [lang] Wildcard regex

Siegfried Goeschl-3
Hi folks,

assuming that "standard wildcard" is actually globbing I came around of
an globbing to regexp converter somewhere - I will have a look

Cheers,

Siegfried Goeschl

On 10/8/10 4:32 PM, Stephen Colebourne wrote:

> Human users enter wildcards * and ? (because regex is too complex). In
> my case, I'm passing it to MongoDB, which needs regex.
>
> Stephen
>
>
> On 8 October 2010 15:10, Paul Benedict<[hidden email]>  wrote:
>> Can I get some sense of use case? What would you use it for? Just curious.
>>
>> On Fri, Oct 8, 2010 at 9:06 AM, Stephen Colebourne<[hidden email]>  wrote:
>>> I don't think comons lang has a routine for converting a standard
>>> wildcard string (with * and ?) to a regex.
>>> Here is a first suggestion, although I'm sure it can be improved.
>>>
>>>   public Pattern createPattern(String text) {
>>>     StringTokenizer tkn = new StringTokenizer(text, "?*", true);
>>>     StringBuilder buf = new StringBuilder(text.length() + 10);
>>>     buf.append('^');
>>>     boolean lastStar = false;
>>>     while (tkn.hasMoreTokens()) {
>>>       String str = tkn.nextToken();
>>>       if (str.equals("?")) {
>>>         buf.append('.');
>>>         lastStar = false;
>>>       } else if (str.equals("*")) {
>>>         if (lastStar == false) {
>>>           buf.append(".*");
>>>         }
>>>         lastStar = true;
>>>       } else {
>>>         buf.append(Pattern.quote(str));
>>>         lastStar = false;
>>>       }
>>>     }
>>>     buf.append('$');
>>>     return Pattern.compile(buf.toString(), Pattern.CASE_INSENSITIVE);
>>>   }
>>>
>>> Other possile conversions would be * and ? to databse wildcards, so
>>> perhaps there is scope for a few related methods here?
>>>
>>> Stephen
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [lang] Wildcard regex

sebb-2-2
In reply to this post by jodastephen
What does the regex represent? A filename?

If so, then maybe the code belongs in IO rather than Lang.

Also, filename globbing is not consistent across OSes.

On 8 October 2010 15:32, Stephen Colebourne <[hidden email]> wrote:

> Human users enter wildcards * and ? (because regex is too complex). In
> my case, I'm passing it to MongoDB, which needs regex.
>
> Stephen
>
>
> On 8 October 2010 15:10, Paul Benedict <[hidden email]> wrote:
>> Can I get some sense of use case? What would you use it for? Just curious.
>>
>> On Fri, Oct 8, 2010 at 9:06 AM, Stephen Colebourne <[hidden email]> wrote:
>>> I don't think comons lang has a routine for converting a standard
>>> wildcard string (with * and ?) to a regex.
>>> Here is a first suggestion, although I'm sure it can be improved.
>>>
>>>  public Pattern createPattern(String text) {
>>>    StringTokenizer tkn = new StringTokenizer(text, "?*", true);
>>>    StringBuilder buf = new StringBuilder(text.length() + 10);
>>>    buf.append('^');
>>>    boolean lastStar = false;
>>>    while (tkn.hasMoreTokens()) {
>>>      String str = tkn.nextToken();
>>>      if (str.equals("?")) {
>>>        buf.append('.');
>>>        lastStar = false;
>>>      } else if (str.equals("*")) {
>>>        if (lastStar == false) {
>>>          buf.append(".*");
>>>        }
>>>        lastStar = true;
>>>      } else {
>>>        buf.append(Pattern.quote(str));
>>>        lastStar = false;
>>>      }
>>>    }
>>>    buf.append('$');
>>>    return Pattern.compile(buf.toString(), Pattern.CASE_INSENSITIVE);
>>>  }
>>>
>>> Other possile conversions would be * and ? to databse wildcards, so
>>> perhaps there is scope for a few related methods here?
>>>
>>> Stephen
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [lang] Wildcard regex

jodastephen
No, the wildcard is used for a database search. Find me all names
matching "Foo*". This is not for [io].

The oro code does look reasonable I'd say.

Stephen


On 9 October 2010 12:25, sebb <[hidden email]> wrote:

> What does the regex represent? A filename?
>
> If so, then maybe the code belongs in IO rather than Lang.
>
> Also, filename globbing is not consistent across OSes.
>
> On 8 October 2010 15:32, Stephen Colebourne <[hidden email]> wrote:
>> Human users enter wildcards * and ? (because regex is too complex). In
>> my case, I'm passing it to MongoDB, which needs regex.
>>
>> Stephen
>>
>>
>> On 8 October 2010 15:10, Paul Benedict <[hidden email]> wrote:
>>> Can I get some sense of use case? What would you use it for? Just curious.
>>>
>>> On Fri, Oct 8, 2010 at 9:06 AM, Stephen Colebourne <[hidden email]> wrote:
>>>> I don't think comons lang has a routine for converting a standard
>>>> wildcard string (with * and ?) to a regex.
>>>> Here is a first suggestion, although I'm sure it can be improved.
>>>>
>>>>  public Pattern createPattern(String text) {
>>>>    StringTokenizer tkn = new StringTokenizer(text, "?*", true);
>>>>    StringBuilder buf = new StringBuilder(text.length() + 10);
>>>>    buf.append('^');
>>>>    boolean lastStar = false;
>>>>    while (tkn.hasMoreTokens()) {
>>>>      String str = tkn.nextToken();
>>>>      if (str.equals("?")) {
>>>>        buf.append('.');
>>>>        lastStar = false;
>>>>      } else if (str.equals("*")) {
>>>>        if (lastStar == false) {
>>>>          buf.append(".*");
>>>>        }
>>>>        lastStar = true;
>>>>      } else {
>>>>        buf.append(Pattern.quote(str));
>>>>        lastStar = false;
>>>>      }
>>>>    }
>>>>    buf.append('$');
>>>>    return Pattern.compile(buf.toString(), Pattern.CASE_INSENSITIVE);
>>>>  }
>>>>
>>>> Other possile conversions would be * and ? to databse wildcards, so
>>>> perhaps there is scope for a few related methods here?
>>>>
>>>> Stephen
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [lang] Wildcard regex

sebb-2-2
On 9 October 2010 12:33, Stephen Colebourne <[hidden email]> wrote:
> No, the wildcard is used for a database search. Find me all names
> matching "Foo*". This is not for [io].

But where do humans get the idea that * and ? are wildcards?

>
> The oro code does look reasonable I'd say.

Agreed, it could be adapted for Commons.

However, whether the syntax it supports is universal I don't know.

I think it's a useful addition, but the syntax needs to be carefully documented.

And it would be useful if there were versions for different OSes, to
allow filename matching using the standard for that OS.

> Stephen
>
>
> On 9 October 2010 12:25, sebb <[hidden email]> wrote:
>> What does the regex represent? A filename?
>>
>> If so, then maybe the code belongs in IO rather than Lang.
>>
>> Also, filename globbing is not consistent across OSes.
>>
>> On 8 October 2010 15:32, Stephen Colebourne <[hidden email]> wrote:
>>> Human users enter wildcards * and ? (because regex is too complex). In
>>> my case, I'm passing it to MongoDB, which needs regex.
>>>
>>> Stephen
>>>
>>>
>>> On 8 October 2010 15:10, Paul Benedict <[hidden email]> wrote:
>>>> Can I get some sense of use case? What would you use it for? Just curious.
>>>>
>>>> On Fri, Oct 8, 2010 at 9:06 AM, Stephen Colebourne <[hidden email]> wrote:
>>>>> I don't think comons lang has a routine for converting a standard
>>>>> wildcard string (with * and ?) to a regex.
>>>>> Here is a first suggestion, although I'm sure it can be improved.
>>>>>
>>>>>  public Pattern createPattern(String text) {
>>>>>    StringTokenizer tkn = new StringTokenizer(text, "?*", true);
>>>>>    StringBuilder buf = new StringBuilder(text.length() + 10);
>>>>>    buf.append('^');
>>>>>    boolean lastStar = false;
>>>>>    while (tkn.hasMoreTokens()) {
>>>>>      String str = tkn.nextToken();
>>>>>      if (str.equals("?")) {
>>>>>        buf.append('.');
>>>>>        lastStar = false;
>>>>>      } else if (str.equals("*")) {
>>>>>        if (lastStar == false) {
>>>>>          buf.append(".*");
>>>>>        }
>>>>>        lastStar = true;
>>>>>      } else {
>>>>>        buf.append(Pattern.quote(str));
>>>>>        lastStar = false;
>>>>>      }
>>>>>    }
>>>>>    buf.append('$');
>>>>>    return Pattern.compile(buf.toString(), Pattern.CASE_INSENSITIVE);
>>>>>  }
>>>>>
>>>>> Other possile conversions would be * and ? to databse wildcards, so
>>>>> perhaps there is scope for a few related methods here?
>>>>>
>>>>> Stephen
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [hidden email]
>>>>> For additional commands, e-mail: [hidden email]
>>>>>
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [lang] Wildcard regex

Henri Yandell
+1. Seems to fit a RegexUtils.wildcardToRegex type method.

Note that IO has a Wildcard concept already, so ideally it would match
the same schema.

Hen

On Sat, Oct 9, 2010 at 6:13 AM, sebb <[hidden email]> wrote:

> On 9 October 2010 12:33, Stephen Colebourne <[hidden email]> wrote:
>> No, the wildcard is used for a database search. Find me all names
>> matching "Foo*". This is not for [io].
>
> But where do humans get the idea that * and ? are wildcards?
>
>>
>> The oro code does look reasonable I'd say.
>
> Agreed, it could be adapted for Commons.
>
> However, whether the syntax it supports is universal I don't know.
>
> I think it's a useful addition, but the syntax needs to be carefully documented.
>
> And it would be useful if there were versions for different OSes, to
> allow filename matching using the standard for that OS.
>
>> Stephen
>>
>>
>> On 9 October 2010 12:25, sebb <[hidden email]> wrote:
>>> What does the regex represent? A filename?
>>>
>>> If so, then maybe the code belongs in IO rather than Lang.
>>>
>>> Also, filename globbing is not consistent across OSes.
>>>
>>> On 8 October 2010 15:32, Stephen Colebourne <[hidden email]> wrote:
>>>> Human users enter wildcards * and ? (because regex is too complex). In
>>>> my case, I'm passing it to MongoDB, which needs regex.
>>>>
>>>> Stephen
>>>>
>>>>
>>>> On 8 October 2010 15:10, Paul Benedict <[hidden email]> wrote:
>>>>> Can I get some sense of use case? What would you use it for? Just curious.
>>>>>
>>>>> On Fri, Oct 8, 2010 at 9:06 AM, Stephen Colebourne <[hidden email]> wrote:
>>>>>> I don't think comons lang has a routine for converting a standard
>>>>>> wildcard string (with * and ?) to a regex.
>>>>>> Here is a first suggestion, although I'm sure it can be improved.
>>>>>>
>>>>>>  public Pattern createPattern(String text) {
>>>>>>    StringTokenizer tkn = new StringTokenizer(text, "?*", true);
>>>>>>    StringBuilder buf = new StringBuilder(text.length() + 10);
>>>>>>    buf.append('^');
>>>>>>    boolean lastStar = false;
>>>>>>    while (tkn.hasMoreTokens()) {
>>>>>>      String str = tkn.nextToken();
>>>>>>      if (str.equals("?")) {
>>>>>>        buf.append('.');
>>>>>>        lastStar = false;
>>>>>>      } else if (str.equals("*")) {
>>>>>>        if (lastStar == false) {
>>>>>>          buf.append(".*");
>>>>>>        }
>>>>>>        lastStar = true;
>>>>>>      } else {
>>>>>>        buf.append(Pattern.quote(str));
>>>>>>        lastStar = false;
>>>>>>      }
>>>>>>    }
>>>>>>    buf.append('$');
>>>>>>    return Pattern.compile(buf.toString(), Pattern.CASE_INSENSITIVE);
>>>>>>  }
>>>>>>
>>>>>> Other possile conversions would be * and ? to databse wildcards, so
>>>>>> perhaps there is scope for a few related methods here?
>>>>>>
>>>>>> Stephen
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [hidden email]
>>>>> For additional commands, e-mail: [hidden email]
>>>>>
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]