SoftTree Technologies SoftTree Technologies
Technical Support Forums
RegisterSearchFAQMemberlistUsergroupsLog in
FR: Data generation matching a regexp pattern

 
Reply to topic    SoftTree Technologies Forum Index » SQL Assistant View previous topic
View next topic
FR: Data generation matching a regexp pattern
Author Message
gemisigo



Joined: 11 Mar 2010
Posts: 1566

Post FR: Data generation matching a regexp pattern Reply with quote
Using the library data files for data generation, while it's the best for taking the elements from a predetermined list, is still pretty limited (see issues here), and the rest are either random/sequence numbers or random words picked from a list of words.

It would be nice to have another type of randomly generated data, one that's like random with alpha-numeric characters with windows but still matches a given regexp pattern. I know this (sort of) could be achieved by creating the list as library data but sometimes that list could be a really large set. A simple matching against a regexp pattern could work better.
Mon Aug 19, 2019 5:35 am View user's profile Send private message
SysOp
Site Admin


Joined: 26 Nov 2006
Posts: 6924

Post Reply with quote
I think in the current version it's better to create a lookup table with a random value set and use it as a source of the data instead of a fairly small library file. It's also easier to populate such lookup table using SQL expressions.
Mon Aug 19, 2019 8:55 am View user's profile Send private message
gemisigo



Joined: 11 Mar 2010
Posts: 1566

Post Reply with quote
It probably is. It also takes considerably more effort to fill such tables, partly because SQL Server, for example doesn't support regexp, and even if it were, I'd have to provide a procedure to create values for each pattern. The overhead multiplies with the number of columns having to do this with.
Mon Aug 19, 2019 12:28 pm View user's profile Send private message
SysOp
Site Admin


Joined: 26 Nov 2006
Posts: 6924

Post Reply with quote
Have you looked into the options available for random text generation. It doesn't have to come from data dictionary. You can choose for example alpha-numeric characters with some prefix entered to ensure they are not too random and always start with an alpha character. Min and max values can be used to define min and max length of the generated values, which could be the same.
In theory a regular expression can be used for internal value validation, but I guess that might be very inefficient causing a lot of CPU cycles wasted. Maybe a pattern with fill in placeholders can do a better job.
Mon Aug 19, 2019 2:51 pm View user's profile Send private message
gemisigo



Joined: 11 Mar 2010
Posts: 1566

Post Reply with quote
I have. Full random is quite enough for automated tests. For a human involved test, random is not that good. And while the prefix is great for making it a bit less random, min/max is good for setting the size, reducing the "randomness" a little bit more (eg. using a sequence) would be awesome. Alas, only random works with the prefix, as soon as I switch to sequence, my prefix is gone.

Yes, using regexp is always resource intensive. But I don't really care, and I don't anyone would. It uses CPU cycles and machine time. I can live with that. On the other hand, if I have to come up with a design for a solution (not mentioning having to implement it too), that wastes my time, time that I could spend designing and implementing stuff instead of figuring out how to create usable test data that will be thrown away. And that time is much more precious than CPU cycles. If you really think about it, CPU cycles are wasted by gazillions even now, while I'm browsing the net, reading/writing on forums, and not whipping this machine to give me the best it could.

Pattern with placeholders would be the best. And actually, that's exactly what regexp is. Pattern with (possibly multiple) placeholders. Using regexp is always resource intensive. But there are smart ways using it. For example, for the pattern 'abdc\d{1,4}efgh' you shouldn't mindlessly generate characters from '000000000' to 'ZZZZZZZZZZZZ' and then try to match that using the regexp pattern. That definitely would be a waste of resources. But if you analyze the pattern, that's a pattern with placeholders. The 'abcd' at the start is constant. The 'efgh' at the end too. You'd only have to generate characters (numbers, really) from 0 to 9999, and replace the placeholder and then match the result against the pattern.

Now, I know that analyzing a regexp pattern can be a pretty tough task, as they can become arbitrarily difficult (it isn't called write-only without a reason). So figuring out how to create values for filling the placeholders might be harder way beyond it's being worth. So, you might only allow a (severely) reduced set of regexp expressions to use. Or...

I think a prefix with a sequence could make most of the users happy. You might be right, regexp might be an overkill. Though, I for one would definitely utilize the hell out of it. But I would feel quite content with having prefix + cycle.
Tue Aug 20, 2019 8:11 am View user's profile Send private message
SysOp
Site Admin


Joined: 26 Nov 2006
Posts: 6924

Post Reply with quote
Thank you for your input. Before I submit a specific enhancement request, can I propose something like a pattern or randomized value, like

Fixed size pattern with 3 random digits in the first placeholders group and 2 random letters in the second placeholder group
ABC-[0-9][0-9][0-9]-NA.[a-Z][a-Z]

Variable with up to 3 random digits in the first placeholders group and up to 2 random letters in the second placeholder group
ABC-([0-9]3)-NA.([a-Z]2)


All characters not within [...] and ([...)n) patterns are regular predefined characters that repeat in every generated value, they are optional of course.

Would that work for your use cases?
Tue Aug 20, 2019 8:45 am View user's profile Send private message
gemisigo



Joined: 11 Mar 2010
Posts: 1566

Post Reply with quote
Yes, it would. It is also much simpler than regexp could be. It definitely would be my favorite. Though I still believe Prefix + Sequence would be more (most?) popular.
Tue Aug 20, 2019 6:29 pm View user's profile Send private message
Display posts from previous:    
Reply to topic    SoftTree Technologies Forum Index » SQL Assistant All times are GMT - 4 Hours
Page 1 of 1

 
Jump to: 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


 

 

Powered by phpBB © 2001, 2005 phpBB Group
Design by Freestyle XL / Flowers Online.