MATLAB: Finding e-mail address which begins with 2 character and different domain names as array

finding e-mail address which begins with 2 character and different domain names as array

hello every one; i have 3 arrays and they are declared as the following:
dom_lists={'000' 'hotmal.com';'001' 'gmail.com';'010' 'yahoo.com'; '011' 'mail.com';'100' 'live.com';'101' 'myspace.com';'110' 'msn.com';'111' 'mynet.com'};
sub_g1={'aj';'ih';'vn';'hu';'eg';'is';'rd';'nt';'me';'ah';'zb';'en';'mm'};
sub_g2={'001';'101'; '101';'101'; '000'; '110'; '001'; '111'; '101'; '000'; '110'; '000'; '000'};
list_emialadre={'aakm@hotmail.com';'abomcn@hotmail.com';...............}; 5408 emails which are based on domain names, e.g. hotmail.com have 676 email address and also gmail.com have 676....
what i want is; generating the sub_g2's meaning or equivalent strings from domain_lists vector. after that, sub_g1's data is also used for finding from the email address which begins with the sub_g1's dual characters and ends sub_2 domain_lis. so help me for solving this problem.

Best Answer

  • If you split the problem into parts then it is much easier to solve. Here are the data call arrays, which I altered e.g. by adding one sample email address that actually matches the second pair+domain data (otherwise the two email addresses given do not match any, so we would not have a positive result):
    dom_lists = {'001' 'gmail.com'; '000' 'hotmal.com';'010' 'yahoo.com'; '011' 'mail.com';'100' 'live.com';'101' 'myspace.com';'110' 'msn.com';'111' 'mynet.com'};
    sub_g1 = {'aj'; 'ih'; 'vn'; 'hu'; 'eg'; 'is'; 'rd'; 'nt'; 'me'; 'ah'; 'zb'; 'en'; 'mm'};
    sub_g2 = {'001';'101'; '101';'101'; '000'; '110'; '001'; '111'; '101'; '000'; '110'; '000'; '000'};
    email_list = {'aakm@hotmail.com'; 'abomcn@hotmail.com'; 'ihzzz@myspace.com'};
    First convert the binary strings into numeric values with bin2dec, as this makes them easier to work with:
    sub_N = bin2dec(cell2mat(sub_g2));
    dom_N = bin2dec(cell2mat(dom_lists(:,1)));
    Then simply match these numeric values using bsxfun, and extract the correct domain strings:
    [row,~] = find(bsxfun(@eq,dom_N,sub_N.'));
    dom_g2 = dom_lists(row,2);
    Then create regexp regular expressions based on the character-pair and domain strings, and use these to locate the matching email addresses:
    rgx = strcat('^',sub_g1,'.*@',strrep(dom_g2,'.','\.'),'$');
    mtc = cellfun(@(s)regexp(email_list,s),rgx, 'UniformOutput',false);
    out = ~cellfun('isempty',[mtc{:}]);
    where out can be shown in the command window:
    >> out
    out =
    0 0 0 0 0 0 0 0 0 0 0 0 0
    0 0 0 0 0 0 0 0 0 0 0 0 0
    0 1 0 0 0 0 0 0 0 0 0 0 0
    out is a logical array where each row corresponds to one of the given email addresses (only three in the sample data!) and each column corresponds to the pair+domain data of sub_g1 and sub_g2 (thus thirteen columns). From this array we can see that the third email address matches the data of the second pair+domain data, which is what was stated at the beginning, so the algorithm has successfully detected this positive test case.