1. Saman.firstname.lastname@example.org 2. email@example.com 3. firstname.lastname@example.org 4. Saman.email@example.com 5. saman@firstname.lastname@example.org 6. saman@mail@com 7. saman.desilva@yahoo com
I want to print valid email addresses but am having trouble figuring this problem out. So far I have this script, but it doesn’t print the fully correct output. It still gives me an incorrect output.
sed -nr '/w+@w+.w+$/p' emaillist.txt
email@example.com firstname.lastname@example.org email@example.com Saman.firstname.lastname@example.org saman@email@example.com
First of all, a regular expression that matches all valid email addresses is notoriously complex. I’m going to assume, given the test data, that you’re aiming for a much simpler concept of email address validity.
One issue with your regex is that you aren’t matching from the beginning of the line, which is signified with
^. This allows invalid emails like the one with an ampersand in the username to match because it just matches everything after the ampersand. So if we add the
^, we then get the following output:
$ sed -nr '/^w+@w+.w+$/p' emaillist.txt firstname.lastname@example.org
Well that’s not right either, and now the problem is that
w only represents any letter, number or underscore. Periods are the other “valid” non-alphanumeric character for usernames in your test data, so we also need to tweak your pattern to add that, and now we get the correct output:
$ sed -nr '/^(w|.)+@w+.w+$/p' emaillist.txt Saman.email@example.com firstname.lastname@example.org Saman.email@example.com