Normalization from UNF to 3NF


I keep getting confused trying to normalize this data.

A company makes and sells a product online. There are ten types of Products, each having a code and price.

Their Sales Order department takes customer information (name, address, trimmer product code and purchase quantity) from a simple email posting.

The products are shipped to the customer and the shipments logged into a delivery book, giving customer name and address, shipper (Parcel force, TNT, ANC, Overnite, Royal Mail), date shipped, quantity and product code.

Products are sold on a 30 day return policy, so after one calendar month of shipping an invoice is sent to the customer. The invoice details each product line supplied, unit cost of each line and sub-total, as well as the shipment date and the overall invoice cost (grand total).

Customer payments (cheque numbers) are logged and matched against a copy of the invoice before banking. Every month the log is searched for outstanding debts and these are compiled into an outstanding invoice report.

So I've created some data to help me understand:
enter image description here

This data has helped me alot when it comes to understanding what I need to do in terms of my final database, but I'm finding normalizing it tricky as I've got so much information. So I came up with this:

enter image description here

I would appreciate it if someone could give me some feedback and advice as this attempt feels wrong.

Best Answer

The simplest way to think about normalizing data is that you want to get to where all the data in the table is either your primary key or dependent on the primary key. It's also important to know how your entities relate to each other, because you'll have a different model for one-to-many versus many-to-many relationships.

I think the relationship modeling is where you've gone wrong. You're moving every relationship into its own table, which is not appropriate for a one-to-one or one-to-many relationship. For instance, unless a single order can belong to multiple customers, the Customer_Order table is not the correct way to model this. What you'd want instead is a CustomerNo column in your Order table, with the CustomerNo as a foreign key to the Customer table. That identifies the customer who placed the order.

Same thing with your shipping info. A table that contains only ShipperNo and DateShipped doesn't do you any good. If the entire order must go out in a single shipment, then adding ShipperNo and DateShipped to the Order table would work. If an order may be split into multiple shipments, then I'd create a Shipment table that contains (ShipmentNo, OrderNo, ShipperNo, DateShipped), as well as a ShipmentItems table that links the Shipment to the actual products from the order that were sent in that shipment. Now the shipments that resulted from an order can be identified by selecting all Shipment rows where OrderNo matches the order you're viewing.

Your Invoice_Order table is appropriate if you may issue multiple invoices for a single order, else I'd just add OrderNo to the Invoice table as a foreign key.

Is that enough to help you see the general pattern? You've done a good job separating your entities, if we can get you over the hump on modeling the relationships between the entities.